ArticlePDF Available

Neural Networks and Nonlinear Adaptive Filtering: Unifying Concepts and New Algorithms

The MIT Press
Neural Computation
Authors:

Abstract and Figures

The paper proposes a general framework that encompasses the training of neural networks and the adaptation of filters. We show that neural networks can be considered as general nonlinear filters that can be trained adaptively, that is, that can undergo continual training with a possibly infinite number of time-ordered examples. We introduce the canonical form of a neural network. This canonical form permits a unified presentation of network architectures and of gradient-based training algorithms for both feedforward networks (transversal filters) and feedback networks (recursive filters). We show that several algorithms used classically in linear adaptive filtering, and some algorithms suggested by other authors for training neural networks, are special cases in a general classification of training algorithms for feedback networks.
Content may be subject to copyright.
1
Neural Computation vol. 5, 99. 165-197 (1993)
NEURAL NETWORKS AND NON-LINEAR ADAPTIVE FILTERING:
UNIFYING CONCEPTS AND NEW ALGORITHMS
O. NERRAND, P. ROUSSEL-RAGOT, L. PERSONNAZ, G. DREYFUS
Ecole Supérieure de Physique et de Chimie Industrielles de la Ville de Paris
10, rue Vauquelin
75005 PARIS - FRANCE
S. MARCOS
Laboratoire des Signaux et Systèmes
Ecole Supérieure d'Electricité
Plateau de Moulon
91192 GIF SUR YVETTE - FRANCE
Abstract
The paper proposes a general framework which encompasses the training of neural networks and the
adaptation of filters. We show that neural networks can be considered as general non-linear filters
which can be trained adaptively, i. e. which can undergo continual training with a possibly infinite
number of time-ordered examples. We introduce the canonical form of a neural network. This
canonical form permits a unified presentation of network architectures and of gradient-based training
algorithms for both feedforward networks (transversal filters) and feedback networks (recursive
filters). We show that several algorithms used classically in linear adaptive filtering, and some
algorithms suggested by other authors for training neural networks, are special cases in a general
classification of training algorithms for feedback networks.
INTRODUCTION
The recent development of neural networks has made comparisons between "neural" approaches and
classical ones an absolute necessity, in order to assess unambiguously the potential benefits of using
neural nets to perform specific tasks. These comparisons can be performed either on the basis of
simulations - which are necessarily limited in scope to the systems which are simulated - or on a
conceptual basis - endeavouring to put into perspective the methods and algorithms related to various
approaches.
The present paper belongs to the second category. It proposes a general framework which
encompasses algorithms used for the training of neural networks and algorithms used for the
estimation of the parameters of filters. Specifically, we show that neural networks can be used
adaptively, i.e. can undergo continual training with a possibly infinite number of time-ordered
examples - in contradistinction to the traditional training of neural networks with a finite number of
examples presented in an arbitrary order; therefore, neural networks can be regarded as a class of
non-linear adaptive filters, either transversal or recursive, which are quite general because of the
ability of feedforward nets to approximate non-linear functions. We further show that algorithms
which can be used for the adaptive training of feedback neural networks fall into four broad classes;
these classes include, as special instances, the methods which have been proposed in the recent past
for training neural networks adaptively, as well as algorithms which have been in current use in
linear adaptive filtering. Furthermore, this framework allows us to propose a number of new
algorithms which may be used for non-linear adaptive filtering and for non-linear adaptive control.
The first part of the paper is a short presentation of adaptive filters and neural networks. In the
second part, we define the architectures of neural networks for non-linear filtering, either transversal
or recursive; we introduce the concept of canonical form of a network. The third part is devoted to
the adaptive training of neural networks; we first consider transversal filters, whose training is
relatively straightforward; we subsequently consider the training of feedback networks for non-linear
recursive adaptive filtering, which is a much richer problem; we introduce undirected, semi-directed,
and directed algorithms, and put them into the perspective of standard approaches in adaptive
filtering (output error and equation error approaches) and adaptive control (parallel and series-
parallel approaches), as well as of algorithms suggested earlier for the training of neural networks.
2
1. SCOPES OF ADAPTIVE FILTERS AND OF NEURAL NETWORKS
1.1. ADAPTIVE FILTERS
Adaptive filtering is of central importance in many applications of signal processing, such as the
modelling, estimation and detection of signals. Adaptive filters also play a crucial role in system
modelling and control. These applications are related to communications, radar, sonar, biomedical
electronics, geophysics, etc.
A general discrete-time filter defines a relationship between an input time sequence {u(n), u(n–1),
…} and an output time sequence {y(n), y(n–1), …}, u(n) and y(n) being either uni or
multidimensional signals. In the following, we consider filters having one input and one output. The
generalization to multidimensional signals is straightforward.
There are two types of filters: (i) transversal filters (termed Finite Impulse Response or FIR filters in
linear filtering) whose outputs are functions of the input signals only; and (ii) recursive filters
(termed Infinite Impulse Response or IIR filters in linear filtering) whose outputs are functions both
of the input signals and of a delayed version of the output signals. Hence, a transversal filter is
defined by:
y(n) = [u(n), u(n-1), ..., u(n-M+1)], (1)
where M is the length of the finite memory of the filter, and a recursive filter is defined by
y(n) = [u(n), u(n-1), ..., u(n-M+1), y(n-1), y(n-2), ...., y(n-N)] (2)
where N is the order of the filter.
The ability of a filter to perform the desired task is expressed by a criterion; this criterion may be
either quantitative, e.g., maximizing the signal to noise ratio for spatial filtering [see for instance
Applebaum and Chapman 1976], minimizing the bit error rate in data transmission [see for instance
Proakis 1983], or qualitative, e.g. listening for speech prediction [see for instance Jayant and Noll
1984]. In practice, the criterion is usually expressed as a weighted sum of squared differences
between the output of the filter and the desired output (e.g. LS criterion).
An adaptive filter is a system whose parameters are continually updated, without explicit control by
the user. The interest in adaptive filters stems from two facts: (i) tailoring a filter of given
architecture to perform a specific task requires a priori knowledge of the characteristics of the input
signal; since this knowledge may be absent or partial, systems which can learn the characteristics of
the signal are desirable; (ii) filtering nonstationary signals necessitates systems which are capable of
tracking the variations of the characteristics of the signal.
The bulk of adaptive filtering theory is devoted to linear adaptive filters, defined by relations (1) and
(2), where is a linear function. Linear filters have been extensively studied, and are appropriate for
many purposes in signal processing. A family of particularly efficient adaptation algorithms has been
specially designed in the case of transversal linear filtering; they are referred to as the recursive least
square (RLS) algorithms and their fast (FRLS) versions [Bellanger 1987, Haykin 1991].
Linear adaptive filters are widely used for system and signal modelling, due to their simplicity, and
due to the fact that, in many cases (such as the estimation of gaussian signals), they are optimal.
Despite their popularity, they remain inappropriate in many cases, especially for modelling non-
linear systems; investigations along these lines have been performed for adaptive detection [see for
instance Picinbono 1988], prediction and estimation [see for instance McCannon et al. 1982].
Unfortunately, when dealing with non-linear filters, no general adaptation algorithm is available, so
that heuristic approaches are used.
By contrast, general methods for training neural networks are available; furthermore, neural
networks are known to be universal approximants [see for instance Hornik et al. 1989], so that they
can be used to approximate any smooth non-linear function. Since both the adaptation of filters
[Haykin 1991, Widrow and Stearns 1985] and the training of neural networks involve gradient
techniques, we propose to build on this algorithmic similarity a general framework which
encompasses neural networks and filters. We do this in such a way as to clarify how neural networks
can be applied to adaptive filtering problems.
1.2. NEURAL NETWORKS
The reader is assumed to be familiar with the scope and principles of the operation of neural
networks; in order to help clarify the relations between neural nets and filters, the present section
presents a broad classification of neural network architectures and functions, restricted to networks
with supervised training.
1.2.1. - Functions of neural networks .
The functions of neural networks depend on the network architectures and on the nature of the input
data:
- network architectures: neural networks can have either a feedforward structure or a feedback
structure;
- input data: the succession of input data can be either time-ordered or arbitrarily ordered.
3
Feedback networks (also termed recurrent networks) have been used as associative memories, which
store and retrieve either fixed points or trajectories in state space. The present paper stands in a
completely different context: we investigate feedback neural networks which are never left to evolve
under their own dynamics, but which are continually fed with new input data. In this context, the
purpose of using neural networks is not that of storing and retrieving data, but that of capturing the
(possibly non-stationary) characteristics of a signal or of a system.
Feedforward neural networks have been used basically as classifiers for patterns whose sequence of
presentation is not significant and carries no information, although the ordering of components
within an input vector may be significant.
In contrast, the time ordering of the sequence of input data is of fundamental importance for filters:
the input vectors can be, for instance, the sequence of values of a sampled signal. At time n, the
network is presented with a window of the last M values of the sampled signal {u(n), u(n-1), ..., u(n-
M+1)}, and, at time n+1, the input is shifted by one time period {u(n+1), u(n), ..., u(n-M+2)}. In this
context, feedforward networks are used as transversal filters, and feedback networks are used as
recursive filters.
A very large number of examples of feedforward networks for classification can be found in the
literature. Neural network associative memories have also been very widely investigated [Hopfield
1982, Personnaz 1986, Pineda 1987]. Feedforward networks have been used for prediction [Lapedes
and Farber 1988, Pearlmutter 1989, Weigend et al. 1990]. Examples of feedback networks for
filtering can be found in [Robinson and Fallside 1989, Elman 1990, Poddar and Unnikrishnan 1991].
Note that the above classification is not meant to be rigid. For instance, Chen et al. [Chen et al. 1990]
encode a typical filtering problem (channel equalization) into a classification problem. Conversely,
Waibel et al. [Waibel et al. 1989] uses a typical transversal filter structure as a classifier.
1.2.2. - Non-adaptive and adaptive training .
At present, in the vast majority of cases, neural networks are not used adaptively: they are first
trained with a finite number of training samples, and subsequently used, e.g. for classification
purposes. Similarly, non-adaptive filters are first trained with a finite number of time-ordered
samples, and subsequently used with fixed coefficients. In contrast, adaptive systems are trained
continually while being used with an infinite number of samples. The instances of neural networks
being trained adaptively are quite few [Williams and Zipser 1989a, Williams and Zipser 1989b,
Williams and Peng 1990, Narendra and Parthasarathy 1990, Narendra and Parthasarathy 1991].
2. STRUCTURE OF NEURAL NETWORKS FOR NON-LINEAR FILTERING
2.1. MODEL OF DISCRETE-TIME NEURON
The behaviour of a discrete-time neuron is defined by relation (3):
zi(n) = fi vi(n) = fi cij, zj(n-)
=0
qij
jPi
(3)
where:
fi is the activation function of neuron i,
vi is the potential of neuron i,
zj can be either the output of neuron j or the value of a network input j,
Pi is the set of indices of the afferent neurons and network inputs to neuron i,
cij, is the weight of the synapse which transfers information from neuron or network
input j to neuron i with (discrete) delay ,
qij is the maximal delay between neuron j and neuron i.
It should be clear that several synapses can transfer information from neuron (or network input) j to
neuron i, each synapse having its own delay and its own weight cij, .
Obviously, one must have cii,0=0 i for causality to be preserved.
If neuron i is such that: i Pi and qij=0 jPi, neuron i is said to be static.
2.2. STRUCTURE OF NEURAL NETWORKS FOR FILTERING
The architecture of a network, i.e. the topology of the connections and the distribution of delays, may
be fully or partially imposed by the problem that must be solved: the problem defines the sequence
of input signal values and of desired outputs; in addition, a priori knowledge of the problem may
give hints which help designing an efficient architecture (see for instance the design of the
feedforward network described in [Waibel et al.1989]).
In order to clarify the presentation and to make the implementation of the training algorithms easier,
the canonical form of the network is especially convenient. We first introduce the canonical form of
feedback networks; the canonical form of feedforward networks will appear as a special case.
4
2.2.1. - The canonical form of feedback networks
The dynamics of a discrete-time feedback network can be described by a finite-difference equation
of order N, which can be expressed by a set of N first-order difference equations involving N
variables (termed state variables) in addition to the M input variables. Thus, any feedback network
can be cast into a canonical form which consists of a feedforward (static) network
- whose outputs are the outputs of the neurons which have desired values, and the values of the
state variables,
- whose inputs are the inputs of the network and the values of the state variables, the latter being
delayed by one time unit (Figure 1).
Feedforward network
Output at
time n
External network
inputs at time n
State variables at
time n
...
.....
..... .......
State variables at
time n+1
Unit
delays
1 1 1
Figure 1:
General canonical form of a feedback neural network.
Note that the choice of the set of state variables is not necessarily unique: therefore, a feedback
network may have several canonical forms. The state of the network is the set of values of the state
variables.
In the following, all vectors will be denoted by uppercase letters.
The behaviour of a single-input-single-output network is described by the state equation (4) and
output equation (4'):
S(n+1) = [S(n),U(n)] (4)
y(n) = [S(n),U(n)] (4')
where U(n) is the vector of the M last successive values of the external input u and S(n) is the vector
of the N state variables (state vector). The output of the network may be a state variable.
The transformation of a non-canonical feedback neural network filter to its canonical form requires
the determination of M and of N. In the single-input-single-output case, the computation of the
maximum number of external inputs E (ME) is done as follows: construct the network graph whose
nodes are the neurons and the input, and whose edges are the connections between neurons, weighted
by the values of the delays; find the direct path of maximum weight D from input to output; one has
E = D+1. The determination of the order N of the network from the network graph is less
straightforward; it is described in Appendix 1.
If the task to be performed does not suggest or impose any structure for the filter, one may use either
a multi-layer Perceptron, or the most general form of feedforward network in the canonical form, i.e.
a fully connected network; in that case, the number of neurons, of state variables and of delayed
inputs must be found by trial and error.
If we assume that the state variables are delayed values of the output, or if we assume that the state
of the system can be reconstructed from values of the input and output, then all state variables have
desired values. Such is the case for the NARMAX model [Chen and Billings 1989] and for the
systems investigated in [Narendra 1990]. Figure 2 illustrates the most general form of the canonical
form of a network having a single output y(n) and N state variables {y(n-1), ..., y(n-N+1)}. It
features M external inputs, N feedback inputs and one output; it can implement a fairly large class of
functions ; the non-recursive part of the network (which implements function ) is a fully-
connected feedforward net.
u(n) u(n-M+1) y(n-N+1)
y(n-1)
f
f
y(n)
Fully connected
. .
. . . .
y(n-N)
Unit
delays
11
y(n-2)
. . . . . .
. . . . . . .
1
z1=zM+1=
zM=zM+2=zM+N-1=zM+N=
zM+N+=
zM+N+1 zM+N+2
Figure 2:
Canonical form of a network with a fully-connected feedforward net,
whose state variables are delayed values of the output.
More specific architectures are described in the literature, implementing various classes of
functions and . Some examples of such architectures are presented in Appendix 2.
5
2.2.2. - Special case: the canonical form of feedforward networks
Similarly, any feedforward network with delays, with input signal u, can be cast into the form of a
feedforward network of static neurons, whose inputs are the successive values u(n), u(n-1), ..., u(n-
M+1); this puts the network under the form of a transversal filter obeying relation (1):
y(n) = [u(n), u(n-1), ..., u(n-M+1)] = [U(n)] .
The transformation of a non-canonical feedforward neural network filter to its canonical form
requires the determination of the maximum value M, which is done as explained above in the case of
feedback networks. An example described in Appendix 1 shows that this transformation may
introduce the replication of some weights, known as "shared weights".
3. TRAINING ADAPTIVE NEURAL NETWORKS FOR ADAPTIVE FILTERING
3.1. CRITERION
The task to be performed by a neural network used as a filter is defined by a (possibly infinite)
sequence of inputs u and of corresponding desired outputs d. At each sampling time n, an error e(n)
is defined as the difference between the desired output d(n) and the actual output of the network y(n):
e(n)=d(n)-y(n). For instance, in a modelling process, d(n) is the output of the process to be modelled;
in a predictor, d(n) is the input signal at time n+1.
The training algorithms aim at finding the network coefficients so as to satisfy a given quality
criterion. For example, in the case of non-adaptive training (as defined in Section 1.2.2), the most
popular criterion is the Least Squares (LS) criterion; the cost function to be minimized is
J(C) = 1
K e(p)2
p=1
K
Thus, the coefficients minimizing J(C) are first computed with a finite number K of samples; the
network is subsequently used with these fixed coefficients.
In the context of adaptive training, taking into account all the errors since the beginning of the
optimization does not make sense; thus, one can implement a forgetting mechanism. In the present
paper, we use a rectangular "sliding window" of length Nc; hence the following cost function:
J(n, C) = 1
2 e(p)2
p=n-Nc+1
n
.
The choice of the length Nc of the window is task-dependent, and is related to the typical time scale
of the non-stationarity of the signal to be processed.
In the following, the notation J(n) will be used instead of J(n, C). The computation of e(p) will be
discussed in sections 3.3 and 3.4.2.
3.2. ADAPTIVE TRAINING ALGORITHMS
Adaptive algorithms compute, in real time, coefficient modifications based on past information. In
the present paper, we consider only gradient-based algorithms, which require the estimation of the
gradient of the cost function, J(n), and possibly the estimation of J(n); these computations make use
of data available at time n.
In the simplest and most popular formulation, a single modification of the vector of coefficients
C(n)=C(n)-C(n-1) is computed between time n and time n+1; such a method, usual in adaptive
filtering, is termed a purely recursive algorithm.
The modification of the coefficients is often performed by the steepest-descent method, whereby
C(n)=-µJ(n). In order to improve upon the steepest-descent method, quasi-Newton methods can
be used [Press et al. 1986], whereby C(n)=+µD(n), where D(n) is a vector obtained by a linear
transformation of the gradient.
Purely recursive algorithms were introduced in order to avoid time-consuming computations
between the reception of two successive samples of the input signal. If the application under
investigation does not have stringent time requirements, then other possibilities can be considered.
For instance, if it is desired to get closer to the minimum of the cost function, several iterations of the
gradient algorithm can be performed between time n and time n+1. In that case, the coefficient-
modification vector C(n) is computed iteratively as:
C(n) = CKn(n) - C0(n) where Kn is the number of iterations at time n,
with Ck(n) = Ck-1(n) + µkDk-1(n) (k=1 to Kn), where Dk-1(n) is obtained from the coefficients
computed at iteration k-1,
and C0(n) = CKn-1(n-1) .
If Kn>1, the tracking capabilities of the system in the non-stationary case, or the speed of
convergence to a minimum in the stationary case, may be improved with respect to the purely
recursive algorithm. The applicability of this method depends specifically on the ratio of the typical
time scale of the non-stationarity to the sampling period.
As a final variant, it may be possible to update the coefficients with a period T>1 if the time scale of
the non-stationarity is large with respect to the sampling period:
C0(n) = CKn-T (n-T) .
Whichever algorithm is chosen, the central problem is the estimation of the gradient,
J(n):
J(n)
cij
=
cij
1
2 e(p)2
p=n-Nc+1
n
.
At present, two techniques are available for this computation: the forward computation of the
gradient and the popular backpropagation of the gradient.
i) The forward computation of the gradient is based on the following relation:
6
J(n)
cij
= - e(p) y(p)
cij
p=n-Nc+1
n
.
The partial derivatives of the output at time n with respect to the coefficients appearing on the right-
hand side are computed recursively in the forward direction, from the partial derivatives of the inputs
to the partial derivatives of the outputs of the network.
ii) In contrast, backpropagation uses a chain derivation rule to compute the gradient of J(n). The
required partial derivatives of the cost function J(n) with respect to the potentials are computed in the
backward direction, from the output to the inputs.
The advantages and disadvantages of these two techniques will be discussed in sections 3.3 and
3.4.2.
In the following, we show how to compute the coefficient modifications for feedforward and
feedback neural networks, and we put into perspective the training algorithms developed recently for
neural networks and the algorithms used classically in adaptive filtering.
3.3. TRAINING FEEDFORWARD NEURAL NETWORKS FOR NON-LINEAR TRANSVERSAL
ADAPTIVE FILTERING.
We consider purely recursive algorithms (i.e. T=1 and Kn=1).The extension to non-purely recursive
algorithms is straightforward.
As shown in section 2.2.2, any discrete-time feedforward neural network can be cast into a
canonical form in which all neurons are static. The output of such a network is computed from the M
past values of the input, and the output at time n does not depend on the values of the output at
previous times.
Therefore, the cost function
J(n) = 1
2 e(p)2
p=n-Nc+1
n
is a sum of Nc independent terms. Its gradient can be computed, from the Nc+M+1 past input data
and the Nc corresponding desired outputs, as a sum of N
c independent terms: therefore, the
modification of the coefficients, at time n, is the sum of Nc elementary modifications computed from
Nc independent, identical elementary blocks (each of them with coefficients C(n-1)), between time n
and time n+1.
We introduce the following notation, which will be used both for feedforward and for feedback
networks: the blocks are numbered by m; all values computed from block m of the training network
will be denoted with superscript m. For instance, ym(n) is the output value of the network computed
by the m-th block at time n: it is the value that the output of the filter would have taken on, at time n-
Nc+m, if the vector of coefficients of the network at that time had been equal to C(n-1).
With this notation, the cost function taken into account for the modification of the coefficients at
time n becomes:
J(n) = 1
2 em(n) 2
m
=
1
Nc
where em(n) = d(n-Nc+m) - ym(n) is the error for block
m
computed at time n.
As mentioned in section 3.2, two techniques are available for computing the gradient of the cost
function: the forward computation technique (used classically in adaptive filtering) and the
backpropagation technique (used classically for neural networks) [Rumelhart et al. 1986].
Thus, each block, from block m=1 to block m=Nc , computes a partial modification cijm of the
coefficients and the total modification, at time n, is:
cij(n) = cij
m(n)
m
=
1
Nc
,
as illustrated in Figure 3.
d(n-2)
Block 1
C(n-2)
Block 2
C(n-2)
Block 3
C(n-2)
d(n-1)
At time n-1
u(n-3)
u(n-M-2)
u(n-2)
u(n-M-1)
u(n-1)
u(n-M)
d(n)d(n-2)
Block 1
C(n-1)
Block 2
C(n-1)
Block 3
C(n-1)
d(n-1)
At time n
. . . .
u(n-2)
u(n-M-1)
. . . .
u(n-1)
u(n-M)
. . . .
u(n)
u(n-M+1)
d(n-3)
c
ij
1
(n-1) c
ij
2
(n-1) c
ij
3
(n-1)
c
ij
1
(n) c
ij
2
(n) c
ij
3
(n)
c
ij
(n-1) =
c
ij
1
(n-1) +
c
ij
2
(n-1) +
c
ij
3
(n-1)
c
ij
(n-1) = c
ij
(n-2) +
c
ij
(n-1)
c
ij
(n) = c
ij
1
(n) + c
ij
2
(n) + c
ij
3
(n)
c
ij
(n) = c
ij
(n-1) + c
ij
(n)
. . . .
. . . .
. . . .
U
1
(n-1) U
2
(n-1) U
3
(n-1)
U
1
(n) U
2
(n) U
3
(n)
Figure 3:
Computation of two successive coefficient modifications
for a non-linear transversal filter (Nc=3).
It was mentioned above that either the forward computation method or the backpropagation method
can be used for the estimation of the gradient of the cost function. Both techniques lead to exactly the
same numerical results; it has been shown [Pineda 1989] that backpropagation is less
computationally expensive than forward computation. Therefore, for the training of feedforward
networks operating as non-linear transversal filters, backpropagation is the preferred technique for
gradient estimation. However, as we shall see in the following, this is not always the case for the
training of feedback networks.
3.4. TRAINING FEEDBACK NEURAL NETWORKS FOR NON-LINEAR RECURSIVE
ADAPTIVE FILTERING.
7
This section is devoted to the adaptive training of feedback networks operating as recursive filters.
This problem is definitely richer, and more difficult, than the training of feedforward networks for
adaptive transversal filtering. We present a wide variety of algorithms, and elucidate their
relationships to adaptation algorithms used in linear adaptive filtering and to neural network training
algorithms.
3.4.1. - General presentation of the algorithms for training feedback networks :
Since the state variables and the output of the network at time n depend on the values of the state
variables of the network at time n-1, the computation of the gradient of the cost function requires the
computation of partial derivatives from time n=0 up to the present time n. This is clearly not
practical, since (i) the amount of computation would grow without bound, and (ii) in the case of non-
stationary signals, taking into account the whole past history does not make sense. Therefore, the
estimation of the gradient of the cost function is performed by truncating the computations to a fixed
number of sampling periods Nt into the past. Thus, one has to use Nt computational blocks (defined
below), numbered from m=1 to m=Nt : the outputs y
m(n) are computed through N
t identical
versions of the feedforward part of the canonical form of the network (each of them with coefficients
C(n-1)). Clearly, Nt must be larger than or equal to Nc in order to compute the Nc last errors em(n).
Here again, we first consider the case where T=1 and Kn=1.
Figure 4 shows the m-th computational block for the forward computation technique: the state input
vector is denoted by Sinm(n); the state output vector is denoted by Soutm(n). The canonical
feedforward (FF) net computes the output from the external inputs Um(n) and the state inputs
Sinm(n). The Forward Computation (FC) net computes the partial derivatives required for the
coefficient modification, and the partial derivatives of the state vector which may be used by the next
block. The Nt blocks compute sequentially the N
t outputs {y
m} and the partial derivatives
{ym/cij}, in the forward direction (m=1 to Nt). The Nc errors {em} (computed from the outputs of
the last Nc blocks), and the corresponding partial derivatives are used for the computation of the
coefficient modifications, which is the sum of Nc terms:
cij(n) = - µ J(n)
cij
= µ e
m
m=N
t
-N
c
+1
Nt
ym
cij
= cij
m(n)
m=N
t
-N
c
+1
Nt
.
Details of the computations are to be found in Appendix 3.
+m)
External
inputs
d(n-N
e
m
Canonical FF net
(non linear)
+
-
f'
i
(v
i
m
)
Training block m at time n
FC net
(linear)
f'
i
Products
y
m
State
inputs
State
outputs
z
i
m
c
ij
m
, f'
i
(v
i
m
)
c
ij
m
, f
i
U
m
(n)
S
in
m
(n) S
out
m
(n)
S
out
m
c
ij
(n)
S
in
m
c
ij
(n)
c
ij
m
(n) = µ e
m
y
m
c
ij
m
y
m
c
ij
Figure 4:
Training block m at time n with a desired output value: computation of a partial coefficient
modification using the forward computation of the gradient for a feedback neural network. If the
ouput of block m has no desired value, it has no "products" part and does not contribute directly to
coefficient modifications: it just transmits the state variables and their derivatives to the next block.
In order for the blocks to be able to perform the above computations, the values of the state inputs
Sinm(n) and of their partial derivatives with respect to the weights must be determined. The choice
of these values is of central importance; it gives rise to four families of algorithms.
8
3.4.2. - Choice of the state inputs and of their partial derivatives.
3.4.2.1. - Choice of the state inputs:
The most "natural "choice of the state inputs of block m is to take the values of the state variables
computed by block m-1: Sinm(n)=Soutm-1(n) with S
in1(n)=Sout1(n-1). Thus, the trajectory of the
network in state space, computed at time n, is independent of the trajectory of the process: the input
of block m is not directly related to the actual values of the state variables of the process to be
modelled by the network, hence the name undirected algorithm. If the coefficients are mismatched,
this choice may lead to large errors and to instabilities. Figure 5a shows pictorially the desired
trajectory of the state of the network and the trajectory which is computed at time n when an
undirected algorithm is used (Nt=3, Nc=2). We show in section 3.4.2.2 that, in that case, one must
use the forward computation technique to compute the coefficient modifications (Figure 5b).
This choice of the state inputs has been known as the output error approach in adaptive filtering and
as the parallel approach in automatic control. It does not require that all state variables have desired
values.
In order to reduce the risks of instabilities, an alternative approach may be used, called a semi-
directed algorithm. In this approach, the state of the network is constrained to be identical to the
desired state for m=1:
Sinm(n)=Soutm-1(n) with S
in1(n) = [d(n-Nt), d(n-Nt-1), ..., d(n-Nt-M+1)]. This is possible only
when the chosen model is such that desired values are available for all state variables; this is the case
for the NARMAX model. Figure 6a shows pictorially the desired trajectory of the state of the
network and the trajectory which is computed at time n when a semi-directed algorithm is used
(Nt=4, Nc=2). We show in section 3.4.2.2 that, in that case, one can use the backpropagation
technique to compute the coefficient modifications (Figure 6b).
The trajectory of the state of the network can be further constrained by choosing the state inputs of
all blocks to be equal to their desired values:
Sinm(n) = [d(n-Nt+m-1), d(n-Nt+m-2), ..., d(n-Nt+m-M)] for all m.
With this choice, the training is under control of the desired values, hence of the process to be
modelled, at each step of the computations necessary for the adaptation (hence the name directed
algorithm); therefore, it can be expected that the influence of the mismatch of the model to the
process is less severe than in the previous cases. Figure 7a shows pictorially the desired trajectory of
the state of the network and the trajectory which is computed at time n when a directed algorithm is
used (N t= Nc=3). We show in section 3.4.2.2 that, in that case, one can use the backpropagation
technique to compute the coefficient modifications (Figure 7b). In directed algorithms, all blocks are
independent, just as in the case of the training of feedforward networks (section 3.3); therefore, one
has Nt = Nc.
n
n-1
n-2
n-3
n-4
C(n-1)
C(n-1)
C(n-1)
e
e
3
2
Initialization of the partial derivatives
Computed partial derivatives
Initialization of the feedback inputs
Computed ouput values
Desired outputs
(a)
d(n)
Block 1
C(n-1)
Block 2
C(n-1)
Block 3
C(n-1)
U
1
(n) U
2
(n) U
3
(n)d(n-1)
C
2
(n) C
3
(n)
S
in
1
(n) S
in
2
(n) S
in
3
(n)S
out
1
(n) S
out
2
(n) S
out
3
(n)
S
in
1
c
ij
(n) S
out
1
c
ij
(n) S
in
2
c
ij
(n) S
out
2
c
ij
(n) S
in
3
c
ij
(n) S
out
3
c
ij
(n)
S
out
1
c
ij
(n-1)
S
out
1
(n-1)
(b)
Figure 5:
Undirected algorithm (with Nt=3 and Nc=2).
a) Pictorial representation of the desired trajectory, and of the trajectory computed at time n, in
state space; the trajectory at time n is computed by the blocks shown on Figure 5b.
b) Computational system at time n. The detail of each block is shown on Figure 4. Note that the
output of block 1 has no desired value.
9
n
n-1
n-2
n-3
n-4
n-5
C(n-1) C(n-1) C(n-1)
C(n-1)
e
4
e
3
Output value
Desired output value
(a)
C
1
(
n
)
C
2
(
n
)
C
3
(
n
)
C
4
(n)
d(n)d(n-1)
B
l
oc
k
1
C(n-1)
B
l
oc
k
2
C(n-1)
B
l
oc
k
3
C(n-1)
B
l
oc
k
4
C(n-1)
D
1
(n)
U
1
(n) U
2
(n) U
3
(n) U
4
(n)
S
in
1 (n)
S
out
1 (n)
S
in
3 (n)
S
out
3 (n)
S
in
2 (n)
S
out
2 (n)
S
in
4 (n)
S
out
4 (n)
Sin
4 (n)
Sout
1 (n)
JJ
Sout
3(n)
J
Sout
2 (n)
J
S in
3 (n)
J
S in
2 (n)
J
(b)
Figure 6:
Semidirected algorithm (with Nt=4 and Nc=2).
a) Pictorial representation of the desired trajectory, and of the trajectory computed at time n, in
state space; the trajectory at time n is computed by the blocks shown on Figure 6b.
b) Computational system at time n. The detail of each block is shown on Figure 8. Note that the
outputs of blocks 1 and 2 have no desired values, but do contribute an additive term to the coefficient
modifications.
This choice of the values of the state inputs has been known as the equation error approach in
adaptive filtering and as the series-parallel approach in automatic control. It is an extension of the
teacher forcing technique [Jordan 1985] used for neural network training.
If some state inputs do not have desired values, hybrid versions of the above algorithms can be used:
those state inputs for which no desired values are available are taken equal to the corresponding
computed state variables (as in an undirected algorithm), whereas the other state inputs may be taken
equal to their desired values (as in a directed or in a semi-directed algorithm).
n
n-1
n-2
n-3
n-4
n-5
C(n-1)
C(n-1)
C(n-1)
e
e
e1
2
3
Output value
Desired output value
(a)
d(n)d(n-2)
Block 1
C(n-1)
Block 2
C(n-1)
Block 3
C(n-1)
D1(n)
U 1 (n) U 2 (n) U 3 (n)
d(n-1)
D2(n) D3(n)
C 1 (n) C 2 (n) C 3 (n)
Sin
1 (n) Sin
2 (n) Sin
3 (n)
BPnet BPnet BPnet
(b)
Figure 7:
Directed algorithm (with Nt=Nc=3).
a) Pictorial representation of the desired trajectory, and of the trajectory computed at time n, in
state space; the trajectory at time n is computed by the blocks shown on Figure 7b.
b) Computational system at time n. The detail of each block is shown on Figure 8.
Note that, in a directed algorithm, each block is independent from the others and must have a desired
output value.
10
3.4.2.2. - Consistent choices of the partial derivatives of the state inputs:
The choices of the state inputs lead to corresponding choices for the initialization of the partial
derivatives, as illustrated in Figures 5a, 6a, 7a.
In the case of the undirected algorithm, one has Sinm(n)=Soutm-1(n); therefore, a consistent choice
of the values of the partial derivatives of the state inputs consists in taking the values of the partial
derivatives of the state outputs computed by the previous block:
Sin
m(n)
cij
= Sout
m-1(n)
cij
,
except for the first block where one has:
Sin
1(n)
cij
= Sout
1(n-1)
cij
.
In the case of the semi-directed algorithm, the state input values of the first block are taken equal to
the corresponding desired values; the latter do not depend on the coefficients; therefore, their partial
derivatives can consistently be taken equal to zero. The values of the partial derivatives of the state
inputs are taken equal to the values of the partial derivatives of the state outputs computed by the
previous block.
In the case of the directed algorithm, one can consistently take the partial derivatives of the state
inputs of all blocks equal to zero.
The parameters T, Kn, Nt, Nc being fixed, the first three algorithms described above are summarized
on the first line of each section of Table 1. The first part of the acronyms refers to the choice of the
state inputs and the second part refers to the choice of the partial derivatives of the state inputs. They
include algorithms which have been used previously by other authors: the "Real-Time Recurrent
Learning Algorithm" [Williams and Zipser 1989a] is an undirected algorithm (using the forward
computation technique) with N t=Nc=1. This algorithm is known as the Recursive Prediction Error
algorithm, or IIR-LMS algorithm, in linear adaptive filtering [Widrow and Stearns 1985]. The
"Teacher-Forced Real-Time Recurrent Learning Algorithm" [Williams and Zipser 1989a] is a hybrid
algorithm with Nt=Nc=1.
zero zero
zero
S
in
1
(n) =
Undirected (UD)
Algorithm
(Output Error)
(Parallel)
S
in
m
(n) = S
in
1
c
ij
(n) = S
in
m
c
ij
(n) =
S
out
1
(n-1) S
out
m-1
(n) S
out
1
c
ij
(n-1) S
out
m-1
c
ij
(n)
S
out
m-1
c
ij
(n)
S
out
1
(n-1) S
out
m-1
(n)
S
out
1
(n-1) S
out
m-1
(n)
UD-D Algorithm
UD-SD Algorithm
Initialization: state input
of the first block
State input of
a current block
Initialization:
partial derivatives
for the first block
Partial derivatives
for a current block
Initialization: state input
of the first block
State input of
a current block
Desired values
Initialization:
partial derivatives
for the first block
Partial derivatives
for a current block
zero zero
zero
S
in
1
(n) =
Semi-Directed (SD)
Algorithm
S
in
m
(n) = S
in
1
c
ij
(n) = S
in
m
c
ij
(n) =
S
out
m-1
(n)
S
out
1
c
ij
(n-1) S
out
m-1
c
ij
(n)
S
out
m-1
c
ij
(n)
Desired values S
out
m-1
(n)
Desired values S
out
m-1
(n)
SD-D Algorithm
SD-UD Algorithm
Initialization: state input
of the first block
Desired values
State input of
a current block
Desired values
Initialization:
partial derivatives
of the first block
Partial derivatives
for a current block
zero zero
zero
S
in
1
(n) =
Directed Algorithm (D)
(Equation Error)
(Teacher Forcing)
(Series Parallel)
S
in
m
(n) = S
in
1
c
ij
(n) = S
in
m
c
ij
(n) =
S
out
1
c
ij
(n-1) S
out
m-1
c
ij
(n)
S
out
m-1
c
ij
(n)
Desired values Desired values
Desired values Desired values
D-SD Algorithm
D-UD Algorithm
Table 1:
Three families of algorithms for the training of feedback neural networks. In each section, the first
line describes the algorithms with consistent choices of the state inputs (sec. 3.4.2.2).
11
The above algorithms have been introduced in the framework of the forward computation of the
gradient of the cost function. However, the estimation of the gradient of the cost function by
backpropagation is attractive with respect to computation time, as mentioned in section 3.3.4. If this
technique is used, the computation is performed with Nt blocks, where each coefficient c
ij is
replicated in each block m as cijm. Therefore, one has:
vi
m
cij
m = zj
m .
The training block m is shown in Figure 8: after computing the Nc errors using the Nt blocks in the
forward direction, the Nt blocks compute the derivatives of J(n) with respect to the potentials {vim},
in the backward direction. The modification of the coefficients is computed from the Nt blocks as:
cij(n) = - µ J(n)
cij
= µ
m=1
N t J(n)
vi
m zj
m = cij
m(n)
m=1
N t
.
Canonical FF net
(non linear)
BP net
(linear)
Products e m
J
vi
m
External
in
p
uts
Um(n)
State
in
p
uts
Sin
m(n) cij
m , fi
d(n-Nt+m)
+
-
y m
State
outputs
Sout
m (n)
cij
m(n) = - µ J
vi
m zj
m
Trainin
g
block m at time n
J
Sin
m
J
Sout
m
zi
mf '
i
cij
m , f '
i(vi
m)
fi'(vi
m)
Figure 8:
Training block m at time n with a desired output value: computation of a partial coefficient
modification using the backpropagation technique for the estimation of the gradient for a feedback
neural network. If block m has no desired value, then em=0, but it does contribute an additive term to
the coefficient modification. It should be noticed that forward propagation through all blocks must be
performed before backpropagation.
It is important to notice that backpropagation assumes implicitly that the partial derivatives of the
state inputs of the first copy are taken equal to zero. Therefore, the backpropagation technique will
lead to the same coefficient modifications as the forward propagation technique if and only if it is
used within algorithms complying with this condition, i.e. within directed or semi-directed
algorithms (Figures 6b and 7b); backpropagation cannot be used consistently within undirected and
hybrid algorithms. When both backpropagation and forward computation techniques can be used,
backpropagation is the best choice because of its lower computational complexity.
An example of the use of a directed algorithm for idenfication and control of non-linear processes
can be found in [Narendra and Parthasarathy 1990].
3.4.2.3. - Other choices of the partial derivatives of the state inputs:
Because adaptive neural networks require real-time operation, tradeoffs between consistency and
computation time may be necessary: setting partial derivatives Sinm/cij equal to zero may save
time by making the computation by backpropagation possible even for undirected algorithms (UD-D
or UD-SD algorithms). The full variety of algorithms is shown on Table 1: in each group, the first
line shows the characteristics of the fully consistent algorithm, whereas the other two lines show
other possibilities which are not fully consistent, but which can nevertheless be used with advantage.
The SD-UD, D-SD and D-UD algorithms have been included for completeness: computation time
permitting, the accuracy of the computation may be improved by setting the partial derivatives of the
state inputs to non-zero values in the directed or semi-directed case.
Undirected algorithms have been in use in linear adaptive filtering: the extended LMS algorithm is a
UD-D algorithm (see table 1) with Nt=Nc=1 [Shynk 1989]; the a posteriori error algorithm is also a
UD-D algorithm with Nt=2, Nc=1 [Shynk 1989].
The truncated backpropagation through time algorithm [Williams and Peng 1990] is a UD-D
algorithm with N c=1 and Nt>1, with a special feature: in order to save computation time, the
coefficients of the blocks 1 to Nt-1 are the coefficients which were computed at the corresponding
times.
12
CONCLUSION
The present paper provides a comprehensive framework for the adaptive training of neural networks,
viewed as non-linear filters, either transversal or recursive. We have introduced the concept of
canonical form of a neural network, which provides a unifying view of network architectures and
allows a general description of training methods based on gradient estimation. We have shown that
backpropagation is always advantageous for training feedforward networks adaptively, but that it is
not necessarily the best method for training feedback networks. In the latter case, four families of
training algorithms have been proposed; some of these algorithms have been in use in classical linear
adaptive filtering or adaptive control, whereas others are original.
The unifying concepts thus introduced are helpful in bridging the gap between neural networks and
adaptive filters. Furthermore, they raise a number of challenging problems, both for basic and for
applied research. From a fundamental point of view, general approaches to the convergence and
stability of these algorithms are still lacking; a preliminary study along these lines has been presented
[Dreyfus et al. 1992]; from the point of view of applications, the real-time operation of non-linear
adaptive systems requires specific silicon implementations, thereby raising the questions of the speed
and accuracy required for the computations.
ACKNOWLEDGEMENTS
The authors are very grateful to O. MACCHI for numerous discussions which have been very
helpful in putting neural networks into the perspective of adaptive filtering. C. VIGNAT has been
instrumental in formalizing some computational aspects of this work. We thank H. GUTOWITZ for
his critical reading of the manuscript.
APPENDIX 1
We consider a discrete-time neural network with any arbitrary structure, and its associated network
graph as defined in section 2.2.
The set of state variables is the minimal set of variables which must be initialized in order to allow
the computation of the state of all neurons at any time n>0, given the values of the external inputs at
all times from 0 to n. The order of the network is the number of state variables.
Clearly, the only neurons whose state must be initialized are the neurons which are within loops (i.e.
within cycles in the network graph). Therefore, in order to determine the order of the network, the
network graph should be pruned by suppressing all external inputs and all edges which are not within
cycles (this may result in a disconnected graph).
To determine the order, it is convenient to further simplify the network graph as follows: (i) merge
parallel edges into a single edge whose delay is the maximum delay of the parallel edges; (ii) if two
edges of a loop are separated by a neuron which belongs to this loop only, suppress the neuron and
merge the edges into a single edge whose delay is the sum of the delays of the edges.
We now consider the neurons which are still represented by nodes in the simplified network graph.
We denote by N the order of the network.
If, for each node i of the simplified graph, we denote by Ai the delay of the synpase, afferent to
neuron i, which has the largest delay (i.e. the weight of the edge directed towards i which has the
largest weight), then a simple upper bound for N is given by:
N
i Ai .
The state xi of a neuron i which has an afferent synapse of delay Ai cannot be computed at times
n<Ai; the computation of the states of the other neurons may require the values of xi at times 0, 1, ...,
Ai-1; thus, the contribution of neuron i to the order of the network is smaller than or equal to Ai.
Let the quantity
i be defined as:
i = Ai - minjRi (Aj-
ji) if Ai - minjRi (Aj-
ji) > 0 ,
i = 0 otherwise,
where Ri stands for the set of indices of the nodes which are linked to i by an edge directed from i to
j (i.e. the set of neurons to which neuron i projects efferent synapses).
Then the order of the network is given by:
N=
i
i .
The necessity of imposing the state of neuron i at time k (0<k<Ai-1) depends on whether this value
is necessary for the computation of the state of a neuron j to which neuron i sends its state: if k+ji is
smaller than the maximum delay Aj of the synapses afferent to j, it is not necessary to transmit the
state of neuron i at time k to neuron j, since the latter does not have the information required to
13
compute its state at time k+ji; the information on the state of neuron i at time k is necessary only if
one has kAj-ji .
Therefore, the minimum number of successive values required for neuron i is equal to:
Ai - minjRi (Aj-ji ) if Ai - minjRi (Aj-ji ) > 0 , zero otherwise.
Clearly, this result is in accord with the upper bound given above.
The above results determine the number of state variables related to each neuron. The choice of the
set of state variables is not unique. The presence of parallel edges within a loop, or the presence of
feedforward connections between loops, may require the replication of some neurons and of some
coefficients.
Figure A1.1.a shows a feedback network and Figure A1.1.b shows its canonical form; the order of
the network is 6. The example shows that some weights are replicated.
12345
u
y
(a)
1
2
1 2
2
1
13
3
5
4
11
x1(n-2) x 2(n-1) x2(n-2) x3(n-1) x5(n) x 5(n-1)
c11
,
1
c11,1 c12,1
c12
,
1
c21
,
0
c21
,
2
c32
,
2
c33
,
1
c32
,
2
c54
,
1
c33
,
1
c43,1
c55,2
c53
,
0
c23,1
u(n-2)
z3(n)= z4(n)= z5(n)= z6(n)= z7(n)=
z2(n)= z8(n)=
x1(n-1) x2(n) x2(n-1) x 3(n) x 5(n+1) x 5(n)
z3(n+1)= z4(n+1)= z5(n+1)= z 6(n+1)= z 7(n+1)= z 8(n+1)=
(b)
Unit
dela
y
s
u(n-1)
z1(n)=
c
1u,1
c
1u,1
01
2
1
1
0
1
1
Figure A1.1:
a) Example of a feedback neural network. Numbers in rectangles are synapse delay values, u is the
external input and y is the output of the network.
b) Canonical form of the network (E=8, M=2, N=6). The cij, notation of relation (3) is used.
APPENDIX 2
This appendix describes several architectures of feedback neural networks which have been
proposed in the literature. We present their canonical form, so that they can be easily compared.
The discrete-time mathematical model of a time-invariant dynamical process is of the form
S(n+1) = [S(n), U(n)]
Y(n) = [S(n), U(n)] ,
where vector U is the input of the dynamical system, vector S denotes the state of the system, and
vector Y is the output of the system. Since neural networks with hidden neurons are able to
approximate a large class of non-linear functions, they can be used for implementing functions and
.
The network proposed by Jordan [Jordan 1986] is trained to produce a given sequence y(n) for a
given constant input P ("plan"). Thus it is used as an associative memory. The network and its
canonical form are shown in Figure A2.1. The representation of the network under its canonical form
shows that the network is of order 2, although the representation used by Jordan exhibits four
connections with unit delays. Note that the state variables are not delayed values of the output. The
presence of hidden neurons allows this network to learn any function y(n)=[S(n), U(n)].
S1S2
H1H2
O1O2
P
S1S2
H1H2
O1O2
P
y1(n)y2(n)
s2(n)s1(n)
s2(n+1)s1(n+1)
1
(a) (b)
1
11
1 1
Figure A2.1:
a) Network architecture proposed by Jordan.
b) Canonical form.
The network suggested by Elman [Elman 1988] is used as a non-linear filter. Its canonical form is
shown on Figure A2.2.
14
f
f
f
f
1 1
Figure A2.2:
Canonical form of the network architecture proposed by Elman.
Each state variable is computed as a fixed non-linear function f of a weighted sum of the external
inputs and state inputs. Therefore, the class of functions which can be implemented is restricted to
the form:
[S(n),U(n)] = f[AS(n) + BU(n)] where A and B are the synaptic matrices.
Similarly, the output is computed as a fixed non-linear function f of a weighted sum of the state
variables, so that the class of functions that can be implemented is restricted to:
[S(n), U(n)] = f[CS(n)] where C is the synaptic matrix.
The network proposed in [Williams and Zipser 1989a, Williams and Peng 1990] is used as a non-
linear filter. The state of the network at time n+1 is computed as a weighted sum of the inputs and of
the state values at time n, followed by a fixed non-linearity fi. As a result, the network can only
implement non-linear functions of the form fi(AS(n) + BU(n)).
The network used by Poddar and Unnikrishnan [Poddar and Unnikrishnan 1991] consists of a
"feedforward" network of pairs of neurons; each neuron, except the output neuron, and each external
input, is associated to a "memory neuron". If x i(n) is the value of the output of neuron i and xj(n) the
value of the output of the associated memory neuron j at time n, the output of the memory neuron at
time n+1 is xj(n+1)= i xi(n) + (1–i) xj(n), 0<i 1. If i=0, the memory neurons introduce only
delays, so that the network is a non-linear transversal filter. If i0, the memory neurons are linear
low-pass first order filters, and the network is actually a feedback network. A state output is
associated to each memory neuron.
Figure A2.3a shows an example of such an architecture where neurons 3, 4, 7 and 8 are the memory
neurons associated to the two inputs 1 and 2 and to the two neurons 5 and 6, respectively. The
canonical form is shown in Figure A2.3b where x3, x4, x7, x8 are chosen as state variables.
1
Unit
delays
Unit
d
elays
(
a
)
(b)
x
3
x
4
x
5
x
6
x7x
8
x9
x
1
x
2
x (n)
9
x
(
n
)
7
x
(
n
)
8
x
(
n
)
1x
(
n
)
2
x
(n)
3
x (n)
4
x (n+1)
7
x
(n+1)
8
x (n+1)
3x (n+1)
4

f
f
f
x (n)
5
x (n)
6

f
f
f
1
1
1
1 1
1 1
Figure A2.3:
a) Network architecture proposed by Poddar and Unnikrishnan.
b) Canonical form.
For process identification and control problems, the most general structure used by Narendra and
Parthasarathy [Narendra and Parthasarathy 1991] is a model of the specific form: y(n) = 1[u(n-1),
u(n-2), ...] + 2[y(n-1), y(n-2), ...], where 1 and 2 are implemented by MLP networks with 20
neurons in the first hidden layer and 10 neurons in the second hidden layer.
15
APPENDIX 3
For simplicity, we present the training of the fully connected neural net of Figure 2: we denote the
external inputs by z1 to zM , the feedback inputs by zM+1 to zM+N , and the outputs of the neurons by
zM+N+1 to zM+N+ (where is the number of neurons). The neurons are ordered in the following
way: the p-th neuron receives the outputs of neurons indexed q<p (fully connected).
At time n, we have to consider the following cost function:
J(n) = 1
2 (em) 2
m=N
t
-N
c
+1
N t
where Nt is the number of blocks used to
compute the Nc
values em (Nt
Nc).
In this appendix, we present the contribution of block m (1mNt) to the gradient estimation. This
contribution is computed from the external input vector, the desired value and the state input vector.
We denote the available values of the coefficients at time n by {cij}.
The canonical FF net of the mth block, with coefficients {c
ijm}={cij}, computes the outputs
zim=fi(vim) of all neurons and the state output vector Soutm(n) from the external input vector
Um(n) = u(n-Nt+m),u(n- Nt+m-1),…,u(n- N t+m-M+1) = z1
m, z2
m, … , zM
m
and the state input vector S in
m(n) = zM+1
m, zM+2
m, … , zM+N
m as follow:
(i) For i = 1 to M (external inputs):
zi
m = u(n-Nt+m-i+1) ;
(ii) For i = M+1 to M+ (state inputs):
zi
m is given by the chosen algorithm (table 1) ;
(iii) For i = M+N+1 to M+N+-1 (hidden neurons):
zi
m = fivi
m with vi
m =
jPi
cij
m zj
m ;
(iv) For i = M+N+ (linear output neuron):
ym = zM+N+
m = vM+N+
m = cM+N+,j
m zj
m
jPM+N+
.
Thus, the state output vector is
Sout
m(n) z
M+N+
m, zM+N++1
m, … , zM+N++N-1
m = ym, zM+1
m, … , zM+N-1
m .
and, if Nt-Nc+1mNt , we obtain from the desired value d(n-Nt+m) and the output ym:
em = d(n-Nt+m) - ym.
In the following, we present two methods for the computation of the gradient of J(n): the forward
computation and the backpropagation techniques.
1) Forward computation (Figure 4):
We consider the whole set of Nt blocks as a static network on which we perform the forward
computation technique.
It is based on the following relation:
J(n)
cij
=
cij
1
2 em 2
m=N
t
-N
c
+1
Nt
= - em ym
cij
m=N
t
-N
c
+1
Nt
.
The linear FC net of the mth block computes, with coefficients {cijm} and {f'i(vim)}, the set of partial
derivatives of the state output (including ym) with respect to all coefficients cij :
Sout
m
cij
(n)
For the (M+N)+(-1)/2 coefficients cij (i>j):
(i) For p = 1 to M (external inputs):
zp
m
cij
= 0 ;
(ii) For p = M+1 to M+ (feedback inputs):
zp
m
cij
is given by the chosen algorithm (table 2)
(iii) For p = M+N+1 to M+N+-1 (hidden neurons):
if p=i then zi
m
cij
= fi'[ vi
m] zj
m otherwise zp
m
cij
= fp'[ vp
m] cph zh
m
cij
hPp
;
(iv) For p = M+N+ (linear output neuron):
ym
cij
= zM+N+
m
cij
= M+N+,i zj
m + cM+N+,h
m zh
m
cij
hPM+N+
.
Thus the partial derivatives of the state output are given by:
Sout
m
cij
(n) zM+N+
m
cij
; zM+N++1
m
cij
;…; zM+N++N-1
m
cij
= ym
cij
; zM+1
m
cij
;…; zM+N-1
m
cij
Once all partial derivatives of the output values ym are computed for the Nt blocks, the gradient of
J(n) is obtained from:
J(n) = J(n)
cij i>j
where J(n)
cij
= - em ym
cij
m=N
t
-N
c
+1
Nt
.
If the steepest-descent method is used, the coefficient modifications are given by:
cij(n) = - µ J(n)
cij
= µ e
m ym
cij
m=N
t
-N
c
+1
Nt
= cij
m(n)
m=N
t
-N
c
+1
Nt
.
2) Backpropagation (Figure 8):
Considering the effect of the coefficient cij only, one has:
16
dJ(n) = J(n)
cij
m dcij
m
m=1
N t
with dcij
m = dcij m, thus dJ(n)
dcij
= J(n)
cij
m
m=1
N t
.
Then the gradient of J(n) can be written as:
J(n)
cij p,qi,j
cpq constant
= J(n)
cij
mp,qi,j
m'm
cpq
m' constant
m=1
Nt
where J(n)
cij
m = J(n)
vi
m vi
m
cij
m = J(n)
vi
m zj
m for i=M+N+1 to M+N+
for j=1 to i-1
This means that standard backpropagation can be applied to the whole set of Nt blocks considered as
a static network with replicated coefficients.
The linear BP net of the mth block computes, with coefficients {cijm} and {f'i(vim)}, the set of partial
derivatives of J(n) with respect to the potentials vim of all neurons:
We define the following set of variables qim:
(i) for i=M+N++N-1 down to M+N++1:
if m=Nt then qi
m = 0 otherwise qi
m = qi-N-+1
m+1 ;
(ii) for i=M+N+ (linear output neuron):
if m=Nt then qi
m = em otherwise qi
m = em + qM+1
m+1 ; (note that qi
m = - J(n)
v
i
m ) ;
(iii) for i = M+N+1 down to M+N+1 (hidden neurons):
qi
m = fi'(vi
m) chi
m
hRi
qh
m where Ri is the set of indices of the neurons
to which the i-th neuron transmits its output ; ( qi
m = - J(n)
vi
m )
;
(iv) for i = M+N (last feedback input):
qi
m = chi
m
hRi
qh
m ;
(v) for i=M+N-1 down to M+1 (other feedback inputs):
qi
m = chi
m
hRi
qh
m + qi+N+
m .
Note that computation by backpropagation assumes implicitly that the derivatives of the feedback
inputs of the first block (m=1) with respect to the coefficients are equal to zero; this is in contrast to
the forward computation of the gradient, where these values can be initialized arbitrarily.
Note also that with the forward computation technique, the number of partial derivatives to compute
for each block is
[
M+(
-1)
/2] whereas with the backpropagation method this number is
.
Once all partial derivatives of J(n) with respect to the potentials vim of all neurons are computed for
the Nt blocks, the gradient of J(n) is obtained from:
J(n) = J(n)
cij i>j
where J(n)
cij
= J(n)
vi
m zj
m
m
=
1
Nt
.
If the steepest-descent method is used, the coefficient modifications are given by:
cij(n) = - µ J(n)
cij
=- µ J(n)
vi
m zj
m
m
=
1
Nt
= cij
m(n)
m
=
1
Nt
.
17
LITERATURE REFERENCES
Bellanger, M.G. 1987. Adaptive Digital Filters and Signal Analysis: Marcel Dekker.
Chen, S., Billings S.A. 1989. Representations of Non-Linear Systems: the NARMAX Model. Int. J.
Control 49, 1013-1032.
Chen, S., G.J. Gibson, C.F.N. Cowan and P.M. Grant. 1990. Adaptive Equalization of Finite
Nonlinear Channels Using Multilayer Perceptrons. Signal Processing 20, 107-119.
Dreyfus, G., O. Macchi, S. Marcos, L. Personnaz, P. Roussel-Ragot, D. Urbani, C. Vignat. 1992.
Adaptive Training of Feedback Neural Networks for Non-linear Filtering and Control. In
Proceedings of the Second IEEE Conf. on Signal Processing.
Elman, J. L. 1990. Finding Structure in Time. Cognitive Science 14, 179-211.
Fallside, F. 1990. Analysis of Linear Predictive data as Speech and of ARMA Processes by a Class
of Single-Layer Connectionnist Models. In Neurocomputing: Algorithms, Architectures and
Applications, F. Fogelman-Soulié and J. Hérault, eds., 265-283.
Haykin, S. 1991. Adaptive Filter Theory: Prentice-Hall International Editions.
Hopfield, J.J. 1982. Neural Networks and Physical Systems with Emergent Collective Computational
Abilities. Proc . of the Natl. Acad. Sci. USA 79, 2554-2558.
Hornik, K., M. Stinchcombe and H. White. 1989. Multilayer Feedforward Networks are Universal
Approximators. Neural Networks 2, 359-366.
Jayant, N.S. and P. Noll. 1984. Digital Coding of Waveforms. Principles and Applications to Speech
and Video. Signal Processing Series, A. Oppenheim, ed.: Prentice-Hall .
Jordan, M. I. 1985. The Learning of Representations for Sequential Performace. Doctoral
Dissertation, University of California, San Diego.
Jordan, M. I. 1989. Serial Order: A Parallel, Distributed Processing Approach. In Proceedings of the
Eighth Annual Conference of the Cognitive Science Society, 531-546: Erlbaum.
Lapedes, A., and R. Farber. How Neural Nets Work. 1988. In Neural Information Processing
Systems, D. Z. Anderson, ed., 442-456.
McCannon, T.E., N.C. Gallagher, D.Minoo-Hamedani and G.L. Wise. 1982. On the design of
nonlinear discrete-time predictors. IEEE Trans. on Information Theory 28, 366-371.
Narendra, K. S., and K. Parthasarathy. 1990. Identification and Control of Dynamical Systems Using
Neural Networks. IEEE Transactions on Neural Networks, 1, 4-27.
Narendra, K. S., and K. Parthasarathy. 1991. Gradient Methods for the Optimization of Dynamical
Systems Containing Neural Networks. IEEE Trans. on Neural Networks 2, 252-262.
Nicolau, E. and D. Zaharia. 1989 . Adaptive arrays. In Studies in Electrical and Electronic
Engineering 35: Elsevier.
Pearlmutter B. 1989. Learning State Space Trajectories in recurrent Neural Networks. Neural
Computation 1, 263-269.
Personnaz L., I. Guyon and G. Dreyfus. 1986. Collective Computational Properties of Neural
Networks: New Learning Mechanisms. Phys. Rev. A 34, 4217-4228.
Picinbono, B. 1988. Adaptive methods in temporal processing. In Underwater Acoustic Data
Processing, Y.T. Chan, ed., 313-327. Kluwer academic Publishers.
Pineda F. J. 1989. Recurrent Backpropagation and the Dynamical Approach to Adaptive Neural
Computation. Neural Comp. 1, 161-172.
Pineda, F. 1987. Generalization of Backpropagation to Recurrent Neural Networks. Phys. Rev. Lett.
59, 2229-2232.
Poddar, P., and K.P. Unnikrishnan. 1991. Non-Linear Prediction of Speech Signals using Memory
Neuron Networks. In Neural Networks for Signal Processing, Proceedings of the 1991 IEEE
Workshop, B. H. Juang, S. Y. Kung, and C. A. Kamm, eds., 395-404.
Press, W.H., B.P. Flannery, S.A. Teukolsky, W.T. Vetterling. 1986. Numerical Recipes. Cambridge
University Press.
18
Proakis, J.G. 1983. Digital communications: Mc Graw Hill.
Robinson, A. J. and F. Fallside. 1989. A Dynamic Connectionnist Model for Phoneme Recognition.
In Neural Networks from Models to Applications, L. Personnaz and G. Dreyfus, eds., 541-550: Paris,
IDSET.
Rumelhart, D., G. Hinton, R. Williams. 1986. Learning Internal Representations by Error
Propagation. In Parallel Distributed Processing, D. Rumelhart and J. McClelland, eds. MIT Press.
Shynk, J.J. 1989. Adaptive IIR Filtering. IEEE ASSP Magazine. April, 4-21.
Waibel, A., T. Hanazawa, G. Hinton, K. Shikano, and K. Lang. 1989. Phoneme Recognition Using
Time-Delay Neural Networks. IEEE Trans. on Acoustics, Speech, and Signal Processing 37, 328-
339.
Weigend, A. S., B.A. Huberman and D.E. Rumelhart. 1990. Predicting the Future: a Connectionnist
Approach. International Journal of Neural Systems 1, 193-209.
Widrow, B. and S.D. Stearns. 1985. Adaptive Signal Processing: Prentice-Hall.
Williams, R.J. and D. Zipser. 1989a. A Learning Algorithm for Continually Running Fully Recurrent
Neural Networks. Neural Comp. 1, 270-280.
Williams, R.J. and D. Zipser. 1989b. Experimental Analysis of the Real-Time Recurrent Learning
Algorithm. Connection Science 1, 87-111.
Williams, R.J. and J. Peng. 1990. An Efficient Gradient-Based Algorithm for On-Line Training of
Recurrent Network Trajectories. Neural Comp. 2, 490-501.
... Regarding modeling of a dynamical process, that is the case of this study, there are several types of multilayer perceptron depending on how the dynamic character of the basin is considered (Nerrand et al. 1993). Three of them are considered: ...
... • The feed-forward directed model: it uses exogenous variables and the previous observed outputs up to the instant of simulation as inputs. This type of model can be used when being sure that observed output values would be available in real-time conditions, or when the output noise is considered lower than the state noise (Nerrand et al. 1993). This model represents the physical phenomenon only for the most recent times; indeed it already has the observed value of the output at time k, and estimates its value at time k + 1. ...
... • The recurrent model: it uses exogenous variables and the previous simulated outputs as inputs (recurrent inputs). This type of model can be used when the availability of the observed output values is not guaranteed or when the state noise is considered to be lower than the output noise (Nerrand et al. 1993). This model's lead time is limited to the response time of the system, unless a forecast of the exogenous variables is provided. ...
Article
Full-text available
Flash floods frequently hit the Mediterranean regions and cause numerous fatalities and heavy damage. Their forecast is still a challenge because of the poor knowledge of the processes involved and because of the difficulty to forecast heavy convective rainfall. In any case, early warning remains a strong need. In this study, the authors propose to build a deep artificial neural network for flash flood forecasting, allowing, by its specific architecture, to take better account of the spatial variability and the scales of the rainfall as well as the hydrological responses. The outcomes of the deep model are then compared to a classical global multilayer perceptron previously published. For this purpose, a database of 58 heavy rainfall events extracted from 16 years of hydrometeorological observations on a well-studied basin in Southern France is applied to train a deep recurrent neural network. The results are of twofold: first, the deep model improves the lead time from two hours to three hours providing then suitable forecast for an operational use. Second, the model selection process converged towards an architecture that explicitly considers spatial scales of the basin. More generally, this study shows that the implementation of a rigorous selection process mobilizing several well-known regularization methods has enabled the deep model to converge towards a parsimonious model highlighting some of the known physical processes of the basin: the roles of elevation and distance to the outlet. This work provides, thus, a very interesting piece of evidence to fuel the controversy on the interpretability of modern AI.
... More specifically, in the case of Artificial Neural Networks, the dynamics of the model are obtained by including delay blocks in the structure [1]. The dynamics can be introduced in the neural network by maintaining its feedforward structure (Time Delay Neural Networks-TDNN [2][3][4][5]), or by introducing feedback [6]. In the latter case, the delays are mandatory, otherwise, the calculation of the neuron output cannot be resolved. ...
Article
Full-text available
In this paper, a new algorithm for the training of Locally Recurrent Neural Networks (LRNNs) is presented, which aims to reduce computational complexity and at the same time guarantee the stability of the network during the training. The main feature of the proposed algorithm is the capability to represent the gradient of the error in an explicit form. The algorithm builds on the interpretation of Fibonacci’s sequence as the output of an IIR second-order filter, which makes it possible to use Binet’s formula that allows the generic terms of the sequence to be calculated directly. Thanks to this approach, the gradient of the loss function during the training can be explicitly calculated, and it can be expressed in terms of the parameters, which control the stability of the neural network.
... The MLP is used in approximation problems [7] like stock market forecasting [8], aviation prognostics [9], data mining [10,11], filtering [12], control applications [13], energy systems [44], atmospheric science [45], hydrology [46], renewable energy systems [47], ecological modeling [48], electric load forecasting [49], rainfall-runoff modeling [50], weather forecasting [51] etc. ...
Preprint
Full-text available
This thesis presents a novel approach to neural network training that addresses the challenge of determining the optimal number of learning factors. The proposed Adaptive Multiple Optimal Learning Factors (AMOLF) algorithm dynamically adjusts the number of learning factors based on the error change per multiply, leading to improved training efficiency and accuracy. The thesis also introduces techniques for grouping weights based on the curvature of the objective function and for compressing large Hessian matrices. Experimental results demonstrate the superior performance of AMOLF compared to existing methods like OWO-MOLF and Levenberg-Marquardt.
... Neural networks have been applied in pattern recognition and object classification problems since the development of the backpropagation algorithm provided an efficient method for supervised training of the neural networks. Neural networks have also been used as nonlinear filters and have been trained to synthesize the response of nonlinear systems (Nerrand, Roussel-Ragot, Personnaz, Dreyfus, & Marcos, 1993). Traditional neural networks, however, worked well with static data, but were cumbersome for dealing with temporal data. ...
Article
Research techniques of prognostics for gas turbines and diesel engines have advanced in recent years. An analysis of trends in these techniques would benefit researchers assessing growth in the field and planning future research efforts. Prognostics research techniques were identified in 1,734 published papers dated 1997-2016 from both the Prognostics and Health Management (PHM) Society and papers identified by CiteSeerx that were published at venues other than the PHM Society. In order to categorize papers by research technique, a taxonomy of prognostics was created. Additionally, the papers were categorized into two topics: gas turbines and diesel engines. In a large proportion of papers, trends in research techniques of prognostics for gas turbines and diesel engines reflected improvements in the speed of multi-core computer processors, the development of optimized learning methods, and the availability of large training sets. The variety of prognostics research techniques that were identified in this review demonstrated the growth in prognostics research and increased use of this knowledge in the field. This systematic analysis of trends in research techniques of prognostics for gas turbines and diesel engines is useful to assess growth and utilization of knowledge in the larger field, and to provide a rationale (i.e., strategy, basis, structure) for planning the most effective use of limited research resources and funding.
... Il a été montré (Nerrand et al., 1993) que tout réseau bouclé peut être mis sous une forme particulière, appelée forme canonique, qui est la représentation d'état minimal de la fonction réalisée par ce réseau. Cette forme canonique est constituée d'un graphe acyclique, et de connexions à retard unité reliant certaines sorties de ce graphe à ses entrées. ...
Thesis
Full-text available
La caractérisation des réservoirs argilo-gréseux par les données de diagraphies est un moyen pratique de la description des réservoirs dans les champs pétroliers. Au cours des dernières années, plusieurs études ont été menées dans le domaine de l'ingénierie pétrolière en appliquant l'intelligence artificielle. Ce travail représente une méthode basée sur la pétrophysique qui utilise des diagraphies de puits et des données de modules de base pour prédire et enregistrer les données en profondeur dans les réservoirs argilo-gréseux de la formation du Trias dans le champ de Hassi R'Mel (Sahara algérien). Dans l'étude des gisements de pétrole, la prédiction de la perméabilité absolue et de la porosité est un élément fondamental dans les descriptions de réservoirs ayant un impact direct sur les autres paramètres pétrophysiques, les programmes d'injection d'eau et la bonne gestion de réservoir d’une manière plus efficace. Les formations du Trias du champ de Hassi R’Mel sont composées de grès et de sable schisteux avec de la dolomie. Les enregistrements diagraphiques de 10 puits de ce champ sont le point de départ pour la caractérisation de son réservoir. Ce travail présente un modèle hybride "neuro-fuzzy" basé sur l'utilisation des données de diagraphies pour l’estimation de la porosité et de la perméabilité. Une approche de la logique floue (fuzzy logic) est utilisée pour comparer la perméabilité carotte et la perméabilité calculée à partir des réseaux de neurones ainsi que celles de la porosité, développées dans ce modèle sur la base des données disponibles au niveau des puits. La logique floue est utilisée pour le choix des meilleurs rapports de forage associés à la porosité et la base de données de perméabilité. Le réseau neuronal est utilisé comme méthode de régression non linéaire pour développer une transformation entre diagraphies de puits sélectionnés et mesures de porosité et de perméabilité. Cette technique de méthode intelligente est utilisée comme un outil puissant pour l’estimation des propriétés des réservoirs d’après les paramètres diagraphiques et dans les projets de développement pétrolier et de gaz naturel.
... Les réseaux bouclés sont des systèmes dynamiques non linéaires qui évoluent continuellement jusqu'à l'obtention d'un état d'équilibre, et nécessitent une prise en compte de la notion du temps. Ils sont utilisés pour modéliser des processus dynamiques non-linéaires, la commande de processus ou le filtrage [92,93]. ...
Thesis
Full-text available
L’épilepsie est une neuropathologie chronique, caractérisée par des manifestations cliniques paroxystiques transitoires provenant d’une décharge anormale et excessive d’une population neuronale. L'électroencéphalographie (EEG) est la modalité de référence d'exploration cérébrale pour la détection et le diagnostic de l'activité épileptiforme, il permet d'évaluer l'activité bioélectrique cérébrale par le biais d'un ensemble d'électrodes placées sur le cuir chevelu. D'autre part, le suivi des patients présentant un risque de crise d'épilepsie est essentiel pour garantir un traitement optimal et prévenir les complications des crises ultérieures. Ainsi, la prédiction des crises permet aux patients de recevoir une alerte précoce et d'agir efficacement par le biais de médicaments ou d'autres mesures préventives. Le cadre scientifique de cette thèse se concentre sur le développement de nouvelles approches de détection et de prédiction de l’occurrence d’une crise d’épilepsie et de localiser les générateurs corticaux continus par le traitement des signaux EEG. Les approches proposées repose principalement sur des techniques de traitement du signal, notamment les distributions temps-fréquence quadratiques (QTFDs) tels que le spectrogramme (SP), la distribution de Pseudo Wigner-Ville lissée (SPWVD) et la distribution de Choi–Williams (CWD), ainsi que des nouvelles caractéristiques pertinentes extraites des signaux EEG et des approches de complexité non linaire comme l'entropie de Renyi (RE) pour éventuellement les incorporer dans des classifieurs d'apprentissage supervisé performants pour la détection et la prédiction d'éventuelles anomalies paroxystiques critiques. L'algorithme proposé est évalué sur la base de données du Children's Hospital Boston (CHB MIT) ainsi que celle de la Temple University Hospital (TUH), produisant ainsi des résultats encourageants en termes de taux total de classification (Acc), de sensibilité (Sens) et de taux de fausses alarmes (FPR).
Chapter
The object of the study is a new methodological and practical approach to solving the problem of adaptive neural network filtering. A hybrid adaptive approach to the operational assessment of the security of critical resources is described, which combines traditional Kalman filtering methods with the capabilities of artificial neural networks with training. An analysis of this approach features is made, it allows training and adjusting the filtering weights to the statistical characteristics of the security indicators of critical resources, measured and observed both linearly and non-linearly.
Article
Full-text available
Karst aquifers can provide previously untapped freshwater resources and have thus generated considerable interest among stakeholders involved in the water supply sector. Here we compare the capacity of two systemic models to simulate the discharge and piezometry of a karst aquifer. Systemic models have the advantage of allowing the study of heterogeneous, complex karst systems without relying on extensive geographical and meteorological datasets. The effectiveness and complementarity of the two models are evaluated for a range of hydrologic conditions and for three methods to estimate evapotranspiration (Monteith, a priori ET, and effective rainfall). The first model is a reservoir model (referred to as VENSIM, after the software used), which is designed with just one reservoir so as to be as parsimonious as possible. The second model is a neural network (NN) model. The models are designed to simulate the rainfall–runoff and rainfall–water level relations in a karst conduit. The Lez aquifer, a karst aquifer located near the city of Montpellier in southern France and a critical water resource, was chosen to compare the two models. Simulated discharge and water level were compared after completing model design and calibration. The results suggest that the NN model is more effective at incorporating the nonlinearity of the karst spring for extreme events (extreme low and high water levels), whereas VENSIM provides a better representation of intermediate-amplitude water level fluctuations. VENSIM is sensitive to the method used to estimate evapotranspiration, whereas the NN model is not. Given that the NN model performs better for extreme events, it is better for operational applications (predicting floods or determining water pumping height). VENSIM, on the other hand, seems more appropriate for representing the hydrologic state of the basin during intermediate periods, when several effects are at work: rain, evapotranspiration, development of vegetation, etc. A proposal for improving both models is also provided.
Article
Full-text available
Deriving gradient algorithms for time-dependent neural network structures typically requires numerous chain rule expansions, diligent bookkeeping, and careful manipulation of terms. In this paper, we show how to derive such algorithms via a set of simple block diagram manipulation rules. The approach provides a common framework to derive popular algorithms including backpropagation and backpropagation-through-time without a single chain rule expansion. Additional examples are provided for a variety of complicated architectures to illustrate both the generality and the simplicity of the approach.
Article
Full-text available
Error backpropagation in feedforward neural network models is a popular learning algorithm that has its roots in nonlinear estimation and optimization. It is being used routinely to calculate error gradients in nonlinear systems with hundreds of thousands of parameters. However, the classical architecture for backpropagation has severe restrictions. The extension of backpropagation to networks with recurrent connections will be reviewed. It is now possible to efficiently compute the error gradients for networks that have temporal dynamics, which opens applications to a host of problems in systems identification and control.
Chapter
It is very difficult to present in one lecture all the material corresponding to the title of this paper. There are entire books on adaptive filtering and we do not even intend to make a résumé of them [1]–[4]. Adaptive systems have been the topic of many papers in previous meetings of the NATO Advanced Study Institute on Underwater Acoustics and Signal Processing, and this tutorial lecture will not make a new contribution to the field. On the contrary, after a long period of production, it is now time to make an overview of this material. As the field is extremely broad, it would of course be impossible to cover all the problems in one lecture, and we will focus our attention on some points which especially interest us.
Chapter
Since the first application of linear predictive or autoregressive analysis to speech by Atal [1] its use has become widespread throughout the analysis of speech — for coding, recognition and synthesis; see for example successive Proceedings of the International Conference on Acoustics, Speech & Signal Processing (ICASSP).
Article
Computational properties of use to biological organisms or to the construction of computers can emerge as collective properties of systems having a large number of simple equivalent components (or neurons). The physical meaning of content-addressable memory is described by an appropriate phase space flow of the state of a system. A model of such a system is given, based on aspects of neurobiology but readily adapted to integrated circuits. The collective properties of this model produce a content-addressable memory which correctly yields an entire memory from any subpart of sufficient size. The algorithm for the time evolution of the state of the system is based on asynchronous parallel processing. Additional emergent collective properties include some capacity for generalization, familiarity recognition, categorization, error correction, and time sequence retention. The collective properties are only weakly sensitive to details of the modeling or the failure of individual devices.
Article
One of the problems of some renewables energies is that the output of these kinds of systems is non-dispatchable depending on variability of weather conditions that cannot be predicted and controlled. From this point of view, the short-term forecast is going to be essential for effectively integrating solar energy sources, being a very useful tool for the reliability and stability of the grid ensuring that an adequate supply is present. In this paper a new methodology for forecasting the output of a PV generator one hour ahead based on dynamic artificial neural network is presented. The results of this study show that the proposed methodology could be used to forecast the power output of PV systems one hour ahead with an acceptable degree of accuracy.
Article
The highest Total Electron Content (TEC) values in the world normally occur at Equatorial Ionization Anomaly (EIA) region resulting in largest ionospheric range delay values observed for any potential Space Based Augmentation System (SBAS). Reliable forecasting of TEC is crucial for satellite based navigation systems. The day to day variability of the location of the anomaly peak and its intensity is very large. This imposes severe limitations on the applicability of commonly used ionospheric models to the low latitude regions. It is necessary to generate a mathematical ionospheric forecasting and mapping model for TEC based on physical ionospheric influencing parameters. A model, IRPE-TEC, has been developed based on real time low latitude total electron content data using GPS measurements from a number of stations situated around the northern crest of the EIA during 2007 through 2011 to predict the vertical TEC values during the low and moderate solar activity levels of the 24th solar cycle. This model is compared with standard ionospheric models like International Reference Ionosphere (IRI) and Parameterized Ionospheric Model (PIM) to establish its applicability in the equatorial region for accurate predictions.