Article

Back propagation with expected source values

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The back propagation learning rule converges significantly faster if expected values of source units are used for updating weights. The expected value of a unit can be approximated as the sum of the output of the unit and its error term. Results from numerous simulations demonstrate the comparative advantage of the new rule.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... This method uses energy function defined by equation (9). However, in the batch EBP algorithm, the weights are updated after accumulating errors corresponding to all input patterns, and thus making use of the energy function defined by equation (8). [4][9] ...
... As with  , further increases in Beta beyond some problem of specific value result in oscillations and nonconvergence. [8] ...
... This method makes two trivial changes to the original EBP algorithm, it monitors the value of the energy function E given by equation (8), and it dynamically adjusts the value of the learning rate coefficient η. ...
Article
Full-text available
Error backpropagation neural network (EBP) used training algorithm for feedforward artificial neural networks (FFANNs). The main problem with the EBP algorithm that it is very slow and the converge to the optimal solution is not guaranteed. This problem leads to search for improvements to speed up this algorithm. In this research we use several methods to speed up the EBP algorithm. A many layer neural network was designed for building pattern compression system, encoding and recognition. We also used many methods to speed up this algorithm (EBP) and comparison between them.
... Though this gradient-descent technique has been used successfully on a number of interesting problems its applicability to complex real-world problems has often been limited by its slow convergence to a solution (i.e., a set of weights that meet the desired error-criterion). Several modi cations for improving the learning speed have been proposed in the literature 3,4,8,13,16,21]. ...
... There have been many attempts to speed-up BP to make it more feasible for complex real-world applications 3,4,13,16,21]. This paper focuses on methods to speed up BP by e ectively dealing with the so-called at-spot regions of the sigmoid activation function (computed by individual network nodes) where its derivative (with respect to its net input) approaches zero. ...
... The speedup of each technique with respect to BP, is also shown. We calculate the speedup of a method m as : speedup(m) = (avg epochs using BP) -(avg epochs using m) avg epochs using m (16) We report the best results for each technique on each data-set. These results are reported in Table 2. ...
Article
1 Generalized delta rule, popularly known as Back-Propagation (BP) [15, 22] is probably one of the most widely used procedures for training multi-layer feedforward networks of sigmoid units. Though this gradient-descent technique has been used successfully on a number of interesting problems its applicability to complex real-world problems has often been limited by its slow convergence to a solution (i.e., a set of weights that meet the desired error-criterion). Several modifications for improving the learning speed have been proposed in the literature [3, 4, 8, 13, 16, 21]. The formulation of the BP learning rule together with a mathematical property of the sigmoid activation function sometimes prevents weight updates from taking place even in the presence of considerable error. This phenomenon is referred to in the literature as flat-spots. Modifications to BP that help speed up learning in the presence of flat spots are therefore of interest in practice. In this paper we investiga...
... For fair evaluation of the generalization capability of two variants of INCA, the benchmark regression problems from the literature are considered. We have chosen 16 different regression problems of varying complexities such as simple, radial function, harmonic function, complicated interaction function, time-series prediction and some real world problems [19,21,[34][35][36][37][38]. We have considered 5 real world problems (1)-(5), and 11 artificial function approximation problems (6)- (10). ...
... (8) SAF Problem: We consider the following threedimensional regression problem [34]. ...
Article
Full-text available
We propose incremental node creation algorithm (INCA). INCA emphasizes on architectural adaptation and functional adaptation in a unified frame work. INCA starts from a single hidden node and then trains node one by one incrementally. Two variants of INCA are developed, namely cascade and flat. In the cascade variant, every hidden node is added in a new hidden layer that is connected to the network inputs and all pre-existing hidden nodes. In contrast, the flat variant adds node one by one to a single hidden layer. Sixteen regression problems are used to investigate which network growing strategy provides the better generalization performance. Simulation results reveal that both architectures perform well on all the investigated regression problems of varying complexities. In general, cascade is better than flat architecture except some real world problems. The trigonometric sine activation function provides better approximation capability than log-sigmoid function except some regression problems.007/s40012-013-0026-0
... All variants of the generalized constructive algorithm were tested by extensive simulation with ten regression tasks with changed activation function. For fair evaluation of the generalization performance of all variants of the algorithm, benchmark regression functions from the literature are employed [20,23,26,[28][29][30][31]. The generated data of all regression tasks was normalized in the interval [-1, 1] and then partitioned into the training set (trS), validation set (vaS), and testing set (teS). ...
... Example 2: Consider now (3-D) analytical function called (SAF) [28] given by The 1600 uniformly distributed random points were generated in the (3-D) space, 0 ≤ X ≤ 1. The first 300 exemplars were used for trS, the following 300 exemplars were used for vaS and the final 1000 exemplars were used for teS. ...
Article
Full-text available
Activation function plays an important role in the convergence of the training algorithms. In this paper, we propose a new class of adaptive sigmoidal activation functions (ASAF) with two trainable parameters. ASAF possess the universal approximation property for the continuous functions. We propose a constructive training algorithm using ASAF. The proposed algorithm emphasizes architectural adaptation and functional adaptation during training. This algorithm is a constructive approach to building single hidden layer neural network incrementally. To achieve functional adaptation, trainable parameters of ASAF are trained along with other weights. Four variants of the proposed algorithm are developed using ASAF. All the variants are empirically evaluated on ten regression functions in terms of learning accuracy and generalization capability. Simulation results reveal that adaptive sigmoidal activation function presents several advantages over fixed log-sigmoid function, resulting in increased flexibility, smoother learning, and better generalization performance.
... For fair evaluation of the generalization capability of two variants of INCA, the benchmark regression problems from the literature are considered. We have chosen 16 different regression problems of varying complexities such as simple, radial function, harmonic function, complicated interaction function, time-series prediction and some real world problems [19,21,[34][35][36][37][38]. We have considered 5 real world problems (1)-(5), and 11 artificial function approximation problems (6)- (10). ...
... (8) SAF Problem: We consider the following threedimensional regression problem [34]. ...
Article
Full-text available
We propose Incremental Node Creation Algorithm (INCA). INCA emphasizes on architectural adaptation and functional adaptation in a unified frame work. INCA starts from a single hidden node and then trains node one by one incrementally. Two variants of INCA are developed, namely cascade and flat. In the cascade variant, every hidden node is added in a new hidden layer that is connected to the network inputs and all pre-existing hidden nodes. In contrast, the flat variant adds node one by one to a single hidden layer. Sixteen regression problems are used to investigate which network growing strategy provides the better generalization performance. Simulation results reveal that both architectures perform well on all the investigated regression problems of varying complexities. In general, cascade is better than flat architecture except some real world problems. The trigonometric sine activation function provides better approximation capability than log-sigmoid function except some regression problems.
... Gaussian noise ∼ N (0, 0.05) was added to the output y (t) and also for letting system reach to its steady state the first 1000 values are discarded. • Samad This dataset is generated from the function [44]: ...
Article
Full-text available
This article approaches scene classification problem by proposing an enhanced bag of features (BoF) model and a modified radial basis function neural network (RBFNN) classifier. The proposed BoF model integrates the image features extracted by histogram of oriented gradients, local binary pattern and wavelet coefficients. The extracted features are obtained in a hierarchical multi-resolution manner. The proposed approach is able to capture multi-level (the pixel-, patch-, and image-level) features. The histograms of features constructed by BoF model are then used for training a modified RBFNN classifier. As a modification, we propose using a new variant of particle swarm optimization, in which the parameters are updated adaptively, for determining the center of Gaussian functions in RBFNN. Experimental results demonstrate that our proposed approach significantly outperforms the state-of-the-art methods on scene classification of OT, FP, and LSP benchmark datasets.
... The Samad function is used to generate this synthetic dataset [42]: ...
Article
This paper presents a new evolutionary cooperative learning scheme, able to solve function approximation and classification problems with improved accuracy and generalization capabilities. The proposed method optimizes the construction of radial basis function (RBF) networks, based on a cooperative particle swarm optimization (CPSO) framework. It allows for using variable-width basis functions, which increase the flexibility of the produced models, while performing full network optimization by concurrently determining the rest of the RBF parameters, namely center locations, synaptic weights and network size. To avoid the excessive number of design variables, which hinders the optimization task, a compact representation scheme is introduced, using two distinct swarms. The first swarm applies the non-symmetric fuzzy means algorithm to calculate the network structure and RBF kernel center coordinates, while the second encodes the basis function widths by introducing a modified neighbor coverage heuristic. The two swarms work together in a cooperative way, by exchanging information towards discovering improved RBF network configurations, whereas a suitably tailored reset operation is incorporated to help avoid stagnation. The superiority of the proposed scheme is illustrated through implementation in a wide range of benchmark problems, and comparison with alternative approaches.
... The performance of fading channel modelling based on ELM is in comparison with backpropagation (BP) algorithm (Tariq 1991) (Levenberg-Marquardt algorithm; Bogdan et al. 2010) which is a popular algorithm of SLFNs. All of the simulations are carried out in MATLAB 7.12.0. ...
Article
Full-text available
Due to the complexity and extensive application of wireless systems, fading channel modeling is of great importance for designing a mobile network, especially for high speed environments. High mobility challenges the speed of channel estimation and model optimization. In this study, we propose a single-hidden layer feedforward neural network (SLFN) approach to modelling fading channels, including large-scale attenuation and small-scale variation. The arrangements of SLFN in path loss (PL) prediction and fading channel estimation are provided, and the information in both of them is trained with extreme learning machine (ELM) algorithm and a faster back-propagation (BP) algorithm called Levenberg-Marquardt algorithm. Computer simulations show that our proposed SLFN estimators could obtain PL prediction and the instantaneous channel transfer function of sufficient accuracy. Furthermore, compared with BP algorithm, the ability of ELM to provide millisecond-level learning makes it very suitable for fading channel modelling in high speed scenarios.
... • Samad: This dataset is generated from the function [53] ...
Article
This paper presents a novel algorithm for training radial basis function (RBF) networks, in order to produce models with increased accuracy and parsimony. The proposed methodology is based on a nonsymmetric variant of the fuzzy means (FM) algorithm, which has the ability to determine the number and locations of the hidden-node RBF centers, whereas the synaptic weights are calculated using linear regression. Taking advantage of the short computational times required by the FM algorithm, we wrap a particle swarm optimization (PSO) based engine around it, designed to optimize the fuzzy partition. The result is an integrated framework for fully determining all the parameters of an RBF network. The proposed approach is evaluated through its application on 12 real-world and synthetic benchmark datasets and is also compared with other neural network training techniques. The results show that the RBF network models produced by the PSO-based nonsymmetric FM algorithm outperform the models produced by the other techniques, exhibiting higher prediction accuracies in shorter computational times, accompanied by simpler network structures.
... Such implicit learning parameters are adapted according to the initial settings of the weights and steepness parameters, and to minimization of the total error E. How important choosing the initial setting is, has been demonstrated by the results of the four bits parity problem, when the SD2 distribution was used. Another advantage of the BP + framework is that other accelerating techniques, such as momentum, alternative cost functions, and expected source values (Samad, 1991), are allowed inside it without modifications. On the optimization side, the definition of the sensitivity of a unit given in Section 3.2 is particular useful because it can remove units independently of the training set. ...
Article
Methods to speed up learning in back propagation and to optimize the network architecture have been recently studied. This paper shows how adaptation of the steepness of the sigmoids during learning treats these two topics in a common framework. The adaptation of the steepness of the sigmoids is obtained by gradient descent. The resulting learning dynamics can be simulated by a standard network with fixed sigmoids and a learning rule whose main component is a gradient descent with adaptive learning parameters. A law linking variation on the weights to variation on the steepness of the sigmoids is discovered. Optimization of units is obtained by introducing a tendency to decay to zero in the steepness values. This decay corresponds to a decay of the sensitivity of the units. Units with low final sensitivity can be removed after a given transformation of the biases of the network. A decreasing initial distribution of the steepness values is suggested to obtain a good compromise between speed of learning and network optimization. Simulation of the proposed procedure has shown an improvement of the mean convergence rate with respect to the standard back propagation and good optimization performance. Several 4-3-1 networks for the four bits parity problem were discovered.
... 6. The last example we consider is a " Simple 3-dimensional Analytical Function " (SAF) [28] ...
Article
In this paper a new strategy for adaptively and autonomously constructing a multi-hidden-layer feedforward neural network (FNN) is introduced. The proposed scheme belongs to a class of structure level adaptation algorithms that adds both new hidden units and new hidden layers one at a time when it is determined to be needed. Using this strategy, a FNN may be constructed having as many hidden layers and hidden units as required by the complexity of the problem being considered. Simulation results applied to regression problems are included to demonstrate the performance capabilities of the proposed scheme.
... However, its slow convergence makes it inappropriate for learning real-world tasks. Several authors have explored modi cations of BP which result in faster learning 2,4,8,1,6]. Most of these modi cations are either extremely heuristic or else make use of sophisticated optimization techniques. ...
Article
Back-propagation (BP) [9, 5] is one of the most widely used procedures for training multi-layer artificial neural networks with sigmoid units. Though successful in a number of applications, its convergence to a set of desired weights can be excruciatingly slow. Several modifications have been proposed for improving the learning speed [2, 4, 8, 1, 6]. The phenomenon of flat-spots is known to play a significant role in the slow convergence of BP [2]. The formulation of the BP Learning rule prevents the network from learning effectively in the presence of flat-spots. In this paper we propose a new approach to minimize the error such that flat-spots occurring in the output layer are appropriately handled, thereby permitting the network to learn even in the presence of flat-spots. The improvement provided by the technique is demonstrated on a number of standard benchmark data-sets. More importantly, the speedup in learning is obtained with little or no increase in the computational require...
... Several authors have explored modi cations to BP in an attempt to reduce the number of training epochs needed to learn a task 2,4,8,1,6]. Such modi cations are of signi cant practical interest if BP is to be applied to real-world problems. ...
Article
Full-text available
Generalized delta rule, popularly known as back-propagation (BP) [9, 5] is probably one of the most widely used procedures for training multi-layer feed-forward networks of sigmoid units. Despite reports of success on a number of interesting problems, BP can be excruciatingly slow in converging on a set of weights that meet the desired error criterion. Several modifications for improving the learning speed have been proposed in the literature [2, 4, 8, 1, 6]. BP is known to suffer from the phenomenon of flat spots [2]. The slowness of BP is a direct consequence of these flat-spots together with the formulation of the BP Learning rule. This paper proposes a new approach to minimizing the error that is suggested by the mathematical properties of the conventional error function and that effectively handles flat-spots occurring in the output layer. The robustness of the proposed technique is demonstrated on a number of data-sets widely studied in the machine learning community. 1 Introduc...
... We need to further understand the ESV modi cation. It is not clear from the intuitive argument given in Samad, 1991] why we get the improvements that were observed. Further empirical and theoretical investigation is needed to determine whether the rule gives better performance simply due to roughly and dynamically adjusting the step size, or if it is approximating a more useful weight update value. ...
Article
Despite its notoriously slow learning time, backpropagation (BP) is one of the most widely used neural network training algorithms. Two major reasons for this slow convergence are the step size problem nd the flat spot problem [Fahlman, 1988]. In [Samad, 1991] a simple modification, the expected source values (ESV) rule, is proposed for speeding up the BP algorithm. We have extended the ESV rule by coupling it with a flat-spot removal strategy presented in [Fahlman, 1988], as well as incorporating a momentum term to combat the step size problem. The resulting rule has shown dramatically improved learning time over standard BP, measured in training epochs. Two versions of the ESV modification are mentioned in [Samad, 1991], on-demand and up-front. but simulation results are given mostly for the on-demand case. Our results indicate that the up-front version works somewhat better than the on-demand version in terms of learning speed. We have also analyzed the interactions between the thre...
Article
Three improvements to gradient-based training algorithms are proposed, accelerating convergence of the conventional methods by up to two orders of magnitude. Premature saturation of hidden nodes is circumvented; weights leading to linear output nodes are updated non-recuxsively; and training of feedback structures is facilitated by two preliminary feedforward-training phases prior to the final feedback training. Performance evaluation on two simulated processes demonstrates the effect of complex search spaces on the conventional, and the new, algorithms.
Chapter
A very simple and efficient method for artificial neural networks training is proposed. Extensive simulation has established that it works very well as an ANN training law. It is faster than BackPropagation, it is suitable for on-chip learning and it can be implemented in parallel computers very easily.
Chapter
Publisher Summary Backpropagation algorithm is the most popular and widely used of artificial neural networks. A backpropagation neural network (BNN) is constructed from simple processing units called â€øe neuronsâ€� or â€øe nodes,â€� which are arranged in a series of layers bounded by input and output layers encompassing a variable number of hidden layers. Each neuron is connected to other neurons in the network by connections of different strengths or weights. This chapter presents an overview of the current usage of the BNNin quantitative structure–activity relationship (QSAR) and quantitative structure-property relationship (QSPR) studies. Emphasis is placed on practical aspects related to the selection of the training and testing sets, the preprocessing of the data, the choice of an architecture with adequate parameters, and the comparison of models. Advantages and limitations of BNN are discussed, as well as the usefulness of hybrid systems which mix and match a BNN with other intelligent techniques for solving complex modeling problems.
Article
Training feedforward networks, layer–by–layer, with the Levenberg–Marquardt back–propagation algorithm is presented in this paper. The Levenberg–Marquardt backpropagation technique has been noted as an efficient method for training feedforward neural networks in terms of training accuracy, convergence properties and overall training time. We introduce a method to further improve the computation and memory complexity of this algorithm by modifying the weights layer–by–layer. Four examples from the literature and from an engineering application are provided to demonstrate the outperformance of the technique over the general Levenberg–Marquardt backpropagation, which is based on adjusting all the weights simultaneously. These examples show that further improvement, in both the training time and convergence property, can be obtained using the new approach.
Article
Interfacial friction and material flow stress can be evaluated through the use of calibration curves in ring compression testing. In this study the neural network approach has been extended to their evaluation adaptive to ring geometries of wider range. The ring geometries covered were in the range of 6:3:0.5 to 6:3:2 (OD:ID:T0), which are the most commonly used values. Data for training the networks were acquired in the same way as in the development of the calibration curves. A serial scheme for the evaluation was found to be effective when multilayered BP (backpropagation) networks were employed. Network construction, network training including the selection of learning parameters, and implementation of the trained network are also detailed in this paper. Predictions for different ring geometries and friction factors were conducted and satisfactory results were obtained with prediction error of about 5%, at maximum, for both friction and flow stress.
Article
Error back propagation (EBP) is a widely used training algorithm for feedforward neural networks (FFNNs), but low learning rate limits its applications in the networks with complex topology architecture and large patterns. In this work, two modifications on Levenberg-Marquardt algorithm for FFNNs were made. One modification was made on the objective function, while the other was the evaluation of the initial weights and biases. The modified algorithm gave a better convergence rate compared to the traditional EBP algorithm and it was less computationally intensive and required less memory. The performance of the algorithm was verified separately with polymer and protein systems. The results showed that the BP network based on modified Levenberg-Marquardt algorithm could be used to predict the binodal curve of H2O/DMAc (N,N-dimethylacetamide) /PSf (polysulfone) system and lysozyme solubility in aqueous salt solution.
Article
The objective of this paper is to present the development and numerical testing of a robust fault detection and identification (FDI) system using artificial neural networks (ANNs), for incipient (slowly developing) faults occurring in process systems. The challenge in using ANNs in FDI systems arises because of one's desire to detect faults of varying severity, faults from noisy sensors, and multiple simultaneous faults. To address these issues, it becomes essential to have a learning algorithm that ensures quick convergence to a high level of accuracy. A recently developed accelerated learning algorithm, namely a form of an adaptive back propagation (ABP) algorithm, is used for this purpose. The ABP algorithm is used for the development of an FDI system for a process composed of a direct current motor, a centrifugal pump, and the associated piping system. Simulation studies indicate that the FDI system has significantly high sensitivity to incipient fault severity, while exhibiting insensitivity to sensor noise. For multiple simultaneous faults, the FDI system detects the fault with the predominant signature. The major limitation of the developed FDI system is encountered when it is subjected to simultaneous faults with similar signatures. During such faults, the inherent limitation of pattern-recognition-based FDI methods becomes apparent. Thus, alternate, more sophisticated FDI methods become necessary to address such problems. Even though the effectiveness of pattern-recognition-based FDI methods using ANNs has been demonstrated, further testing using real-world data is necessary.
Article
In cascade-correlation (CC) and constructive one-hidden- layer networks, structural level adaptation is achieved by incorporating new hidden units with identical activation functions one at a time into the active evolutionary net. Functional level adaptation has not received considerable attention, since selecting the activation functions will increase the search space considerably, and a systematic and a rigorous algorithm for accomplishing the search will be required as well. In this paper, we present a new strategy that is applicable to both the fixed structure as well as the constructive network trainings by using different activation functions having hierarchical degrees of nonlinearities, as the constructive learning of a one- hidden-layer feed-forward neural network (FNN) is progressing. Specifically, the orthonormal Hermite polynomials are used as the activation functions of the hidden units, which have certain interesting properties that are beneficial in network training. Simulation results for several noisy regression problems have revealed that our scheme can produce FNNs that generalize much better than one-hidden-layer constructive FNNs with identical sigmoidal activation functions, in particular as applied to rather complicated problems.© (2000) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.
Article
This paper presents two new modifications to the input-side training in constructive one-hidden-layer feedforward neural networks (FNNs). One is based on scaling of the network output error to which output of a hidden unit is expected to maximally correlate. Results from extensive simulations of many regression problems are then summarized to demonstrate that constructive FNNs generalization capabilities may be significantly improved by the new technique. The second contribution is a proposal for a new criterion for input-side weight pruning. This pruning technique removes redundant input-side weights simultaneously with the network constructive scheme, leading to a smaller network with comparable generalization capabilities. Simulation results are provide d to illustrate the effectiveness of the proposed pruning technique
Conference Paper
Presents a neural network (NN) based methodology for modeling and prediction of hydraulic servo actuators performance, via experimental data. The predictability of a trained NN to modeling a hydraulic actuator is first compared to a linear model. The result demonstrates the excellent ability of the NN in terms of multi-step prediction. Next, a state-space model using neural networks (NNs) to approximate the multivariable states (i.e., displacement, velocity and line pressures) of the hydraulic actuator is developed. The training algorithm and the criterion for the measurement of output errors are discussed. Test results show that NNs are capable of modelling and predicting the highly nonlinear hydraulic actuator even in a noisy environment
Article
This paper presents the results of a study in the design of a neural network based adaptive robotic control scheme. The neural network used here is a two hidden layer feedforward network and the learning scheme is the well-known backpropagation algorithm. The neural network essentially provides the inverse of the plant and acts in conjunction with a standard PD controller in the feedback loop. The objective of the controller is to accurately control the end position of a single link manipulator in the presence of large payload variations, variations in the link length and also variations in the damping constant. Based on results of this study, guidelines are presented in selecting the number of neurons in the hidden layers and also the parameters for the learning scheme used for training the network. Results also indicate that increasing the number of neurons in the hidden layer will improve the convergence speed of learning scheme up to a certain limit beyond which the addition of neurons will cause oscillations and instability. Guidelines for selecting the proper learning rate, momentum and fast backpropagation constant that ensure stability and convergence are presented. Also, a relationship between the r.m.s. error and the number of iterations used in training the neural network is established.
Article
The intended aim of the study is to develope an approach to the identification of the loads acting on aircraft wings, which uses an artificial neural network to model the load-strain relationship in structural analysis.As the first step of the study, this paper describes the application of an artificial neural network to identify the loads distributed across a cantilevered beam. The distributed loads are approximated by a set of concentrated loads. The paper demonstrates that using an artificial neural network to identify loads is feasible and a well trained artificial neural network reveals an extremely fast convergence and a high degree of accuracy in the process of load identification for a cantilevered beam model.
Article
Full-text available
Two neural network based approaches, a multilayered feed forward neural network trained with supervised Error Back Propagation technique and an unsupervised Adaptive Resonance Theory-2 (ART2) based neural network were used for automatic detection/diagnosis of localized defects in ball bearings. Vibration acceleration signals were collected from a normal bearing and two different defective bearings under various load and speed conditions. The signals were processed to obtain various statistical parameters, which are good indicators of bearing condition, and these inputs were used to train the neural network and the output represented the ball bearing states. The trained neural networks were used for the recognition of ball bearing states. The results showed that the trained neural networks were able to distinguish a normal bearing from defective bearings with 100% reliability. Moreover, the networks were able to classify the ball bearings into different states with success rates better than those achieved with the best among the state-of-the-art techniques.
Article
Error back propagation (EBP) is now the most used training algorithm for feedforward artificial neural networks (FFANNs). However, it is generally believed that it is very slow if it does converge, especially if the network size is not too large compared to the problem at hand. The main problem with the EBP algorithm is that it has a constant learning rate coefficient, and different regions of the error surface may have different characteristic gradients that may require a dynamic change of learning rate coefllcient based on the nature of the surface. Also, the characteristic of the error surface maybe unique in every dimension, which may require one learning rate coefficient for each weight. To overcome these problems several modifications have been suggested. This survey is an attempt to present them together and to compare them. The first modification was momentum strategy where a fraction of the last weight correction is added to the currently suggested weight correction. It has both an accelerating and a decelerating effect where they are necessary. However, this method can give only a relatively small dynamic range for the learning rate coefficient. To increase the dynamic range of the learning rate coefficient, such methods as the "bold driver" and SAB (self-adaptive back propagation) were proposed. A modification to the SAB that eliminates the requirement of selection of a "good" learning rate coefficient by the user gave the SuperSAB. A slight modification to the momentum strategy produced a new method that controls the oscillation of weights to speed up learning. Modification to the EBP algorithm in which the gradients are resealed at every layer helped to improve the performance. Use of "expected output" of a neuron instead of actual output for correcting weights improved performance of the momentum strategy. The conjugate gradient method and "self-determination of adaptive learning rate" require no learning rate coefficient from the user. Use of energy functions other than the sum of the squared error has shown improved convergence rate. An effective learning rate coefficient selection needs to consider the size of the training set. All these methods to improve the performance of the EBP algorithm are presented here.
Article
The back-propagation learning rule is modified by using the classical gradient descent algorithm (which uses only a proportional term) with integral and derivative terms of the gradient. The effect of these terms on the convergence behaviour of the objective function is studied and compared with MOM (momentum equation). It is observed that, with an appropriate tuning of the proportional-integral-derivative (PID) parameters, the rate of convergence is greatly improved and the local minima can be overcome. The integral action also helps in locating a minimum quickly. A guideline is presented to appropriately tune the PID parameters and an “integral suppression scheme” is proposed that effectively uses the PID principles, resulting in faster convergence at a desired minimum.
Article
Full-text available
This thesis studies the theory of stochastic adaptive computation based on neural networks. A mathematical theory of computation is developed in the framework of information geometry, which generalises Turing machine (TM) computation in three aspects - It can be continuous, stochastic and adaptive - and retains the TM computation as a subclass called "data processing". The concepts of Boltzmann distribution, Gibbs sampler and simulated annealing are formally defined and their interrelationships are studied. The concept of "trainable information processor" (TIP) - parameterised stochastic mapping with a rule to change the parameters - is introduced as an abstraction of neural network models. A mathematical theory of the class of homogeneous semilinear neural networks is developed, which includes most of the commonly studied NN models such as back propagation NN, Boltzmann machine and Hopfield net, and a general scheme is developed to classify the structures, dynamics and learning rules. All the previously known general learning rules are based on gradient following (GF), which are susceptible to local optima in weight space. Contrary to the widely held belief that this is rarely a problem in practice, numerical experiments show that for most non-trivial learning tasks GF learning never converges to a global optimum. To overcome the local optima, simulated annealing is introduced into the learning rule, so that the network retains adequate amount of "global search" in the learning process. Extensive numerical experiments confirm that the network always converges to a global optimum in the weight space. The resulting learning rule is also easier to be implemented and more biologically plausible than back propagation and Boltzmann machine learning rules: Only a scalar needs to be back-propagated for the whole network. Various connectionist models have been proposed in the literature for solving various instances of problems, without a general method by which their merits can be combined. Instead of proposing yet another model, we try to build a modular structure in which each module is basically a TIP. As an extension of simulated annealing to temporal problems, we generalise the theory of dynamic programming and Markov decision process to allow adaptive learning, resulting in a computational system called a "basic adaptive computer", which has the advantage over earlier reinforcement learning systems, such as Sutton's "Dyna", in that it can adapt in a combinatorial environment and still converge to a global optimum. The theories are developed with a universal normalisation scheme for all the learning parameters so that the learning system can be built without prior knowledge of the problems it is to solve.
Article
Neural networks are a group of computer-based pattern recognition methods that have recently been applied to clinical diagnosis and classification. In this study, we applied one type of neural network, the backpropagation network, to the diagnostic classification of giant cell arteritis (GCA). The analysis was performed on the 807 cases in the vasculitis database of the American College of Rheumatology. Classification was based on the 8 clinical criteria previously used for classification of this data set: 1) age > or = 50 years, 2) new localized headache, 3) temporal artery tenderness or decrease in temporal artery pulse, 4) polymyalgia rheumatica, 5) abnormal result on artery biopsy, 6) erythrocyte sedimentation rate > or = 50 mm/hour, 7) scalp tenderness or nodules, and 8) claudication of the jaw, of the tongue, or on swallowing. To avoid overtraining, network training was terminated when the generalization error reached a minimum. True cross-validation classification rates were obtained. Neural networks correctly classified 94.4% of the GCA cases (n = 214) and 91.9% of the other vasculitis cases (n = 593). In comparison, classification trees correctly classified 91.6% of the GCA cases and 93.4% of the other vasculitis cases. Neural nets and classification trees were compared by receiver operating characteristic (ROC) analysis. The ROC curves for the two methods crossed, indicating that the better classification method depended on the choice of decision threshold. At a decision threshold that gave equal costs to percentage increases in false-positive and false-negative results, the methods were not significantly different in their performance (P = 0.45). Neural networks are a potentially useful method for developing diagnostic classification rules from clinical data.
Article
A fast training algorithm is developed for two-layer feedforward neural networks based on a probabilistic model for hidden representations and the EM algorithm. The algorithm decomposes training the original two-layer networks into training a set of single neurons. The individual neurons are then trained via a linear weighted regression algorithm. Significant improvement on training speed has been made using this algorithm for several bench-mark problems. Copyright 1997 Elsevier Science Ltd. All Rights Reserved.
Article
Training of artificial neural networks is normally a time consuming task due to iterative search imposed by the implicit nonlinearity of the network behaviour. In this work, three improvements to "batch-mode" offline training methods, gradient-based or gradient-free, are proposed. For nonlinear multilayer perceptrons (NMLP) with linear output layers, a method based on linear regression in the output layer is presented. For arbitrary NMLPs, an algorithm is developed for detecting "saturated" hidden nodes and re-activating them while transferring their contribution onto the bias node in the same layer. For state-feedback NMLPs with incomplete learning data in the state variables, a method is shown that interpolates the unknown state values to form an intermediate training set used for finding good initial weights for the final training with only the original training set. In addition, three conventional gradient-based training methods-steepest-descent gradient search, conjugate gradient, and Gauss-Newton-are compared mutually and with the above improvements on the same two example problems. Where conventional methods get stuck in bad local minima, saturation avoidance leads to satisfactory results, and the speed-up achieved by the two other improvements is about two orders of magnitude.
Article
Regression problem is an important application area for neural networks (NNs). Among a large number of existing NN architectures, the feedforward NN (FNN) paradigm is one of the most widely used structures. Although one-hidden-layer feedforward neural networks (OHL-FNNs) have simple structures, they possess interesting representational and learning capabilities. In this paper, we are interested particularly in incremental constructive training of OHL-FNNs. In the proposed incremental constructive training schemes for an OHL-FNN, input-side training and output-side training may be separated in order to reduce the training time. A new technique is proposed to scale the error signal during the constructive learning process to improve the input-side training efficiency and to obtain better generalization performance. Two pruning methods for removing the input-side redundant connections have also been applied. Numerical simulations demonstrate the potential and advantages of the proposed strategies when compared to other existing techniques in the literature.
Conference Paper
Investigates the experimental modeling of the dynamic behavior of a force-acting industrial hydraulic actuator using a neural network (NN). Due to variable environmental stiffness as well as the characteristics of hydraulic components, the dynamics of the system is time-varying and highly nonlinear. It is therefore desirable to develop a nonlinear modeling scheme, based on NNs, to estimate and predict the output of the system online. In this paper, the predictability of an online-trained NN modeling a hydraulic force-acting system is first compared to a linear model. The result demonstrates that the NN outperforms its linear counterpart in terms of multi-step prediction. Then, a more detailed discussion of the online training of the NN is provided. The related aspects include the choice of the window length, the NN's structure and the criterion for terminating the training. The work studied in this paper should help in the design of appropriate force-control law and/or fault diagnosis algorithms
Conference Paper
A sigmoid function has been utilized for the input/output functions of the backpropagation type neural networks. It, however, has a local minimum problem; if the output of the sigmoid function becomes 0 or 1, no further learning occurs even if there are errors between teaching inputs and outputs of the output unit. The offset method of applying some offset values to the intermediate layer cells is thought to be effective in solving the local minimum problem. In this paper, we propose two formulations of offset function: the linearly decremental offset function that decrements offset value as iteration of the learning process increases, and the logarithmic error offset function that varies offset values according as logarithm of output errors. The performance of these methods are evaluated by recognition test of handwriting.
Conference Paper
In this paper we present the results of a learning algorithm based on a recently developed algorithm (the so called feedforward algorithm) in conjuction with structural adaptation. Simulation results of training on random associations and on the two-spirals problem are presented for the proposed algorithm and the backpropagation algorithm.
Conference Paper
This paper presents a design methodology for an on-line self-tuning adaptive control (OLSTAC) of a single flexible link manipulator (FLM) using backpropagation neural networks (BPNN). The particular problem discussed is the on-line system identification of a FLM using BPNN and the OLSTAC of a FLM using a separate neural network as a controller. A finite-element model of a FLM is obtained using ANSYS. The pseudo-link concepts developed in [2] are used to determine on-line angular displacement of the end effector of the FLM. The illustrative simulation results are promising and show that the OLSTAC technique can be applied to flexible structures such as a FEM resulting reduced error and increased robustness
Article
This paper investigates the properties of the so-called feedforward method, which is a very simple training law suitable for on-chip learning. Its merit is conceptual and implementational simplicity. Its signals do not propagate in both directions and it works for various types of activation function, a feature that makes it particularly effective in the case of unmodeled activation functions. Extensive simulation has shown that this method is usually faster than backpropagation
Article
The study deals with the application of nonparametric pixel-by-pixel classification methods in the classification of pixels, based on their multispectral data. A neural network, the binary diamond, is introduced, and its performance is compared with a nearest neighbor algorithm and a back-propagation network. The binary diamond is a multilayer, feedforward neural network, which learns from examples in unsupervised one-shot mode. It recruits its neurons according to the actual training set, as it learns. The comparisons of the algorithms were done using a realistic database, consisting of approximately 90000 Landsat 4 Thematic Mapper pixels. The binary diamond and the nearest neighbor performances were close, with some advantages to the binary diamond. The performance of the back-propagation network lagged behind. An efficient nearest neighbor algorithm, the binned nearest neighbor, is described. Ways for improving the performances, such as merging categories and analyzing nonboundary pixels, are addressed and evaluated
Article
Full-text available
Thesis (Ph. D.)--Harvard University, 1975. Includes bibliographical references.
Article
While there exist many techniques for finding the parameters that minimize an error function, only those methods that solely perform local computations are used in connectionist networks. The most popular learning algorithm for connectionist networks is the back-propagation procedure, which can be used to update the weights by the method of steepest descent. In this paper, we examine steepest descent and analyze why it can be slow to converge. We then propose four heuristics for achieving faster rates of convergence while adhering to the locality constraint. These heuristics suggest that every weight of a network should be given its own learning rate and that these rates should be allowed to vary over time. Additionally, the heuristics suggest how the learning rates should be adjusted. Two implementations of these heuristics, namely momentum and an algorithm called the delta-bar-delta rule, are studied and simulation results are presented.
Associative memory storage using a variant of the generalized delta rule An improved three-. layer, backpropagation algorithm Beyond Regression: New Tools tor Prediction and Analysis in the Behavioral Sciences
  • T Samad
  • P Harper
Samad T., & Harper, P. (1987). Associative memory storage using a variant of the generalized delta rule. Proceedings oJ the First IEEE International Conference on Neural Networks~ 111 173-184 Stornetta. W. S., & Huberman. B. A ( 19ts7,. An improved three-. layer, backpropagation algorithm. In M. Caudill & C. Butler (Eds. L Proceedings of the First IEEE International Conference on Neural Networks, 11,637-644 Werbos P. (1974). Beyond Regression: New Tools tor Prediction and Analysis in the Behavioral Sciences. Ph.D. Thesis. Harvard University Committee on Applied Mathematics.
Associative memory storage using a variant of the generalized delta rule
  • Samad
Refining and redefining the backpropagation learning rule for connectionist networks
  • Samad