-
[show abstract]
[hide abstract]
ABSTRACT: Injecting weight noise during training has been a simple strategy to improve the fault tolerance of multilayer perceptrons (MLPs) for almost two decades, and several online training algorithms have been proposed in this regard. However, there are some misconceptions about the objective functions being minimized by these algorithms. Some existing results misinterpret that the prediction error of a trained MLP affected by weight noise is equivalent to the objective function of a weight noise injection algorithm. In this brief, we would like to clarify these misconceptions. Two weight noise injection scenarios will be considered: one is based on additive weight noise injection and the other is based on multiplicative weight noise injection. To avoid the misconceptions, we use their mean updating equations to analyze the objective functions. For injecting additive weight noise during training, we show that the true objective function is identical to the prediction error of a faulty MLP whose weights are affected by additive weight noise. It consists of the conventional mean square error and a smoothing regularizer. For injecting multiplicative weight noise during training, we show that the objective function is different from the prediction error of a faulty MLP whose weights are affected by multiplicative weight noise. With our results, some existing misconceptions regarding MLP training with weight noise injection can now be resolved.
IEEE Transactions on Neural Networks 03/2011; · 2.95 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Improving fault tolerance of a neural network is an important issue that has been studied for more than two decades. Various algorithms have been proposed in sequel and many of them have succeeded in attaining a fault tolerant neural network. Amongst all, on-line node fault injection-based algorithms are one type of these algorithms. Despite its simple implementation, theoretical analyses on these algorithms are far from complete. In this paper, an on-line node fault injection training algorithm is studied. By node fault injection training, we assume that the hidden nodes are random neuron in which the output of these hidden nodes can be zeros in a random manner. So, in each step of update, we randomly set the hidden outputs to be zeros. The network output and the gradient vector are calculated with these zero-output hidden nodes, and thus apply the standard online weight algorithm to update the weight vector. The corresponding objective function is derived and the convergence of the algorithm is proved. By a theorem from H. White, we show that the weight vector obtained by this algorithm can converge with probability one. The weight vector converges to a local minimum of the objective function derived.
Technologies and Applications of Artificial Intelligence (TAAI), 2010 International Conference on; 12/2010
-
[show abstract]
[hide abstract]
ABSTRACT: In the last two decades, many online fault/noise injection algorithms have been developed to attain a fault tolerant neural network. However, not much theoretical works related to their convergence and objective functions have been reported. This paper studies six common fault/noise-injection-based online learning algorithms for radial basis function (RBF) networks, namely 1) injecting additive input noise, 2) injecting additive/multiplicative weight noise, 3) injecting multiplicative node noise, 4) injecting multiweight fault (random disconnection of weights), 5) injecting multinode fault during training, and 6) weight decay with injecting multinode fault. Based on the Gladyshev theorem, we show that the convergence of these six online algorithms is almost sure. Moreover, their true objective functions being minimized are derived. For injecting additive input noise during training, the objective function is identical to that of the Tikhonov regularizer approach. For injecting additive/multiplicative weight noise during training, the objective function is the simple mean square training error. Thus, injecting additive/multiplicative weight noise during training cannot improve the fault tolerance of an RBF network. Similar to injective additive input noise, the objective functions of other fault/noise-injection-based online algorithms contain a mean square error term and a specialized regularization term.
IEEE Transactions on Neural Networks 07/2010; · 2.95 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents a model to generate and study Gnutella topology from an original point of view. Instead of using characteristics of the final topology, the network is constructively created from scratch and its connectedness is studied by simulation. As the resultant topology has the same node degree distribution as what has been measured from the true Gnutella and a virus outbreak simulation has shown that the network is not connected, it is argued that the true Gnutella might not be a connected network. To improve the connected ability of the model, a modification on the connection mechanism is proposed and the topological change of the network is studied by simulation. Although the node degree distribution of the resultant topology is deviated from the measurement results, this new connection mechanism can indeed improve the connectedness of the network that is confirmed by the virus outbreak simulation.
International Journal of Computers and Applications 01/2008; 30.
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, an objective function for training a fault tolerant neural network is derived based on the idea of Kullback-Leibler (KL) divergence. The new objective function is then applied to a radial basis function (RBF) network that is with multiplicative weight noise. Simulation results have demonstrated that the RBF network trained in accordance with the new objective function is of better fault tolerance ability, in compared with the one trained by explicit regularization. As KL divergence has relation to Bayesian learning, a discussion on the proposed objective function and the other Bayesian type objective functions is discussed.
TENCON 2007 - 2007 IEEE Region 10 Conference; 12/2007
-
[show abstract]
[hide abstract]
ABSTRACT: Error sensitivity measure is normally a commonly used factor for searching the optimal structure of a neural network. Starting with the derivation of a recursive equation for the update of a reduced order parametric vector based on the full order parametric vector, the error sensitivity measure for use in linear regressor and RBF network pruning is re-derived and an approximated error sensitivity measure identical to that of proposed in optimal brain damage has been obtained. Considering the training is accomplished by recursive least square method, an on-line training-pruning algorithm is proposed.
Machine Learning and Cybernetics, 2003 International Conference on; 12/2003
-
[show abstract]
[hide abstract]
ABSTRACT: Bluff is a liar dice game that is quite popular in China and Hong Kong. As the rule of the game is simple and the game can be played with two to six people, it has become one of the sub-culture in local pub and club. This paper introduces and formulate this dice game (in 2-person game form) as a kind of decision making problem. A risk-averse heuristic algorithm for playing the game is proposed. Preliminary testing results has indicated that this computer game program has already shown its minimal intelligence.
Machine Learning and Cybernetics, 2003 International Conference on; 12/2003
-
J. Sum
[show abstract]
[hide abstract]
ABSTRACT: This paper presents two SOM-like algorithms that are extended from two alternative soft competition algorithms namely maximum likelihood competitive learning (MLCL) and fuzzy competitive learning (FCL). Simulation results on the topographic map formation are presented and a possible application of such algorithms for data transmission is elucidated. It is observed that under certain circumstances, the performance of these SSOM algorithms in a vowel data transmission problem can be comparable to and sometimes even better than that of using SOM.
Systems, Man and Cybernetics, 2003. IEEE International Conference on; 11/2003
-
[show abstract]
[hide abstract]
ABSTRACT: Ant routing is a method for network routing in agent technology. Although its effectiveness and efficiency have been demonstrated and reported in the literature, its properties have not yet been well studied. This paper presents some preliminary analysis on an ant algorithm in regard to its population growing property and jumping behavior. Results conclude that as long as the value max, {iΩ<sub>j</sub>|} is known, the practitioner is able to design the algorithm parameters, such as the number of agents being created for each request, k, and the maximum allowable number of jumps of an agent, in order to meet the network constraint.
IEEE Transactions on Parallel and Distributed Systems 04/2003; 14(3):193- 202. · 1.40 Impact Factor
-
J. Sum
[show abstract]
[hide abstract]
ABSTRACT: Owing to the computational complexity requirement, pruning a fully connected recurrent neural network (RNN) would be ineffective for large size RNN. In this paper several non-heuristic pruning algorithms for fully connected RNN are investigated, some of them are extended from extended Kalman filter based approaches and some of them are based on weight magnitude, together with some techniques on the pruning procedures. Their effectiveness, such as on their computational complexities, network sizes and generalization abilities, is evaluated and presented. This paper presents the issue on computational complexity.
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on; 12/2002
-
[show abstract]
[hide abstract]
ABSTRACT: The training of neural networks using the extended Kalman filter (EKF) algorithm is plagued by the drawback of high computational complexity and storage requirement that may become prohibitive even for networks of moderate size. In this paper, we present a local EKF training and pruning approach that can solve this problem. In particular, the by-products obtained along with the local EKF training can be utilized to measure the importance of the network weights. Comparing with the original global approach, the proposed local EKF training and pruning approach results in a much lower computational complexity and storage requirement. Hence, it is more practical in solving real world problems. The performance of the proposed algorithm is demonstrated on one medium- and one large-scale problems, namely, sunspot data prediction and handwritten digit recognition.
International Journal of Neural Systems 01/2001; 10(6):425-38. · 4.28 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The regularization of employing the forgetting recursive least
square (FRLS) training technique on feedforward neural networks is
studied. We derive our result from the corresponding equations for the
expected prediction error and the expected training error. By comparing
these error equations with other equations obtained previously from the
weight decay method, we have found that the FRLS technique has an effect
which is identical to that of using the simple weight decay method. This
new finding suggests that the FRLS technique is another online approach
for the realization of the weight decay effect. Besides, we have shown
that, under certain conditions, both the model complexity and the
expected prediction error of the model being trained by the FRLS
technique are better than the one trained by the standard RLS method
IEEE Transactions on Neural Networks 12/1999; · 2.95 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Pruning a neural network to a reasonable smaller size, and if possible to give a better generalization, has long been investigated. Conventionally the common technique of pruning is based on considering error sensitivity measure, and the nature of the problem being solved is usually stationary. In this article, we present an adaptive pruning algorithm for use in a nonstationary environment. The idea relies on the use of the extended Kalman filter (EKF) training method. Since EKF is a recursive Bayesian algorithm, we define a weight-importance measure in term of the sensitivity of a posteriori probability. Making use of this new measure and the adaptive nature of EKF, we devise an adaptive pruning algorithm called adaptive Bayesian pruning. Simulation results indicate that in a noisy nonstationary environment, the proposed pruning algorithm is able to remove network redundancy adaptively and yet preserve the same generalization ability.
Neural Computation 06/1999; 11(4):965-76. · 1.88 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In the use of extended Kalman filter approach in training and pruning a feedforward neural network, one usually encounters the problems on how to set the initial condition and how to use the result obtained to prune a neural network. In this paper, some cues on the setting of the initial condition will be presented with a simple example illustrated. Then based on three assumptions--1) the size of training set is large enough; 2) the training is able to converge; and 3) the trained network model is close to the actual one, an elegant equation linking the error sensitivity measure (the saliency) and the result obtained via extended Kalman filter is devised. The validity of the devised equation is then testified by a simulated example.
IEEE Transactions on Neural Networks 02/1999; 10(1):161-6. · 2.95 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In the use of the extended Kalman filter approach in training and
pruning a feedforward neural network, one usually encounters the
problems of how to set the initial condition and how to use the result
obtained to prune a neural network. In this paper, some cues on the
setting of the initial condition are presented with a simple example
illustrated. Then based on three assumptions: 1) the size of training
set is large enough; 2) the training is able to converge; and 3) the
trained network model is close to the actual one, an elegant equation
linking the error sensitivity measure (the saliency) and the result
obtained via an extended Kalman filter is devised. The validity of the
devised equation is then testified by a simulated example
IEEE Transactions on Neural Networks 02/1999; · 2.95 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents an algorithm to form a topographic map
resembling to the self-organizing map. The idea stems on defining an
energy function which reveals the local correlation between neighboring
neurons. The larger the value of the energy function, the higher the
correlation of the neighborhood neurons. On this account, the proposed
algorithm is defined as the gradient ascent of this energy function.
Simulations on two-dimensional maps are illustrated
IEEE Transactions on Neural Networks 10/1997; · 2.95 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, we investigate the attraction basin of the bidirectional associative memory (BAM) model. The BAM is a two-layer heteroassociator that stores a prescribed set of bipolar library pairs. It consists of two layers of neurons. One layer has n neurons and the other has p neurons. We will first point out why the conventional energy approach cannot tell us about the attraction basin of each library pair. We then rigorously derive the statistical dynamics of the BAM, which shows how the upper bound on the number of errors changes during recalling for an arbitrary error pattern in the initial state. From the dynamics, we can estimate the attraction basin for the worst case errors, as well as the memory capacity and the number of errors in the retrieved pairs. The memory capacity is alpha rn, where alpha r (0 < alpha r < 1) depends on the ratio [formula: see text]. The number of errors in the retrieved pairs is [formula: see text] when the number of library pairs is alpha n. When r = 1, the lower bound on the attraction basin for the worst case errors is about 0.0068n.
International Journal of Neural Systems 01/1997; 7(6):715-25. · 4.28 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents an alternative membership function for fuzzy
c-mean. According to this membership function and Bezdek's definition,
we derive two sequential algorithms for fuzzy c-mean. Both of them are
stochastic gradient descent algorithms which minimize Bezdek's objective
functional. Analytical result indicates that both algorithms are
actually compatible with each other. The convergence properties of both
algorithms are studied. As the update equations are so simple, these
sequential algorithms are embedded into neural network to form a class
of fuzzy neural network analogue to unsupervised type neural network
such that competitive learning is a special case
Fuzzy Systems, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the Third IEEE Conference on; 07/1994
-
[show abstract]
[hide abstract]
ABSTRACT: Analyzes the convergence property of the one-dimensional
self-organizing map (SOM). The key of the proof is the application of
Ljung's theorem [1977]. With the aid of the theorem, the authors can
conclude that convergence of the one dimensional self-organizing map is
almost certain if the following conditions are fulfilled, (i) the map is
initial in order, (ii) the neighborhood interacting function (NIF) is
non-increasing outward throughout the neighborhood interacting set (NIS)
and (iii) the input distribution is stationary. Note that these
conditions are less restrictive than those obtained previously in two
folds: (i) there is no limit on the size of the NIS and (ii) the input
distribution is not required to be uniform
Speech, Image Processing and Neural Networks, 1994. Proceedings, ISSIPNN '94., 1994 International Symposium on; 05/1994
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents the learning mechanism of a model of fuzzy
neural network, fuzzy self-organizing map (FSOM). The analysis on the
convergence of learning mechanism will be elucidated. When the dimension
of input data is one, we can prove that the convergence of the learning
mechanism is almost sure. While the input data dimension is higher than
one, the mechanism fulfils only the necessary condition for convergence.
Simulation result will be given to illustrate the model
Neural Networks, 1994. IEEE World Congress on Computational Intelligence., 1994 IEEE International Conference on;