-
[show abstract]
[hide abstract]
ABSTRACT: this paper develops a new model, a neural network pushdown automaton (NNPDA), which is a hybrid system that couples a recurrent network to an external stack memory. More importantly, a NNPDA should be capable of learning and recognizing some class of context-free grammars. As such, this model is a significant extension of previous work where neural network finite state automata simulated and learned regular grammars. We explore the capabilities of such a model by inferring automata from sample strings - the problem of grammatical inference. It is important to note that our focus is only on that of inference, not of prediction or translation. We will be concerned with problem of inferring an unknown system model based on observing sample strings and not on predicting the next string element in a sequence. In some ways, our problem can be thought of as one of system identification [Ljung87].
11/1998;
-
[show abstract]
[hide abstract]
ABSTRACT: It is often difficult to predict the optimal neural network size for a particular application. Constructive or destructive methods that add or subtract neurons, layers, connections, etc. might offer a solution to this problem. We prove that one method, Recurrent Cascade Correlation, due to its topology, has fundamental limitations in representation and thus in its learning capabilities. It cannot represent with monotone (i.e. sigmoid) and hard-threshold activation functions certain finite state automata. We give a "preliminary" approach on how to get around these limitations by devising a simple constructive training method that adds neurons during training while still preserving the powerful fully-recurrent structure. We illustrate this approach by simulations which learn many examples of regular grammars that the Recurrent Cascade Correlation method is unable to learn. 1 Introduction Choosing the architecture of a neural network for a particular problem usually requires some prior k...
04/1996;
-
[show abstract]
[hide abstract]
ABSTRACT: The authors propose a model of a time warping recurrent neural
network (TWRNN) to handle temporal pattern classification where severely
time warped and deformed data may occur. This model is shown to have
built-in time warping ability. The authors analyze the properties of
TWRNN and show that for trajectory classification it has several
advantages over such schemes as dynamic programming, hidden Markov
models, time-delayed neural networks, and neural network finite
automata. A numerical example of trajectory classification is presented.
This problem, making a feature of variable sampling rates, having
internal states, continuous dynamics, heavily time-warped data, and
deformed phase space trajectories, is shown to be difficult for the
other schemes. The TWRNN has learned it easily. The authors also trained
it with TDNN and failed
Neural Networks, 1992. IJCNN., International Joint Conference on; 07/1992
-
[show abstract]
[hide abstract]
ABSTRACT: A dynamic time warping based speech recognition system with neural
network trained templates is proposed. The algorithm for training the
templates is derived based on minimizing classification error of the
speech classifier. A speaker-independent isolated digit recognition
experiment is conducted and achieves a 0.89% average recognition error
rate with only one template for each digit, indicating that the derived
templates are able to capture the speaker-invariant features of speech
signals. Both nondiscriminative and discriminative versions of the
neural net template training algorithm are considered. The former is
based on maximum likelihood estimation. The latter is based on
minimizing classification error. It is demonstrated through experiments
that the discriminative training algorithm is far superior to the
nondiscriminative one, providing both smaller recognition error rate and
greater discrimination power. Experiments using different feature
representation schemes are considered. It is demonstrated that the
combination of the feature vector and the delta feature vector yields
the best recognition result
Neural Networks, 1992. IJCNN., International Joint Conference on; 07/1992
-
[show abstract]
[hide abstract]
ABSTRACT: A discriminative training algorithm for predictive neural network
models is proposed. The algorithm is applied to a speaker independent
isolated digit recognition experiment. The recognition error rate is
reduced from 2.52% when the classifier is trained with a
non-discriminative algorithm to 0.58% when the discriminative algorithm
is applied. The increase in classifier discrimination ability is also
demonstrated
Neural Networks, 1992. IJCNN., International Joint Conference on; 07/1992
-
[show abstract]
[hide abstract]
ABSTRACT: It is shown that a recurrent, second-order neural network using a
real-time, feedforward training algorithm readily learns to infer
regular grammars from positive and negative string training samples.
Numerous simulations which show the effect of initial conditions,
training set size and order, and neuron architecture are presented. All
simulations were performed with random initial weight strengths and
usually converge after approximately a hundred epochs of training. The
authors discuss a quantization algorithm for dynamically extracting
finite-state automata during and after training. For a well-trained
neural net, the extracted automata constitute an equivalence class of
state machines that are reducible to the minimal machine of the inferred
grammar. It is then shown through simulations that many of the neural
net state machines are dynamically stable and correctly classify long
unseen strings
Neural Networks, 1991., IJCNN-91-Seattle International Joint Conference on; 08/1991
-
[show abstract]
[hide abstract]
ABSTRACT: In principle, a potentially infinitely large neural network
(either in number of neurons or in the precision of a single neural
activity) could possess an equivalent computational power to a Turing
machine. The authors show such an equivalence of Turing machines to
several explicitly constructed neural networks. It is proven that for
any given Turing machine there exists a recurrent neural network with
local, second-order, and uniformly connected weights (i.e., the weights
connecting the second-order product of local `input neurons' with their
corresponding `output neurons') which can simulate it. The numerical
implementation and learning of such a neural Turing machine are also
discussed
Neural Networks, 1991., IJCNN-91-Seattle International Joint Conference on; 08/1991
-
[show abstract]
[hide abstract]
ABSTRACT: A discussion is presented of the advantage of using a linear
recurrent network to encode and recognize sequential data. The hidden
Markov model (HMM) is shown to be a special case of such linear
recurrent second-order neural networks. The Baum-Welch reestimation
formula, which has proved very useful in training HMM, can also be used
to learn a linear recurrent network. As an example, a network has
successfully learned the stochastic Reber grammar with only a few
hundred sample strings in about 14 iterations. The relative merits and
limitations of the Baum-Welch optimal ascent algorithm in comparison
with the error correction-gradient descent-learning algorithm are
discussed
Neural Networks, 1990., 1990 IJCNN International Joint Conference on; 07/1990
-
[show abstract]
[hide abstract]
ABSTRACT: A scheme is presented to construct automatically a neural network architecture that takes advantage of both the parallel and sequential strategies to solve a pattern classification or decision problem. The scheme optimizes an entropy measure to train nodes that extract attributes from the training patterns. The sequential extraction of attributes with ranking order could alleviate significantly the scale-up problem of an all parallel network. Examples of decision-tree problems demonstrate amply the superior performance of this PSIN (parallel sequential induction network) against the usual backpropagation procedure in multilayered networks.< >
Neural Networks, 1988., IEEE International Conference on; 08/1988
-
[show abstract]
[hide abstract]
ABSTRACT: An adaptive template method for pattern recognition is proposed. The template adaptation algorithm is derived based on minimizing the classification error of the classifier. The authors have applied this method to a multispeaker English E-set recognition experiment and achieved a 90.38% average recognition rate with only one template for each letter. This indicates that the derived templates are able to capture the speaker-invariant features of speech signals
Neural Networks for Signal Processing [1992] II., Proceedings of the 1992 IEEE-SP Workshop;
-
[show abstract]
[hide abstract]
ABSTRACT: A new technique for speech signal processing called nonlinear
resampling transformation (NRT) is proposed. The representation of a
speech pattern derived from this technique has two important features:
first, it reduces redundancy; second, it effectively removes the
nonlinear variations of speech signals in time. The authors have applied
NRT to the TI isolated-word database achieving a 99.66% recognition rate
on a 10 digits multi-speaker task for a linear predictive neural net
classifier. In their experiment, the authors have also found that
discriminative training is superior to nondiscriminative training for
linear predictive neural network classifiers
Neural Networks for Signal Processing [1991]., Proceedings of the 1991 IEEE Workshop;