Y.C. Lee

University of Maryland, College Park, College Park, MD, USA

Are you Y.C. Lee?

Claim your profile

Publications (11)0 Total impact

  • Article: The Neural Network Pushdown Automaton: Architecture, Dynamics and Training
    [show abstract] [hide abstract]
    ABSTRACT: this paper develops a new model, a neural network pushdown automaton (NNPDA), which is a hybrid system that couples a recurrent network to an external stack memory. More importantly, a NNPDA should be capable of learning and recognizing some class of context-free grammars. As such, this model is a significant extension of previous work where neural network finite state automata simulated and learned regular grammars. We explore the capabilities of such a model by inferring automata from sample strings - the problem of grammatical inference. It is important to note that our focus is only on that of inference, not of prediction or translation. We will be concerned with problem of inferring an unknown system model based on observing sample strings and not on predicting the next string element in a sequence. In some ways, our problem can be thought of as one of system identification [Ljung87].
    11/1998;
  • Article: Constructive Learning of Recurrent Neural Networks: Limitations of Recurrent Casade Correlation and a Simple Solution
    [show abstract] [hide abstract]
    ABSTRACT: It is often difficult to predict the optimal neural network size for a particular application. Constructive or destructive methods that add or subtract neurons, layers, connections, etc. might offer a solution to this problem. We prove that one method, Recurrent Cascade Correlation, due to its topology, has fundamental limitations in representation and thus in its learning capabilities. It cannot represent with monotone (i.e. sigmoid) and hard-threshold activation functions certain finite state automata. We give a "preliminary" approach on how to get around these limitations by devising a simple constructive training method that adds neurons during training while still preserving the powerful fully-recurrent structure. We illustrate this approach by simulations which learn many examples of regular grammars that the Recurrent Cascade Correlation method is unable to learn. 1 Introduction Choosing the architecture of a neural network for a particular problem usually requires some prior k...
    04/1996;
  • Conference Proceeding: Time warping recurrent neural networks and trajectory classification
    [show abstract] [hide abstract]
    ABSTRACT: The authors propose a model of a time warping recurrent neural network (TWRNN) to handle temporal pattern classification where severely time warped and deformed data may occur. This model is shown to have built-in time warping ability. The authors analyze the properties of TWRNN and show that for trajectory classification it has several advantages over such schemes as dynamic programming, hidden Markov models, time-delayed neural networks, and neural network finite automata. A numerical example of trajectory classification is presented. This problem, making a feature of variable sampling rates, having internal states, continuous dynamics, heavily time-warped data, and deformed phase space trajectories, is shown to be difficult for the other schemes. The TWRNN has learned it easily. The authors also trained it with TDNN and failed
    Neural Networks, 1992. IJCNN., International Joint Conference on; 07/1992
  • Conference Proceeding: Speech recognition using dynamic time warping with neural network trained templates
    Y. Liu, Y.-C. Lee, H.-H. Chen, G.-Z. Sun
    [show abstract] [hide abstract]
    ABSTRACT: A dynamic time warping based speech recognition system with neural network trained templates is proposed. The algorithm for training the templates is derived based on minimizing classification error of the speech classifier. A speaker-independent isolated digit recognition experiment is conducted and achieves a 0.89% average recognition error rate with only one template for each digit, indicating that the derived templates are able to capture the speaker-invariant features of speech signals. Both nondiscriminative and discriminative versions of the neural net template training algorithm are considered. The former is based on maximum likelihood estimation. The latter is based on minimizing classification error. It is demonstrated through experiments that the discriminative training algorithm is far superior to the nondiscriminative one, providing both smaller recognition error rate and greater discrimination power. Experiments using different feature representation schemes are considered. It is demonstrated that the combination of the feature vector and the delta feature vector yields the best recognition result
    Neural Networks, 1992. IJCNN., International Joint Conference on; 07/1992
  • Conference Proceeding: Discriminative training algorithm for predictive neural network models
    Y. Liu, Y.-C. Lee, H.-H. Chen, G.-Z. Sun
    [show abstract] [hide abstract]
    ABSTRACT: A discriminative training algorithm for predictive neural network models is proposed. The algorithm is applied to a speaker independent isolated digit recognition experiment. The recognition error rate is reduced from 2.52% when the classifier is trained with a non-discriminative algorithm to 0.58% when the discriminative algorithm is applied. The increase in classifier discrimination ability is also demonstrated
    Neural Networks, 1992. IJCNN., International Joint Conference on; 07/1992
  • Conference Proceeding: Second-order recurrent neural networks for grammatical inference
    [show abstract] [hide abstract]
    ABSTRACT: It is shown that a recurrent, second-order neural network using a real-time, feedforward training algorithm readily learns to infer regular grammars from positive and negative string training samples. Numerous simulations which show the effect of initial conditions, training set size and order, and neuron architecture are presented. All simulations were performed with random initial weight strengths and usually converge after approximately a hundred epochs of training. The authors discuss a quantization algorithm for dynamically extracting finite-state automata during and after training. For a well-trained neural net, the extracted automata constitute an equivalence class of state machines that are reducible to the minimal machine of the inferred grammar. It is then shown through simulations that many of the neural net state machines are dynamically stable and correctly classify long unseen strings
    Neural Networks, 1991., IJCNN-91-Seattle International Joint Conference on; 08/1991
  • Conference Proceeding: Turing equivalence of neural networks with second order connection weights
    G.-Z. Sun, H.-H. Chen, Y.-C. Lee
    [show abstract] [hide abstract]
    ABSTRACT: In principle, a potentially infinitely large neural network (either in number of neurons or in the precision of a single neural activity) could possess an equivalent computational power to a Turing machine. The authors show such an equivalence of Turing machines to several explicitly constructed neural networks. It is proven that for any given Turing machine there exists a recurrent neural network with local, second-order, and uniformly connected weights (i.e., the weights connecting the second-order product of local `input neurons' with their corresponding `output neurons') which can simulate it. The numerical implementation and learning of such a neural Turing machine are also discussed
    Neural Networks, 1991., IJCNN-91-Seattle International Joint Conference on; 08/1991
  • Conference Proceeding: Recurrent neural networks, hidden Markov models and stochastic grammars
    [show abstract] [hide abstract]
    ABSTRACT: A discussion is presented of the advantage of using a linear recurrent network to encode and recognize sequential data. The hidden Markov model (HMM) is shown to be a special case of such linear recurrent second-order neural networks. The Baum-Welch reestimation formula, which has proved very useful in training HMM, can also be used to learn a linear recurrent network. As an example, a network has successfully learned the stochastic Reber grammar with only a few hundred sample strings in about 14 iterations. The relative merits and limitations of the Baum-Welch optimal ascent algorithm in comparison with the error correction-gradient descent-learning algorithm are discussed
    Neural Networks, 1990., 1990 IJCNN International Joint Conference on; 07/1990
  • Conference Proceeding: Parallel sequential induction networks: a new paradigm of neural network architecture
    G.Z. Sun, H.H. Chen, Y.C. Lee
    [show abstract] [hide abstract]
    ABSTRACT: A scheme is presented to construct automatically a neural network architecture that takes advantage of both the parallel and sequential strategies to solve a pattern classification or decision problem. The scheme optimizes an entropy measure to train nodes that extract attributes from the training patterns. The sequential extraction of attributes with ranking order could alleviate significantly the scale-up problem of an all parallel network. Examples of decision-tree problems demonstrate amply the superior performance of this PSIN (parallel sequential induction network) against the usual backpropagation procedure in multilayered networks.< >
    Neural Networks, 1988., IEEE International Conference on; 08/1988
  • Conference Proceeding: Adaptive template method for speech recognition
    Y Liu, Y.C. Lee, H.-H. Chen, G.-Z. Sun
    [show abstract] [hide abstract]
    ABSTRACT: An adaptive template method for pattern recognition is proposed. The template adaptation algorithm is derived based on minimizing the classification error of the classifier. The authors have applied this method to a multispeaker English E-set recognition experiment and achieved a 90.38% average recognition rate with only one template for each letter. This indicates that the derived templates are able to capture the speaker-invariant features of speech signals
    Neural Networks for Signal Processing [1992] II., Proceedings of the 1992 IEEE-SP Workshop;
  • Conference Proceeding: Nonlinear resampling transformation for automatic speech recognition
    [show abstract] [hide abstract]
    ABSTRACT: A new technique for speech signal processing called nonlinear resampling transformation (NRT) is proposed. The representation of a speech pattern derived from this technique has two important features: first, it reduces redundancy; second, it effectively removes the nonlinear variations of speech signals in time. The authors have applied NRT to the TI isolated-word database achieving a 99.66% recognition rate on a 10 digits multi-speaker task for a linear predictive neural net classifier. In their experiment, the authors have also found that discriminative training is superior to nondiscriminative training for linear predictive neural network classifiers
    Neural Networks for Signal Processing [1991]., Proceedings of the 1991 IEEE Workshop;

Institutions

  • 1988–1992
    • University of Maryland, College Park
      • • Department of Physics
      • • Institute for Advanced Computer Studies
      College Park, MD, USA