# David E. Rumelhart's research while affiliated with Stanford University and other places

**What is this page?**

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

## Publications (104)

Fourteen rhesus monkeys and two human Os were trained to discriminate between identical blocks of wood placed 13 in apart, using cues that were provided by a pointer that was placed at random in positions spaced 1.0 in apart between the manipulanda. Monkeys made increasingly more errors as a function of increasing distance between the manipulandum...

multiple simultaneous constraints
parallel distributed processing [PDP] / examples of PDP models
representation and learning in PDP models
origins of parallel distributed processing (PsycINFO Database Record (c) 2012 APA, all rights reserved)

We discuss the development of a neural network for facial expression recognition. It aims at recognizing and interpreting facial expressions in terms of signaled emotions and level of expressiveness. We use the backpropagation algorithm to train the system to differentiate between facial expressions. We show how the network generalizes to new faces...

n M In this paper we present a hybrid multilayer perceptron (MLP)/hidde arkov model (HMM) speaker-independent continuous-speech recognib tion system, in which the advantages of both approaches are combined y using MLPs to estimate the state-dependent observation probabilities p of an HMM. New MLP architectures and training procedures are resented w...

e present a speaker-independent, continuous-speech recog- ( nition system based on a hybrid multilayer perceptron MLP)/hidden Markov model (HMM). The system come bines the advantages of both approaches by using MLPs to stimate the state-dependent observation probabilities of an e p HMM. New MLP architectures and training procedures ar resented that...

arlier hybrid multilayer perceptron (MLP)/hidden Markov model (HMM) continuous speech recognition sysr g tems have not modeled context-dependent phonetic effects, sequences of distributions for phonetic models, o ender-based speech consistencies. In this paper we present a new MLP architecture and training procedure for t " modeling context-depende...

In this paper we present a training method and a network architecture for estimating contextdependent observation probabilities in the framework of a hybrid hidden Markov model (HMM) / multi layer perceptron (MLP) speaker-independent continuous speech recognition system. The context-dependent modeling approach we present here computes the HMM conte...

We describe a technique for mapping out human somatosensory cortex using functional magnetic resonance imaging (fMRI). To produce cortical activation, a pneumatic apparatus presented subjects with a periodic series of air puffs in which a sliding window of five locations moved along the ventral surface of the left arm in a proximal-to-distal or dis...

Internal models of the environment have an important role to play in adaptive systems in general and are of particular importance for the supervised learning paradigm. In this paper we demonstrate that certain classical problems associated with the notion of the "teacher" in supervised learning can be solved by judicious use of learned internal mod...

morphic Extensions to the Relational Model. PhD dissertation, The University of Iowa, Dept. of Computer Science, August 1989. D. Eichmann. A hybrid approach to software repository retrieval: Blending faceted classification and type signatures. In Third International Conference of Software Engineering and Knowledge Engineering, pages 236-240, Skokie...

morphic Extensions to the Relational Model. PhD dissertation, The University of Iowa, Dept. of Computer Science, August 1989. D. Eichmann. A hybrid approach to software repository retrieval: Blending faceted classification and type signatures. In Third International Conference of Software Engineering and Knowledge Engineering, pages 236-240, Skokie...

In this paper we present a training method and a network architecture for estimating context-dependent observation probabilities in the framework of a hybrid hidden Markov model (HMM)/multi layer perceptron (MLP) speaker-independent continuous speech recognition system. The context-dependent modeling approach we present here computes the HMM contex...

Interest in the study of neural networks has grown remarkably in the last several years. This effort has been characterized in a variety of ways: as the study of brain-style computation, connectionist architectures, parallel distributed-processing systems, neuromorphic computation, artificial neural systems. The common theme to these efforts has be...

Just four years ago, the only widely reported commercial application of neural network technology outside the financial industry was the airport baggage explosive detection system developed at Science Applications International Corporation (SAIC). Since that time scores of industrial and commercial applications have come into use, but the details o...

An optimal control theory of story comprehension and recall is proposed within the framework of a “situation”‐state space. A point in situation‐state space is specified by a collection of propositions, each of which can have the values of either “present” or “absent.” A trajectory in situation‐state space is a temporally ordered sequence of situati...

We present a neural network algorithm that simultaneously performs segmentation and recognition of input patterns that self-organizes to detect input pattern locations and pattern boundaries. We outline the algorithm and demonstrate this neural network architecture and algorithm on character recognition using the NIST database and report results he...

Internal models of the environment have an important role to play in adaptive systems in general and are of particular importance for the supervised learning paradigm. In this paper we demonstrate that certain classical problems associated with the notion of the %eacher" in supervised learning can be solved by judicious use of learned internal mode...

Internal models of the environment have an important role to play in adaptive systems in general and are of particular importance for the supervised learning paradigm. In this paper we demonstrate that certain classical problems associated with the notion of the “teacher― in supervised learning can be solved by judicious use of learned internal mod...

The authors show how the effective number of parameters changes
during backpropagation training by analyzing the eigenvalue spectra of
the covariance matrix of hidden unit activations and of the matrix of
weights between inputs and hidden units. They use the standard example
of time series prediction of the sunspot series. The effective ranks of
th...

Inspired by the information theoretic idea of minimum description
length, the authors add a term to the backpropagation cost function that
penalizes network complexity. The authors give the details of the
procedure, called weight-elimination, describe its dynamics, and clarify
the meaning of the parameters involved. From a Bayes perspective, the
co...

Inspired by the information theoretic idea of minimum description length, we add a term to the usual back-propagation cost function that penalizes network complexity. From a Bayesian perspective, the complexity term can be usefully interpreted as an assumption about prior distribution of the weights. This method, called weight-elimination, is contr...

We present a neural network algorithm that simultaneously performs
segmentation and recognition of input patterns that self-organizes to
detect input pattern locations and pattern boundaries. We outline the
algorithm and demonstrate this neural network architecture and algorithm
on character recognition using the NIST database and report results
he...

We have designed a feed-forward neural network to classify low-resolution mass spectra of unknown compounds according to the presence or absence of 100 organic substructures. The neural network, MSnet, was trained to compute a maximum-likelihood estimate of the probability that each substructure is present. We discuss some design considerations and...

mass spectral classification; structure elucidation; neural networks; back propagation We have designed a feed-forward neural network to classify low-resolution mass spectra of unknown compounds according to the presence or absence of 100 organic substructures. The neural network, MSnet, was trained to compute a maximum-likelihood estimate of the p...

We investigate the effectiveness of connectionist architectures for predicting the future behavior of nonlinear dynamical systems. We focus on real-world time series of limited record length. Two examples are analyzed: the benchmark sunspot series and chaotic data from a computational ecosystem. The problem of overfitting, particularly serious for...

This chapter reviews and examines a variant type of computational unit which we have recently proposed for use in multi-layer neural networks [3]. Instead of the output of this unit depending on a weighted sum of the inputs, it depends on a weighted product. In justifying the introduction of a new type of unit we explore at some length the rational...

We introduce a new form of computational unit for feedforward learning networks of the backpropagation type. Instead of calculating a weighted sum this unit calculates a weighted product, where each input is raised to a power determined by a variable weight. Such a unit can learn an arbitrary polynomial term, which would then feed into higher level...

This article presents a simulation-based tutorial system for exploring parallel distributed processing (PDP) models of information
processing. The system consists of software and an accompanying handbook. The intent of the package is to make the ideas underlying
PDP accessible and to disseminate some of the main simulation programs that we have dev...

We describe a new learning procedure, back-propagation, for networks of neurone-like units. The procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. As a result of the weight adjustments, internal 'hidden' u...

This paper presents a generalization of the perception learning procedure for learning the correct sets of connections for arbitrary networks. The rule, falled the generalized delta rule, is a simple scheme for implementing a gradient descent method for finding weights that minimize the sum squared error of the sytem's performance. The major theore...

We describe a distributed model of information processing and memory and apply it to the representation of general and specific information. The model consists of a large number of simple processing elements which send excitatory and inhibitory signals to each other via modifiable connections. Information processing is thought of as the process whe...

Responds to D. Broadbent's (see record 1986-08237-001 ) comments on the present 2nd and 1st authors' (see record 1986-08244-001 ) article on distributed memory. Broadbent concedes that the present authors are probably correct in supposing that memory representations are distributed but argues that psychological evidence is irrelevant to the present...

Responds to D. Broadbent's (see record
1986-08237-001) comments on the present 2nd and 1st authors' (see record
1986-08244-001) article on distributed memory. Broadbent concedes that the present authors are probably correct in supposing that memory representations are distributed but argues that psychological evidence is irrelevant to the present...

This paper reports the results of our studies with an unsupervised learning paradigm which we have called “Competitive Learning.” We have examined competitive learning using both computer simulation and formal analysis and have found that when it is applied to parallel networks of neuron-like elements, many potentially useful learning tasks can be...

A common terminology is essential when working in any area, and the study of typing is no exception. To aid ourselves and others, we have compiled a glossary of basic definitions useful in the description of the phenomena of typing. The glossary, which also contains a categorization of errors, has proved useful in several ways. Not only does it kee...

The study of typing comprises a fascinating mixture of elements from motor skills, typewriter mechanics, anatomy, and cognitive control structures. Our research group initially started to study typing because it seemed an ideal example of highly skilled performance, with readily available experimental subjects and, with the advent of computer-contr...

We review the major phenomena of skilled typing and propose a model for the control of the hands and fingers during typing. The model is based upon an Activation-Trigger-Schema system in which a hierarchical structure of schemata directs the selection of the letters to be typed and, then, controls the hand and finger movements by a cooperative, rel...

The interactive activation model of context effects in letter perception is reviewed, elaborated, and tested. According to the model, context aids the perception of target letters as they are processed in the perceptual system. The implication that the duration and timing of the context in which a letter occurs should greatly influence the percepti...

Describes a model in which perception results from excitatory and
inhibitory interactions of detectors for visual features, letters,
and words. A visual input excites detectors for visual features in
the display and for letters consistent with the active features.
Letter detectors in turn excite detectors for consistent words. It
is suggested that...

This report is the first part of a two-part series introducing an interactive activation model of context effects in perception. In this part, is developed the model for the perception of letters in words and other contexts and a number of experiments in the recent literature is applied. The model is used to account for the perceptual advantage for...

Learning is not a simple unitary process. This paper identifies three qualitatively different phases of the learning process. In one phase, the learner acquires facts and information, accumulating more structures onto the already existing knowledge structures. This phase of learning is adequate only when the material being learned is part of a prev...

Describes development of a model for the recognition of tachistoscopically presented words. It is a "sophisticated guessing" model which takes explicit account of the geometry of the characters which make up the words or letter strings. Explicit attempts are made to account for word frequency effects, effects due to letter transition probabilities,...

A theory of analogical reasoning is proposed in which the elements of a set of concepts, e.g., animals, are represented as points in a multidimensional Euclidean space. Four elements A,B,C,D, are in an analogical relationship A:B::C:D if the vector distance from A to B is the same as that from C to D. Given three elements A,B,C, an ideal solution p...

Additive AND/OR graphs are defined as AND/ OR graphs without circuits, which can be considered as folded AND/OR trees; i. e. the cost of a common subproblem is added to the cost as many times as the subproblem occurs, but it is computed only once. Additive ...

Describes methods for determining sidedness and eye dominance in infants under 12 wk. of age, in 2-5 yr. olds, and in Ss over 5 yr. of age. The effects of imitation on developing left or right handedness is discussed. Research is noted which indicates the deleterious effects of crossed dominance. It is suggested that those children and adults who a...

## Citations

... The self-organizing map (SOM), introduced by Kohonen [85], is an unsupervised machine learning method that performs an ordered mapping of the input data into a lower-dimensional space. Essentially, the SOM is an artificial neural network (ANN) that is trained through a competitive learning framework, i.e., the ANN nodes compete with each other for the right to "respond" to the input data [86]. ...

... After a fixed number of time steps, an activation-weighted sum of all memories is added back to the cell state of the LSTM. regularities can be viewed as an implementation of semantic memory (McClelland and Rumelhart, 1987;McClelland and Rogers, 2003;Rogers and McClelland, 2004;Saxe et al., 2019). ...

... Automated and accurate classification of objects into stars and galaxies from optical (and near infrared) imaging data is an issue of considerable interest. Artificial neural network based approaches to the star galaxy classification problem include SOM (Miller & Coe 1996), decision tree induction (Weir, Fayyad & Djorgovski 1995) and back propagation, which is the basis for SExtractor, a widely used tool for star-galaxy separation (Bertin & Arnouts 1996). One of the drawbacks of classification tools such as SExtractor that employ back propagation is that it is difficult to modify them for specific needs. ...

... Um dos modelos mais utilizados na literatura é o Bilingual Interactive Activation Model -BIA (van Heuven, Dijkstra, & Grainger 1998), que foi desenvolvido com base no modelo interativo de processamento de palavras (Rumelhart & McClelland 1981). Esse modelo assume um léxico compartilhado para as duas línguas, que se estende à reformulação do modelo, Bilingual Interactive Activation Plus Model -BIA+ (Dijkstra & van Heuven 2002). ...

... As mentioned previously, in deep learning, the training process aims to minimize the cost function of the neural network by changing the values of its parameters, i.e. the weights and biases. For this purpose, Gradient Descent is known as one of the most popular optimization algorithms (Rumelhart et al., 1986). This technique consists of two steps that are performed iteratively through the training dataset. ...

... As one kind of intelligence optimization algorithms, the BP multi-layer feed-forward ANN algorithm was proposed by D.E. Rumelhart firstly [31]. In complicated systems with several effective input parameters, ANN can be used to predict output data. ...

... Alguns métodos propõem a inclusão de termos na função de custo que levam em conta o somatório do quadrado dos pesos das ligações (Σw ij 2 ), o somatório do módulo dos pesos (Σ w ij ) ou uma função logarítmica log(1+w 2 ). Weigend [4] propôs um termo de penalização da forma : ...