ArticlePDF Available

Generalization and Exclusive Allocation of Credit in Unsupervised Category Learning

Authors:

Abstract and Figures

A new way of measuring generalization in unsupervised learning is presented. The measure is based on an exclusive allocation, or credit assignment , criterion. In a classifier that satisfies the criterion, input patterns are parsed so that the credit for each input feature is assigned exclusively to one of multiple, possibly overlapping, output categories. Such a classifier achieves context-sensitive, global representations of pattern data. Two additional constraints, sequence masking and uncertainty multiplexing, are described; these can be used to refine the measure of generalization. The generalization performance of EXIN networks, winner-take-all competitive learning networks, linear decorrelator networks, and Nigrin's SONNET--2 network is compared. Keywords: Generalization, Exclusive allocation, Credit assignment, Binding, Unsupervised learning, Pattern classification, Distributed coding, EXIN (excitatory+inhibitory) learning, Sparse coding, Rule extraction, Regularization, Blind source separation.
Comparison of WTA, linear decorrelator, and EXIN networks. (A) Initially, neurons in the input layer project excitatory connections non-specifically to neurons in the output layer. Also, each neuron in the output layer projects lateral inhibitory connections non-specifically to all its neighbours (shaded arrows). (B), (C), (D) The excitatory learning rule causes each type of neural network to become selective for patterns ab, abc, and cd after a period of exposure to those patterns; a different neuron becomes wired to respond to each of the familiar patterns. Each network's response to pattern abc is shown. (E) In WTA, the compound pattern abcd (filled lower circles) causes the single 'nearest' neuron (abc) (filled upper circle) to become active and suppress the activation of the other output neurons. (G) In EXIN, the inhibitory learning rule weakens the strengths of inhibitory connections between neurons that code non-overlapping patterns, such as between neurons ab and cd. Then when abcd is presented, both neurons ab and cd become active (filled upper circles), representing the simultaneous presence of the familiar patterns ab and cd. (F) The linear decorrelator responds similarly to EXIN for input pattern abcd. However, in response to the unfamiliar pattern c, both WTA (H) and EXIN (J) moderately activate (partially filled circles) the neuron whose code most closely matches the pattern (cd ), whereas the linear decorrelator (I) activates a more distant match (abc). (Reprinted, with permission, from Marshall (1995), copyright Elsevier Science.)
… 
Content may be subject to copyright.
Network: Comput. Neural Syst. 9 (1998) 279–302. Printed in the UK PII: S0954-898X(98)91912-1
Generalization and exclusive allocation of credit in
unsupervised category learning
Jonathan A Marshall and Vinay S Gupta
Department of Computer Science, CB 3175, Sitterson Hall, University of North Carolina, Chapel
Hill, NC 27599–3175, USA
Received 19 August 1997
Abstract. A new way of measuring generalization in unsupervised learning is presented. The
measure is based on an exclusive allocation,orcredit assignment, criterion. In a classifier that
satisfies the criterion, input patterns are parsed so that the credit for each input feature is assigned
exclusively to one of multiple, possibly overlapping, output categories. Such a classifier achieves
context-sensitive, global representations of pattern data. Two additional constraints, sequence
masking and uncertainty multiplexing, are described; these can be used to refine the measure of
generalization. The generalization performance of EXIN networks, winner-take-all competitive
learning networks, linear decorrelator networks, and Nigrin’s SONNET-2 network are compared.
1. Generalization in unsupervised learning
The concept of generalization in pattern classification has been extensively treated in the
literature on supervised learning, but rather little has been written on generalization with
regard to unsupervised learning. Indeed, it has been unclear what generalization even means
in unsupervised learning. This paper provides an appropriate definition for generalization
in unsupervised learning, a metric for generalization quality, and a qualitative evaluation
(using the metric) of generalization in several simple neural network classifiers.
The essence of generalization is the ability to appropriately categorize unfamiliar
patterns, based on the categorization of familiar patterns. In supervised learning, the output
categorizations for a training set of input patterns are given explicitly by an external teacher,
or supervisor. Various techniques have been used to ensure that test patterns outside this
set are correctly categorized, according to an external standard of correctness.
After supervised learning, a system’s ability to generalize can be measured in terms
of task performance. For instance, a face-recognition system can be tested using different
image viewpoints or illumination conditions, and performance can be evaluated in terms of
how accurately the system’s outputs match the actual facial identities in images. However,
in some situations, it may not be appropriate to measure the ability to generalize in terms
of performance on a specific task. For example, on which task would one measure the
generalization quality of the human visual system? Human vision is capable of so many
tasks that no one task is appropriate as a benchmark. In fact, much of the power of the
human visual system is its usefulness in completely novel tasks.
For general-purpose systems, like the human visual system, it would be useful to define
generalization performance in a task-independent way, rather than in terms of a specific task.
E-mail: marshall@cs.unc.edu
0954-898X/98/020279+24$19.50
c
1998 IOP Publishing Ltd
279
280 J A Marshall and V S Gupta
It would be desirable to have a ‘general-purpose’ definition of generalization quality, such
that if a system satisfies the definition, it is likely to perform well on many different tasks.
This paper proposes such a general-purpose definition, based on unsupervised learning.
The definition measures how well a system’s internal representations correspond to the
underlying structure of its input environment, under manipulations of context, uncertainty,
multiplicity, and scale (Marshall 1995). For this definition, a good internal representation
is the goal, rather than good performance on a particular task.
In unsupervised learning, input patterns are assigned to output categories based on some
internal standard, such as similarity to other classified patterns. Patterns are drawn from a
training environment, or probability distribution, in which some patterns may be more likely
to occur than others. The classifications are typically determined by this input probability
distribution, with frequently-occurring patterns receiving more processing and hence finer
categories. The categorization of patterns with a low or zero training probability, i.e. the
generalization performance, is determined partly by the higher-probability patterns. There
may be classifier systems that categorize patterns in the training environment similarly, but
which respond to unfamiliar patterns in different ways. In other words, the generalizations
produced by different classifiers may differ.
2. A criterion for evaluating generalization
How can one judge whether the classifications and parsings that an unsupervised classifier
generates are good ones? Several criteria (e.g., stability, dispersion, selectivity, convergence,
and capacity) for benchmarking unsupervised neural network classifier performance have
been proposed in the literature. This paper describes an additional criterion: an exclusive
allocation (or credit assignment) measure (Bregman 1990, Marshall 1995). Exclusive
allocation as a criterion for evaluating classifications was first discussed by Marshall (1995).
This paper refines and formalizes the intuitive concept of exclusive allocation, and it
describes in detail how exclusive allocation can serve as a measure for generalization in
unsupervised classifiers.
This paper also describes two regularization constraints, sequence masking and
uncertainty multiplexing, which can be used to evaluate further the generalization
performance of unsupervised classifiers. In cases where there exist multiple possible
classifications that would satisfy the exclusive allocation criterion, these regularizers allow
a secondary measurement and ranking of the quality of the classifications.
The principle of credit assignment states that the ‘credit’ for a given input feature should
be assigned, or allocated, exclusively to a single classification. In other words, any given
piece of data should count as evidence for one pattern at a time and should be prevented
from counting as evidence for multiple patterns simultaneously. This intuitively simple
concept has not been stated in a mathematically precise way; such a precise statement is
given in this paper.
There are many examples (e.g., from visual perception of orientation, stereo depth,
and motion grouping, from visual segmentation, from other perceptual modalities, and from
‘blind source separation’ tasks) where a given datum should be allowed to count as evidence
for only one pattern at a time (Bell and Sejnowski 1995, Comon et al 1991, Hubbard and
Marshall 1994, Jutten and Herault 1991, Marshall 1990a, c, Marshall et al 1996, 1997, 1998,
Morse 1994, Schmitt and Marshall 1998). A good example comes from visual stereopsis,
where a visual feature seen by one eye can be potentially matched with many visual features
seen by the other eye (the ‘correspondence’ problem). Human visual systems assign the
credit for each such monocular visual feature to at most one unique binocular match; this
Generalization and exclusive allocation 281
property is known as the uniqueness constraint (Marr 1982, Marr and Poggio 1976). In
stereo transparency (Prazdny 1985), individual visual features should be assigned to the
representation of only one of multiple superimposed surfaces (Marshall et al 1996).
2.1. Neural network classifiers
A neural network categorizes an input pattern by activating some classifier output neurons.
These activations constitute a representation of the input pattern, and the input features
of that pattern are said to be assigned to that output representation. An input pattern
that is not part of the training set, but which contains features present in two or more
training patterns, can exist. Such an input is termed a superimposition of input patterns.
Presentation of superimposed input patterns can lead to simultaneous activation of multiple
representations (neurons).
2.2. An exclusive allocation measure
One way to define an exclusive allocation measure for a neural network classifier is to
specify how input patterns (both familiar and unfamiliar) should ideally be parsed, in terms
of a given training environment (the familiar patterns), and then to measure how well the
network’s actual parsings compare with the ideal. Consider, for instance, the network shown
in figure 1, which has been trained to recognize patterns ab and bc (Marshall 1995). Each
output neuron is given a ‘label’ (ab, bc) that reflects the familiar patterns to which the
neuron responds. The parsings that the network generates are evaluated in terms of those
labels. When ab or bc is presented, then the ‘best’ parsing is for the correspondingly labelled
output neuron to become fully active and for the other output neuron to become inactive
(figure 1(A)). In a linear network, when half a pattern is missing (say the input pattern is a),
and the other half does not overlap with other familiar patterns, the corresponding output
neuron should become half-active (figure 1(B)).
However, when the missing half renders the pattern’s classification ambiguous (say
the input pattern is b), the partially matching alternatives (ab and bc) should not both
be half-active. Instead, the activation should be distributed among the partially matching
alternatives. One such parsing, in which the activation from b is distributed equally between
ab and bc, results in two activations at 25% of the maximum level (figure 1(C)). This parsing
would represent the network’s uncertainty about the classification of input pattern b.
Another such parsing, in which the activation from b is distributed unequally, to ab
and not to bc, results in 50% activation of neuron ab (figure 1(D)). This parsing would
represent a ‘guess’ by the network that the ambiguous input pattern b should be classified
as ab. Although the distribution of credit from b to ab and bc is different in the two
parsings of figure 1(C) and 1(D), both parsings allocate the same total amount of credit.
(An additional criterion, ‘uncertainty multiplexing,’ which distinguishes between parsings
like the ones in figures 1(C) and 1(D), will be presented in subsection 4.7.)
By the same reasoning, it would be incorrect to parse pattern abc as ab (figure 1(E)),
because then the contribution from c is ignored. (That can happen if the inhibition between
ab and bc is too strong.) It would also be incorrect to parse abc as ab + bc (figure 1(F))
because b would be represented twice.
A correct parsing in this case would be to equally activate neurons ab and bc at 75%
of the maximum level (figure 1(G)). That this is correct can be verified by comparing the
sum of the input signals, 1 + 1 + 1 = 3, with the sum of the ‘size-normalized’ output
signals. Each output neuron encodes a pattern of a certain preferred ‘size’ (or ‘scale’)
282 J A Marshall and V S Gupta
ab bc
abc
(A)
ab bc
abc
(E)
ab bc
abc
(F)
ab bc
abc
(B)
ab bc
abc
(C)
ab bc
abc
(G)
ab bc
abc
(D)
Figure 1. Parsings for exclusive allocation. (A) Normal parsing; the familiar pattern ab activates
the correspondingly labelled output neuron. (B) The unfamiliar pattern a half-activates the best-
matching output neuron, ab. (C) The unfamiliar input pattern b matches ab and bc equally
well, and its excitation credit is divided equally between the corresponding two output neurons,
resulting in a 25% activation for each of the two neurons. (D) The excitation credit from b is
allocated entirely to neuron ab (producing a 50% activation), and not to neuron bc. (E) Incorrect
parsing in response to unfamiliar pattern abc: neuron ab is fully active, but the credit from input
unit c is lost. (F) Another incorrect parsing of abc: the credit from unit b is counted twice,
contributing to the full activation of both neurons ab and bc. (G) Correct parsing of abc: the
credit from b is divided equally between the best matches ab and bc, resulting in a 75% activation
of both neurons ab and bc. (Redrawn with permission, from Marshall (1995), copyright Elsevier
Science.)
(Marshall 1990b, 1995), which in figure 1 is the sum of the weights of the neuron’s input
connections. The sum of the input weights to neuron ab is 1 + 1 + 0 = 2, and the sum
of the input weights to neuron bc is 0 + 1 + 1 = 2. Thus, both of these output neurons
are said to have a size of 2. The size-normalized output signal for each output neuron is
computed by multiplying its activation by its size. The sum of the size-normalized output
signals in figure 1(G) is (0.75 × 2) + (0.75 × 2) = 3. Because this equals the sum of the
input signals, the exclusively-allocated parsing in figure 1(G) is valid (unlike the parsings
in figure 1(E) and 1(F)).
Given the examples above, exclusive allocation can be informally defined as the
conjunction of the following pair of conditions. Exclusive allocation is said to be achieved
when:
Condition 1. The activation of every output neuron is accounted for exactly once by
the input activations.
Condition 2. The total input equals the total size-normalized output, as closely as
possible.
These two informal exclusive allocation conditions are made more precise in subsequent
sections. They are used below to evaluate the generalization performance of several neural
network classifiers.
3. Generalization performance of several networks
3.1. Response to familiar and unfamiliar patterns
This section compares the generalization performance of three neural network classifiers: a
winner-take-all network, an EXIN network, and a linear decorrelator network. First, each
network will be described briefly.
3.1.1. Winner-take-all competitive learning. Among the simplest unsupervised learning
procedures is the winner-take-all (WTA) competitive learning rule, which divides the space
Generalization and exclusive allocation 283
of input patterns into hyper-polyhedral decision regions, each centered around a ‘prototype’
pattern. The ART-1 network (Carpenter and Grossberg 1987) and the Kohonen network
(Kohonen 1982) are examples of essentially WTA neural networks. When an input pattern
first arrives, it is assigned to the one category whose prototype pattern best matches it.
The activation of neurons encoding other categories is suppressed (e.g., through strong
inhibition). The prototype of the winner category is then modified to make it slightly closer
to the input pattern. This is done by strengthening the winner’s input connection weights
from features in the input pattern and/or weakening the winner’s input connection weights
from features not in the input pattern. In these networks, generalization is based purely on
similarity of patterns to individual prototypes.
There can exist input patterns (e.g., abc or b in figure 1) that are not part of the training
set but which contain features present in two or more training patterns. Such inputs may
bear similarities to multiple individual prototypes and may be quite different from any
one individual prototype. However, a WTA network cannot activate multiple categories
simultaneously. Hence, the network cannot parse the input in a way that satisfies the
second exclusive allocation condition, so generalization performance suffers.
3.1.2. EXIN networks. In the EXIN (EXcitatory + INhibitory learning) neural network
model (Marshall 1990b, 1995), this problem is overcome by using an anti-Hebbian inhibitory
learning rule in addition to a Hebbian excitatory learning rule. If two output neurons are
frequently coactive, which would happen if the categories that they encode overlap or
have common features, the lateral inhibitory weights between them become stronger. On
the other hand, if the activations of the two output neurons are independent, which would
happen if the neurons encode dissimilar categories, then the inhibitory weights between them
become weaker. This results in category scission between independent category groupings
and allows the EXIN network to generate near-optimal parsings of multiple superimposed
patterns, in terms of multiple simultaneous activations (Marshall 1995).
3.1.3. Linear decorrelator networks. Linear decorrelator networks (Oja 1982, F
¨
oldi
´
ak
1989) also use an anti-Hebbian inhibitory learning rule that can cause the lateral inhibitory
connections to vanish during learning. This allows simultaneous neural activations.
However, the linear decorrelator network responds essentially to differences, or distinctive
features (Anderson et al 1977, Sattath and Tversky 1987) among the patterns, rather than
to the patterns themselves (Marshall 1995).
3.1.4. Example. Figure 2 (Marshall 1995) compares the exclusive allocation performance
of winner-take-all competitive learning networks, linear decorrelator networks, and EXIN
networks and illustrates the intuitions on which the rest of this paper is based. The initial
connectivity pattern in the three networks is identical (figure 2(A)). The networks are trained
on patterns ab, abc, and cd, which occur with equal probability. Within each of the three
networks, a single neuron learns to respond selectively to each of the familiar patterns. In
the WTA and EXIN networks, the neuron labelled ab develops strong input connections
from a and b and weak connections from c and d. Similarly, the neurons labelled abc
and cd develop appropriate selective excitatory input connections. In the WTA network,
the weights on the lateral inhibitory connections among the output neurons remain uniform,
fixed, and strong enough to ensure WTA behaviour. In the EXIN network, the inhibition
between neurons ab and abc and between neurons abc and cd becomes strong because of
the overlap in the category exemplars; the inhibition between neurons ab and cd becomes
284 J A Marshall and V S Gupta
abcd
ab abc cd
+
abcd
ab abc cd
+
Winner-
Take-All
Linear
Decorrelator
EXIN
(B)
(E)
(H)
(C)
(F)
(I)
(D)
(G)
(J)
Lateral
inhibitory
connections
+
Feedforward
excitatory
connections
Input
Output
abcd
Train on patterns
ab, abc, cd.
(A)
Initially
nonspecific
network
ab
ab abc cd
+.5
+1
+.5
+1
–1
+1
–1
c d
ab
ab abc cd
+.5
+1
+.5
+1
–1
+1
–1
cd
abcd
ab abc cd
+
abcd
ab abc cd
+
ab
ab abc cd
+.5
+1
+.5
+1
–1
+1
–1
c d
Input
=
abc
Input
=
abcd
Input
=
c
+
abcd
ab abc cd
abcd
ab abc cd
+
Figure 2. Comparison of WTA, linear decorrelator, and EXIN networks. (A) Initially, neurons
in the input layer project excitatory connections non-specifically to neurons in the output layer.
Also, each neuron in the output layer projects lateral inhibitory connections non-specifically to
all its neighbours (shaded arrows). (B), (C), (D) The excitatory learning rule causes each type
of neural network to become selective for patterns ab, abc, and cd after a period of exposure to
those patterns; a different neuron becomes wired to respond to each of the familiar patterns. Each
network’s response to pattern abc is shown. (E) In WTA, the compound pattern abcd (filled
lower circles) causes the single ‘nearest’ neuron (abc) (filled upper circle) to become active
and suppress the activation of the other output neurons. (G) In EXIN, the inhibitory learning
rule weakens the strengths of inhibitory connections between neurons that code non-overlapping
patterns, such as between neurons ab and cd. Then when abcd is presented, both neurons ab
and cd become active (filled upper circles), representing the simultaneous presence of the
familiar patterns ab and cd. (F) The linear decorrelator responds similarly to EXIN for input
pattern abcd. However, in response to the unfamiliar pattern c, both WTA (H) and EXIN (J)
moderately activate (partially filled circles) the neuron whose code most closely matches the
pattern (cd), whereas the linear decorrelator (I) activates a more distant match (abc). (Reprinted,
with permission, from Marshall (1995), copyright Elsevier Science.)
Generalization and exclusive allocation 285
weak because the category exemplars have no common features. The linear decorrelator
network learns to respond to the differences among the patterns, rather than to the patterns
themselves. For example, the neuron labelled abc really becomes wired to respond optimally
to pattern c-and-not-d. In the linear decorrelator, the weights on the lateral connections
vanish, when the responses of the three neurons become fully decorrelated.
Each of the three networks responds correctly to the patterns in the training set, by
activating the appropriate output neuron (figure 2(B), 2(C) and 2(D)). Now consider the
response of these trained networks to the unfamiliar pattern abcd. The WTA network
responds by activating neuron abc (figure 2(E)) because the input pattern is closest to this
prototype. However, the response of the WTA network to pattern abcd is the same as the
response to pattern abc. The activation of neuron abc can be credited to the input features a,
b, and c. The input feature d is not accounted for in the output: it is not accounted for by
the activation of neuron cd because the activation of neuron cd is zero. Thus, condition 1
from the pair of exclusive allocation conditions is not fully satisfied. Also, the total input
(1 + 1 + 1 + 1 = 4) does not equal the total size-normalized output (0 + (1 × 3) + 0 = 3),
so the WTA network does not satisfy condition 2 for input pattern abcd.
On the other hand, in the linear decorrelator and the EXIN networks, neurons ab
and cd are simultaneously activated (figure 2(F) and 2(G)), because during training, the
inhibition between neurons ab and cd became weak or vanished. Thus, all input features
are fully represented in the output, and the exclusive allocation conditions are met for input
pattern abcd, in the linear decorrelator and EXIN networks. These two networks exhibit
a global context-sensitive constraint satisfaction property (Marshall 1995) in their parsing
of abcd: the contextual presence or absence of small distinguishing features, or nuances,
(like d) dramatically alters the parsing. When abc is presented, the network groups a, b,
and c together as a unit, but when d is added, the network breaks c away from a and b
and binds it with d instead, forming two separate groupings, ab and cd.
It is evident from figure 2(C), 2(F) and 2(I) that the size-normalization value for each
neuron must be computed not by examining the neuron’s input weight values per se (which
would give the wrong values for the linear decorrelator), but rather by examining the size of
the training patterns to which the neuron responds. Thus, the output neuron sizes are (2, 3, 2)
for all the networks shown in figure 2.
Now consider the response of the three networks to the unfamiliar pattern c. As shown
in figure 2(H), 2(I) and 2(J), the WTA and the EXIN networks respond by partially activating
neuron cd. However, in the linear decorrelator network, neuron abc is fully activated. Since,
during training, this neuron was fully activated when the pattern abc was presented, its full
activation is not accounted for by the presence of feature c alone in the input pattern. Thus,
condition 1 is not satisfied by the linear decorrelator network for input pattern c. Note that
abc (figure 2(I)) also does not satisfy condition 2 for pattern c, because 1 6= 1 × 3.
The example of figure 2 thus illustrates that allowing multiple simultaneous neural
activations and learning common features, rather than distinctive features, among the input
patterns enables an EXIN network to satisfy exclusive allocation constraints and to exhibit
good generalization performance when presented with multiple superimposed patterns. In
contrast, WTA networks (by definition) cannot represent multiple patterns simultaneously.
Although linear decorrelator networks can represent multiple patterns simultaneously, they
are not guaranteed to satisfy the exclusive allocation constraints.
A basic idea of this paper is that exclusive allocation provides a meaningful,
self-consistent way of specifying how a network should respond to unfamiliar patterns
and is therefore a valuable criterion for generalization.
286 J A Marshall and V S Gupta
3.2. Equality of total input and total output
Condition 2 will be used to compare a linear decorrelator network, an EXIN network,
and a SONNET-2 (Self-Organizing Neural NETwork-2) network (Nigrin 1993). (Since a
WTA network does not allow simultaneous activation of multiple category winners, it is
not considered in this example.)
3.2.1. SONNET-2. SONNET-2 is a fairly complex network, involving the use of inhibition
between connections (Desimone 1992, Reggia et al 1992, Yuille and Grzywacz 1989),
rather than between neurons, to implement exclusive allocation. The discussion in this
paper will focus on the differences in how EXIN and SONNET-2 networks implement the
inhibitory competition among perceptual categories. Because the inhibition in SONNET-2
acts between connections, rather than between neurons, it is more selective. Connections
from one input neuron to different output neurons compete for the ‘right’ to transmit signals;
this competition is implemented through an inhibitory signal that is a combination of the
excitatory signal on the connection and the activation of the corresponding output neuron.
For example, figures 3(C), 3(F) and 3(I) show that connections from input feature b to the
two output neurons compete with each other; other connections in figures 3(C), 3(F) and 3(I)
do not. As in EXIN networks, the excitatory learning rule involves prototype modification of
output layer competition winners, and the inhibitory learning rule is based on coactivation of
the competing neurons; hence SONNET-2 displays the global context-sensitive constraint
satisfaction property (abc versus abcd) and the sequence masking property (Cohen and
Grossberg 1986, 1987) (abc versus c) (Nigrin 1993) displayed by EXIN networks.
3.2.2. Example. Figure 3 shows three networks (linear decorrelator, EXIN, SONNET-2)
trained on patterns ab and bc, which are assumed to occur with equal training probability.
The linear decorrelator network can end up in one of many possible final configurations,
subject to the constraint that the output neurons are maximally decorrelated. A problem
with linear decorrelators is that they are not guaranteed to come up with a configuration that
responds well to unfamiliar patterns. To illustrate this point, a configuration that responds
correctly to familiar patterns but does not generalize well to unfamiliar patterns has been
chosen for figures 3(A), 3(D) and 3(G).
When the unfamiliar and ambiguous pattern b is presented, the EXIN and SONNET-2
networks respond correctly by activating both neuron ab and neuron bc to about 25% of their
maximum level (figures 3(E) and 3(F)), thus representing the uncertainty in the classification.
This parsing is considered a good one because there are two alternatives for matching input
pattern b (ab and bc) and the input pattern comprises half of both alternatives. Condition 2 is
thus satisfied for this input pattern. The linear decorrelator network activates both neurons ab
and bc to 50% of their maximum level because the neurons receive no inhibitory input
(figure 3(D)); condition 2 is not satisfied because 0 + 1 + 0 6= (2 × 0.5) + (2 × 0.5).
When the unfamiliar (but not ambiguous) pattern ac is presented, the two neurons in
the linear decorrelator network receive a net input of zero and hence do not become active.
This linear decorrelator network thus does not satisfy condition 1 for this input pattern. In
the EXIN network, ab and bc are active to about 25% of their maximum activation. This
behaviour arises because neurons ab and ac still exert an inhibitory influence on each other
because of the overlap in their category prototypes, even though the subpatterns a and c
in the pattern ac have nothing in common. The EXIN network thus does not fully satisfy
condition 1 for this input pattern. On the other hand, in the SONNET-2 network, both
neurons ab and bc are correctly active to 50% of their maximum level. This parsing is
Generalization and exclusive allocation 287
Input
=
ab
Input
=
b
Input
=
ac
EXIN
(B)
(E)
(H)
+
d
ab bc
ba
c
+
d
ab bc
ba
c
+
d
ab bc
ba
c
Linear
Decorrelator
(A)
(D)
(G)
SONNET-2
(C)
(F)
(I)
+
d
ab bc
ba
c
+
d
ab bc
ba
c
+
d
ab bc
ba
c
+.5
ab bc
b
+.5
c
.5
+.5 +.5
a
.5
+.5
ab bc
b
+.5 +.5 +.5
.5
a c
.5
+.5
ab bc
b
+.5 +.5 +.5
ca
.5 .5
Figure 3. Comparison of linear decorrelator, EXIN, and SONNET-2 networks. Three different
networks trained on patterns ab and bc. (A), (B), (C) A different neuron becomes wired to
respond to each of the familiar patterns. The response of each network to pattern ab is shown.
(D) When an ambiguous pattern b is presented, in the linear decorrelator network, both neurons
ab and bc are active at 50% of their maximum level; the input does not fully account for
these activations (see text). (E), (F) The EXIN and SONNET-2 networks respond correctly by
partially activating both neurons ab and bc, to about 25% of the maximum activation. (G) When
pattern ac is presented, the two neurons in the linear decorrelator network receive a net input
of zero and hence do not become active. (H) In the EXIN network, neurons ab and bc still
compete with each other, even though the subpatterns a and c are disjoint. This results in
incomplete representation of the input features at the output. (I) In SONNET-2, links from a
to ab and from c to bc do not inhibit each other; this ensures that neurons ab and bc are active
sufficiently (at about 50% of their maximum level) to fully account for the input features.
considered to be correct because the subpatterns a and c within the input pattern comprise
half of the prototypes ab and bc respectively, and there is only one (partially) matching
alternative for each subpattern. The SONNET-2 network responds in this manner because
the link from input feature a to neuron ab does not compete with the link from input
feature c to neuron bc.
The SONNET-2 network satisfies condition 2 for all three input patterns shown in
figure 3, the EXIN network satisfies condition 2 on two of the input patterns, and the
linear decorrelator network satisfies condition 2 on one of the input patterns. The greater
selectivity of inhibition in SONNET-2 leads to better satisfaction of the exclusive allocation
constraints and thus better generalization. The example of figure 3 thus elaborates the
concept of exclusive allocation and is incorporated in the formalization below.
288 J A Marshall and V S Gupta
3.3. Summary of generalization behaviour examples
Figure 4 summarizes the comparison between the networks that have been considered.
A‘+ in the table indicates that the given network possesses the corresponding property
to a satisfactory degree; a indicates that it does not. The general complexity of the
neural dynamics and the architecture of the networks have also been compared; the <
signs indicate that complexity increases from left to right in the table. WTA networks
have fixed, uniform inhibitory connections and are considered to be the simplest of all
the networks discussed. Linear decorrelators use a single learning rule for both excitatory
and inhibitory connections; further, the inhibitory connection weights can all vanish under
certain conditions. EXIN networks use slightly different learning rules for feedforward and
lateral connections. SONNET-2 implements inhibition between input layer output layer
connections, rather than between neurons. Thus, sophisticated generalization performance
is obtained at the cost of increased complexity.
Example WTA LD EXIN SONNET-2
Figure 2:
abc
versus
abcd
– + + +
Figure 2:
cd
versus
c
+ – + +
Figure 3:
b
versus
ac
– – – +
Complexity of network < < <
Figure 4. Summary of exclusive allocation examples. The table indicates how well
different networks behave on the representative examples of exclusive allocation discussed in
subections 3.1 and 3.2. Key: +, network behaved properly; , network did not behave properly;
<, network complexity increases from left to right.
4. Formal expression of generalization conditions
Section 3 compared the generalization performance of several networks qualitatively. The
exclusive allocation conditions will now be framed in formal terms, so that a quantitative
computation of how well a network adheres to the exclusive allocation constraints, and a
quantitative measure of generalization performance, will be theoretically possible.
As mentioned earlier, classifications done by an unsupervised classifier are determined
by the patterns present in the training environment. Hence, to formalize the two exclusive
allocation conditions, a precise way to describe the concepts or category prototypes learned
by the network must be provided. Deriving such a description is analogous to the
rule-extraction task (Craven and Shavlik 1994): ‘Given a trained neural network and
the examples used to train it, produce a concise and accurate symbolic description of the
network’ (p 38). What does a neuron’s activation mean? A possible description of the
patterns encoded by the neuron can be obtained from the connection weights. However,
because of the possible presence of lateral interactions, feedback, etc, connection weights
may not provide an accurate picture of the patterns learned by the neuron.
Another approach would be to use symbolic if–then rules (Craven and Shavlik 1994).
Such a description can be quite comprehensive and elaborate; however, the number of rules
required to describe a network can grow exponentially with the number of input features.
Generalization and exclusive allocation 289
Moreover, the method described by Craven and Shavlik (1994) for obtaining the rules uses
examples not contained in the training set; the case of real-valued input features is also not
considered.
4.1. Label vectors to describe network behaviour
The approach taken in this paper is to derive a label for each output neuron, based on the
neuron’s activations in response to the familiar input patterns. The label is symbolized as a
label vector and quantitatively summarizes the features to which the neuron responds. An
advantage of this method is that the label can be computed by using examples drawn only
from the training set.
An input pattern is represented by neural activations in the input layer. Each neuron
in the output layer responds to one or more patterns; the label defines a prototype for this
group of patterns. The label for each output neuron is expressed in terms of the input units
that feed it; multilayered networks would be analysed by considering successive layers in
sequence.
Consider an input pattern X = (x
1
,x
2
,...,x
I
), drawn from the network’s input space.
Let the network’s training set be defined by probability distribution S on the input space,
and let p
S
(X) be the probability of X being the input pattern on any training presentation.
When X is presented to a network, the activation of the ith input neuron is x
i
, and the
activation of the j th output neuron is y
j
(X). Y(X) =
y
1
(X), y
2
(X),...,y
J
(X)
is the
vector of output activation values in response to input X. Abbreviate y
j
y
j
(X), and
assume 0 6 x
i
,y
j
6 1. Define
L
0
ij
=
Z
X
p
S
(X) · x
i
· y
j
dX. (1)
If S consists of a finite number of patterns instead of a continuum, then the definition
becomes
L
0
ij
=
X
X
p
S
(X) · x
i
· y
j
. (2)
The L
0
ij
values are normalized to obtain the label values that will be used in expressing the
exclusive allocation conditions:
L
ij
=
L
0
ij
max
k
L
0
kj
. (3)
The label L
j
of the j th output neuron is the vector (L
1j
,L
2j
,...,L
Ij
), where I is the
number of input units.
The label L
j
is a summary that characterizes the set of patterns to which neuron j
responds. The use of labels for this characterization is reasonable for most unsupervised
networks, where learning is based on pattern similarity, and where the decision regions
thus tend to be convex. However, labels would not be appropriate for characterizing
networks with substantially non-convex decision regions, e.g., the type of network produced
by many supervised learning procedures. The process of computing labels is essentially a
rule extraction process, to infer the structure of a network, given knowledge only of the
training input probabilities and the network’s ‘black box’ input–output behaviour on the
training data. Each component L
ij
of a label is analogous to a weight in an inferred model
of the black box network. One benefit of this approach is that it facilitates comparing the
generalization behaviour of different networks, without regard to differences in their internal
structure or operation.
290 J A Marshall and V S Gupta
4.2. Exclusive allocation conditions
When an input pattern is presented to a network, the network parses that pattern and
represents the parsing via activations in the output layer. The activation of each input
neuron can be decomposed into parts, each part being accounted for by (assigned to) a
different output neuron. Thus, for condition 1 to be satisfied, the sum of these parts should
equal the activation of the input neuron, and together they should be able to account for the
activation of all output neurons.
One can describe the decomposition by using parsing coefficients. The parsing
coefficient C
ij
(
ˆ
X,
ˆ
Y) describes how much of the ‘credit’ for the activation of input neuron i
is assigned to output neuron j , given an input pattern vector
ˆ
X and an output vector
ˆ
Y .
Abbreviate C
ij
C
ij
X, Y (X)
. If the exclusive allocation constraints are fully satisfied,
then for each input pattern X (and its corresponding output vector Y(X)) in the full pattern
space there should exist parsing coefficients C
ij
> 0 such that for all output neurons j,
X
i:L
ij
6=0
x
i
C
ij
P
k
C
ik
L
ij
P
k
L
kj
= y
j
. (4)
In equation (4), the normalized label values L
ij
/
P
k
L
kj
are analogous to the weights of a
neural network. The parsing coefficients C
ij
/
P
k
C
ik
describe how the credit for each input
activation is allocated to output neurons, so that the L-weighted, C-allocated inputs exactly
produce the outputs. It is assumed that
P
j
C
ij
> 0 for all i. The sum is taken only over
the non-zero L
ij
values; otherwise the C
ij
coefficients would be underconstrained.
The idea of using parsing coefficients to express exclusive allocation constraints is
similar to the idea of using dynamic gating weights for credit assignment (Rumelhart and
McClelland 1986, Morse 1994). The dynamic gating weights are computed using the actual
or static weights on the connections in the network. In contrast, parsing coefficients are
computed using the label vector for the output neurons and are independent of the actual
connection weights. As seen in the examples in section 3, networks that respond identically
to patterns in the training set can have very different connection weights (the weights may
even have different signs). Hence it is difficult to compare the generalization properties
of these networks using dynamic gating weights. On the other hand, label vectors are
computed from the response of a network to patterns in the training set; the label vectors in
these different networks (figure 2 or figure 3) are identical. The label vector method treats
each network as a black box (independent of the network’s connectivity and weights, which
are internal to the box), examining just the networks’ inputs and outputs. This facilitates a
comparison of the generalization behaviour of the networks.
4.3. Minimization form of conditions
It is possible (e.g., in the presence of noise) that a network does not satisfy equation (4)
exactly. Yet it would still be desirable to measure how close the network comes to
satisfying the exclusive allocation conditions expressed by this equation. Hence the
exclusive allocation requirement should instead be framed as a minimization condition.
By squaring the difference between the left-hand and right-hand sides in equation (4) and
summing over all output neurons, one obtains
E
1
(
ˆ
X,
ˆ
Y) =
1
J
X
j
ˆy
j
(
ˆ
X)
X
i:L
ij
6=0
ˆx
i
C
ij
(
ˆ
X,
ˆ
Y)
P
k
C
ik
(
ˆ
X,
ˆ
Y)
L
ij
P
k
L
kj
2
(5)
Generalization and exclusive allocation 291
where
ˆ
X = ( ˆx
1
, ˆx
2
,..., ˆx
I
) and
ˆ
Y = ( ˆy
1
, ˆy
2
,..., ˆy
J
) are placeholder variables. The
normalization 1/J adjusts for the number of output units J .
By integrating over the network’s parsings of all possible input patterns, one obtains
the quantity
E
1
=
Z
X
E
1
X, Y (X)
dX. (6)
Thus, for each input pattern X, the objective is to find a set of parsing
coefficients C
ij
X, Y (X)
that minimizes the measure E
1
of the network’s exclusive
allocation ‘deficiency’, in a least-squares sense. The measure E
1
is computed across all
patterns in the full pattern space, whereas (as shown in equation (1)) the labels are computed
across only the training set S. How the parsing coefficients can be obtained, for the purpose
of measuring network behaviour, is a separate question, not treated in detail in this paper.
This analysis is concerned non-constructively with the existence of parsing coefficients
that satisfy or minimize the equations. In practice, the minimization can be performed in a
number of ways, e.g., using an iterative procedure (Morse 1994) to find the coefficients.
4.4. Condition 2 is necessary but not sufficient
The E
1
scores produced by equation (5) can be used as a criterion to grade a network’s
generalization behaviour on particular input pattern parsings. For instance, in figure 2(I), it
is easily seen that equation (5) will produce poor scores for the linear decorrelator’s parsing
of pattern c, with any set of parsing coefficients.
However, for certain input patterns there can be more than one parsing that would yield
good E
1
scores; some of these parsings may reflect better generalization behaviour than
others. An extreme example is illustrated in figure 5, which shows a network with two
input neurons, marked a and b, and two output neurons, marked p and q. The network is
trained on two patterns, (1, 0) and (, 1), which occur with equal probability during training,
with 0 <1. By equations (1)–(3), the neuron labels in this network are
L
ap
= 1 L
bp
= 0 L
aq
= L
bq
= 1.
Suppose that, after the network has been trained, the pattern X = (1, 0) is presented.
As shown in figure 5(A), the network could respond by activating output neuron p fully;
Y(X) = (1,0). Using the parsing coefficients
C
ap
= 1 C
bp
= 0 C
aq
= 0 C
bq
= 0,
this response satisfies equation (4) for all input and output neurons. However, a network
could instead respond as in figure 5(B), where the output is Y(X) =
0,/(1+)
. In this
case, one set of valid parsing coefficients would be
C
ap
= 0 C
bp
= 0 C
aq
= 1 C
bq
= 0.
Even for this parsing, equation (4) is fully satisfied. If this same relationship holds when
is made vanishingly small, then equation (4) will always be satisfied for neuron q with
the given set of parsing coefficients. This example shows that in the limit as 0, the
activation of an input neuron i could be assigned to an inactive output neuron j if the
corresponding label L
ij
were zero, and condition 1 would be satisfied. For this reason,
equation (4) excludes labels of value zero.
Condition 2 is imposed to repair further this anomaly; the parsing in figure 5(B) does
not satisfy the equations listed below. Define
Y
1
(
ˆ
X) =
ˆ
Y : E
1
(
ˆ
X,
ˆ
Y) = min
ˆ
ˆ
Y
E
1
(
ˆ
X,
ˆ
ˆ
Y)
. (7)
292 J A Marshall and V S Gupta
11
0
ε
a b
p
q
11
0
ε
a b
p
q
11
0
ε
b
p
q
a
(A) (B) (C) (D)
ε 1
10
b
pq
a
Figure 5. Credit assignment example. The label vector of each of the two output neurons
(marked p and q) has two elements, corresponding to the two input neurons (marked a and b).
The label of neuron p is (1, 0); the label of neuron q is (, 1). These values are indicated by
the numbers and arrows. Thin arrows denote a weak connection, with weight . Dashed arrows
denote a connection with weight 0. (A) When pattern a is presented, the depicted network
fully activates the output neuron marked p; this parsing satisfies conditions 1 and 2. (B) When
pattern a is presented, the depicted network activates the output neuron marked q to a small
value, /(1+); this parsing satisfies exclusive allocation condition 1, but not condition 2. (C) In
response to input b, neuron p becomes active in the depicted network. However, if C
bp
= 1,
this parsing would satisfy condition 2 but not condition 1. (D) Here the label of neuron p
is (, 1); the label of neuron q is (0, 1). When input pattern a is presented, the network’s best
response (according to the neuron labels) is to activate neuron p to the level /(1 + ). This
parsing satisfies condition 1, and condition 2 should be designed so that this parsing is judged
to satisfy it.
Y
1
(
ˆ
X) is the set of all output vectors that would best satisfy condition 1, given input
ˆ
X.
Next, define the function
M
2
(
ˆ
X,
ˆ
Y) =
X
i
ˆx
i
X
j
ˆy
j
X
k
L
kj

2
. (8)
This function measures the difference between the total input and the total size-normalized
output. The factor
P
k
L
kj
represents the size of output neuron j .
Next, define
E
2
(
ˆ
X,
ˆ
Y) =
M
2
(
ˆ
X,
ˆ
Y) min
ˆ
ˆ
Y Y
1
(
ˆ
X)
M
2
(
ˆ
X,
ˆ
ˆ
Y)
2
. (9)
The ‘min M
2
term represents the best condition 2 score for any parsing of any output
vector that best satisfies condition 1. The equation thus measures the difference between
the condition 2 score for the given parsing and the condition 2 score for the best parsing.
Finally, define
E
2
=
Z
X
E
2
X, Y (X)
dX. (10)
This equation computes an overall score for how well all the network’s parsings satisfy
condition 2.
For input pattern X = (1, 0) in figures 5(A) and 5(B), both the output vectors
Y(X) = (1,0) and Y(X) =
0,/(1+)) are included in the set Y
1
(X). However, the value
of M
2
(
ˆ
X,
ˆ
Y) equals zero when
ˆ
Y = (1, 0) and exceeds zero when
ˆ
Y =
0,/(1+)
. The
minimum value of the M
2
function in (9) is zero. Thus, E
2
(
ˆ
X,
ˆ
Y) = 0 when
ˆ
Y = (1, 0),
and E
2
(
ˆ
X,
ˆ
Y) > 0 when
ˆ
Y =
0,/(1+)
. The measure E
2
is minimized in networks
that behave like the network of figure 5(A), rather than like the network of figure 5(B).
Generalization and exclusive allocation 293
As figures 5(A) and 5(B) illustrate, condition 2 is a necessary part of the definition of
exclusive allocation. Figure 5(C) shows that condition 2 alone is not sufficient to define
exclusive allocation; the parsing
C
ap
= 0 C
bp
= 1 C
aq
= 0 C
bq
= 0
satisfies condition 2 but not condition 1. Therefore, both conditions are necessary in the
definition of exclusive allocation.
The equations above for condition 2 were designed to insure that the parsing shown in
figure 5(D) is considered valid. In this case, the total input (= 1) does not equal the total
size-normalized output (= /(1 + ). Nevertheless, the parsing is the best one possible,
given the labels shown. In equation (9), the quality of a parsing is measured relative to the
quality of the best parsing, rather than to an arbitrary external standard. For this reason, the
clause ‘as closely as possible’ is included in condition 2.
4.5. Inexactness tolerance
Consider equations (7)–(9). In a realistic environment with noise, there might exist an
output vector
ˆ
Y that is spuriously excluded from the set Y
1
(
ˆ
X), yet whose M
2
value is
significantly smaller than that of any output vector in Y
1
(
ˆ
X). This situation can occur
because equation (7) requires that the E
1
(
ˆ
X,
ˆ
Y) value exactly equal the smallest such value
for any
ˆ
ˆ
Y ; but noise can preclude exact equality.
Hence, near-equality, rather than exact equality, should be required; some degree of
inexactness tolerance is necessary. Equation (7) must therefore be revised. One way to
remedy the problem is to replace equation (7) with
Y
1
(
ˆ
X) =
ˆ
Y : E
1
(
ˆ
X,
ˆ
Y) 6 T
1
min
ˆ
ˆ
Y
E
1
(
ˆ
X,
ˆ
ˆ
Y)
(11)
where T
1
> 1 is an inexactness tolerance parameter. Using this new equation, E
2
measures
the degree to which condition 2 is satisfied, relative to the best M
2
value chosen from
among the parsings that satisfy condition 1 tolerably well. T
1
thus becomes an additional
free parameter of the evaluation process.
The two exclusive allocation conditions will be discussed further in section 5. But first,
two additional constraints that refine the measure of generalization will be introduced. The
exclusive allocation conditions leave ambiguous the choice between certain parsings (for
example, between figures 1(C) and 1(D)). The two additional constraints are useful because
they further limit the allowable choices, thereby regularizing or disambiguating the parsings.
The additional constraints are optional: there may be some instances for which the added
regularization is not needed.
4.6. Sequence masking constraint
The ‘sequence masking’ property (Cohen and Grossberg 1986, 1987, Marshall 1990b, 1995,
Nigrin 1993) concerns the responses of a system to patterns of different sizes (or scales). It
holds that large, complete output representations are better than small ones or incomplete
ones. For example, it is better to parse input pattern ab as a single output category ab
(figure 6(A)) than as two smaller output categories a + b (figure 6(B)). It is also better to
parse input ab as the complete output category ab (figure 6(A)) than as an incomplete part
of a larger output category abcd (figure 6(C)).
294 J A Marshall and V S Gupta
(C)
a abc
d
bab
a cb d
(A)
a abcdbab
a cb d
(B)
a abcdbab
a cb d
Figure 6. Sequence masking. Input pattern ab is presented. Possible output responses satisfying
conditions 1 and 2 are shown. (A) Output is ab. (B) Output is a+b. (C) Output is 50% activation
of abcd.
A new sequence masking constraint can optionally be imposed, to augment the exclusive
allocation criterion, as part of the definition of generalization. The sequence masking
constraint biases the network evaluation measure toward preferring parsings that exhibit the
sequence masking property. The sequence masking constraint can be stated as
Condition 3. Large, complete output representations are better than small ones or
incomplete ones.
One way to implement condition 3 is as follows. Let
Y
2
(
ˆ
X) =
ˆ
Y : E
2
(
ˆ
X,
ˆ
Y) 6 T
2
min
ˆ
ˆ
Y Y
1
(
ˆ
X)
E
2
(
ˆ
X,
ˆ
ˆ
Y)
(12)
M
3
(
ˆ
Y) =
1
P
j
ˆy
j
P
k
L
kj
2

1+
P
k
L
kj
(13)
E
3
(
ˆ
X) =
M
3
ˆ
Y(
ˆ
X)
min
ˆ
ˆ
Y Y
2
(
ˆ
X)
M
3
ˆ
ˆ
Y(
ˆ
X)
2
. (14)
Here Y
2
(
ˆ
X) is the set of output vectors satisfying condition 1 that also best satisfy
condition 2, in response to a given input pattern
ˆ
X. The parameter T
2
> 1 specifies the
inexactness tolerance of the evaluation process with regard to satisfaction of condition 2.
The function M
3
computes a bias in favour of larger, complete output representations.
Using this function, the network of figure 6(A) would have an M
3
score of
3
4
, the network
of figure 6(B) would have an M
3
score of
2
2
, and the network of figure 6(C) would have an
M
3
score of
5
4
; smaller values are considered to be better.
In equation (14), the ‘minM
3
’ term represents the best condition 3 score for any parsing
of any output vector that best satisfies conditions 1 and 2. The equation thus measures the
difference between the condition 3 score for the given parsing and the condition 3 score for
the best parsing.
To compute an overall score for how well the network’s parsings satisfy condition 3,
define
E
3
=
Z
X
E
3
X, Y (X)
dX. (15)
This equation integrates the E
3
scores across all possible input patterns.
Generalization and exclusive allocation 295
The sequence masking constraint should be imposed in measurements of generalization
when larger, complete representations are more desirable than small ones or incomplete
ones.
4.7. Uncertainty multiplexing constraint
If an input pattern is ambiguous (i.e., there exists more than one valid parsing), then
conditions 1 and 2 do not indicate whether a particular parsing should be selected or whether
the representations for the multiple parsings should be simultaneously active. For instance,
in subsection 3.2.2 (figures 1(C) or 3(E) and 3(F)), when pattern b is presented, conditions 1
and 2 can be satisfied if neuron ab is half-active and neuron bc is inactive, or if ab is inactive
and bc is half-active, or by an infinite number of combinations of activations between these
two extreme cases.
Marshall (1990b, 1995) discussed the desirability of representing the ambiguity in such
cases by partially activating the alternative representations, to equal activation values. The
output in which neurons ab and bc are equally active at the 25% level (figures 1(C), 3(E) and
3(F)) would be preferred to one in which they were unequally active: for example, when
ab is 50% active and bc is inactive (figure 1(D)). This type of representation expresses
the network’s uncertainty about the true classification of the input pattern, by multiplexing
(simultaneously activating) partial activations of the best classification alternatives.
A new ‘uncertainty multiplexing’ constraint can optionally be imposed, to augment
the exclusive allocation criterion. The uncertainty multiplexing constraint regularizes the
classification ambiguities by limiting the allowable relative activations of the representations
for the multiple alternative parsings for ambiguous input patterns. The uncertainty
multiplexing constraint can be stated as
Condition 4. When there is more than one best match for an input pattern, the
best-matching representations divide the input signals equally.
The notion of best match is specified by conditions 1 and 2, and (optionally) 3. (Other
definitions for best match can be used instead.)
One way to implement the uncertainty multiplexing constraint is as follows. Let
Y
3
(
ˆ
X) =
ˆ
Y : E
3
(
ˆ
X,
ˆ
Y) 6 T
3
min
ˆ
ˆ
Y Y
2
(
ˆ
X)
E
3
(
ˆ
X,
ˆ
ˆ
Y)
(16)
Y
4
(
ˆ
X) = mean
Y
3
(
ˆ
X)
(17)
E
4
(
ˆ
X,
ˆ
Y) =
ˆ
Y Y
4
(
ˆ
X)
2
(18)
where mean)
R
α dα

kαk is the element-by-element mean of the set of vectors α,
where kαk describes the size or ‘measure’ of the region α, and where α
2
refers to the dot
product of vector α with itself. Function Y
2
selects the set of output vectors satisfying
condition 1 that also best satisfy condition 2. Of this set of output vectors, function Y
3
selects the subset that also best satisfies condition 3. Function Y
4
averages all the output
vectors in this subset together. Finally, E
4
treats this average as the ‘ideal’ output vector
and measures the deviation of a given output vector from the ideal. Parameter T
3
> 1
specifies the inexactness tolerance of the evaluation process with regard to satisfaction of
condition 3.
By these equations, the parsing of input pattern b in figure 1(C) in which neurons ab
and bc are equally active at the 25% level would be preferred to a parsing in which they
were unequally active: for example, when ab is 50% active and bc is inactive.
296 J A Marshall and V S Gupta
If enforcement of uncertainty multiplexing is desired but enforcement of sequence
masking is not desired, then equation (16) can be replaced by Y
3
(
ˆ
X) = Y
2
(
ˆ
X).
To compute an overall score for how well the network’s parsings satisfy condition 4,
define
E
4
=
Z
X
E
4
X, Y (X)
dX. (19)
The uncertainty multiplexing constraint should be imposed in measurements of
generalization when balancing ambiguity among likely alternatives is more desirable than
making a definite guess.
4.8. Scoring network performance
To compare objectively the generalization performance of specific networks, given a
particular training environment, one can formulate a numerical score that incorporates the
four criteria E
1
, E
2
, E
3
, and E
4
. One can assign each of these factors a weighting to yield
an overall network performance score. For instance, the score can be defined as
E
T
1
,T
2
,T
3
= a
1
E
1
+ a
2
E
2
+ a
3
E
3
+ a
4
E
4
(20)
where a
1
, a
2
, a
3
, and a
4
are weightings that reflect the relative importance of the four
generalization conditions, and T
1
, T
2
, and T
3
are parameters of the evaluation process,
specifying the degrees to which various types of inexactness are tolerated (equations (11),
(12) and (16)). The score of each network can then be computed numerically (and
laboriously). A full demonstration of such a computation would be an interesting next
step for this research. The choice of weightings for the various factors would affect the
final rankings.
It is theoretically possible to eliminate the free parameters for inexactness tolerance
by replacing them with a fixed calculation, such as a standard deviation from the mean.
However, such expedients have not been explored in this research. In any case, a single set
of parameter values or calculations should be chosen for any comparison across different
network types.
To compare fully the generalization performance of the classifiers themselves (e.g., all
linear decorrelators versus all EXIN networks) might require evaluating
R
E(S) dS, across all
possible training environments S. Such a calculation is obviously infeasible, except perhaps
by stochastic analysis. Nevertheless, the comparisons can be understood qualitatively on
the basis of key examples, like the ones presented in this paper.
5. Assessing generalization in EXIN network simulations
This section discusses how the generalization criteria can be applied to measure the
performance of an EXIN network. The EXIN network was chosen as an example because
it yields good but not perfect generalization; thus, it shows effectively how the criteria
operate.
Figure 7(A) shows an EXIN network that has been trained with six input patterns
(Marshall 1995). Figure 7(B) shows the multiplexed, context-sensitive response of the
network to a variety of familiar and unfamiliar input combinations. All 64 possible binary
input patterns were tested, and reasonable results were produced in each case (Marshall
1995); figure 7(B) shows 16 of the 64 tested parsings.
Given the four generalization conditions described in the preceding sections, the
performance of this EXIN network (Marshall 1995) will be summarized below. A sample
Generalization and exclusive allocation 297
2 3 4 3 3 4
abc
cdab
de
def
a
b c d e f
a
}
}
+
(A)
(B)
Figure 7. Simulation results. (A) The state of an EXIN network after 3000 training presentations
of input patterns drawn from the set a, ab, abc, cd, de, def . The input pattern coded by each
output neuron (the label) is listed above the neuron body. The approximate ‘size’ normalization
factor of each output neuron is shown inside the neuron. Strong excitatory connections (weights
between 0.992 and 0.999) are indicated by lines from input (lower) neurons to output (upper)
neurons. All other excitatory connections (weights between 0 and 0.046) are omitted from the
figure. Strong inhibitory connections (weights between 0.0100 and 0.0330), indicated by thick
lateral arrows, remain between neurons coding patterns that overlap. The thickness of the lines
is proportional to the inhibitory connection weights. All other inhibitory connections (weights
between 0 and 0.0006) are omitted from the figure. (B) 16 copies of the network. Each copy
illustrates the network’s response to a different input pattern. Network responses are indicated
by filling of active output neurons; fractional height of filling within each circle is proportional to
neuron activation value. Rectangles are drawn around the networks that indicate the responses
to one of the training patterns. (Redrawn with permission, from Marshall (1995), copyright
Elsevier Science.)
of the most illustrative parsings, and the degree to which they satisfy the conditions, will be
discussed. The most complex example below, pattern abcdf, is examined in greater detail,
using the equations described in the preceding section to evaluate the parsing.
5.1. EXIN network response to training patterns
Consider the response of the network to a pattern in the training set, such as a (figure 7(B)).
The active output neuron has the label a. It is fully active, so it fully accounts for the input
pattern a. No other output neuron is active, so the activations across the output layer are
298 J A Marshall and V S Gupta
fully accounted for by the input pattern. Thus, condition 1 is satisfied on the patterns from
the training set. The total input almost exactly equals the total size-normalized output, so
condition 2 is well satisfied on these patterns. Condition 3 is also satisfied for the training
patterns: for example, pattern ab activates output neuron ab, not a or abc. Since the
training patterns are unambiguous, condition 4 is satisfied by default on these patterns. As
seen in figure 7(B), the generalization conditions are satisfied for all patterns in the training
set (marked by rectangles).
5.2. EXIN network response to ambiguous patterns
Consider the response of the network to an ambiguous pattern such as d. Pattern d is part of
familiar patterns cd, de, and def, and it matches cd and de most closely. The corresponding
two output neurons are active, both between the 25% and 50% levels. Conditions 1 and 2
appear to be approximately satisfied on this pattern: the activation of d is accounted for
by split activation across cd and de, and the activations of cd and de are accounted for by
disjoint fractions of the activation of d. Condition 3 is well satisfied because neuron def
is inactive. Condition 4 is also approximately (but not perfectly) satisfied, since the two
neuron activations are nearly equal.
Now consider pattern b, which is part of patterns ab and abc. The network activates
neuron ab at about the 50% level. Since b constitutes 50% of the pattern ab, the activation
of neuron ab fully accounts for the input pattern. Likewise, the activation of ab is fully
accounted for by the input pattern. Pattern b is more similar to ab than to abc,soitis
correct for neuron abc to be inactive in this case, by condition 3.
Similarly, pattern c is part of abc and cd. However, neuron abc is slightly active, and
neuron cd is active at a level slightly less than 50%. Condition 1 is satisfied on pattern c:
the sum of the output activations attributable to c is still the same as that of the activations
attributable to b in the previous example, and the activation of neurons abc and cd are
attributable to disjoint fractions (approximately 25% and 75%) of the activation of c; thus,
condition 2 is well satisfied. Condition 3 is not as well satisfied here as in the previous
example. The difference can be explained by the weaker inhibition between abc and cd
than between ab and abc; more coactivation is thus allowed.
Input pattern c is unambiguous, by condition 4. To satisfy condition 4 on an input
pattern, a network must determine which output neurons represent the best matches for the
input pattern. The simultaneous partial activation of abc and cd is a manifestation of some
inexactness tolerance in determining the best matches, by the EXIN network. Alternatively,
as described in subsection 3.2.2, greater selectivity in the interneuron inhibition (as in
SONNET-2) can be used to satisfy condition 2 more exactly.
The results in figure 7 show that when presented with ambiguous patterns, the EXIN
network activates the best match, and when there is more than one best match, it permits
simultaneous activation of the best matches. Thus, the generalization behaviour on
ambiguous patterns meets the exclusive allocation conditions satisfactorily.
5.3. EXIN network response to multiple superimposed patterns
Consider the response of the network to pattern abcd. Pattern abcd can be compared with
patterns ab and cd; the response to abcd is the superposition of the separate responses
to ab and cd. Conditions 1, 2 and 4 are clearly met here. As discussed in subsection 3.1.4,
this is in contrast to the response of a WTA network, where the output neuron abc would
Generalization and exclusive allocation 299
become active. Condition 3 is also met here; there is no output neuron labelled abcd, and
if neuron abc were fully active, then the input from neuron d could be accounted for only
by partial activation of another output neuron, such as de.
When f is added to abcd to yield the pattern abcdf, a chain reaction alters the network’s
response, from def down to a in the output layer. The presence of d and f causes the def
neuron to become approximately 50% active. In turn, this inhibits the cd neuron more,
which then becomes less active. As a result, the abc neuron receives less inhibition and
becomes more active. This in turn inhibits the activation of neuron ab. Because neuron ab
is less active, neuron a then becomes more active. These increases and decreases tend to
balance one another, thereby keeping conditions 1 and 2 satisfied on pattern abcdf. The
dominant parsing appears to be ab+ cd+ def, but the overlap between cd and def prevents
those two neurons from becoming fully coactive. As a result, the alternative parsings
involving abc or a can become partially active. No strong violations of conditions 3 and 4
are apparent. The responses to patterns cdf, abcf, and bcdf are also shown for comparison.
The patterns listed above were selected for discussion on the basis of their interesting
properties. The network’s response to all the other patterns can also be evaluated using
the exclusive allocation criterion. In each case, the EXIN network adheres well to the
four generalization conditions. Thus, the simulation indicates the high degree with which
EXIN networks show exclusive allocation, sequence masking, and uncertainty multiplexing
behaviours.
5.4. An example credit assignment computation
The generalization conditions can be formalized in a number of ways; the equations given
above represent one such formalization. For example, a different computation could be
used to express exclusive allocation deficiency, instead of the least-squares method of
equations (5), (8), (9), (14) and (18). Nonlinearities could be introduced in the credit
assignment scheme (equation (4)). The formalization given here expresses the generalization
conditions in a relatively simple manner that is suitable from a computational viewpoint.
Figure 8 describes a computation of the extent to which the network in figure 7 satisfies
the generalization conditions for a particular input pattern. The table in the rectangle
describes approximate parsing coefficients for pattern abcdf. The coefficients shown in
the table were estimated manually, to two decimal places. These coefficients represent the
portion of the credit that is assigned between each input neuron activation and each output
neuron activation. For example, the activation of input neuron a is 1; 21% of its credit
is allocated to output neuron a, and 79% is allocated to ab. The input to neuron ab
is 0.79 + 0.38, the sum of the contributions it receives from different input neurons
weighted by the activation of the input neurons. This input is divided by the neuron’s
normalization factor (‘size’), 2. This normalization factor is derived from the neuron’s
label, which is determined by the training (familiar) patterns to which the neuron responds
(equations (2) and (3)). The resulting attributed activation value, 0.59, is very close to the
actual activation, 0.58, of neuron ab in the simulation. The existence of parsing coefficients
(e.g., those in figure 8) that produce attributed activations that are all close to the actual
allocations shows that condition 1 (equation (4)) is well satisfied for the input pattern abcdf.
Condition 2 is well satisfied because
P
i
x
i
(which equals 5) is very close to
P
j
y
j
P
k
L
kj
(which equals (0.19 × 1) + (0.58 × 2) + (0.24 × 3) + (0.58 × 2) + (0.00 ×
2) +(0.56× 3) = 4.91). Numerical values for conditions 3 and 4 can also be calculated, but
the calculations would be much more computationally intensive, as they call for evaluation
of all possible parsings of an input pattern, within a given training environment.
300 J A Marshall and V S Gupta
.21
.79
.00
.38
.62 .12
.88 .30
.00
.70
.00
.00 1.00
abcdef
a
ab
abc
cd
de
def
From input neuron
To output
neuron
.21
1.17
.74
1.18
.00
1.70
/ 1 =
/ 2 =
/ 3 =
/ 2 =
/ 2 =
/ 3 =
.21
.59
.25
.59
.00
.57
.19
.58
.24
.58
.00
.56
Attributed
activation
Neuron size
Total parse
energy
Actual
activation
111101
C L
Σ
C
ij
ij
k
ik
1
/
(Σ
L
)
k
kj
y
j
Input acti-
vations x
i
Figure 8. Parsing coefficients and attributed activations. The table inside the rectangle shows
parsing coefficients: the inferred decomposition of the credit from each input neuron into
each output neuron to produce the activations shown in figure 7. Because the results of this
computation are very close to the EXIN network’s simulated results (compare grey shaded
columns on the right), it can be concluded that the EXIN network satisfies condition 1 for
exclusive allocation very well, on pattern abcdf. (Redrawn with permission, from Marshall
(1995), copyright Elsevier Science.)
6. Discussion
The exclusive allocation criterion was used to compare qualitatively the generalization
performance of four unsupervised classifiers: WTA competitive learning networks, linear
decorrelator networks, EXIN networks, and SONNET-2 networks. The comparisons suggest
that more sophisticated generalization performance is obtained at the cost of increased
complexity. The exclusive allocation behaviour of an EXIN network was examined in more
detail, and one parsing was analysed quantitatively. The concept of exclusive allocation,
or credit assignment, is a conceptually useful way of defining generalization because it
lends itself very well to the natural problem of decomposing and identifying independent
sources underlying superimposed or ambiguous signals (blind source separation) (Bell and
Sejnowski 1995, Comon et al 1991, Jutten and Herault 1991).
This paper has described formal criteria for evaluating the generalization properties
of unsupervised neural networks, based on the principles of exclusive allocation, sequence
masking, and uncertainty multiplexing. The examples and simulations show that satisfaction
of the generalization conditions can enable a network to do context-sensitive parsing, in
response to multiple superimposed patterns as well as ambiguous patterns. The method
describes a network in terms of its response to patterns in the training set and then places
constraints on the response of the network to all patterns, both familiar and unfamiliar.
The concepts of exclusive allocation, sequence masking, and uncertainty multiplexing
thus provide a principled basis for evaluating the generalization capability of unsupervised
classifiers.
The criteria in this paper define success for a system in terms of the quality of the
system’s internal representations of its input environment, rather than in terms of a particular
external task. The internal representations are inferred, without actually examining the
Generalization and exclusive allocation 301
system’s internal processing, weights, etc, through a black-box approach of ‘labelling’:
observing the system’s responses to its training (‘familiar’) inputs. Then the system’s
generalization performance is evaluated by examining its responses to both familiar and
unfamiliar inputs. This definition is useful when a system’s generalization cannot be
measured in terms of performance on a specific external task, either when objective
classifications (‘supervision’) of input patterns are unavailable or when the system is general
purpose.
Acknowledgments
This research was supported in part by the Office of Naval Research (Cognitive and Neural
Sciences, N00014-93-1-0208) and by the Whitaker Foundation (Special Opportunity Grant).
We thank George Kalarickal, Charles Schmitt, William Ross, and Douglas Kelly for valuable
discussions.
References
Anderson J A, Silverstein J W, Ritz S A and Jones R S 1977 Distinctive features, categorical perception, and
probability learning: Some applications of a neural model Psychol. Rev. 84 413–51
Bell A J and Sejnowski T J 1995 An information-maximization approach to blind separation and blind
deconvolution Neural Comput. 7 1129–59
Bregman A S 1990 Auditory Scene Analysis: The Perceptual Organization of Sound (Cambridge, MA: MIT Press)
Carpenter G A and Grossberg S 1987 A massively parallel architecture for a self-organizing neural pattern
recognition machine Comput. Vision, Graphics Image Process. 37 54–115
Cohen M A and Grossberg S 1986 Neural dynamics of speech and language coding: developmental programs,
perceptual grouping, and competition for short term memory Human Neurobiol. 5 1–22
——1987 Masking fields: A massively parallel neural architecture for learning, recognizing and predicting multiple
groupings of patterned data Appl. Opt. 26 1866–91
Comon P, Jutten C and Herault J 1991 Blind separation of sources, part II: problems statement Signal Process. 24
11–21
Craven M W and Shavlik J W 1994 Using sampling and queries to extract rules from trained neural networks
Machine Learning: Proc. 11th Int. Conf. (San Francisco, CA: Morgan Kaufmann) pp 37–45
Desimone R 1992 Neural circuits for visual attention in the primate brain Neural Networks for Vision and Image
Processing ed G A Carpenter and S Grossberg (Cambridge, MA: MIT Press) pp 343–64
F
¨
oldi
´
ak P 1989 Adaptive network for optimal linear feature extraction Proc. Int. Joint Conf. on Neural Networks
(Washington, DC) (Piscataway, NJ: IEEE) vol I, pp 401–5
Hubbard R S and Marshall J A 1994 Self-organizing neural network model of the visual inertia phenomenon in
motion perception Technical Report 94-001 Department of Computer Science, University of North Carolina
at Chapel Hill, 26 pp
Jutten C and Herault J 1991 Blind separation of sources, part I: an adaptive algorithm based on neuromimetic
architecture Signal Process. 24 1–10
Kohonen T 1982 Self-organized formation of topologically correct feature maps Biol. Cybern. 43 59–69
Marr D 1982 Vision: A Computational Investigation into the Human Representation and Processing of Visual
Information (San Francisco, CA: Freeman)
Marr D and Poggio T 1976 Cooperative computation of stereo disparity Science 194 238–87
Marshall J A 1990a Self-organizing neural networks for perception of visual motion Neural Networks 3 45–74
——1990b A self-organizing scale-sensitive neural network Proc. Int. Joint Conf. on Neural Networks
(San Diego, CA) (Piscataway, NJ: IEEE) vol III, pp 649–54
——1990c Adaptive neural methods for multiplexing oriented edges Proc. SPIE 1382 (Intelligent Robots and
Computer Vision IX: Neural, Biological, and 3-D Methods, Boston, MA) ed D P Casasent, pp 282–91
——1992 Development of perceptual context-sensitivity in unsupervised neural networks: Parsing, grouping, and
segmentation Proc. Int. Joint Conf. on Neural Networks (Baltimore, MD) (Piscataway, NJ: IEEE) vol III,
pp 315–20
——1995 Adaptive perceptual pattern recognition by self-organizing neural networks: Context, uncertainty,
multiplicity, and scale Neural Networks 8 335–62
302 J A Marshall and V S Gupta
Marshall J A, Kalarickal G J and Graves E B 1996 Neural model of visual stereomatching: Slant, transparency,
and clouds Network: Comput. Neural Syst. 7 635–70
Marshall J A, Kalarickal G J and Ross W D 1997 Transparent surface segmentation and filling-in using local
cortical interactions Investigative Ophthalmol. Visual Sci. 38 641
Marshall J A, Schmitt C P, Kalarickal G J and Alley R K 1998 Neural model of transfer-of-binding in visual
relative motion perception Computational Neuroscience: Trends in Research, 1998 ed J M Bower, to appear
Morse B 1994 Computation of object cores from grey-level images PhD Thesis Department of Computer Science,
University of North Carolina at Chapel Hill
Nigrin A 1993 Neural Networks for Pattern Recognition (Cambridge, MA: MIT Press)
Oja E 1982 A simplified neuron model as a principal component analyzer J. Math. Biol. 15 267–73
Reggia J A, D’Autrechy C L, Sutton G G and Weinrich M 1992 A competitive redistribution theory of neocortical
dynamics Neural Comput. 4 287–317
Rumelhart D E and McClelland J L 1986 On the learning of past tenses of English verbs Parallel Distributed
Processing: Explorations in the Microstructure of Cognition. Volume 2: Psychological and Biological Models
(Cambridge, MA: MIT Press) pp 216–71
Sattath S and Tversky A 1987 On the relation between common and distinctive feature models Psychol. Rev. 94
16–22
Schmitt C P and Marshall J A 1998 Grouping and disambiguation in visual motion perception: A self-organizing
neural circuit model, in preparation
Yuille A L and Grzywacz N M 1989 A winner-take-all mechanism based on presynaptic inhibition feedback Neural
Comput. 1 334–47
... Unlike the EXIN model, the LISSOM model has modiiable lateral excitatory pathways and uses an instar lateral inhibitory synaptic plasticity rule. However, like the EXIN rules, the LISSOM rules produce a sparse, distributed coding that reduces redundancies (Marshall, 1995a; Marshall & Gupta, 1998; Sirosh et al., 1996). Lateral excitatory pathways in the LISSOM model help the development of a topographic RF arrangement. ...
... Strong lateral inhibition between two neurons tends to make them less likely to be coactivated, causing the two to become selective to diierent inputs according to the excitatory synaptic plasticity rule (Section 2.5.2). Thus, when the network is exposed to normal stimuli, the lateral inhibitory weights and the excitatory aaerent weights are modiied so that each neuron becomes selective to diierent inputs and the RFs of all Layer 2 neurons cover the input space (Marshall, 1995a; Marshall & Gupta, 1998). This leads to improved discrimination and sparse coding (Marshall, 1995a). ...
... The EXIN lateral inhibitory synaptic plasticity rule directly reduces inhibition to neurons inactivated by peripheral scotomas or lesions, thus making them more likely to respond to some visual stimuli. The EXIN lateral inhibitory synaptic plasticity rule enhances the eeciency of a neural network's representation of perceptual patterns, by recruiting unused and under-used neurons to represent input patterns (Marshall, 1995a; Marshall & Gupta, 1998). In comparison, the LISSOM lateral inhibitory synaptic plasticity rule weakens lateral inhibitory pathways from inactive neurons to active neurons, thereby tending to make the active neurons more strongly active and to suppress the inactive neurons more strongly. ...
Article
Full-text available
The position, size, and shape of the receptive field (RF) of some cortical neurons change dynamically, in response to artificial scotoma conditioning (Pettet & Gilbert, 1992) and to retinal lesions (Chino et al., 1992; Darian-Smith & Gilbert, 1995) in adult animals. The RF dynamics are of interest because they show how visual systems may adaptively overcome damage (from lesions, scotomas, or other failures), may enhance processing efficiency by altering RF coverage in response to visual demand, and may perform perceptual learning. This paper presents an afferent excitatory synaptic plasticity rule and a lateral inhibitory synaptic plasticity rule -- the EXIN rules (Marshall, 1995a) -- to model persistent RF changes after artificial scotoma conditioning and retinal lesions. The EXIN model is compared to the LISSOM model (Sirosh et al., 1996) and to a neuronal adaptation model (Xing & Gerstein, 1994). The rules within each model are isolated and are analyzed independently, to elucidate t...
... But there have been very few studies comparing abstract synaptic plasticity rules with experimental data (e.g., Bear et al., 1987;Dudek & Bear, 1992). In addition, only a few models use and emphasize the role of inhibitory synaptic plasticity (e.g., Marshall, 1990abc, 1995aMarshall & Gupta, 1998;Sirosh & Miikkulainen, 1996) in development of important computational and neurobiological properties. ...
... Thus, whenever a neuron is active, its output excitatory connections to active neurons tend to become slightly stronger, while its output excitatory connections to inactive neurons tend to become slightly weaker. Neuron activations remain within ?C; B] according to the shunting equation (Section 2.1); this in turn causes the excitatory weight values to be bounded between 0 and Q(B) (Grossberg, 1982 (Marshall, 1995a;Marshall & Gupta, 1998) where > 0 is a small learning rate constant, and H and R are half-recti ed non-decreasing functions. 8 Thus, whenever a neuron is active, its output inhibitory connections to other active neurons tend to become slightly stronger (i.e., more inhibitory), while its output inhibitory connections to inactive neurons tend to become slightly weaker. ...
... There have been only a few experiments on lateral inhibitory synaptic plasticity (e.g., Levy suggested several lateral inhibitory synaptic plasticity rules, including the outstar lateral inhibitory rule, to model some aspects of classical conditioning. To motivate further experimentation on lateral inhibitory synaptic plasticity, predictions of the outstar lateral inhibitory synaptic plasticity rule (Marshall, 1990a(Marshall, , 1995aMarshall & Gupta, 1998) are presented. As in the case of the excitatory synaptic plasticity rules, changes in the lateral inhibitory synaptic weights under the outstar lateral inhibitory synaptic plasticity rule are studied as a function of input excitation to model neurons in Section 3.5.1, ...
Article
Full-text available
A large variety of synaptic plasticity rules have been used in models of excitatory synaptic plasticity (Brown et al., 1990). These rules are generalizations of the Hebbian rule and have some properties consistent with experimental data on long-term excitatory synaptic plasticity, but they also have some properties inconsistent with experimental data. For example, the BCM rule (Bear et al., 1987; Bienenstock et al., 1982) produces homosynaptic potentiation and depression, which has been observed experimentally (Artola et al., 1990; Dudek & Bear, 1992; Kirkwood et al., 1993; Fr'egnac et al., 1994; Yang & Faber, 1991). But the BCM rule is also inconsistent with some experimental results; e.g., the BCM rule cannot produce heterosynaptic depression (Abraham & Goddard, 1983; Lynch et al., 1977). In addition, long-term synaptic plasticity in inhibitory pathways has been emphasized in some models of cortical function (Marshall, 1990abc, 1995a; Sirosh et al., 1996), but experimental data on in...
... This paper presents a novel account of the e ects of these pharmacological treatments, based on the EXIN synaptic plasticity rules (Marshall, 1995), which include both an instar a erent excitatory and an outstar lateral inhibitory rule. Functionally, the EXIN plasticity rules enhance the e ciency, discrimination, and context-sensitivity of a neural network's representation of perceptual patterns (Marshall, 1995;Marshall & Gupta, 1998). The EXIN model decreases lateral inhibition from neurons outside the infusion site (control regions) to neurons inside the infusion region, during monocular deprivation. ...
... Some experimental ideas are suggested in Section 5.2. It is hypothesized that the EXIN rules, which were developed from computational considerations (Marshall, 1990a(Marshall, , 1995Marshall & Gupta, 1998), have a neurophysiological realization in the synaptic microcircuitry of cortical tissue and in the neuropharmacology of cortical plasticity. Reiter and Stryker (1988) locally infused muscimol, a GABA agonist selective for GABA A receptors, into the primary visual cortex of kittens during MD. ...
... A functional feature of the EXIN lateral inhibitory plasticity rule is that it enhances e ciency of representation by recruiting unused or under-used neurons (Marshall, 1995) in the presence of peripheral scotomas or lesions to represent some input information . The EXIN rules also produce neurons with high selectivity and sparse distribution coding (Marshall, 1995;Marshall & Gupta, 1998). It is hypothesized that anti-Hebbian outstar lateral inhibitory plasticity may be a general part of cortical development, and speci c experiments to test the model's predictions are proposed. ...
Article
Full-text available
Infusion of a GABA agonist (Reiter & Stryker, 1988) and infusion of an NMDA receptor antagonist (Bear et al., 1990), in the primary visual cortex of kittens during monocular deprivation, shifts ocular dominance toward the closed eye, in the cortical region near the infusion site. This reverse ocular dominance shift has been previously modeled by variants of a covariance synaptic plasticity rule (Bear et al., 1990; Clothiaux et al., 1991; Miller et al., 1989; Reiter & Stryker, 1988). Kasamatsu et al. (1997, 1998) showed that infusion of an NMDA receptor antagonist in adult cat primary visual cortex changes ocular dominance distribution, reduces binocularity, and reduces orientation and direction selectivity. This paper presents a novel account of the effects of these pharmacological treatments, based on the EXIN synaptic plasticity rules (Marshall, 1995), which include both an instar afferent excitatory and an outstar lateral inhibitory rule. Functionally, the EXIN plasticity rules enha...
... In the model, a erent excitatory synaptic plasticity plays the primary role in OD plasticity under the classical rearing paradigms, and lateral inhibitory interactions produce secondary OD changes. The EXIN lateral inhibitory synaptic plasticity rule controls the development of lateral inhibitory pathway weights as a function of neuronal activation and contributes to the development of input feature selectivity and high discriminability of model cortical neurons and to sparse neuronal coding of input features (Marshall, 1995a;Marshall & Gupta, 1998). ...
... It has been proposed that several input feature selectivities depend on intracortical inhibition (Bonds & DeBruyn, 1985;Sillito, 1975Sillito, , 1997Sillito, , 1979Somers et al., 1995;Somogyi & Martin, 1985). The EXIN rules produce neurons with high selectivity and sparse distributed coding (Marshall, 1995a;Marshall & Gupta, 1998). In the EXIN model, strong lateral inhibitory pathways develop between neurons with overlapping receptive elds (Marshall, 1995a), consistent with experimental results suggesting that a neuron receives the strongest inhibition when the orientation of the input stimulus is the same as the neuron's preferred orientation (Blakemore & Tobin, 1972;Ferster, 1989), or when the position of the input stimulus is in the neuron's receptive eld (DeAngelis et al., 1992). ...
Article
Full-text available
Previous models of visual cortical ocular dominance (OD) plasticity (e.g., Clothiaux et al., 1991; Miller et al., 1989) are based on afferent excitatory synaptic plasticity alone; these models do not consider the role of lateral interactions and synaptic plasticity in lateral pathways in OD plasticity. Recent models of other cortical properties and functions have emphasized lateral intracortical interactions, however, and long-range lateral pathways develop during the early postnatal stages (Callaway & Katz, 1990). Thus, a biologically plausible model of OD plasticity should consider the development of intracortical pathways and its effects on OD and other cortical properties during early postnatal stages. In this paper, the EXIN model (Marshall, 1995a), which consists of afferent excitatory and lateral inhibitory synaptic plasticity, is used to model OD plasticity during the "classical" rearing paradigms such as normal rearing, monocular deprivation, reverse suture, strabismus, binocu...
Article
Full-text available
In nationwide mammography screening, thousands of mammography examinations must be processed. Each consists of two standard views of each breast, and each mammogram must be visually examined by an experienced radiologist to assess it for any anomalies. The ability to detect an anomaly in mammographic texture is important to successful outcomes in mammography screening and, in this study, a large number of mammograms were digitized with a highly accurate scanner; and textural features were derived from the mammograms as input data to a SONNET selforganizing neural network. The paper discusses how SONNET was used to produce a taxonomic organization of the mammography archive in an unsupervised manner. This process is subject to certain choices of SONNET parameters, in these numerical experiments using the craniocaudal view, and typically produced O(10), for example, 39 mammogram classes, by analysis of features from O(10(3)) mammogram images. The mammogram taxonomy captured typical subtleties to discriminate mammograms, and it is submitted that this may be exploited to aid the detection of mammographic anomalies, for example, by acting as a preprocessing stage to simplify the task for a computational detection scheme, or by ordering mammography examinations by mammogram taxonomic class prior to screening in order to encourage more successful visual examination during screening. The resulting taxonomy may help train screening radiologists and conceivably help to settle legal cases concerning a mammography screening examination because the taxonomy can reveal the frequency of mammographic patterns in a population.
Article
A large and influential class of neural network architectures uses postintegration lateral inhibition as a mechanism for competition. We argue that these algorithms are computationally deficient in that they fail to generate, or learn, appropriate perceptual representations under certain circumstances. An alternative neural network architecture is presented here in which nodes compete for the right to receive inputs rather than for the right to generate outputs. This form of competition, implemented through preintegration lateral inhibition, does provide appropriate coding properties and can be used to learn such representations efficiently. Furthermore, this architecture is consistent with both neuroanatomical and neurophysiological data. We thus argue that preintegration lateral inhibition has computational advantages over conventional neural network architectures while remaining equally biologically plausible.
Article
A theory of postnatal activity-dependent neural plasticity based on synaptic weight modification is presented. Synaptic weight modifications are governed by simple variants of a Hebbian rule for excitatory pathways and an anti-Hebbian rule for inhibitory pathways. The dissertation focuses on modeling the following cortical phenomena: long-term potentiation and depression (LTP and LTD); dynamic receptive field changes during artificial scotoma conditioning in adult animals; adult cortical plasticity induced by bilateral retinal lesions, intracortical microstimulation (ICMS), and repetitive peripheral stimulation; changes in ocular dominance during "classical" rearing conditioning; and the effect of neuropharmacological manipulations on plasticity. Novel experiments are proposed to test the predictions of the proposed models, and the models are compared with other models of cortical properties. The models presented in the dissertation provide insights into the neural basis of perceptual ...
Article
Full-text available
Stereomatching of oblique and transparent surfaces is described using a model of cortical binocular ‘tuned’ neurons selective for disparities of individual visual features and neurons selective for the position, depth and 3D orientation of local surface patches. The model is based on a simple set of learning rules. In the model, monocular neurons project excitatory connection pathways to binocular neurons at appropriate disparities. Binocular neurons project excitatory connection pathways to appropriately tuned ‘surface patch’ neurons. The surface patch neurons project reciprocal excitatory connection pathways to the binocular neurons. Anisotropic intralayer inhibitory connection pathways project between neurons with overlapping receptive fields. The model’s responses to simulated stereo image pairs depicting a variety of oblique surfaces and transparently overlaid surfaces are presented. For all the surfaces, the model (i) assigns disparity matches and surface patch representations based on global surface coherence and uniqueness, (ii) permits coactivation of neurons representing multiple disparities within the same image location, (iii) represents oblique slanted and tilted surfaces directly, rather than approximating them with a series of frontoparallel steps, (iv) assigns disparities to a cloud of points at random depths, like human observers and unlike K. Prazdny’s [Biol. Cybern. 52, 93-99 (1985; Zbl 0557.92003)] method, and (v) causes globally consistent matches to override greedy local matches. The model represents transparency, unlike the model of D. Marr and T. Poggio [Science 194, 283-287 (1976)], and it assigns unique disparities, unlike the model of Prazdny.
Article
Purpose. Segmentation of transparently overlaid surfaces requires spatially nonlocal communication between the cortical representation of various image features e.g., between two X-junctions on a single surface. The nonlocal communication is needed to allow the local evidence for the validity or invalidity of transparent segmentations to he checked for global consistent1), within each surface. For example, in the figure, both X-junctions are individually consistent with Metelli transparency but are globally inconsistent with each other, and the image evoke.s only little or no perceived transparency. We modeled a cortical circuit that uses layered filiing-in interaction-, to perform nonlocal communication, determines the validity/invalidity of the transparency interpretations, and represents both the reflective and Iransmissive components of surfaces, thus performing surface-based segmentation of transparently overlaid surfaces. Methods. A layered neural circuit with local multiplicative, additive, and other interactions within minicolumns, local lateral diffusive filling-in connections, and dynamic binding to establish layer connectivity was computationally simulated. The simulated neural circuit also used a local anti-binding rule between modeled cortical complex cell.s to allow filling-in to occur independently through overlaid surfaces The simulated cortical response was tested on images containing instances of valid and invalid Metelli transparency. Results. The simulation represented valid transparency configurations by weakening the boundary activations between overlaid surfaces "and by activating neurons within the reflectiveness and transinisMvencss layers to correct values. The simulation leprescnted invalid iranspaiency configurations by strengthening the boundary activations so that the surfaces arc represented as being contiguous, rather than overlaid, and by inactivating neurons u ithm the tranxmissiveness layers. Conclusions. The proposed cortical circuit model shows how nonlocal communication can occur within the representation of overlaid surfaces, via filling-in and dynamic binding interactions in multiple overlapping layers. The model also shows how simple local neural interactions can balance perceptual tradeoffs between surface lightness and Metelli transparency.
Article
A winner-take-all mechanism is a device that determines the identity and amplitude of its largest input (Feldman and Ballard 1982). Such mechanisms have been proposed for various brain functions. For example, a theory for visual velocity estimate (Grzywacz and Yuille 1989) postulates that a winner-take-all selects the strongest responding cell in the cortex's middle temporal area (MT). This theory proposes a circuitry that links the directionally selective cells in the primary visual cortex to MT cells, making them velocity selective. Generally, several velocity cells would respond, but only the winner would determine the perception. In another theory, a winner-take-all guides the spotlight of attention to the most salient image part (Koch and Ullman 1985). Also, such mechanisms improve the signal-to-noise ratios of VLSI emulations of brain functions (Lazzaro and Mead 1989). Although computer algorithms for winner-take-all mechanisms exist (Feldman and Ballard 1982; Koch and Ullman 1985), good biologically motivated models do not. A candidate for a biological mechanism is lateral (mutual) inhibition (Hartline and Ratliff 1957). In some theoretical mutual-inhibition networks, the inhibition sums linearly to the excitatory inputs and the result is passed through a threshold non linearity (Hadeler 1974). However, these networks work only if the difference between winner and losers is large (Koch and Ullman 1985). We propose an alternative network, in which the output of each element feeds back to inhibit the inputs to other elements. The action of this presynaptic inhibition is nonlinear with a possible biophysical substrate. This paper shows that the new network converges stably to a solution that both relays the winner's identity and amplitude and suppresses information on the losers with arbitrary precision. We prove these results mathematically and illustrate the effectiveness of the network and some of its variants by computer simulations.
Article
Edge linearization operators are often used in computer vision and in neural network models of vision to reconstruct noisy or incomplete edges. Such operators gather evidence for the presence of an edge at various orientations across all image locations and then choose the orientation that best fits the data at each point. One disadvantage of such methods is that they often function in a winner-take-all fashion: the presence of only a single orientation can be represented at any point multiple edges cannot be represented where they intersect. For example the neural Boundary Contour System of Grossberg and Mingolla implements a form of winner-take-all competition between orthogonal orientations at each spatial location to promote sharpening of noisy uncertain image data. But that competition may produce rivalry oscillation instability or mutual suppression when intersecting edges (e. g. a cross) are present. This " cross problem" exists for all techniques including Markov Random Fields where a representation of a chosen favored orientation suppresses representations of alternate orientations. A new adaptive technique using both an inhibitory learning rule and an excitatory learning rule weakens inhibition between neurons representing poorly correlated orientations. It may reasonably be assumed that neurons coding dissimilar orientations are less likely to be coactivated than neurons coding similar orientations. Multiplexing by superposition is ordinarily generated: combinations of intersecting edges become represented by simultaneous activation of multiple neurons each of which represents a
Article
Investigates the relationship between the classificatory structure of objects and the dissimilarity between them by discussing the common and distinctive features models. Data indicate that given a feature structure, the 2 models produce different orderings of dissimilarity between objects. However, if one model holds in some feature structure, then the other model also holds, albeit in a different feature structure. It is suggested that the choice of a model and the specification of the feature structure are not always determined by the observed dissimilarity. (16 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)