ArticlePDF Available

MATCH TRACKING STRATEGIES FOR FUZZY ARTMAP NEURAL NETWORKS

Authors:
  • École de Technologie Supérieure (Université du Québec)
Article

MATCH TRACKING STRATEGIES FOR FUZZY ARTMAP NEURAL NETWORKS

Abstract and Figures

Training fuzzy ARTMAP neural networks for classification using data from com-plex real-world environments may lead to category proliferation, and yield poor performance. This problem is known to occur whenever the training set contains noisy and overlapping data. Moreover, when the training set contains identical input patterns that belong to different recognition classes, fuzzy ARTMAP will fail to converge. To circumvent these problems, some alternatives to the net-work's original match tracking (MT) process have been proposed in literature, such as using negative MT, and removing MT altogether. In this chapter, the MT parameter of fuzzy ARTMAP is optimized during training using a new Parti-cle Swarm Optimisation (PSO)-based strategy, denoted PSO(MT). The impact on fuzzy ARTMAP performance of training with different MT strategies is assessed empirically, using different synthetic data sets, and the NIST SD19 handwritten character recognition data set. During computer simulations, fuzzy ARTMAP is trained with the original (positive) match tracking (MT+), with negative match tracking (MT-), without MT algorithm (WMT), and with PSO(MT). Through a com-prehensive set of simulations, it has been observed that by training with MT-, fuzzy ARTMAP expends fewer resources than with other MT strategies, but can achieve a significantly higher generalization error, especially for data with over-lapping class distributions. In particular, degradation of error in fuzzy ARTMAP performance due to overtraining is more pronounced for MT-than for MT+. Gener-alization error achieved using WMT is significantly higher than other strategies on data with complex non-linear decision bounds. Furthermore, the number of in-ternal categories required to represent decision boundaries increases significantly. Optimizing the value of the match tracking parameter using PSO(MT) yields the lowest overall generalization error, and requires fewer internal categories than WMT, but generally more categories than MT+ and MT-. However, this strategy re-quires a large number of training epochs to convergence. Based on this empirical results with PSO(MT), the MT process as such can provide a significant increase to fuzzy ARTMAP performance, assuming that the MT parameter is tuned for the specific application in mind.
Content may be subject to copyright.
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
International Journal of Computational Intelligence and Applications
c
World Scientific Publishing Company
MATCH TRACKING STRATEGIES FOR FUZZY ARTMAP
NEURAL NETWORKS
PHILIPPE HENNIGES, ERIC GRANGERAND ROBERT SABOURIN
Laboratoire d’imagerie, de vision et d’intelligence artificielle
D´
ept. de g´
enie de la production automatis´
ee
´
Ecole de technologie sup´
erieure, Montreal, Canada
LUIZ S. OLIVEIRA
Dept. de Inform´
atica Aplicada
Pontif´
ıcia Universidade Cat´
olica do Paran´
a (PUCPR), Curitiba, Brazil
Received (received date)
Revised (revised date)
Training fuzzy ARTMAP neural networks for classification using data from a complex real-world
environments may lead to category proliferation and yield poor performance. This problem is known
to occur whenever the training set contains noisy or overlapping data. Moreover, when the training set
contains inconsistent cases (i.e., identical input patterns that belong to different recognition classes),
fuzzy ARTMAP will fail to converge. To circumvent these problems, some alternatives to the network’s
original match tracking (MT) process have been proposed in literature, such as using negative MT, and
removing MT altogether. In this paper, the impact on fuzzy ARTMAP performance of training with
different MT strategies is assessed empirically, using different synthetic data sets, and the NIST SD19
data set (a handwritten numerical character recognition problem). During computer simulations, fuzzy
ARTMAP is trained with the original (positive) match tracking (MT+), with negative match tracking
(MT-), and without MT algorithm (WMT). Their performance is compared to that of fuzzy ARTMAP
where the MT parameter is optimized during training using a Particle Swarm Optimisation (PSO)-
based strategy, denoted PSO(MT). Through a comprehensive set of simulations, it has been observed
that by training with MT-, fuzzy ARTMAP expends fewer resources than with other MT strategies,
but can achieve a significantly higher generalization error, especially for data with overlapping class
distributions. In particular, degradation of error in fuzzy ARTMAP performance due to overtraining
is more pronounced for MT- than for MT+. Generalization error achieved using WMT is significantly
higher than other strategies on data with complex non-linear decision bounds. Furthermore, the number
of internal categories required to represent decision boundaries increases significantly. Optimizing
the value of the match tracking parameter using PSO(MT) yields the lowest overall generalization
error, and requires fewer internal categories than WMT, but generally more categories than MT+ and
MT-. However, this strategy requires a large number of training epochs to convergence. Based on this
empirical results with PSO(MT), the MT process as such can provide a significant increase to fuzzy
ARTMAP performance, assuming that the MT parameter is tuned for the specific application in mind.
Keywords: Pattern Recognition, Classification, Supervised Learning, Neural Networks, Adaptive Res-
onance Theory (ART), Fuzzy ARTMAP, Match Tracking, Particle Swarm Optimisation, Character
Recognition, NIST SD19.
Corresponding author: ´
Ecole de technologie sup´
erieure, 1100 Notre-Dame Ouest, Montreal, Quebec, H3C
1K3, Canada, email: eric.granger@etsmtl.ca, phone: 1-514-396-8650, fax: 1-514-396-8595.
1
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
2P. Henniges, E. Granger, R. Sabourin and L. S. Oliveira
1. Introduction
The fuzzy ARTMAP neural network architecture is capable of self-organizing stable recog-
nition categories in response to arbitrary sequences of analog or binary input patterns. It can
perform fast, stable, on-line, unsupervised or supervised, incremental learning, classifica-
tion, and prediction 6,7. As such, it has been successfully applied in complex real-world
pattern recognition tasks such as the recognition of radar signals 15,31, multi-sensor im-
age fusion, remote sensing and data mining 9,30,34,38, recognition of handwritten charac-
ters 3,13,22, and signature verification 28.
A drawback of fuzzy ARTMAP is its ability to learn decision boundaries between class
distributions that consistently yield low generalization error for a wide variety of pattern
recognition problems. For instance, when trained for automatic classification of handwrit-
ten characters, fuzzy ARTMAP cannot achieve a level of performance that is competitive
with some other commonly-used models 16 a. In the context of batch supervised learning
of a finite training set, the main factors affecting fuzzy ARTMAP’s capacity to generalize
are:
(1) internal dynamics of network: prototype choice and class prediction functions, learning
rule, match tracking process, hyper-parameter values, and representation of categories
with hyper-rectangles.
(2) learning process: supervised learning strategy (and thus, the number of training
epochs), proportion of patterns in the training subset to those in validation and test
subsets, user-defined hyper-parameter values, data normalisation technique, sequential
gradient-based learning, and data presentation order.
(3) data set structure: overlap and dispersion of patterns, etc., and therefore of the geometry
of decision boundaries among patterns belonging to different recognition classes.
Several ARTMAP networks have been proposed to refine the decision boundaries cre-
ated by fuzzy ARTMAP. For instance, many variants attempt to improve the accuracy
of fuzzy ARTMAP predictions by providing for probabilistic (density based) predic-
tions 10,14,24,35,37,39.
When learning data from complex real-world environments, fuzzy ARTMAP is known
to suffer from overtraining, often referred to in literature as the category proliferation prob-
lem. It occurs when the training data set contains overlapping class distributions 18,21,23. In-
creasing the amount of training data requires significantly more internal category neurons,
and therefore computational complexity, while yielding a higher generalisation error. The
category proliferation problem is directly connected to the match tracking (MT) process
of fuzzy ARTMAP. During fuzzy ARTMAP training, when a mismatch occurs between
predicted and desired output responses, MT allows selecting alternate category neurones.
aIn handwritten character recognition, statistical classifiers (e.g., linear and quadratic discriminant function, Gaus-
sian mixture classifier, and k-Nearest-Neighbor (kNN)), neural networks (e.g., Multi-Layer Perceptron (MLP), the
Radial Basis Function (RBF) network), and Support Vector Machines (SVM) are commonly used for classifica-
tion due to their learning flexibility and inexpensive computation 25. Such recognition problems typically exhibit
complex decision boundaries, with moderate overlap between character classes.
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
Match Tracking Strategies for Fuzzy ARTMAP 3
The match tracking process is parameterized by hyper-parameter ε, and was originally
introduced as a small positive value 10. In fuzzy ARTMAP literature, this parameter is com-
monly set to a value (ε=0+) that allows to minimize network resources. Such a choice may
however contribute to overtraining, and significantly degrade the capacity to generalize. As
a result, some authors have studied the impact on performance of removing the MT al-
together, and conclude that the usefulness of MT is questionable 2,26. However, training
without MT may lead to a network with a greater number of internal categories, and possi-
bly a higher generalization error.
In an extreme case, a well known convergence problem occurs when learning inconsis-
tent cases – identical training subset patterns that belong to different classes 10. The con-
sequence is a failure to converge, as identical prototypes linked to these inconsistent cases
proliferate. This anomalous situation is a result of the original match tracking process.
This convergence problem may be circumvented by using the feature of ARTMAP-IC 10
called negative match tracking (i.e., setting ε=0after mismatch reset). This allows fuzzy
ARTMAP training to converge and find solutions with fewer internal categories, but may
however lead to a higher generalization error.
In this paper, the impact on fuzzy ARTMAP performance of training with different MT
strategies – the original positive MT (MT+), negative MT (MT-) and without MT (WMT)
- is assessed empirically. As an alternative, a Particle Swarm Optimization (PSO)-based
approach called PSO(MT) is used to optimize the value of MT hyper-parameter εduring
fuzzy ARTMAP training, such that the generalization error is minimized. The architec-
ture, weights, and MT parameter are in effect selected to minimize generalisation error
by virtue of ARTMAP training, which allows to grow the network architecture (i.e., the
number of category neurons) with the problem’s complexity. An experimental protocol
has been defined such that the generalization error and resource requirements of fuzzy
ARTMAP trained with different MT strategies may be compared using different types of
pattern recognition problems. The first two types consist of synthetic data with overlapping
class distributions, and with complex decision boundaries but no overlap, respectively. The
third type consists of real-world data - handwritten numerical characters extracted from the
NIST SD19.
In the next section, the MT strategies for fuzzy ARTMAP training are briefly reviewed.
Section III presents the experimental methodology, e.g., protocol, data sets and perfor-
mance measures employed for proof of concept computer simulations. Section IV presents
and discuss experimental results obtained with synthetic and NIST SD19 data.
2. Fuzzy ARTMAP Match Tracking
2.1. The fuzzy ARTMAP neural network:
ARTMAP refers to a family of neural network architectures based on Adaptive Resonance
Theory (ART) 4that is capable of fast, stable, on-line, unsupervised or supervised, incre-
mental learning, classification, and prediction 6 7. ARTMAP is often applied using the
simplified version shown in Figure 1. It is obtained by combining an ART unsupervised
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
4P. Henniges, E. Granger, R. Sabourin and L. S. Oliveira
1
F1
W
+
_
match
F2
2M
...
12N
...
ART network
...
A1A2AM
12L
...
x
y
reset
tracking
Fab
.
..
|x|
|A|
yab
t1t2tL
Wab
Fig. 1. An ARTMAP neural network architecture specialized for pattern classification.
neural network 4with a map field. The ARTMAP architecture called fuzzy ARTMAP 7
can process both analog and binary-valued input patterns by employing fuzzy ART 5as the
ART network.
The fuzzy ART neural network consists of two fully connected layers of nodes: an M
node input layer, F1, and an Nnode competitive layer, F2. A set of real-valued weights
W={wi j [0,1]:i=1,2,..., M;j=1,2, ...,N}is associated with the F1-to-F2layer con-
nections. Each F2node jrepresents a recognition category that learns a prototype vector
wj= (w1j,w2j,..., wM j ). The F2layer of fuzzy ART is connected, through learned associa-
tive links, to an Lnode map field Fab, where Lis the number of classes in the output space.
A set of binary weights Wab ={wab
jk ∈ {0,1}:j=1,2, ..., N;k=1,2, ..., L}is associated
with the F2-to-Fab connections. The vector wab
j= (wab
j1,wab
j2,..., wab
jL)links F2node jto one
of the Loutput classes.
2.2. Algorithm for supervised learning of fuzzy ARTMAP:
In batch supervised training mode, ARTMAP classifiers learn an arbitrary mapping be-
tween training set patterns a=(a1,a2,..., am)and their corresponding binary supervision
patterns t=(t1,t2,...,tL). These patterns are coded to have unit value tK=1 if Kis the target
class label for a, and zero elsewhere. The following algorithm describes fuzzy ARTMAP
learning:
(1) Initialisation: Initially, all the F2nodes are uncommitted, all weight values wi j are
initialized to 1, and all weight values wab
jk are set to 0. An F2node becomes committed
when it is selected to code an input vector a, and is then linked to an Fab node. Values
of the learning rate β[0,1], the choice α>0, the match tracking 0 <ε1, and the
baseline vigilance ¯
ρ[0,1]hyper-parameters are set.
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
Match Tracking Strategies for Fuzzy ARTMAP 5
(2) Input pattern coding: When a training pair (a,t)is presented to the network, aunder-
goes a transformation called complement coding, which doubles its number of com-
ponents. The complement-coded input pattern has M=2mdimensions and is defined
by A=(a,ac)=(a1,a2,..., am;ac
1,ac
2,..., ac
m), where ac
i= (1ai), and ai[0,1]. The
vigilance parameter ρis reset to its baseline value ¯
ρ.
(3) Prototype selection: Pattern Aactivates layer F1and is propagated through weighted
connections Wto layer F2. Activation of each node jin the F2layer is determined by
the Weber law choice function:
Tj(A) = |Awj|
α+|wj|,(1)
where |·| is the L1norm operator defined by |wj| ≡ M
i=1|wi j|,is the fuzzy AND
operator, (Awj)imin(Ai,wij ), and αis the user-defined choice parameter. The F2
layer produces a binary, winner-take-all pattern of activity y= (y1,y2,..., yN)such that
only the node j=Jwith the greatest activation value J=argmax{Tj:j=1,2, ..., N}
remains active; thus yJ=1 and yj=0,j6=J. If more than one Tjis maximal, the
node jwith the smallest index is chosen. Node Jpropagates its top-down expectation,
or prototype vector wJ, back onto F1and the vigilance test is performed. This test
compares the degree of match between wJand Aagainst the dimensionless vigilance
parameter ρ[0,1]:
|AwJ|
|A|=|AwJ|
Mρ.(2)
If the test is passed, then node Jremains active and resonance is said to occur. Oth-
erwise, the network inhibits the active F2node (i.e., TJis set to 0 until the network
is presented with the next training pair (a,t)) and searches for another node Jthat
passes the vigilance test. If such a node does not exist, an uncommitted F2node be-
comes active and undergoes learning (Step 5). The depth of search attained before an
uncommitted node is selected is determined by the choice parameter α.
(4) Class prediction: Pattern tis fed directly to the map field Fab, while the F2category y
learns to activate the map field via associative weights Wab. The Fab layer produces a
binary pattern of activity yab = (yab
1,yab
2,..., yab
L) = twab
Jin which the most active Fab
node K=argmax{yab
k:k=1,2,..., L}yields the class prediction (K=k(J)). If node K
constitutes an incorrect class prediction, then a match tracking (MT) signal raises the
vigilance parameter ρsuch that:
ρ=|AwJ|
M+ε,(3)
where ε=0+, to induce another search among F2nodes in Step 3. This search contin-
ues until either an uncommitted F2node becomes active (and learning directly ensues in
Step 5), or a node Jthat has previously learned the correct class prediction Kbecomes
active.
(5) Learning: Learning input ainvolves updating prototype vector wJ, and, if Jcorre-
sponds to a newly-committed node, creating an associative link to Fab. The prototype
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
6P. Henniges, E. Granger, R. Sabourin and L. S. Oliveira
vector of F2node Jis updated according to:
w0
J=β(AwJ)+(1β)wJ,(4)
where βis a fixed learning rate parameter. The algorithm can be set to slow learn-
ing with 0 <β<1, or to fast learning with β=1. With complement coding and fast
learning, fuzzy ARTMAP represents category jas an m-dimensional hyperrectangle
Rjthat is just large enough to enclose the cluster of training set patterns ato which it
has been assigned. That is, an M-dimensional prototype vector wjrecords the largest
and smallest component values of training subset patterns aassigned to category j.
The vigilance test limits the growth of hyperrectangles – a ρclose to 1 yields small
hyperrectangles, while a ρclose to 0 allows large hyperrectangles. A new association
between F2node Jand Fab node K(k(J) = K) is learned by setting wab
Jk =1 for k=K,
where Kis the target class label for a, and 0 otherwise. The next training subset pair
(a,t)is presented to the network in Step 2.
Network training proceeds from one epoch to the next, and is halted for validation after
each epoch b. Given a finite training data set, batch supervised learning ends after the epoch
for which the generalisation error is minimized on an independent validation data set. With
the large data sets considered in this paper, learning through this hold-out validation (HV)
is an appropriate validation strategy. If data were limited, k-fold cross-validation would be
a more suitable strategy, at the expense of some estimation bias due to crossing 18,33.
Once the weights Wand Wab have been found through this process, ARTMAP can pre-
dict a class label for an input pattern by performing Steps 2, 3 and 4 without any vigilance
or match tests. During testing, a pattern athat activates node Jis predicted to belong to
class K=k(J). The time complexity required to process one input pattern, during either a
training or testing phase, is O(MN).
2.3. Match tracking strategies:
During training, when a mismatch occurs between a predicted response yab and a desired
response tfor an input pattern a, the original positive MT process (MT+) of fuzzy ARTMAP
raises the internal vigilance parameter to ρ= (|AwJ|)(M)1+εin order to induce another
search among F2category nodes. MT+ is parameterized by the MT hyper-parameter ε,
which was introduced as a small positive value, 0 <ε17.
It is well documented that training fuzzy ARTMAP with data from overlapping class
distributions may lead to category proliferation, and that this problem is connected to the
MT process. In this case, increasing the amount of training data requires significantly more
resources (i.e., the number of internal category neurons, thus memory space and computa-
tional complexity), yet provides a higher generalisation error 18,21,23. In addition, the MT
parameter is commonly set to the value ε= +0.001 in fuzzy ARTMAP literature to mini-
mize network resources 10. Such a choice may however play a significant role in category
proliferation, and considerably degrade the capacity to generalize.
bAn epoch is defined as one complete presentation of all the patterns of the training set.
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
Match Tracking Strategies for Fuzzy ARTMAP 7
Consequently, some authors have challenged the need for a MT process 1,26. Training
without MT (WMT) implies creating a new category each time that a predictive response yab
does not match a desired response t. Note that training fuzzy ARTMAP WMT is equivalent
to performing MT but setting ε=1. Training WMT may however lead to a network with a
greater number of internal categories, and possibly a higher generalization error.
In an extreme case, a convergence problem occurs whenever the training set contains
identical patterns that belong recognition classes 10. The effect is a proliferation of identical
prototypes associated with the inconsistent cases, and a failure to converge. Consider for
example that on the first training epoch, fuzzy ARTMAP learns two completely overlap-
ping, minimum-sized prototypes, wA.1(linked to class A) and wB.1(linked to class B), for
two identical pulse patterns, a1and a2. In a subsequent epoch, wA.1is initially selected to
learn a2, since TA.1=TB.1'1, and wA.1was created prior to wB.1(index A.1 is smaller than
B.1). Since wA.1is not linked to class B, mismatch reset raises the vigilance parameter ρto
(|A2wA.1|/M) + ε, where |A2wA.1|=|A2wB.1|. As a result, wB.1can no longer pass
the vigilance test required to become selected for a2, and fuzzy ARTMAP must create an-
other minimum-sized prototype wB.2=wB.1. From epoch to epoch, the same phenomenon
repeats itself, yielding ever more prototypes wB.n=wB.1for n =3,4,..., .
ARTMAP-IC 10 is an extension of fuzzy ARTMAP that produce a binary winner-take-
all pattern ywhen training, but use distributed activation of coded F2 nodes when testing.
ARTMAP-IC is further extended in two ways. First, it biases distributed test set predictions
according to the number of times F2 nodes are assigned to training set patterns. Second, it
uses a negative MT process (MT-) to address the problem of inconsistent cases, whereby
identical training set patterns correspond to different classes labels.
With negative MT (MT-), ρis also initially raised after mismatch reset, but is allowed to
decay slightly before a different node Jis selected. Then, the MT parameter is set to a small
negative value, ε0 (typically a value of ε=0.001), which allows for identical inputs
that predict different classes to establish distinct recognition categories. In the example
above, mismatch reset raises ρbut wB.1would still pass the vigilance test. This allows to
learn fully overlapping prototypes for training set patterns that belong to different classes.
In some applications, incorporation into fuzzy ARTMAP of the MT- feature of
ARTMAP-IC may be essential to avoid the convergence problem observed with original
MT+. Training fuzzy ARTMAP with MT- would thereby find solutions with fewer internal
categories, but may nonetheless lead to a higher generalization error.
An alternate approach consists in optimizing the MT hyper-parameter during batch su-
pervised learning of a fuzzy ARTMAP neural network. In effect, both network (weights and
architecture) and εvalues are co-optimized for a given problem, using the same cost func-
tion. The next Sub-section presents a Particle Swarm Optimization (PSO)-based approach
called PSO(MT) that automatically selects a value (magnitude and polarity) of εduring
fuzzy ARTMAP training such that the generalization error is minimized. This approach is
based on the PSO training strategy proposed in 16, but focused only on a one-dimensional
optimization space of ε[1,1].
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
8P. Henniges, E. Granger, R. Sabourin and L. S. Oliveira
x
y
s
q
s
q+1
p
q
p
Global
Best
(gBest)
v
q
v
q+1
q
g
Particle's
Best
(pBest)
i
Fig. 2. PSO update of a particle’s position sqto sq+1in a 2-dimensional space during iteration q+1.
2.4. Particle Swarm Optimisation (PSO) of the match tracking parameter
PSO is a population-based stochastic optimization technique that was inspired by social
behavior of bird flocking or fish schooling 19. It shares many similarities with evolutionary
computation techniques such as genetic algorithms (GAs), yet has no evolution operators
such as crossover and mutation. PSO belongs to the class of evolutionary algorithm tech-
niques that does not utilize the “survival of the fittest” concept, nor a direct selection func-
tion. A solution with lower fitness values can therefore survive during the optimization and
potentially visit any point of the search space 12. Finally, while GAs were conceived to deal
with binary coding, PSO was designed, and proved very effective, in solving real valued
global optimization problems, which makes it suitable for this study.
With PSO, each particle corresponds to a single solution in the search space, and the
population of particles is called a swarm. All particles are assigned position values which
are evaluated according to the fitness function being optimized, and velocities values which
direct their movement. Particles move through the search space by following the particles
with the best fitness. Assuming a d-dimensional search space, the position of particle i
in an P-particle swarm is represented by a d-dimensional vector si= (si1,si2,...,sid), for
i=1,2,...,P. The velocity of this particle is denoted by vector vi= (vi1,vi2,...,vid), while
the best previously-visited position of this particle is denoted as pi= (pi1,pi2,..., pid ). For
each new iteration q+1, the velocity and position of particle iare updated according to:
vq+1
i=wqvq
i+c1r1(pq
isq
i) + c2r2(pq
gsq
i)(5)
sq+1
i=sq
i+vq+1
i(6)
where pgrepresents the global best particle position in the swarm, wqis the particle in-
ertia weight, c1and c2are two positive constants called cognitive and social parameters,
respectively, and r1and r2are random numbers uniformly distributed in the range [0,1].
The role of wqin Equation 5 is to regulate the trade-off between exploration and ex-
ploitation. A large inertia weight facilitates global search (exploration), while a small one
tends to facilitate fine-tuning the current search area (exploitation). This is why inertia
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
Match Tracking Strategies for Fuzzy ARTMAP 9
Algorithm 1: PSO learning strategy for fuzzy ARTMAP.
A. Initialization:
set the maximum number of iterations qmax and/or fitness objective E
set PSO parameters P,vmax,w0,c1,c2,r1and r1
initialize particle positions at random such that p0
g,s0
iand p0
i[1,1]d, for i=1,2,...,P
initialize particle velocities at random such that 0v0
ivmax, for i=1,2,...,P
B. Iterative process:
set iteration counter q=0
while qqmax or E(pq
g)Edo
for i=1,2,...,Pdo
train fuzzy ARTMAP using hold-out validation and sq
i
compute fitness value E(sq
i)of resulting network
if E(sq
i)<E(pq
i)then
update particle’s best personal position: pq
i=sq
i
end
end
select the particle with best global fitness: g=argmin{E(sq
i):i=1,2,...,P}
for i=1,2,...,Pdo
update velocity: vq+1
i=wqvq
i+c1r1(pq
isq
i) + c2r2(pq
gsq
i)
update position: sq+1
i=sq
i+vq+1
i
end
q=q+1
update particle inertia wq
end
weight values are defined by some monotonically decreasing function of q. Proper fine-
tuning of c1and c2may result in faster convergence of the algorithm and alleviation of
the local minima. Kennedy and Eberhart propose that the cognitive and social scaling pa-
rameters be selected such that c1=c2=220. Finally, the parameters r1and r2are used to
maintain the diversity of the population. Figure 2 depicts the update by PSO of a particle’s
position from sq
ito sq+1
i.
Algorithm 1 shows the pseudo-code of a PSO learning strategy specialized for super-
vised training of fuzzy ARTMAP neural networks. It essentially seeks to minimize fuzzy
ARTMAP generalisation error E(sq
i)in the d-dimensional space of hyper-parameter val-
ues. For enhanced computational throughput and global search capabilities, Algorithm 1 is
inspired by the synchronous parallel version of PSO 32. It utilizes a basic type of neigh-
borhood called global best or gbest, which is based on a sociometric principle that concep-
tually connects all the members of the swarm to one another. Accordingly, each particle
is influenced by the very best performance of any member of the entire swarm. Exchange
of information only takes place among the particle’s own experience (the location of its
personal best pq
i, lbest), and the experience of the best particle in the swarm (the location
of the global best pq
g, gbest).
The PSO(MT) approach is obtained by setting d=1, and the particle positions to MT
parameter values, sq
i=εq
i. Measurement of any fitness values E(sq
i)in this algorithm in-
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
10 P. Henniges, E. Granger, R. Sabourin and L. S. Oliveira
volves computing the generalisation error on a validation subset for the fuzzy ARTMAP
network which has been trained using the MT hyper-parameter value at particle position εq
i.
When selecting pq
ior pq
g, if the two fitness values being compared are equal, then the parti-
cle/network requiring fewer number of F2 category nodes is chosen. The same training and
validation sets are used throughout this process. Following the last iteration of Algorithm 1,
the overall generalisation error is computed on a test set for the network corresponding to
particle position pq
g.
3. Experimental Methodology
In order to observe the effects of MT strategies from a perspective of different data struc-
ture, several data sets were selected for computer simulations. Four synthetic data sets
are representative of pattern recognition problems that involve either (1) simple decision
boundaries with overlapping class distributions, or (2) complex decision boundaries, were
class distributions do not overlap on decision boundaries. A set of handwritten numerical
characters from the NIST SD19 database is representative of complex real-world pattern
recognition problems. Prior to a simulation trial, these data sets were normalized according
to the min-max technique, and partitioned into three parts – training, validation, and test
subset.
During each simulation trial, the performance of fuzzy ARTMAP is compared from a
perspective of different training subset size, and match tracking strategies. In order to assess
the effect on performance of training subset size, the number of training subset patterns
used for supervised learning was progressively increased, while corresponding validation
and test subsets were held fixed. The performance is compared for fuzzy ARTMAP neural
networks trained according to four different MT strategies: MT+ (ε=0.001), MT- (ε=
0.001), WMT (equivalent to setting ε=1) and PSO(MT). Training is performed by setting
the other three hyper-parameters such that the resources (number of categories, training
epochs, etc.) are minimized: α=0.001, β=1 and ρ=0. In all cases, training is performed
using the HV strategy 33 described in Subsection 2.2.
The PSO(MT) strategy also uses the hold-out validation technique on fuzzy ARTMAP
network to calculate the fitness of each particle, and therefore find the network and εvalue
that minimize generalization error. Other fuzzy ARTMAP hyper-parameters are left un-
changed. In all simulations involving PSO, the search space of the MT parameter was set
to the following range of ε[1,1]. Each simulation trial was performed with P=15
particles, and ended after a maximum of qmax =100 iterations (although none of our sim-
ulations have ever attained that limit). A fitness objective Ewas not considered to end
training, but a trial was ended if the global best fitness E(pq
g)is constant for 10 consecutive
iterations. The initial position s0
1of one particle was set according to MT- (ε=0.001).
All the remaining particle vectors were initialized randomly, according to a uniform distri-
bution in the search space. The PSO parameters were set as follows: c1=c2=2; r1and
r2were random numbers uniformly distributed in [0,1]; wqwas decreased linearly from
0.9 to 0.4 over the qmax iterations; the maximum velocity vmax was set to 0.2. At the end
of a trial, the fuzzy ARTMAP network with the best global fitness value pq
gwas retained.
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
Match Tracking Strategies for Fuzzy ARTMAP 11
Independently trials were repeated 4 timescwith different initializations of particle vectors,
and the network with greatest pq
gof the four was retained.
Since fuzzy ARTMAP performance is sensitive to the presentation order of the train-
ing data, each simulation trial was repeated 10 times with either 10 different randomly
generated data sets (synthetic data), or 10 different randomly selected data presentation
orders (NIST SD19 data). The average performance of fuzzy ARTMAP was assessed in
terms of resources required during training, and its generalisation error on the test sets. The
amount of resources required during training is measured by compression and convergence
time. Compression refers to the average number of training patterns per category prototype
created in the F2 layer. Convergence time is the number of epochs required to complete
learning for a learning strategy. It does not include presentations of the validation subset
used to perform hold-out validation. Generalisation error is estimated as the ratio of in-
correctly classified test subset patterns over all test set patterns. Given that compression
indicates the number of F2 nodes, the combination of compression and convergence time
provides useful insight into the amount of processing required by fuzzy ARTMAP during
training to produce its best asymptotic generalisation error. Average results, with corre-
sponding standard error, are always obtained as a result of the 10 independent simulation
trials.
The Quadratic Bayes classifier (CQB) and k-Nearest-Neighbour with Euclidean dis-
tance (kNN) classifier were included for reference with generalisation error results. These
are classic parametric and non-parametric classification techniques from statistical pattern
recognition, which are immune to the effects of overtraining. For each computer simula-
tion, the value of kemployed with kNN was selected among k= 1, 3, 5, 7, and 9, using
hold-out validation. The rest of this section gives some additional details on the synthetic
and real data sets employed during computer simulations.
3.1. Synthetic data sets:
All four synthetic data sets described below are composed of a total of 30,000 randomly-
generated patterns, with 10,000 patterns for the training, validation, and test subsets. They
correspond to 2 class problems, with a 2 dimensional input feature space. Each data subset
is composed of an equal number of 5,000 patterns per class. In addition, the area occupied
by each class is equal. During simulation trials, the number of training subset patterns used
for supervised learning was progressively increased from 10 to 10,000 patterns according
to a logarithmic rule: 5, 6, 8, 10, 12, 16, 20, 26, 33, 42, 54, 68, 87, 110, 140, 178, 226, 286,
363, 461, 586, 743, 943, 1197, 1519, 1928, 2446, 3105, 3940, 5000 patterns per class. This
corresponds to 30 different simulation trials over the entire 10,000 pattern training subset.
These data sets have been selected to facilitate the observation of fuzzy ARTMAP be-
havior on different tractable problems. Of the four sets, two have simple linear decision
boundaries with overlapping class distributions, Dµ(ξtot)and Dσ(ξt ot ), and two have com-
plex non-linear decision boundaries without overlap, DCIS and DP2. The total theoretical
cFrom previous study with our data sets, it was determined that performing 4 independent trials of the PSO
learning strategy with only 15 particles leads to better optimization results than performing 1 trial with 60 particles.
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
12 P. Henniges, E. Granger, R. Sabourin and L. S. Oliveira
(a) Dµ(ξtot )(b) Dσ(ξtot )
(c) DCIS (d) DP2
Fig. 3. Representation of the synthetic data sets used for computer simulations.
probability of error associated with Dµand Dσis denoted by ξtot . Note that with DCIS
and DP2, the length of decision boundaries between class distributions is longer, and fewer
training patterns are available in the neighborhood of these boundaries than with Dµ(ξtot)
and Dσ(ξtot ). In addition, note that the total theoretical probability of error with DCIS and
DP2 is 0, since class distributions do not overlap on decision boundaries. The four synthetic
data sets are now described:
Dµ(ξtot ): As represented in Figure 3(a), this data consists of two classes, each one defined by a
multivariate normal distribution in a two dimensional input feature space. It is assumed
that data is randomly generated by sources with the same Gaussian noise. Both sources
are described by variables that are independent and have equal variance σ2, therefore
distributions are hyperspherical. In fact, Dµ(ξtot )refers to 13 data sets, where the degree
of overlap, and thus the total probability of error between classes differs for each set.
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
Match Tracking Strategies for Fuzzy ARTMAP 13
The degree of overlap is varied from a total probability of error, ξt ot = 1% to ξtot =
25%, with 2% increments, by adjusting the mean vector µ2of class 2.
Dσ(ξtot ): As represented in Figure 3(b), this data is identical to Dµ(ξtot ), except that the degree
of overlap between classes is varied by adjusting the variance σ2
2of both classes. Note
that for a same degree of overlap, Dσ(ξtot)data sets have a larger overlap boundary
than Dµ(ξtot )yet they are not as dense.
DCIS: As represented in Figure 3(c), the Circle-in-Square problem 6requires a classifier to
identify the points of a square that lie inside a circle, and those that lie outside a cir-
cle. The circle’s area equals half of the square. It consists of one non-linear decision
boundary where classes do not overlap.
DP2: As represented in Figure 3(d), each decision region of the DP2 problem is delimited by
one or more of the four following polynomial and trigonometric functions:
f1(x) = 2sin(x) + 5 (7)
f2(x)=(x2)2+1 (8)
f3(x) = 0.1x2+0.6sin(4x) + 8 (9)
f4(x) = (x10)2
2+7.902 (10)
and belongs to one of the two classes, indicated by the Roman numbers I and II 36.
It consists of four non-linear boundaries, and class definitions do not overlap. Note
that equation f4(x)was slightly modified from the original equation such that the area
occupied by each class is approximately equal.
3.2. NIST Special Database 19 (SD19):
Automatic reading of numerical fields has been attempted in several domains of application
such as bank cheque processing, postal code recognition, and form processing. Such appli-
cations have been very popular in handwriting recognition research, due to the availability
of relatively inexpensive CPU power, and to the possibility of considerably reducing the
manual effort involved in these tasks 29.
The NIST SD19 17 data set has been selected due to the great variability and difficulty of
such handwriting recognition problems (see Figure 4). It consists of images of handwritten
sample forms (hsf) organized into eight series, hsf-{0,1,2,3,4,6,7,8}. SD19 is divided in 3
sections which contains samples representing isolated handwritten digits (’0’, ’1’, ..., ’9’)
extracted from hsf-{0123}, hsf-7 and hsf-4.
For our simulations, the data in hsf-{0123}has been further divided into training subset
(150,000 samples), validation subset 1 (15,000 samples), validation subset 2 (15,000 sam-
ples) and validation subset 3 (15,000 samples). The training and validation subsets contain
an equal number of samples per class. All 60,089 samples in hsf-7 has been used as a stan-
dard test subset. The distribution of samples per class in test sets is approximately equal.
The set features extracted for samples is a mixture of concavity, contour, and surface
characteristics 29. Accordingly, 78 features are used to describe concavity, 48 features are
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
14 P. Henniges, E. Granger, R. Sabourin and L. S. Oliveira
Figure 3 – Handwriting sample from for NIST SD19
(a) Handwritten sample form. (b) Images of extracted digits.
Fig. 4. Examples in the NIST SD19 data of: (a) a handwriting sample form, and (b) some images of handwritten
digits extracted from the forms.
used to describe contour, and 6 features are used to describe surface. Each sample is there-
fore composed of 132 features that are normalized between 0 and 1 by summing up their
respective feature values, and then dividing each one by its summation. With this feature
set, the NIST SD19 data base exhibits complex decision boundaries, with moderate over-
lap between digit classes. Some experimental results obtained with Multi-Layer Perceptron
(MLP), Support Vector Machine (SVM), and k-NN classifiers are reported in 16 .
During simulations, the number of training subset patterns used for supervised learning
was progressively increased as from 100 to 150,000 patterns, according to a logarithmic
rule. The 16 different training subset consist of the first 10, 16, 28, 47, 80, 136, 229, 387,
652, 1100, 1856, 3129, 5276, 8896, and all 15000 patterns per class.
4. Simulation Results
4.1. Synthetic data with overlapping class distributions:
Figure 5 presents the average performance obtained when fuzzy ARTMAP is trained with
the four MT strategies – MT-,MT+,WMT and PSO(MT) – on Dµ(13%). The generalisation
errors for the Quadratic Bayes classifier (CQB), as well as the theoretical probability of
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
Match Tracking Strategies for Fuzzy ARTMAP 15
101102103
14
16
18
20
22
24
26
28
Generalization error (%)
Training set size (patterns per class)
etot
FAM MT−
FAM MT+
FAM WMT
FAM PSO(MT)
CQB
(a) Generalisation error.
101102103
0
5
10
15
20
25
30
Compression (training patterns per F2 node)
Training set size (patterns per class)
(b) Compression.
101102103
100
101
102
103
Compression (training patterns per F2 node)
Training set size (patterns per class)
(c) Convergence time.
101102103
−0.2
0
0.2
0.4
0.6
0.8
1
Match Tracking (ε)
Training set size (patterns per class)
(d) MT parameter for PSO(MT).
Fig. 5. Average performance of fuzzy ARTMAP (with MT+,MT-,WMT and PSO(MT)) versus training subset
size for Dµ(ξtot =13%). Error bars are standard error of the sample mean.
error (ξtot ), are also shown for reference.
As shown in Figure 5(a), PSO(MT) generally yields the lowest generalisation error
over training set sizes, followed by WMT,MT+, and then MT-. With more than 20 training
patterns per class, the error of both MT- and MT+ algorithms tends to increase in a manner
that is indicative of fuzzy ARTMAP overtraining 18 . However, with more than about 500
training patterns per class, the generalization error for MT- grows more rapidly with the
training set size than for MT+,WMT and PSO(MT). With a training set of 5000 patterns
per class, a generalization error of about 21.22% is obtained with MT+, 26.17% with MT-,
16.22% with WMT, and 15.26% with PSO(MT). The degradation in performance of MT- is
accompanied by a notably higher compression and a lower convergence time than other MT
strategies. MT- produces networks with fewer but larger categories than other MT strategies
because of the MT polarity. Those large categories contribute to a lower resolution of the
decision boundary, and thus a greater generalization error.
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
16 P. Henniges, E. Granger, R. Sabourin and L. S. Oliveira
(a) Net generalisation error.
(b) Compression. (c) Convergence time.
Fig. 6. Average performance of fuzzy ARTMAP (with MT+,MT-,WMT and PSO(MT)) as a function of ξtot for
all Dµ(ξtot )data sets. Error bars are standard error of the sample mean.
By training with WMT, the generalization error is significantly lower than both MT- and
MT+ especially with a large amount of training patterns, but the compression is the lowest
of all training strategies. Based on the error alone, the effectiveness of the MT algorithm
is debateable with overlapping data when compared with MT- and MT+, especially for
application in which resource requirements are not an issue.
By training with PSO(MT), fuzzy ARTMAP yields a significantly lower generalization
error than all other strategies, and a compression that falls between that of WMT and MT- or
MT+. With a training set of 5000 patterns per class, a compression of about 8.0 is obtained
with MT+, 26.4 with MT-, 4.8 with WMT, and 5.3 with PSO(MT). The convergence time is
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
Match Tracking Strategies for Fuzzy ARTMAP 17
generally comparable with WMT,MT- and MT+. However, PSO(MT) requires a consider-
able number of training epochs to complete the optimization process. With a training set of
5000 patterns per class, a convergence time of about 8.2 epochs is obtained with MT+, 3.6
with MT-, 12.3 with WMT, and 2534 with PSO(MT).
Empirical results indicate that the MT process of fuzzy ARTMAP has a considerable
impact on performance obtained with overlapping data, especially when εis optimized. As
shown in Figure 5(d), when α=0.001, β=1 and ρ=0, and class distributions overlap, the
values of εthat minimize error tends from about 0 towards 0.8 as the training set size grows.
Higher εsettings tend to create a growing number of category hyperrectangles close to the
bourdary between classes. Generalisation error of PSO(MT) tends toward that of to WMT
on this data set. Furthermore, PSO(MT) and WMT do not show the performance degradation
due to overtraining as with MT+ and MT-.
Very similar tendencies are found in simulation results where fuzzy ARTMAP is trained
using the other Dµ(ξtot )and Dσ(ξtot )data sets. However, as ξtot increases, the performance
degradation due to training subset size tends to become more pronounced, and occurs for
fewer training set patterns. Let us define the net error as the difference between the gen-
eralization error obtained by using all the training data (5,000 patterns per class) and the
theoretical probability of error ξtot of the database. Figure 6 shows the performance of fuzzy
ARTMAP as a function of ξtot for all Dµ(ξtot )data sets. As shown, using PSO(HV) always
provides the lowest net error over ξtot values for overlapping data, followed by WMT,MT+
and MT-. Again, MT- obtains the highest compression, whereas PSO(MT) obtain a com-
pression between WMT and MT+. The convergence time of PSO(HV) is orders of magnitude
longer that the other strategies.
Figure 7 presents an example of decision boundaries obtained for Dµ(ξtot =13%)when
fuzzy ARTMAP is trained with 5,000 patterns per class and different MT strategies. For
overlapping class distribution, MT- tends to create much fewer F2 nodes (908 categories
with 5000 patterns per class) than the other MT strategies because of the polarity of ε.
Although it leads to a higher compression, and can resolve inconsistent cases, the larger
categories produce coarse granulation of the decision boundary, and thus a higher gen-
eralization error. With PSO(MT) and WMT, the lower error is a consequence of the finer
resolution on overlap regions of the decision boundary between classes.
4.2. Synthetic data with complex decision boundaries:
Figure 8 presents the average performance obtained when fuzzy ARTMAP is trained on
DCIS using the four MT strategies – MT-,MT+,WMT and PSO(MT). The generalisation
error for the k-NN classifier, as well as the theoretical probability of error, ξtot , are also
shown for reference.
In this case, MT+,MT- and PSO(MT) obtain a similar generalization error across train-
ing set sizes, while WMT yields an error that is significantly higher than the others strategies
for larger training set sizes. For example, with a training set of 5000 patterns per class, a
generalization error of about 1.51% is obtain with MT+, 1.64% with MT-, 4.36% with WMT,
and 1.47% with PSO(MT). Compression of fuzzy ARTMAP as a functions of training set
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
18 P. Henniges, E. Granger, R. Sabourin and L. S. Oliveira
(a) MT+. (b) MT-.
(c) WMT. (d) PSO(MT).
Fig. 7. An Example of decision boundaries formed by fuzzy ARTMAP in the input space for Dµ(ξtot =13%).
Training is performed (a) with MT+, (b) with MT-, (c) WMT, and (d) PSO(MT) on 5,000 training patterns per class.
The optimal decision boundary for Dµ(ξtot =13%)is also shown for reference. Note that virtually no training,
validation or test subset patterns are located in the upper-left and lower-right corners of these figures.
size in a grows in a similar way for MT-,MT+ and PSO(MT). With a training set of 5000
patterns per class, a compression of 107 is obtained with MT+, 108 with MT-, 14 with WMT,
and 109 with PSO(MT).WMT does not allow to create a network with higher compres-
sion because data structure leads to the creation many small categories that overlap on the
decision boundary between classes. However, WMT requires the fewest number of training
epochs to converge, while PSO(MT) requires a considerable number of epochs. With a
training set of 5000 patterns per class, a convergence time of about 18.4 epochs is required
with MT+, 14.4 with MT-, 6.6 with WMT, and 4186 with PSO(MT).
Empirical results indicate that the MT process of fuzzy ARTMAP also has a consider-
able impact on performance obtained on data with complex decision boundaries, especially
when εis optimized. As shown in Figure 8(d), when α=0.001, β=1 and ρ=0, and de-
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
Match Tracking Strategies for Fuzzy ARTMAP 19
101102103
0
5
10
15
20
25
30
35
40
Generalization error (%)
Training set size (patterns per class)
etot
FAM MT−
FAM MT+
FAM WMT
FAM PSO(MT)
KNN
(a) Generalisation error.
101102103
101
102
Compression (training patterns per F2 node)
Training set size (patterns per class)
(b) Compression.
101102103
100
101
102
103
Compression (training patterns per F2 node)
Training set size (patterns per class)
(c) Convergence time.
101102103
0
0.2
0.4
0.6
0.8
1
Match Tracking (ε)
Training set size (patterns per class)
(d) MT parameter for PSO(MT).
Fig. 8. Average performance of fuzzy ARTMAP (with MT+,MT-,WMT and PSO(MT)) versus training subset
size for DCIS. Error bars are standard error of the sample mean.
cision boundaries are complex, the values of εthat minimize error tends from about 0.4
towards 0 as the training set size grows. Lower εsettings tend to create fewer category
hyperrectangles close to the bourdary between classes. Generalisation error of PSO(MT)
tends toward that of to MT+ and MT- on this data set.
Similar tendencies are found in simulation results where fuzzy ARTMAP is trained
using the DP2 data set. However, since the decision boundaries are more complex with DP2,
a greater number of training patterns are required for fuzzy ARTMAP to asymptotically
start reaching its minimum generalisation error. Moreover, all MT strategies tested on data
with non linear decision boundaries generate no overtraining 18.
Figure 9 presents an example of decision boundaries obtained for DCIS when fuzzy
ARTMAP is trained with 5,000 patterns per class and different MT strategies. For data
with complex decision boundaries, training fuzzy ARTMAP WMT yields higher generaliza-
tion error since it initially tends to create some large categories, and then compensates by
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
20 P. Henniges, E. Granger, R. Sabourin and L. S. Oliveira
(a) MT+. (b) MT-.
(c) WMT. (d) PSO(MT).
Fig. 9. An Example of decision boundaries formed by fuzzy ARTMAP in the input space for DCIS. Training
is performed (a) with MT+, (b) with MT-, (c) WMT, and (d) PSO(MT) on 5,000 training patterns per class. The
optimal decision boundary for DCIS is also shown for reference.
creating many small categories. This leads to coarse granulation of the decision boundary,
and thus a higher generalization error.
Table 1 shows the average generalisation error obtained with the reference classifiers
and the fuzzy ARTMAP neural network using different MT strategies on Dµ(ξtot ),DCIS
and DP2. Training was performed on 5,000 patterns per class. When using PSO(MT),
the generalisation error of fuzzy ARTMAP is always lower than when using MT+,MT-
and WMT, but is always significantly higher than that of the Quadratic Bayes and k-NN
classifiers. When data contains overlapping class distributions, the values of εthat minimize
error tends towards +1. In contrast, when decision boundaries are complex, these εvalues
tend towards 0.
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
Match Tracking Strategies for Fuzzy ARTMAP 21
Table 1. Average generalisation error of reference and fuzzy ARTMAP classifiers using different MT strategies on syn-
thetic data sets. Values in parenthesis are standard error of the sample mean.
Data set Average generalisation error (%)
CQB k-NN FAM w/ MT+ FAM w/ MT- FAM w/ WMT FAM w/ PSO(MT) ε
Dµ(1%)1,00(0,04) 1,08(0,03) 1,87(0,04) 2,31(0,19) 1,30(0,03) 1,24(0,04) 0,61(0,06)
Dµ(3%)3,08(0,05) 3,31(0,06) 5,44(0,09) 7,52(0,16) 3,84(0,09) 3,66(0,06) 0,75(0,05)
Dµ(5%)4,87(0,07) 5,26(0,08) 8,48(0,13) 11,15(0,36) 6,01(0,07) 5,75(0,08) 0,79(0,04)
Dµ(7%)7,00(0,10) 7,48(0,11) 11,85(0,15) 16,05(0,47) 8,63(0,20) 8,07(0,08) 0,73(0,04)
Dµ(9%)9,12(0,08) 9,88(0,08) 15,01(0,14) 19,88(0,74) 11,30(0,21) 10,62(0,11) 0,72(0,02)
Dµ(11%)11,00(0,08) 11,81(0,13) 18,06(0,18) 23,85(0,37) 13,29(0,15) 12,72(0,12) 0,77(0,05)
Dµ(13%)13,16(0,15) 14,27(0,18) 21,22(0,17) 26,17(0,41) 16,22(0,19) 15,26(0,16) 0,74(0,05)
Dµ(15%)15,11(0,15) 16,13(0,13) 23,69(0,16) 29,05(0,48) 18,40(0,32) 17,42(0,15) 0,74(0,04)
Dµ(17%)16,96(0,10) 18,39(0,09) 26,25(0,16) 31,87(0,25) 20,49(0,13) 19,79(0,33) 0,71(0,08)
Dµ(19%)19,25(0,16) 20,71(0,16) 29,13(0,09) 34,19(0,44) 23,30(0,26) 22,24(0,11) 0,79(0,05)
Dµ(21%)20,97(0,13) 22,70(0,16) 31,63(0,14) 36,28(0,34) 25,86(0,54) 24,35(0,12) 0,79(0,05)
Dµ(23%)22,99(0,12) 25,04(0,13) 33,77(0,21) 38,15(0,28) 28,40(0,41) 26,72(0,19) 0,71(0,03)
Dµ(25%)25,11(0,10) 27,23(0,12) 36,08(0,14) 39,52(0,18) 31,05(0,40) 29,05(0,14) 0,72(0,04)
DCIS N/A 0,86(0,03) 1,51(0,04) 1,64(0,04) 4,36(0,43) 1,47(0,04) 0,01(0,00)
DP2 N/A 1,65(0,04) 3,45(0,19) 4,33(0,22) 7,13 0,48) 3,44(0,06) 0,01(0,00)
4.3. NIST SD19 data:
Figure 10 presents the average performance obtained when fuzzy ARTMAP is trained on
the NIST SD19 data using the four MT strategies – MT-,MT+,WMT and PSO(MT). The
generalisation error for the k-NN classifier are also shown for reference.
As shown in this figure, MT- and MT+ obtain similar average generalization error
across training set sizes. Using a training set of 52760 patterns, a generalization error of
about 5.81% is obtained with MT+, 6.02% with MT-, 32.84% with WMT, and 5.57% with
PSO(MT). When optimizing the MT parameter with PSO(MT), generalization error is
lower then other MT strategies with a small number of training pattern, and similar to MT-
and MT+ with greater number of training pattern. WMT is unable to create fuzzy ARTMAP
network with low generalization error on NIST SD19. Since NIST database possesses com-
plex decision boundaries with a small degree of overlap, WMT cannot generate a good repre-
sentation of the decision boundaries because it generates too many categories that overlap
between classes.
Using all the training data, MT- acheives the highest compression, followed by MT+,
PSO(MT) and WMT. However, with small amount of training patterns, PSO(MT) generates
the highest compression. For example, with a training set of 52760 patterns, a compression
rate of about 237.4 is obtained with MT+, 281.9 with MT-, 2.7 with WMT, and 141.6 with
PSO(MT).WMT obtains the lowest compression rate because if creates many very small
categories to define the decision boundaries. With a training set of 52760 patterns, a con-
vergence time of about 15.7 epochs is obtain with MT+, 6.8 with MT-, 1 with WMT, and 381
with PSO(MT).WMT still possesses the fastest convergence time. The low generalization
error of PSO(MT) requires a high convergence time (about 24.3 time higher than MT+ with
all training pattern).
As shown in Figure 10(d), when α=0.001, β=1 and ρ=0, and decision boundaries
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
22 P. Henniges, E. Granger, R. Sabourin and L. S. Oliveira
101102103
5
10
15
20
25
30
35
40
Generalization error (%)
Training set size (patterns per class)
FAM MT−
FAM MT+
FAM WMT
FAM PSO(MT)
kNN
(a) Generalisation error.
101102103
101
102
Compression (training patterns per F2 node)
Training set size (patterns per class)
(b) Compression.
101102103
100
101
102
Compression (training patterns per F2 node)
Training set size (patterns per class)
(c) Convergence time.
101102103
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Match Tracking (ε)
Training set size (patterns per class)
(d) MT parameter for PSO(MT).
Fig. 10. Average performance of fuzzy ARTMAP (with MT+,MT-,WMT and PSO(MT)) versus training subset
size for NIST SD19 data set. Error bars are standard error of the sample mean.
are complex, the values of εthat minimize error tends from about -0.2 towards 0 as the
training set size grows. As with DCIS and DP2, Generalisation error of PSO(MT) tends
toward that of to MT+ and MT- on this data set. Despite promising results training fuzzy
ARTMAP with PSO(MT), other pattern classifiers (such as SVM) have achieved signifi-
cantly lower generalization error 27,29.
5. Conclusions
A fuzzy ARTMAP neural network applied to complex real-world problems such as hand-
written character recognition may achieve poor performance and encounter a convergence
problem whenever the training set contains very similar or identical patterns that belong to
different classes. In this paper, the impact on fuzzy ARTMAP performance of adopting dif-
ferent MT strategies – the original positive MT (MT+), negative MT (MT-) and without MT
(WMT) – is assessed. As an alternative, the value of the MT parameter is optimized along
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
Match Tracking Strategies for Fuzzy ARTMAP 23
with network weight using a Particle Swarm Optimization (PSO) based strategy called
PSO(MT). An experimental protocol has been defined such that the generalization error
and resource requirements of fuzzy ARTMAP trained with different MT strategies may be
assessed on different types of synthetic and a real-world handwritten numerical character
pattern recognition problem.
Overall, empirical results indicate that using the MT process for batch supervised learn-
ing has a significant impact on fuzzy ARTMAP performance. When data is defined by over-
lapping class distributions, training with MT- tends to produce fewer categories than the
other MT strategies, although this advantage coincides with a higher generalization error.
The need for MT+ or MT- is debateable as WMT yields a significantly lower generalization
error. However, PSO(MT) has been shown to create fuzzy ARTMAP networks with a finer
resolution on decision bounds, and an even lower error than WMT. In addition, it has been
shown to eliminate the degradation of error due to overtraining. To represent overlapping
class distributions with PSO(MT), the lowest errors are obtained for MT parameter val-
ues that tend toward the maximum value (ε=1) as the training set size grows. PSO(MT)
thereby favors the creation of new internal categories to define decision boundaries.
When data is defined by complex decision boundaries, training with PSO(MT) creates
the decision boundaries that yield the lowest generalization error, followed most closely by
MT- and then MT+. Training with WMT yields a considerably higher generalization error
and lower compression than the other MT strategies, specially when for larger training
set sizes. To represent complex decision boundaries with PSO(MT), the lowest errors are
obtained for MT parameter values that tend toward 0 as the training set size grows.
Finally, with the NIST SD19 data set, when using all training pattern the generalization
error obtain with PSO(MT) is about 0.84% lower than MT-, but comes at the expense
of lower compression and a convergence time that can be two order of magnitude greater
than other strategies. Training with a Multi-Objective PSO (MOPSO) based strategy, where
the cost function accounts for both generalization error and compression would provide
solutions that require fewer internal categories. In addition light weight versions of PSO
may reduce the convergence time.
In this paper, training fuzzy ARTMAP with PSO(MT) has been shown to produce a sig-
nificantly lower generalization error than with other MT strategies. These results are always
produced at the expense of a significantly higher number of training epochs. Nonetheless,
results obtained with PSO(MT) underline the importance of optimizing the MT param-
eter during training, for different problems. The MT parameter values found using this
strategy vary significantly according to training set size and data set structure, and differ
considerably from the popular choice (ε=0+), specially when data has overlapping class
distributions.
Acknowledgements
This research was supported in part by the Natural Sciences and Engineering Research
Council of Canada, and le Fonds qu´
eb´
ecois de la recherche sur la nature et les technologies.
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
24 P. Henniges, E. Granger, R. Sabourin and L. S. Oliveira
References
1. Anagnostopoulos, G. C., Georgiopoulos, M., Verzi, S. J., and Heileman, G. L., ”Boosted Ellipsoid
ARTMAP,” Proc. SPIE – Applications and Science of Computational Intelligence V,4739, 74-85,
2002.
2. Anagnostopoulos, G. C., and Georgiopoulos, ”Putting the Utility of Match Tracking in Fuzzy
ARTMAP Training to the Test,Lecture Notes in Computer Science,2774, 1-6, 2003.
3. Bote-Lorenzo, M. L., Dimitriadis, Y., G ´
omez-S´
anchez, E., “Automatic extraction of human-
recognizable shape and execution prototypes of handwritten characters,Pattern Recognition,
36:7, 1605-1617, 2003.
4. Carpenter, G. A., and Grossberg, S., “A Massively Parallel Architecture for a Self-Organizing
Neural Pattern Recognition Machine,Computer, Vision, Graphics and Image Processing,37,
54-115, 1987.
5. Carpenter, G. A., Grossberg, S., and Rosen, D. B., “Fuzzy ART: Fast Stable Learning and Cate-
gorisation of Analog Patterns by an Adaptive Resonance System,Neural Networks,4:6, 759-771,
1991.
6. Carpenter, G. A., Grossberg, S., and Reynolds, J. H., “ARTMAP: Supervised Real-Time Learn-
ing and Classification of Nonstationary Data by a Self-Organizing Neural Network,Neural Net-
works,4, 565-588, 1991.
7. Carpenter, G. A., Grossberg, S., Markuzon, N., Reynolds, J. H., and Rosen, D. B., “Fuzzy
ARTMAP: A Neural Network Architecture for Incremental Supervised Learning of Analog Mul-
tidimensional Maps,” IEEE Trans. on Neural Networks,3:5, 698-713, 1992.
8. Carpenter, G. A., and Ross, W. D, “ART-EMAP: A Neural Network Architecture for Object
Recognition by Evidence Accumulation,” IEEE Trans. on Neural Networks, 6:4, 805-818, 1995
9. Carpenter, G.A., Gjaja, M.N., Gopal, S., and Woodcock, C.E., “ART Neural Networks for Re-
mote Sesing: Vegetation Classification from Landsat TM and Terrain Data,” IEEE Trans. on Geo-
sciences and Remote Sensing,35:2, 1997.
10. Carpenter, G. A., and Markuzon, N., “ARTMAP-IC and Medical Diagnosis: Instance Counting
and Inconsistent Cases,” Neural Networks,11:2, 323-336, 1998.
11. Carpenter, G. A., Milenova, B. L., and Noeskeand, B. W., “Distributed ARTMAP: a neural
network for fast distributed supervised learning,Neural Networks,11, 793813, 1998.
12. Eberhart, R. C. and Shi, Y., “Comparison Between Genetic Algorihms and Particle Swarm In-
telligence,” in Evolutionary Programming VII, V. W. Porto et al, eds., Springer, 611-616, 1998
13. G´
omez-S´
anchez, E., Gago-Gonzalez, J. A., Dimitriadis, Y. A., Cano-Izquierdo, J. M., Lopez
Coronado, J., “Experimental study of a novel neuro-fuzzy system for on-line handwritten
UNIPEN digit recognition,” Pattern Recognition Letters,19, 357-364, 1998.
14. G´
omez-S´
anchez, E., Dimitriadis, Y. A., Cano-Izquierdo, J. M., Lopez-Coronado, J.,
µARTMAP: Use of Mutual Information for Category Reduction in Fuzzy ARTMAP,IEEE
Trans. on Neural Networks,13:1, 58-69, 2002.
15. Granger, E., Rubin, M., Grossberg, S., and Lavoie, P., “A What-and-Where Fusion Neural Net-
work for Recognition and Tracking of Multiple Radar Emitters, Neural Networks,14, 325-344,
2001.
16. Granger, E., Henniges, P., Sabourin, R., and Oliveira, L. S., “Supervised Learning of Fuzzy
ARTMAP Neural Networks Through Particle Swarm Optimization,” Journal of Pattern Recogni-
tion Research,2:1, 27-60, 2007.
17. Grother, P. J., “NIST Special Database 19 - Handprinted forms and characters database,” Na-
tional Institute of Standards and Technology (NIST), 1995.
18. Henniges, P., Granger, E., and Sabourin, R., “Factors of Overtraining with Fuzzy ARTMAP Neu-
ral Networks,” International Joint Conference on Neural Networks 2005, 1075-1080, Montreal,
Canada, August 1-4, 2005.
19. Kennedy, J., and Eberhart, R. C., “Particle Swarm Intelligence,Proc. Int’l Conference on Neural
November 14, 2007 23:32 WSPC IJCIA07˙PSOMT
Match Tracking Strategies for Fuzzy ARTMAP 25
Network, 1942-1948, 1995.
20. Kennedy, J., and Eberhart, R. C., Swarm Intelligence, Morgan Kaufmann, 2001.
21. Koufakou, A., Georgiopoulos, M., Anagnostopoulos, G., and Kasparis, T., ”Cross-Validation in
Fuzzy ARTMAP for Large Databases,Neural Networks,14, 1279-1291, 2001.
22. Lee, S.-J., and Tsai, H.-L., “Pattern Fusion in Feature Recognition Neural Networks for Hand-
written Character Recognition”, IEEE Transactions on Systems, Man, and Cybernetics Part B:
Cybernetics,28:4, 612-617, 1998.
23. Lerner B., and Vigdor B., ”An Empirical Study of Fuzzy ARTMAP Applied to Cytogenetics,”
IEEE Convention of Electrical and Electronics Engineers in Israel, 301-304, 2004.
24. Lim C. P., and Harrison, R. F., ”Modified Fuzzy ARTMAP Approaches for Bayes Optimal Clas-
sification Rates: An Empirical Demonstration,” Neural Network,10:4, 755-774, 1997.
25. Liu, C.-L., Sako, H., and Fujisawa, H., ”Performance Evaluation of Pattern Classifiers for Hand-
written Character Recognition,” Int’l J. on Document Analysis and Recognition,4, 191-204, 2002.
26. Marriott, S., and Harrison, R. F., ”A modified fuzzy ARTMAP architecture for the approximation
of noisy mappings”, Neural Networks,8:4, 619-41, 1995.
27. Milgram, J., Chriet, M. and Sabourin, R., “Estimating Accurate Multi-class Probabilities with
Support Vector Machines,” International Joint Conference on Neural Networks 2005, 1906-1911,
Montral, Canada, August 1-4, 2005.
28. Murshed, N. A., Bortolozzi, F., and Sabourin, R., ”A Cognitive Approach to Signature Verifi-
cation,” International Journal of Pattern Recognition and Artificial Intelligence (Special issue on
Bank Cheques Processing,11:7, 801-825, 1997.
29. Oliveira, L. S., Sabourin, R., Bortolozzi, F., and Suen, C. Y., ”Automatic Recognition of Hand-
written Numerical Strings: A Recognition and Verification Strategy,” IEEE Transactions on Pat-
tern Analysis and Machine Intelligence,24:11, 1438-1454, 2002.
30. Parsons, O., and Carpenter, G. A., “ARTMAP neural network for information fusion and data
mining: map production and target recognition methodologies,Neural Networks,16, 10751089,
2003.
31. Rubin, M.A., ”Application of Fuzzy ARTMAP and ART-EMAP to Automatic Target Recogni-
tion Using Radar Range Profiles,” Neural Networks,8:7, 1109-1116, 1995.
32. Schutte, J. F., Reinbolt, J. A., Fregly, B. J., Haftka, R. T., and George, A. D., ”Parallel Global
Optimization with Particle Swarm Algorithm,International J. of Numerical Methods in Engi-
neering,61, 2296-2315, 2004.
33. Stone, M., ”Cross-Validatory Choice and Assessment of Statistical Predictions,” Journal of the
Royal Statistical Society, 111-147, 1974.
34. Sumathi, S., Sivanandam, S. N., and Jagadeeswari, R., “Design of Soft Computing Models for
Data Mining Applications,” Indian J. of Engineering and Materials Sciences,7:3, 107-21, 2000.
35. Srinivasa, N., “Learning and Generalization of Noisy Mappings Using a Modified PROBART
Neural Network,” IEEE Trans. on Signal Processing,45:10, 2533-2550, 1997.
36. Valentini, G., ”An Experimental Bias-Variance Analysis of SVM Ensembles Based on Resam-
pling Techniques,IEEE Trans. Systems, Man, and Cybernetics - Part B: Cybernetics,35:6, 1252-
1271, 2005.
37. Verzi, S. J., Heileman, G. L., Georgiopoulos, M., and Healy, M. J., ”Boosting the Performance
of ARTMAP”, IEEE International Joint Conference on Neural Networks Proceedings 1998, An-
chorage, USA, 396-401, 1998.
38. Waxman, A. M., Verly, J. G., Fay, D. A., Liu, F., Braun, M. I., Pugliese, B., Ross, W., Streilein,
W., “A Prototype System for 3D Color Fusion and Mining of Multisensor/Spectral Imagery,Proc.
of the 4th International Conference on Information Fusion, Vol. 1, pp. WeC1-(3-10), Montreal,
Canada, August 7-10, 2001.
39. Williamson, J. R., ”A Constructive, Incremental-Learning Neural Network for Mixture Model-
ing and Classification,” Neural Computation,9:7, 1517-1543, 1997.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We have developed a prototype system in which a user can fuse up to 4 modalities (or 4 spectral bands) of imagery previously registered to one another with respect to a 3D terrain model. The color fused imagery can be draped onto the terrain to support interactive 3D fly-through. The fused imagery, and its opponent-sensor contrasts, can be further processed to yield extended boundary contours and texture measures. Together, these layers of registered imagery and image features can be interactively mined for objects of interest. Data mining for infrastructure and compact targets is achieved using a point-and-click user interface in conjunction with a Fuzzy ARTMAP neural network for on-line pattern learning and recognition. Graphical user interfaces enable the user to control each stage of processing: image enhancement, image fusion, contour and texture extraction, 3D terrain characterization, 3D graphics model building, preparation for exploitation, and interactive data mining. The system is configured as a client-server architecture, enabling remote collaborative exploitation of multisensor imagery. Throughout, the processing of imagery and patterns relies on neural network models of spatial and color opponency, and the adaptive resonance theory of pattern processing. This system has been used to process imagery of a variety of geographic sites, in order to extract roads, rivers, forests and orchards, and performance has been assessed against manually determined "ground truth." The data mining approach has been extended for the case of hyperspectral imagery of hundreds of bands. This prototype system has now been installed at multiple US government sites for evaluation by image analysts. We plan to extend this approach to include various non-imaging sensor modalities that can be localized to geographic coordinates (e.g., GMTI and SIGINT). We also plan to embed these image fusion and mining capabilities in commercial open software environments for image processing and GIS.
Article
Full-text available
In this paper, the impact on fuzzy ARTMAP performance of decisions taken for batch supervised learning is assessed through computer simulation. By learning different real-world and synthetic data, using different learning strategies, training set sizes, and hyper-parameter values, the generalization error and resources requirements of this neural network are compared. In particular, the degradation of fuzzy ARTMAP performance due to overtraining is shown to depend on factors such as the training set size and the number of training epochs, and occur for pattern recognition problems in which class distributions overlap. Although the hold-out learning strategy is commonly employed to avoid overtraining, results indicate that it is not necessarily justified. As an alternative, a new Particle Swarm Optimization (PSO) learning strategy, based on the concept of neural network evolution, has been introduced. It co-jointly determines the weights, architecture and hyper-parameters such that generalization error is minimized. Through a comprehensive set of simulations, it has been shown that when fuzzy ARTMAP uses this strategy, it produces a significantly lower generalization error, and mitigates the degradation of error due to overtraining. Overall, the results reveal the importance of optimizing all fuzzy ARTMAP parameters for a given problem, using a consistent objective function.
Conference Paper
Full-text available
A new methodology for automatic mapping from Landsat Thematic Mapper (TM) and terrain data, based on the fuzzy ARTMAP neural network, is developed. System capabilities are tested on a challenging remote sensing classification problem, using spectral and terrain features for vegetation classification in the Cleveland National Forest. After training at the pixel level, system capabilities are tested at the stand level, using sites not seen during training. Results are compared to those of maximum likelihood classifiers, as well as back propagation neural networks and K Nearest Neighbor algorithms. ARTMAP dynamics are fast, stable, and scalable, overcoming common limitations of back propagation, which did not give satisfactory performance. Best results are obtained using a hybrid system based on a convex combination of fuzzy ARTMAP and maximum likelihood predictions. Fuzzy ARTMAP automatically constructs a minimal number of recognition categories to meet accuracy criteria. A voting strategy improves prediction by training the system several times on different orderings of an input set. Voting assigns confidence estimates to competing predictions
Article
A generalized form of the cross‐validation criterion is applied to the choice and assessment of prediction using the data‐analytic concept of a prescription. The examples used to illustrate the application are drawn from the problem areas of univariate estimation, linear regression and analysis of variance.
Article
Although modern technologies enable storage of large streams of data, but there is no technology which can help to understand, analyze and visualize the hidden information in the data. Data mining also called as data or knowledge discovery is the process of analyzing data from different perspectives and summarizing it into useful information. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarizes the relationships identified. Pattern classification is one particular category of data mining, which enables the discovery of knowledge from very large databases (VLDB). Data mining can be applied to a wide range of applications such as business forecasting, decision support systems, SONAR, RADAR, SEISMIC and medical diagnosis. Artificial neural networks are used to mine the database which has better noise immunity and lesser training time. A self-organizing neural network architecture called predictive ART or ARTMAP is introduced that is capable of fast stable learning, hypothesis testing in response to arbitrary stream of input patterns. A generalization of binary ARTMAP is the fuzzy ARTMAP, which learns to classify input by a pattern of fuzzy membership values between 0 and 1, indicating the extent to which each feature is present. Generalization of fuzzy ARTMAP is the Cascade ARTMAP which has pre-existing symbolic rules that are used to initialize the network before learning so that the network efficiency is increased. This rule insertion also provides knowledge to the network that cannot be captured by training examples. Interpretation of knowledge learned by this neural network leads to compact and simpler rules compared to Back propagation approach. Another self-organizing algorithm is proposed using Kohonen Architecture which also requires lesser time and high prediction accuracy compared to BPN. Moreover, the rules extracted from this network are very simple compared to BPN approach. Finally, the extracted rules have been validated for their correctness. This approach is most widely used in the Medical Industry for correct prediction when the database is large in size. At this time, the manual mining on such a voluminous data is very difficult and also a very time consuming process. Sometimes it may lead to incorrect predictions. Henceforth, the data mining software is developed. The performance evaluation of all three networks namely, Cascade ARTMAP, Fuzzy ARTMAP and Kohonen have been done and compared with conventional methods. Simulation is carried out using the medical data bases taken from the UCI repository of machine learning data bases. The developed data mining software can also be used for other applications like Web, communcations, and pattern recognition.
Article
Ellipsoid ARTMAP (EAM) is an adaptive-resonance-theory neural network architecture that is capable of successfully performing classification tasks using incremental learning. EAM achieves its task by summarizing labeled input data via hyper-ellipsoidal structures (categories). A major property of EAM, when using off-line fast learning, is that it perfectly learns its training set after training has completed. Depending on the classification problems at hand, this fact implies that off-line EAM training may potentially suffer from over-fitting. For such problems we present an enhancement to the basic Ellipsoid ARTMAP architecture, namely Boosted Ellipsoid ARTMAP (bEAM), that is designed to simultaneously improve the generalization properties and reduce the number of created categories for EAM's off-line fast learning. This is being accomplished by forcing EAM to be tolerant about occasional misclassification errors during fast learning. An additional advantage provided by bEAM's desing is the capability of learning inconsistent cases, that is, learning identical patterns with contradicting class labels. After we present the theory behind bEAM's enhancements, we provide some preliminary experimental results, which compare the new variant to the original EAM network, Probabilistic EAM and three different variants of the Restricted Coulomb Energy neural network on the square-in-a-square classification problem.
Article
The novel prototype extraction method presented in this paper aims to advancing in the comprehension of handwriting generation and improving on-line recognition systems. The extraction process is performed in two stages. First, using Fuzzy ARTMAP we group character instances according to classification criteria. Then, an algorithm refines these groups and computes the prototypes. Experimental results on the UNIPEN international database show that the proposed system is able to extract a low number of prototypes that are easily recognizable. In addition, the extraction method is able to condense knowledge that can be successfully used to initialize an LVQ-based recognizer, achieving an average recognition rate of 90.15%, comparable to that reached by human readers.