The Kernel Addition Training Algorithm: Faster Training for CMAC Based Neural Networks
ABSTRACT The rapidly increasing size of databases creates a need for new algorithms to solve multi-class categorisation problems. Machine learning techniques such as neural networks have been successfully applied to this class of problems. However training times for these techniques can blow out as the size of the database increases. Some of the desirable features of algorithms for large databases are low order time complexity, training with only a single pass of the data, and accountability for class assignment decisions. We propose a new training algorithm for Cerebellar Model Articulation Controller (CMAC) based classifiers, which possesses these features. The training algorithm proposed here is based on a kernel addition method. An empirical investigation of this training method has found it to be superior to traditional techniques both in accuracy and time required to learn mappings between input vectors and class labels.
Article: Wrapper subset evaluation facilitates the automated detection of diabetes from heart rate variability measures[show abstract] [hide abstract]
ABSTRACT: Diabetes affects almost one million Australians, and is associated with many other conditions such as vision loss, heart failure and stroke. Any improvement in early diagnosis would therefore represent a significant gain with respect to reducing the morbidity and mortality of the Australian population. In this study we apply signal processing and automated machine learning to analyse heart rate variability measures. These data are well suited to the diagnosis of cardiac dysfunction, but here we use the same measures to detect diabetes. By applying appropriate methods we were able to select the most relevant features to use as input to a variety of classifier algorithms. We compare sensitivity and specificity results obtained from these classifier algorithms. Results suggest that the detection of diabetes is feasible from heart rate variability measures.
The Kernel Addition Training Algorithm: Faster Training for CMAC
Based Neural Networks
David Cornforth1 David Newth2
1,2School of Environmental and Information Sciences, Charles Sturt University, Australia
(E-mail : firstname.lastname@example.org, email@example.com)
The rapidly increasing size of databases creates a
need for new algorithms to solve multi-class
techniques such as neural networks have been
successfully applied to this class of problems.
However training times for these techniques can blow
out as the size of the database increases. Some of the
desirable features of algorithms for large databases
are low order time complexity, training with only a
single pass of the data, and accountability for class
assignment decisions. We propose a new training
algorithm for Cerebellar
Controller (CMAC) based classifiers, which possesses
these features. The training algorithm proposed here
is based on a kernel addition method. An empirical
investigation of this training method has found it to be
superior to traditional techniques both in accuracy
and time required to learn mappings between input
vectors and class labels.
Keywords: CMAC, Neural Networks, Training
A well-studied class of machine learning problems
is that of categorisation, or classification. Here the key
is to determine some relationship between a set of
input vectors that represent stimuli, and a
corresponding set of values on a nominal scale that
represent category or class. The relationship is
obtained by applying an algorithm to training samples
that are 2-tuples <u, z>, consisting of an input vector
u and a class label z. The learned relationship can then
be applied to instances of u not included in the
training set, in order to discover the corresponding
class label z . A number of machine learning
techniques including Genetic Algorithms , and
Neural Networks , have been shown to be very
effective for solving such problems.
There are many large databases in existence that
could yield valuable information if efficient and
scalable methods of automated classification could be
found . Some of the desirable features of
algorithms for automated classification are: low order
time complexity, training with only a single pass of
the data, and accountability for class assignment
Many algorithms for automated classification have
an inherently non-linear relationship between time
taken by the algorithm to run and the number of
training examples. Analysis methods that work well
for small data sets are completely impractical when
applied to larger data sets. For example, training of a
neural network using Backpropagation is known to be
NP-complete . Some studies suggest that
Evolutionary Algorithms have polynomial time
A CMAC based neural network can be trained
faster than a neural network using back-propagation,
but the method still requires multiple passes of the
training data. In this paper we propose a Kernel
Addition Training Algorithm (KATA) as a more
effective learning algorithm for the Cerebellar Model
Articulation Controller (CMAC). Our proposed
method requires only a single pass of the data and we
provide a probability model for class assignment
2.Cerebellar Model Articulation Controller
The Cerebellar Model Articulation Controller, or
CMAC, is a class of sparse coarse-coded associative
memory algorithms that mimic the functionality of the
mammalian cerebellum . Originally CMAC was
proposed as a function modeler for robotic controllers
, but has been extensively used in reinforcement
learning  and also as a classifier system .
We visualise an input vector u of size d as a point
in d-dimensional space. The input space is quantised
using a set of overlapping tiles as shown in Fig. 1. For
input spaces of high dimensionality, the tiles form
hypercubes. A query is performed by first activating
all the tiles that contain a query point. The activated
tiles in turn activate memory cells, which contain
stored values; the weights of the system (Fig. 2). The
summing of these values produces an overall output.
A change of value of the input vector results in a
change in the set of activated tiles, and therefore a
change in the set of memory cells participating in the
CMAC output. The CMAC output is therefore stored
in a distributed fashion, such that the output
corresponding to any point in input space is derived
from the value stored in a number of memory cells.
Fig. 1. The CMAC tile configuration, with a query point
activating a tile in both the tile sets, after .
Fig. 2. Active tiles activate memory locations. These contain
values that are summed to produce the output.
The memory size required by a CMAC depends on
the number of tilings and the size of tiles. If the tiles
are large, such that each tile covers a large proportion
of the input space, a coarse division of input space is
achieved, but local phenomena have a wide area of
influence. If the tiles are small, a fine division of input
space is achieved and local phenomena have a small
area of influence. The number of tiles, and therefore
the number of memory cells, is usually sufficiently
large to become prohibitive due to memory
constraints. Many of these tiles are never used due to
the sparse coverage of the input space. It is usual to
employ a consistent random hash function to collapse
the large tiling space into a smaller memory cell space
A CMAC learns a mapping from an input space
U∈ℜ d to an output space Z∈ℜ, where d is the
number of dimensions, or the size of the input vector.
Following existing convention this can be broken into
three mappings :
The input space to multi-layer tiling system
The multi-layer tiling system to memory table
The memory table to output mapping (weighted
The mapping E can be implemented using simple
integer division in each dimension. The integer values
for each dimension are combined to form one address
for each tiling layer. Addresses for the other tiling
layers are calculated in a similar way. The mapping H
receives q addresses that must be mapped to memory
cells. This mapping is usually implemented by a
hashing function. The mapping W is a weighted
summation of the contents of the memory cells. These
values are set during training.
Step kernel function
Fig. 3. Kernel functions embedded in a 2-dimensional tiling
Linear kernel function
An improvement over the Albus CMAC is the now
widely adopted practice of embedding kernel
functions into the quantising regions . This
modifies the output mapping to a weighted
Each weight y is indexed by address a, and the
kernel function k is applied to some distance measure
of a point from the centre of the tile. The number of
tiling layers is q. Some common kernel functions are
illustrated in Fig. 3.
The Albus CMAC is adapted as a classifier by
adopting a suitable mapping between output variable
z and class label c. One possible mapping  takes
two class problems . For problems with more than
two classes, we define threshold values such as to
divide up the scalar range of z into the number of
classes to be represented. We designate this a scalar
mapping. We note that the scalar mapping is not ideal,
as it represents categorical data on a continuous scale,
and there is no information about degree of
membership of a class.
. This is sufficient for
2.2.Albus Training Algorithm
The Albus CMAC is trained by evaluating the
error as the difference between desired output zd and
actual output z, and updating the active weights at
each time step t:
Such error minimisation algorithms have been
proved to converge . A gain term β is introduced
to control the convergence. In our experiments, this
was set to 1.0 at the start of training, and reduced
during training, as this guarantees quick convergence
. The number of epochs used must be
sufficient to allow convergence, but not too many so
as to cause over fitting. In the algorithm used in this
work, after one epoch of training on two thirds of the
data, the trained classifier was tested on the remaining
one third. The accuracy was calculated as the number
of correctly classified samples divided by the number
of samples used during training. After each epoch, the
accuracy was compared to that from the previous
epoch. If the accuracy had increased by less than
0.1%, then training was terminated.
2.3.Kernel Addition Training Algorithm
Consider an alternative output mapping, using a
CMAC for each class:
where m is the number of classes. We designate
this a vector mapping. We further wish to train each
CMAC so that zv represents a relative probability of
selecting class c. Then it is possible to take account of
a priori probability using Bayes Law:
where P represents probability . The frequency
of samples occurring in each class may be used to
estimate P(ci). The output zi is used to estimate P(x|ci).
There is no need to calculate the denominator, as
assignment to the highest probability class requires
KATA uses a vector class mapping. Since the
magnitude of z, not its value, is required for class
assignment, there is no need for incremental training.
As each training vector is presented, a kernel function
value for each activated tile is added to the value of
the corresponding memory cell. The weights converge
to a value that depends upon the kernel function.
Assuming n training points distributed uniformly over
the cell, the expected value of a cell after training will
be n.ke, where ke is the expected value of the kernel
function. If the kernel function is the step function,
the value of each memory cell after training is a count
of the number of times the corresponding tile was
accessed during training.
After training, a CMAC forms a piecewise model
of the probability density function for the
corresponding class. There is no need to normalise the
output as in equation (1), so the output is given by:
The KATA CMAC is trained using the value of the
+ ) 1(
In contrast to the Albus training algorithm, KATA
is not an iterative algorithm. The weights are updated
during a single presentation of the training data at the
inputs. The model avoids over training by distributing
the contribution of one cell over q-1 others, due to the
overlapping arrangement of cells. The issue of over
fitting during training is addressed in the empirical
trials by the use of cross validation, which introduces
previously unseen input data during testing. The effect
of outliers is minimised by the cumulative nature of
the algorithm. Any outliers will have minimal effect
on the output because of the smoothing afforded by
3.Experiments and Results
From the equations 3 and 7, it can be seen that
KATA should be much faster than one iteration of the
Albus algorithm. The speed advantage will not be as
great as suggested by considering only this equation,
as there is a software overhead associated with
training. However, we would expect that KATA
would be faster than Albus. This conclusion was
tested using computer models of the two algorithms
for comparison purposes.
3.1.Experimental Test Problem
The two CMAC learning algorithms were tested
using the parity problem. This problem was chosen
because of its low spatial frequency, ensuring that
there will be enough samples to discriminate classes
in tests with a high number of dimensions or a high
number of classes. In this problem the input space is
partitioned into m regions in each dimension. If the
inputs are x, and the range of input is r, then output is
If there are just two input variables the problem is
known as Exclusive-OR (or XOR). This is known to
be a difficult problem as it is not linearly separable
. The parity problem for three inputs and two
classes is shown in Fig. 4.
Fig. 4. A three-dimensional input space for the parity
problem. Light regions represent class 0 and dark regions
represent class 1.
0 200 400600800 1000
Fig. 5. Training time with 2 dimensions and 2 classes.
0 200 400600800 1000
Fig. 6. Accuracy with 2 dimensions and 2 classes.
Data sets were generated using randomly generated
x values, and assigning a class label to each record
according to (8) above. Seven data sets were
generated, containing from 2 to 5 dimensions and
from 2 to 5 classes. Each database consisted of 1
million samples, with input variables drawn from a
0200400 600800 1000
Fig. 7. Training time with 3 dimensions and 2 classes.
Fig. 8. Accuracy with 3 dimensions and 2 classes.
Both versions of CMAC used the same parameters.
Input space was uniformly quantised in all
dimensions. Tile spacing was based on the work of
. A hashing function with chaining was used to
achieve zero collisions. The distance measure used for
kernels was Euclidean, and the kernel function used
was linear. Both versions were implemented in C++,
using similar data structures and components in order
to make the resulting code as similar as possible, and
thereby enable meaningful comparisons of running
The performance of the two algorithms was tested
using a cross validation method. The data sets used
were each divided into three parts at random. Training
was performed using two parts of the data, and the
trained model was tested on the remaining one part.
This was done three times using a different part for
testing. In this manner the model was tested on all
data, and reported a number of correctly classified
samples, which was divided by the size of the data set
to obtain a percentage accuracy. The choice of the
fraction one third is a compromise between using all
data to train, which may result in over fitting the
model, and using less data to be computationally
Figs. 5 to 18 show the results for the parity
problem. Training time in seconds, and accuracy in
percent correct, are shown on the vertical axis on
alternating figures. The number of samples is shown
on the horizontal axis of all figures. The graphs of
training time against number of samples suggest a
linear relationship between these variables. This is
apparent for both algorithms. In all tests the KATA
algorithm was about 2.5 to 3 times as fast as Albus.
We note that in all these tests, the Albus training
algorithm used a variable number of training epochs,
which explains in part the occasional outliers. So the
relative speed advantage of KATA depends on the
number of iterations of the Albus algorithm. KATA is
faster than a single iteration of Albus, as expected.
Fig. 9. Training time with 4 dimensions and 2 classes.
Fig. 10. Accuracy with 4 dimensions and 2 classes.
Fig. 11. Training time with 5 dimensions and 2 classes.
0 200400600 8001000
Fig. 12. Accuracy with 5 dimensions and 2 classes.
Fig. 13. Training time with 2 dimensions and 3 classes.
Fig. 14. Accuracy with 2 dimensions and 3 classes.
Fig. 15. Training time with 2 dimensions and 4 classes.
Fig. 16. Accuracy with 2 dimensions and 4 classes.
Fig. 17. Training time with 2 dimensions and 5 classes.
Fig. 18. Accuracy with 2 dimensions and 5 classes.
The graphs of accuracy against number of samples
show that KATA is consistently superior to the
iterative training technique. When the problem
becomes more difficult, using more classes or
dimensions, the performance of the classifier is bound
to deteriorate, because the number of samples
available for each homogenous block of the input
space decreases. Since the Albus technique uses an
error minimisation, this is an inherently biased model,
whereas KATA uses an unbiased model of input
space. Therefore the accuracy of the classifier trained
using KATA degrades more slowly.
The main result of this work is the demonstration
that the Albus perceptron or CMAC can be trained
using a non-iterative method, using only a single pass
of the data. KATA provides a lower order relationship
between training time and number of samples, in
contrast to error minimisation algorithms, in which
the training time depends on the number of training
iterations required. This allows CMAC based
classifiers to be applied to very large databases.
KATA allows a CMAC to be trained in a single pass
of the data, avoiding the need for training data to be
held in memory during training. This enables the
processing of large databases to be computationally
feasible. The output encoding of KATA offers
accountability for class assignment decisions and
allows a priori probability to be accounted for. KATA
is not sensitive to the order in which input samples are
presented. For the experiments presented here we
have found KATA to be consistently faster and more
accurate than the error minimisation technique
proposed by Albus. This new training algorithm has
great potential for application
This work was funded in part by a CSU Faculty
Seed Grant. The authors wish to thank the New South
Wales Centre for Parallel Computing (NSWCPC) for
the use of their SGI Power Challenge machine.
 J.S. Albus, “A New Approach to Manipulator Control:
the Cerebellar Model Articulation Controller (CMAC)”,
Trans. ASME, Series G. Journal of Dynamic Systems,
Measurement and Control, Vol. 97, pp. 220-233, 1975.
 J.S. Albus, “Mechanisms of Planning and Problem
Solving in the Brain”, Mathematical Biosciences, Vol.
45, pp. 247-293, 1979.
 P.C.E. An, W.T. Miller, and P.C. Parks, “Design
Improvements in Associative Memories for Cerebellar
Model Articulation Controllers”, Proc. ICANN, pp.
 T.H. Corman, C.E. Leiserson, and R.L. Rivest,
Introduction to Algorithms, McGraw-Hill, 1990.
 D. Cornforth and D. Elliman, “Modelling Probability
Density Functions for Classifying using a CMAC”, in M.
Taylor and P. Lisboa (Eds.), Techniques and
Applications of Neural Networks, Ellis Horwood, 1993.
 D. Cornforth, Classifiers for Machine Intelligence, PhD
Thesis, Nottingham University, UK, 1994.
 T.G. Dietterich and G. Bakiri, “Solving Multiclass
Learning Problems Via Error-Correcting Output Codes”,
Journal of Artificial Intelligence Research, Vol. 2, pp.
 R.O. Duda, and P.E. Hart, Pattern Classification and
Scene Analysis, John Wiley and sons, 1973.
 Z.J. Geng and W. Shen, “Fingerprint Classification
Using Fuzzy Cerebellar Model Arithmetic Computer
Neural Networks”, Journal of Electronic Imaging, Vol.
6, No. 3, pp. 311-318 1997.
 F.J. Gonzalez-Serrano, A. R. Figueiras-Vidal, and A.
Artes-Rodriguez, “Generalizing CMAC Architecture and
Training”, IEEE Transactions on Neural Networks, Vol.
9, No. 6, pp. 1509-1514, 1998.
 J. Han and M. Kamber Data Mining Concepts and
Techniques, Morgan Kaufman, 2001.
 J. He and X. Yao, “Drift Analysis and Average Time
Complexity of Evolutionary Algorithms”, Artificial
Intelligence (to Appear), 2001.
 J. Holland, Adaptation in Natural and Artificial
Systems: An Introductory Analysis with Applications to
Biology, Control, and Artificial Intelligence, MIT Press,
 C.A. Kulikowski and S. Weiss (Eds.), Computer
Systems That Learn: Classification and Prediction
Methods From Statistics, Neural Nets, Machine
Learning, and Expert Systems, Morgan Kaufman, 1991.
 S.H. Lane, D.A. Handelman, and J.J. Gelfand, “Theory
and Development of Higher-Order CMAC Neural
Networks”, IEEE Control Systems, pp. 23-30, 1992.
 C. Lin and C. Chiang, “Learning convergence of
CMAC Technique”, IEEE Transactions on neural
networks, Vol. 8, No. 6, pp. 1282-1292, 1997.
 P.C. Parks and J. Militzer, “Improved Allocation of
Weights for Associative Memory Storage in Learning
Control Systems”, IFAC Design Methods of Control
Systems, Zurich, Switzerland, pp. 507-512, 1991.
 A. Roy, S. Govil and R. Miranda, “A Neural-Network
Learning Theory and a Polynomial Time RBF
Algorithm”, IEEE Transactions on Neural Networks,
Vol. 8, No. 6, pp. 1301-1313, 1997.
 J.C. Santamaria, R. S. Sutton and A. Ram,
“Experiments with Reinforcement Learning in Problems
with Continuous State and Actions Spaces”, Technical
Report UM-CS-1996-088, Department of Computer
Science, University of Massachusetts, MA., 1996.
 M. Wiering, R. Salustowicz, and J. Schmidhuber,
“Reinforcement Learning Soccer Teams with Incomplete
World Models”, Autonomous Robots, Vol. 7, pp. 77-88,
 Y. Wong, “CMAC Learning is Governed by a Single
Parameter”, IEEE International Conference on Neural
Networks, San Francisco, Vol. 1, pp. 1439-1443, 1993.
 S. Yao and B. Zhang, “The Learning
Convergence of CMAC in Cyclic Learning”,
Proceedings of the International Joint Conference
on Neural Networks, 1993.