Context quantization by kernel Fisher discriminant.
ABSTRACT Optimal context quantizers for minimum conditional entropy can be constructed by dynamic programming in the probability simplex space. The main difficulty, operationally, is the resulting complex quantizer mapping function in the context space, in which the conditional entropy coding is conducted. To overcome this difficulty, we propose new algorithms for designing context quantizers in the context space based on the multiclass Fisher discriminant and the kernel Fisher discriminant (KFD). In particular, the KFD can describe linearly nonseparable quantizer cells by projecting input context vectors onto a high-dimensional curve, in which these cells become better separable. The new algorithms outperform the previous linear Fisher discriminant method for context quantization. They approach the minimum empirical conditional entropy context quantizer designed in the probability simplex space, but with a practical implementation that employs a simple scalar quantizer mapping function rather than a large lookup table.
- SourceAvailable from: Gunnar Rätsch[show abstract] [hide abstract]
ABSTRACT: We incorporate prior knowledge to construct nonlinear algorithms for invariant feature extraction and discrimination. Employing a unified framework in terms of a nonlinearized variant of the Rayleigh coefficient, we propose nonlinear generalizations of Fisher's discriminant and oriented PCA using support vector kernel functions. Extensive simulations show the utility of our approach. Index Terms—Fisher's discriminant, nonlinear feature extraction, support vector machine, kernel functions, Rayleigh coefficient, oriented PCA.IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2003; 25:623-633. · 4.80 Impact Factor
Conference Proceeding: Fisher discriminant analysis with kernels[show abstract] [hide abstract]
ABSTRACT: A non-linear classification technique based on Fisher's discriminant is proposed. The main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space. The linear classification in feature space corresponds to a (powerful) non-linear decision function in input space. Large scale simulations demonstrate the competitiveness of our approachNeural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE Signal Processing Society Workshop; 09/1999
- [show abstract] [hide abstract]
ABSTRACT: We consider the problem of finding the quantizer Q that quantizes the K-dimensional causal context C i = (X iGammat i ; X iGammat 2 ; : : : ; X iGammat K ) of a source symbol X i into one of M conditioning states such that the conditional entropy H(X i jQ(C i )) is minimized. The resulting minimum conditional entropy context quantizer can be used for sequential coding of the sequence X0 ; X1 ; X2 ; : : :. 1 Introduction A key problem in sequential source coding of a discrete random sequence X 0 ; X 1 ; X 2 ; Delta Delta Delta is modeling the underlying conditional distribution of the source P (X i jX iGamma1 ); (1) where X iGamma1 denotes X 0 ; X 1 ; Delta Delta Delta ; X iGamma1 , the prefix of X i . Because of model estimation considerations, it is not possible to directly use all of X iGamma1 as the model's context. Indeed, given a model class, the order of the model or the number of model parameters needs to be carefully selected so as not to negatively i...01/2000;
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2006169
Context Quantization by Kernel Fisher Discriminant
Mantao Xu, Xiaolin Wu, Senior Member, IEEE, and Pasi Fränti
Abstract—Optimal context quantizers for minimum conditional
entropy can be constructed by dynamic programming in the
probability simplex space. The main difficulty, operationally, is
the resulting complex quantizer mapping function in the context
space, in which the conditional entropy coding is conducted. To
overcome this difficulty, we propose new algorithms for designing
context quantizers in the context space based on the multiclass
Fisher discriminant and the kernel Fisher discriminant (KFD). In
particular, the KFD can describe linearly nonseparable quantizer
cells by projecting input context vectors onto a high-dimensional
curve, in which these cells become better separable. The new
algorithms outperform the previous linear Fisher discriminant
method for context quantization. They approach the minimum
empirical conditional entropy context quantizer designed in the
probability simplex space, but with a practical implementation
that employs a simple scalar quantizer mapping function rather
than a large lookup table.
Index Terms—Context quantization, entropy coding, Fisher dis-
criminants, image compression.
KEY and important task in compressing a dis-
is the estimation of
is the prefix or context of. Given
a class of source models, the model order or the number
of parameters must be carefully chosen in the principle of
minimum description length or universal source coding. The
pioneer solutiontothe problemis Rissanen’s algorithm Context
, which dynamically selects a variable-order subset of the
past samples in
, called the context
structures the contexts of different orders by a tree and it can
be shown to be, under certain assumptions, universal in terms
of approaching a minimum adaptive code length for a class of
finite memory sources. A more recent and increasingly popular
universal source-coding technique is context tree weighting .
The idea is to weight the probability estimates associated with
different branches of a context tree to obtain a better estimate
Although the tree-based context modeling techniques have
had remarkable success in text compression, applying them to
image compression poses great difficulty. The context tree can
. The algorithm
Manuscript received June 1, 2004; revised January 14, 2005. This work was
supported in part by NSERC, in part by the National Science Foundation, and
in part by a Nokia Research Fellowship. The associate editor coordinating the
review of this manuscript and approving it for publication was Dr. Thierry Blu.
M. Xu and P. Fränti are with the Department of Computer Science, Univer-
sity of Joensuu, 80101 Joensuu, Finland (e-mail: xu@ cs.joensuu.fi; franti@
X. Wu is with Department of Electrical and Computer Engineering, Mc-
Master University, Hamilton, ON L8G 4K1, Canada, and also with the Univer-
sity of Joensuu, 80101 Joensuu, Finland (e-mail: email@example.com).
Digital Object Identifier 10.1109/TIP.2005.860357
only model a sequence but not a two-dimensional (2-D) signal
like images. In order to apply the context tree-based techniques
to image coding, one needs to schedule the pixels (or transform
coefficients) of an image into a linear sequence as proposed by
the authors of , . Recently, Mrak et al. investigated how to
optimize the ordering of the context parameters within the con-
text trees , but any linear ordering of pixels will inevitably
destroy the intrinsic 2-D sample structures of an image. This is
why most image/video image compression algorithms choose a
priori 2-D context models with fixed complexity, based on do-
main knowledge such as correlation structure of the pixels and
typical input image size, and estimate only the model parame-
ters. For instance, the JBIG standard for binary image compres-
tual coding is implemented by sequentially applying arithmetic
coding based on the estimated conditional probabilities.
Estimating the conditional probabilities
or/and if the symbol alphabet is large with respect to the length
of the input signal, which is the case for image/video compres-
sion. Context quantization is a common technique to overcome
this difficulty –. For example, the state-of-the-art lossless
image compression algorithm CALIC  and the JPEG 2000
entropy-coding algorithm EBCOT  quantize the context,
into a relatively small number
denotes a context quantizer.
Contextquantization is a form of vectorquantization because
(i.e., the context model has order ). Naturally, the objec-
tive of optimal context quantization should be minimization of
the entropy function
would like to make
sible for a given
, or minimize the Kullback–Leibler distance
of conditioning states, and es-
, instead of,
as close toas pos-
tual code length which should include the model cost. Although
the Kullback–Leiber distance (relative entropy) is not strictly a
distance metric for its violation of symmetry and triangular in-
equality, the standard practice is to use it as a nonnegative “dis-
tortion” of context quantizer
The problem of context quantization in minimizing Kull-
back–Leibler distance was first studied by Wu  and then by
Chen  for the application of wavelet image compression.
Greene et al. also developed optimal context quantization
algorithm for compression of binary images . Recently,
referring to the true source entropy is not the ac-
1057-7149/$20.00 © 2006 IEEE
170IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2006
Forchhammer et al. proposed a context quantizer design algo-
rithm under the criterion of minimal adaptive code lengths, and
applied it to lossless video coding. A more theoretical treatment
of the problem can be found in .
The existing context quantizer design algorithms can be
classified into two approaches: those that form coding contexts
directly in the context space of conditioning events (or the
feature space in the terminology of classification and pattern
recognition) like  and , and those that form coding
contexts in the probability simplex space , , . In the
context space, one can apply the generalized Lloyd method 
to design a context quantizer by clustering raw contexts of a
training set according to Kullback–Leiber distance, which was
the idea in , but this iterative approach of gradient descent
cannot guarantee the globally optimal solution. If the random
quantization can be converted to a scalar quantization problem
in the probability simplex space of
makes it possible to design globally optimal context quantizer
by dynamic programming (DP) , , . For the sake of
rigor, weremindthereader thattheabove-mentionedoptimality
is with respect to the statistics of the chosen training data. In
practice, if the statistics of an input image mismatches those of
the training set, then the coding performance becomes of course
suboptimal. Nevertheless, designing optimal context quantizer
still has practical significance because situations exist where
suitable training set can be found. Furthermore, an off-line
optimized context quantizer can be used in conjunction to
adaptive arithmetic coding to compensate for any coding loss
due to the mismatch of statistics.
Regardless of what space is chosen to design the context
quantizer, an input context (feature) vector
the random variable
) has to be mapped to a coding state (a
context quantizer cell) when it comes to actual context-based
face a common operational difficulty of complex quantizer
. Unlike in conventional VQ, the cells
(coding states) of optimal context quantizer are not convex
or even connected in the context space. Since the quantizer
context space of , its description seems only possible via table
lookup. Unfortunately, the table size required by
exponentially in the order of the context. To circumvent this
contexts , i.e., limiting the resolution of the context space ,
is the projection by the linear Fisher discriminant (LFD) .
However, all these techniques compromise optimality. In this
. We have made a measured progress in meeting the
objective by designing context quantizers using kernel Fisher
characterizes the structure of the cells of context quantizer in
both probability simplex space and context space and exposes
the complexity of quantizer mapping function. The main results
. This change of space
(a realization of
of this research, i.e., the context quantizer design algorithms
based on multiclass LFD and KFD, are presented in Section III.
The details of the design algorithm by using KFD are given
in Section IV. Section V presents some experimental results,
and the conclusion follows in Section VI.
II. STRUCTURE AND COMPLEXITY OF QUANTIZER MAPPING
A context quantizer
partitions a -dimensional context
The criterion of minimizing the Kullback–Leibler distance in
context quantizer design leads to complex structures and shapes
of quantizer cells, which are in general not convex or even con-
nected . However, the associated sets of probability mass
are simple convex sets in the probability simplex space of
owing to a necessary condition for minimum conditional en-
is a binary random variable, then the probability sim-
plex is one-dimensional (1-D). In this case, the quantizer cells
are simple intervals. Let
as a function of context ) be a random
variable, then the conditional entropy
can be expressed by
of a context
the unit interval into
Thus, the minimal condition entropy context quantizer
(MCECQ) can be reduced to a scalar quantization problem in
, even though the contextis drawn from a -dimensional
vector space. The globally optimal solution of the problem
design problem can be solved in
number of raw, i.e. unquantized contexts, thanks to a so-called
concave Monge property of the objective function (1) .
is scalar quantized for minimal empirical conditional
entropy of a training set, the optimal MCECQ cells
formed implicitly by
time, whereis the
wise one would directly drive an entropy coder with
is seldom known exactly in practice. Other-
XU et al.: CONTEXT QUANTIZATION BY KERNEL FISHER DISCRIMINANT171
The ? and ? axes represent values of the first two elements in raw context [the two directional gradients ????? ? ?? ? ????? ? ?? and ??? ? ???? ? ??? ? ????
as given in (12) and (13)]. The symbols
, ?, and ? in the scatter plot are, respectively, the raw contexts of cells ? , ? , and ? .
Example distribution of MCECQ cells ?
in context space, for ? ? ? and the source of least significant bits of DPCM errors of image cameraman.
Instead, a training set is used to estimate
showed that the partition of the context space
is generally very complex in shape and structure, re-
sulting highly irregular quantizer mapping function
example of the distribution of
in Fig. 1. Only when
are bounded by quadratic surfaces . Consequently,
the implementation of an arbitrary quantizer mapping function
becomes an operational difficulty in using MCECQ in prac-
tice, which is the main issue that motivated this research.
The simplest way of implementing
, the number of all possible raw contexts, grows
exponentially in the order of contexts, building a huge table of
entries foris clearly impractical. Hashing techniques can
be used to avoid excessive memory use of the
ploiting the fact that the actual number of different raw con-
texts appearing in an input image is much smaller than
this saving of memory is at the expense of increased time of
quantizer mapping operation when collision in table access oc-
curs. To achieve constant execution time of the quantizer map-
ping function, the size of hashing table has to be larger than the
of image coding, the table size needs to be comparable to the
image size since many raw contexts have very low frequency of
is through projection. Wu proposed a suboptimal context
. The idea was to project the training context vectors in the
such that the two marginal posterior distributions of
maximum separation. Then, a DP algorithm was used to form a
. Wu et al. 
in the context space is given
is to use a lookup table.
table by ex-
, , have
to minimize the conditional entropy
-partition of the corresponding 1-D projection space
in which the intervals
is a scalar one in the projection direction
of the original context space
proach is suboptimal, it simplifies the quantizer mapping func-
if and only if
operational advantages in practice .
,, define the con-
. In this design approach the context quantizer
, i.e., a subspace
. Although the projection ap-
, which has
III. IMPROVED DESIGN ALGORITHMS
OF FISHER DISCRIMINANTS
The progress made by this paper is to combine the advan-
tages of the two MCECQ design approaches in the probability
simplex space and in the projection context space of Fisher’s
discriminant. Namely, we seek to attain simultaneously the op-
timality of MCECQ in probability simplex space and the sim-
plicity of quantizer mapping in the projection space.
A. Multiclass LFD
In , a LFD was used to separate the two posterior dis-
is a two-class classification problem. However, the success of
this approach is limited to cases where
are linearly separable to certain degree, but,
for more difficult, linearly nonseparable shapes of context cells,
a departure from  is needed. We seek to separate the
timal MCECQ cells formed in the probability simplex space via
a suitable, nonlinear projection of the context space. The goal
172IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2006
is to apply the discriminant classifier to form a convex parti-
tion in the projection subspace that best matches the optimal
s in the probability simplex space. The multi-
class Fisher discriminant  lends us a tool to design a clas-
sifier that approximates the optimal partition of contexts in the
probability simplex space by an optimized partition in a projec-
tion subspace. The separation of input classes (i.e., the partition
s formed by MCECQ in the context space) in projection
can be measured by the so-called F-ratio validity
mean vector of all raw context samples. The multiclass LFD is
the maximization of F-ratio validity index in (3), i.e.
is the class label of each sampleandis the
represents a discriminant vector in raw context space.
in (4) are the between-class covariance matrix and
the within-class covariance matrix, respectively
in context space, respectively. After the projection direction
is determined by (4), one can still apply DP to the projected
to optimize context quantizer the same way as in
andare the mean vector and sample size of class
The multiclass LFD outperformed the two-class LFD in
terms of designing context quantizers of shorter code length
in our experiments (see Section V). But the contexts of dif-
ferent MCECQ cells (input classes for the Fisher discriminant)
are not linearly separable in the context space as shown in
. A superior alternative is to use a nonlinear classifier of
higher discriminating power. Encouraged by the success of the
kernel-based learning machines, such as support vector ma-
chine, kernel principal component analysis and KFDanalysis in
many other classification and learning applications –,
we propose a new design technique of context quantizers by
using the multiclass kernel Fisher discriminant. The multiclass
kernel Fisher discriminant has been intensively studied as a
generalization of discriminant analysis using kernel approach
, . As an extension of Fisher discriminant, the kernel
one is known for its high discriminating powers on the input
clusters of complex structures. The kernel discriminant first
maps the source feature vectors (or context vectors in MCECQ
design) into some new feature space
classes are better separable. A linear discriminant is computed
to separate input classes in
. Implicitly, this process constructs
a nonlinear classifier of high discriminating power in the orig-
inal feature space. In our application of context quantization,
the objective of the kernel discriminant is, given an
a projection direction
in a new feature space
s are most separable in
applied to design an MCECQ in
implicitly constructs a context quantizer in the context space
be the nonlinear mapping from context space to
some high-dimensional Hilbert space
insuch that the F-ratio validity index
in which different
,, to find
. A DP algorithm is then
. The resulting MCECQ in
. Our goal is to find the
is maximized, where
within-class covariance matrices. Since the space
A technique to overcome this difficulty is the Mercer kernel
Hilbert feature space
. A popular choice for the kernel func-
that has been proved useful (e.g., in support vector ma-
chines) is the Gaussian radial basis function (RBF),
.It isknownthatundersomemild assump-
and, any solution
be written as thelinear span of all mapped contextsamples 
andare the between-class and
is of very
, which is the dot product in
maximizing (5) can
As a result, the F-ratio
can be reformulated as
whereis the kernel matrix,and
class labels, and
are membership vectors corresponding to
is the vector of all ones. The projection of a
tion. The superior discriminating power of KFD over the LFD
is the RBF kernel func-
XU et al.: CONTEXT QUANTIZATION BY KERNEL FISHER DISCRIMINANT173
Fig. 2.Separability of two MCECQ cells ? in (a) and ? in (b) in the projection subspace formed by the KFD.
Fig. 3.Separability of two MCECQ cells ? in (a) and ? in (b) in the projection subspace formed by the LFD.
method of  for MCECQ design is illustrated in Figs. 2 and
3. The plots are for the context vectors in the modeling of the
the histograms of the projected MCECQ cells
andthan LFD. Note that the projection of
KFD is in general nonlinear unlike the classic LFD.
Computationally, the KFD problem is to find the leading
. As the dimension of
the number of source samples
matrix obtained from only
form of regularization is necessary. The simplest solution is
to add either the identity or kernel matrix
, is replaced by
the problem numerical more stable because the within-class
becomes more positive definite for large
roughly equivalent to add independent noises to each of the
is higher than
is a highly singular
source samples, some
. This makes
. It is also
IV. IMPLEMENTATION OF KFD FOR CONTEXT QUANTIZATION
In the above formulations, matrices
is too high for large
tion applications, we are not able to use all the basis functions
corresponding to all raw training contexts. Solving the KFD for
. However, this scheme can not be directly applied to esti-
mating the multiclass KFD. The possible solution applicable to
any choice of
andis to restrict the discriminator
a subspace of
, as proposed in  and . Instead of using
(6), we express
in the subspace
andare too large in
. More importantly, in context quantiza-
to be in
raw training context samples or estimated by some clustering
, and samplescould be either selected from all
174IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2006
algorithms. Without loss of generality, if we choose each
(8) from the training set,
is the -dimensional vector,and
covariance matrices with
ically as the reduced set method for supported vector machines
in . This approximation can be incrementally improved by
adding a raw context sample or a new context base one at a time
to the existing expansion, i.e., incrementing the dimensionality
by one at a time. Such incremental expansion can be done in
a greedy fashion, as follows. For each iteration, we first ran-
domly select a subset
of the remaining training set, and then
we conduct an exhaustive search in
remaining training set, for the training context
being added to (8). The proper size of
to be 59in order toobtain nearlyas gooda performanceas if the
search was through the entire remaining training set . Since
, incrementing the kernel expansion (8) by one base con-
text merely takes
time. Consequently, the approxima-
tion of the kernel discriminant in -dimensional subspace of
hastime complexity, which is drastically lower than
. The pseudocode of this practical approximation algo-
rithm of KFD for context quantization is presented in Fig. 4.
We build the context quantizer in three steps. In the first step,
we apply the DP algorithm to design MCECQ in the probability
simplex space. This produces the MCECQ cells
stitute the input classes of KFD. In the second step, we map
back toin the context space, and use the KFD to find a
projection direction in
(corresponding to a curve in the con-
text space) in which MCECQ cells
ration. In the final step, we compute all projection values of
training contexts and put them into a sorted list. Since each
class in the projection direction is, in general, not convex, in
order to make the underlying classification problem tractable
and, more importantly, make the quantizer mapping function
simple, the DP is used again to construct a convex partition
of the projection subspace that minimizes the conditional en-
is given by
, instead of in the whole
have maximum sepa-
Fig. 4.Pseudocode of context quantization by KFD.
Once the KFD context quantizer is designed, the decoder can
map a raw context
to a coding state
in entropy decoding
V. EXPERIMENTAL RESULTS
We implemented the proposed context quantizers and eval-
uated them in DPCM predictive lossless coding of gray scale
images. The prediction residuals are coded by binary arithmetic
coding that uses context states optimized by the proposed
algorithms. The binary random variables to be coded are
the binary decisions in resolving the value of the prediction
residual. In particular, we are interested in two binary sources:
the signs of DPCM prediction errors on grey scale images,
and the least significant bits of the DPCM prediction errors.
These binary sources are among the most difficult to compress
with their self entropy being maximum (1 bit per sample) and,
thus, present great challenges to context-based entropy coding.
XU et al.: CONTEXT QUANTIZATION BY KERNEL FISHER DISCRIMINANT 175
Consequently, they serve as good, demanding test cases for the
performance of different context quantizers.
The causal context in which the current pixel
consists of three gradients in a local window as
modeling is because they capture the variance and signal the
presence of edge structures in the image signal while keeping
the dimensionality of the feature space low. We did not use
higher order context models to avoid overfitting in the coding
phase. Even this three-dimensional feature space generates a
very large number of raw contexts, namely
. A scalar pre-
is used to reduce the number of raw contexts to a manageable
( was chosen to be 6 in our experiments).
Since the gradient is the difference of adjacent samples, it obeys
geometrical distribution for natural images. The above scalar
prequantization merges the raw contexts into equally probable
The training set of raw contexts was generated out of 23 im-
ages that were samples from two archives of benchmark gray
scale images on the Internet , . The test set consisting
to construct the kernel discriminants for the
respectively, which can be estimated by applying the cross-val-
idation ,  estimation of the minimized misclassification
rate or desirable minimum conditional entropy. Either the en-
coding or decoding of each binary symbol by a KFD context
quantizer needs projecting a context to the discriminant direc-
time according to (8). Thus, the encoding or de-
coding complexity of a KFD context quantizer is
is the length of input sequences.
We compare three context quantizers of Fisher discriminant
type reviewed and developed in this paper. Namely, LFD-I:
the two-class LFD scheme of ; LFD-II: the multiclass LFD
scheme discussed in Section III-A; and KFD: the MCECQ
design algorithm based on KFD developed in Section III-B and
Section IV. All the three context quantizer design algorithms
output convex quantizer cells in the context space with simple
quantizer mapping functions. As a performance benchmark,
we also include the ideal results, i.e., the conditional entropy
rates of the MCECQ quantizer in the probability simplex space,
against which the testing results of the three practical methods
are measured. These rates were obtained by MCECQ designed
for the sample statistics of each individual test image. Clearly,
these rates serve as a theoretical lower bound with respect to the
context model in question, since they are the best achievable in
sign of DPCM error pixel in bits/sample.
Average bit rates achieved by the four context quantizers on coding the
least significant bit of DPCM error pixel in bits/sample.
Average bit rates achieved by the four context quantizers on coding the
the ideal situation when the training data and input image have
identical statistics and as though the quantizer mapping func-
tion, regardless how complex, could be precisely implemented
Figs. 5 and 6 plot the average bit rates achieved by the three
MCECQ design methods in the context space, LFD-I, LFD-II,
and KFD, on coding the sign and the least significant bit of
DPCM errors for the ten test images. The DPCM errors are
generated by the median predictor used by JPEG-LS. The bit
rates are presented as functions of the number of context quan-
tizer cells. As lower bounds for the achievable bit rates by any
176 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2006
BIT RATES OF SIGNS OF DPCM ERRORS FOR DIFFERENT METHODS
BIT RATES OF LEAST SIGNIFICANT BITS OF
DPCM ERRORS FOR DIFFERENT METHODS
convex partition of the context space, we also include in the fig-
ures the corresponding average conditional entropy rates of op-
timal MCECQsdesignedintheprobability simplexspace asex-
plainedabove.Itcanbe observedfrom ourexperimentalresults,
as expected, that LFD-II outperforms LFD-I, and KFD outper-
has higher discriminating power than the other two with its ca-
pability of forming more complex quantizer cells. In fact, the
KFD method achieves the bit rates that are less than 0.5% away
from the lower bound.
We apply the three context quantizers designed from the
training set to encode the signs and the least significant bits of
DPCM errors from ten test images outside of the training set.
All three context quantizers have 12 cells; in other words, the
conditional entropy coding is carried out in 12 coding states.
Tables I and II show the actual code lengths obtained by the
three context quantizers. Not surprisingly, the KFD, in general,
outperforms the two linear ones.
Table III presents the lossless bit rates of the ten gray-level
test images achieved by adaptive binary arithmetic coding that
uses the modeling contexts designed by the proposed MCECQ
methods for each binary decision. As references in comparison,
the bit rates of the JPEG-LS lossless image-coding standard
are also listed in the table. The comparison is fair and mean-
ingful because JPEG-LS uses the same context template as in
our experiments but it employs a heuristic context quantization
scheme . Since an alternative method for lossless coding of
grayscale images is to code each bitplane using a high-order bi-
nary context as in JBIG, we also include in Table III the lossless
BIT RATES OF LOSSLESS IMAGE COMPRESSION BY DIFFERENT METHODS
bit rates obtained by JBIG standard. The proposed KFD-based
contextquantizer outperforms all other methods consistently on
each test image, albeit its improvement over JPEG-LS is quite
the heuristic context quantizer of JPEG-LS is already very good
compared with a heavily optimized one. We envision this work
to be a useful algorithmic tool to evaluate the quality of more
practical context quantizers.
discriminant and the KFD. We succeeded in approaching the
lower bound of the achievable bit rates with a practical imple-
tion rather than a large lookup table.
 J. Rissanen, “A universal data compression system,” IEEE Trans. Inf.
Theory, vol. 29, no. 5, pp. 656–664, Sep. 1983.
 F. Willems, Y. Shtarkov, and T. Tjalkens, “The context-tree weighting
method: basic properties,” IEEE Trans. Inf. Theory, vol. 41, no. 3, pp.
653–664, May 1995.
 N. Ekstrand, “Lossless compression of grayscale images via context
tree weighting,” in Proc. IEEE Data Compression Conf., Apr. 1996, pp.
 M. Arimura, H. Yamamoto, and S. Arimoto, “A bitplane tree weighting
method for lossless compression of gray scale images,” IEICE Trans.
Fundamentals, vol. E80-A, no. 11, pp. 2268–2271, Nov. 1997.
 M. Mrak, D. Marpe, and T. Wiegand, “A context modeling algorithm
and its application in video compression,” in Proc.Int. Conf. Image Pro-
cessing, Barcelona, Spain, Sep. 2003, pp. 845–848.
 Coded Representation of Picture and Audio Information—Progressive
Bi-Level Image Compression, ISO/IEC Draft International Standard
11544, Apr. 1992.
 X. Wu, “Context quantization with fisher discriminant for adaptive em-
bedded wavelet image coding,” in Proc. IEEE Data Compression Conf.,
Mar. 1999, pp. 102–111.
 X. Wu, P. A. Chou, and X. Xue, “Minimum conditional entropy context
 S. Forchhammer, X. Wu, and J. D. Andersen, “Lossless image data se-
quence compression using optimal context quantization,” IEEE Trans.
Image Process., vol. 13, no. 4, pp. 509–517, Apr. 2004.
 X. Wu and N. Memon,“Context-based, adaptive, lossless image codec,”
IEEE Trans. Commun., vol. 45, no. 4, pp. 437–444, Apr. 1997.
 D. Taubman, “High performance scalable image compression with
EBCOT,” IEEE Trans. Image Process., vol. 9, no. 7, pp. 1158–1170,
XU et al.: CONTEXT QUANTIZATION BY KERNEL FISHER DISCRIMINANT 177
 J. Chen, “Context modeling based on context quantization with applica-
tion in wavelet image coding,” IEEE Trans. Image Process., vol. 13, no.
1, pp. 26–32, Jan. 2004.
clustering with application to bi-level image coding,” in Proc. Int. Conf.
Image Processing, Oct. 1998, pp. 508–511.
 A. Gersho and R. M. Gray, Vector Quantization and Signal Compres-
sion.New York: Kluwer, 1992.
IEEE Trans. Computers, vol. 3, no. 24, pp. 281–289, 1975.
 S. Mika, G. Rätsch, J. Weston, B. Schölkopf, and K.-R. Müller, “Fisher
discriminant analysis with kernels,” in Proc. IEEE Workshop on Neural
Networks for Signal Processing IX, Aug 1999, pp. 41–48.
 S. Mika, G. Rätsch, J. Weston, B. Schölkopf, A. J. Smola, and K.-R.
Müller,“Invariantfeature extractionand classificationin kernelspaces,”
in Advances in Neural Information Processing Systems 12, S. A. Solla,
T. K. Leen, and K.-R. Müller, Eds.
 S. Mika, G. Rätsch, and K.-R. Müller, “A mathematical programming
approach to the Kernel Fisher algorithm,” in Advances in Neural Infor-
Eds.Cambridge, MA: MIT Press, 2001, pp. 591–597.
 S. Mika, G. Rätsch, J. Weston, B. Schölkopf, A. Smola, and K. R.
Müller, “Constructing descriptive and discriminativenonlinear features:
Rayleigh coefficients in Kernel feature spaces,” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 25, no. 5, pp. 623–633, May 2003.
 S. Mika, A. J. Smola, and B. Schölkopf, “An improved training algo-
rithm for kernel fisher discriminants,” in Proc. 8th Int. Workshop on Ar-
tificial Intelligence and Statistics, San Francisco, CA, 2001, pp. 98–104.
 G. Baudat and F. Anouar, “Generalized discriminant analysis using a
kernel approach,” Neur. Comput., vol. 12, no. 10, pp. 2385–2404, Oct.
in Proc. 1st Int. Workshop on Pattern Recognition With Support Vector
Machine, Aug. 2002, pp. 24–39.
 B. Schölkopf, S. Mika, C. J. C. Burges, P. Knirsch, K.-R. Müller, G.
Rätsch, and A. J. Smola, “Input space vs. feature space in kernel-based
 A. J. Smola and B. Schölkopf, “Sparse greedy matrix approximation for
machine learning,” in Proc. 17th Int. Conf. Machine Learning, Stanford,
CA, Jun. 2000, pp. 911–918.
 [Online]. Available: http://links.uwaterloo.ca/bragzone.base.html
 [Online]. Available: http://www.cipr.rpi.edu/resource/stills/index.html
 G. C. Cawley and N. L. C. Talbot, “Efficient leave-one-out cross-valida-
tion of kernel fisher discriminant classifiers,” Pattern Recognit., vol. 36,
no. 1, pp. 2585–2592, Nov. 2003.
 G. Fung, M. Dundar, J. Bi, and B. Rao, “A fast iterative algorithm for
fisher discriminant using heterogeneous kernels,” presented at the 21st
Int. Conf. Machine Learning, Banff, AB, Canada, Jul. 2004.
Cambridge, MA: MIT Press, 2000,
 Information Technology—Lossless and Near-Lossless Compression of
Continuous-Tone Still Images, ISO/IEC Final Draft International Stan-
dard FDIS14495-1, 1998.
from Nankai University, Tianjin, China, in 1991 , the
of Joensuu, Joensuu, Finland, in 2005.
His research interests include medical pattern recog-
nition and image compression.
Xiaolin Wu (M’88–SM’96) received the B.Sc.
degree in computer science from Wuhan University,
Wuhan, China, and the Ph.D. degree in computer
science from the University of Calgary, Calgary, AB,
Canada, in 1982 and 1988, respectively.
He is currently a Professor in the Department
of Electrical and Computer Engineering, Mc-
Master University, Hamilton, ON, Canada, and
a Research Professor of computer science at the
Polytechnic University, Brooklyn, NY, and holds
the NSERC-DALSA Research Chair in digital
cinema. His research interests include image processing, multimedia coding
and communications, data compression, and signal quantization, and he has
published over 100 research papers in these fields.
Pasi Fränti received the M.Sc. and Ph.D. degrees in
computer science from the University of Turku, Fin-
land, in 1991 and 1994, respectively.
From 1996 to 1999, he was a Postdoctoral
Researcher with the Academy of Finland. Since
2000, he has been a Professor with the University
of Joensuu, Joensuu, Finland. His primary research
interests are in image compression, vector quantiza-
tion, and clustering algorithms.