Page 1

Electronic copy available at: http://ssrn.com/abstract=1107815

1

Measures of fit in multiple correspondence analysis of crisp and

fuzzy coded data

Zerrin Aşan1 and Michael Greenacre2

1Department of Statistics, Anadolu University, Eskişehir, Turkey

Email: zasan@anadolu.edu.tr

2Department of Economics and Business, Universitat Pompeu Fabra, Barcelona, Spain

Email: michael@upf.es

Abstract

When continuous data are coded to categorical variables, two types of coding are possible: crisp

coding in the form of indicator, or dummy, variables with values either 0 or 1; or fuzzy coding where

each observation is transformed to a set of “degrees of membership” between 0 and 1, using co-called

membership functions.

It is well known that the correspondence analysis of crisp coded data, namely multiple correspondence

analysis, yields principal inertias (eigenvalues) that considerably underestimate the quality of the

solution in a low-dimensional space. Since the crisp data only code the categories to which each

individual case belongs, an alternative measure of fit is simply to count how well these categories are

predicted by the solution. Another approach is to consider multiple correspondence analysis

equivalently as the analysis of the Burt matrix (i.e., the matrix of all two-way cross-tabulations of the

categorical variables), and then perform a joint correspondence analysis to fit just the off-diagonal

tables of the Burt matrix – the measure of fit is then computed as the quality of explaining these tables

only.

The correspondence analysis of fuzzy coded data, called “fuzzy multiple correspondence analysis”,

suffers from the same problem, albeit attenuated. Again, one can count how many correct predictions

are made of the categories which have highest degree of membership. But here one can also defuzzify

the results of the analysis to obtain estimated values of the original data, and then calculate a measure

of fit in the familiar percentage form, thanks to the resultant orthogonal decomposition of variance.

Furthermore, if one thinks of fuzzy multiple correspondence analysis as explaining the two-way

associations between variables, a fuzzy Burt matrix can be computed and the same strategy as in the

crisp case can be applied to analyse the off-diagonal part of this matrix.

In this paper these alternative measures of fit are defined and applied to a data set of continuous

meteorological variables, which are coded crisply and fuzzily into three categories. Measuring the fit

is further discussed when the data set consists of a mixture of discrete and continuous variables.

Key words: data coding, defuzzification, fuzzy coding, indicator matrix, joint correspondence

analysis, measure of fit, multiple correspondence analysis, Burt matrix.

Acknowledgments

The first author thanks Sevil Sentürk for assistance and useful comments. The second author thanks

the Fundación BBVA for financial support in this research, as well as the Spanish Ministry of

Education and Science for partial support in the form of grant MEC-SEJ2006-14098.

2

1. Introduction

Multiple correspondence analysis (MCA) is the correspondence analysis (CA) of a data set of

categorical variables that are coded as zero-one (dummy) variables in an indicator matrix (also called

“logical coding”) or as a matrix composed of all two-way contingency tables called a “Burt matrix”

(see, for example, Benzécri (1973) or, for a recent account, Greenacre and Blasius (2006)). In CA the

eigenvalues, or principal inertias, expressed as percentages relative to their total, are used as a measure

of fit, but in MCA it is well known that these percentages give pessimistic estimates of the quality of

the solution. For example, Lebart (1984, 2006), states that in MCA the “percentages of variance are

misleading measures of information”. The main issue here is to consider the type of data entering the

CA algorithm and whether it makes sense to measure the fit in terms of the usual reconstruction of

these data by the low-dimensional solution. In the case of the input indicator matrix, it seems obvious

that we are not interested in an exact reconstruction of the zeros and ones in the data, so to measure the

fit in this way makes little sense. An alternative that seems more appropriate to the discrete zero-one

data would be to measure how well the MCA solution predicts the categories to which each case

belongs. In the case of the input Burt matrix, the reason for the low percentages is clear when we

consider what is included in “total inertia”, which constitutes the denominator in calculating the

measure of fit. Down the diagonal of the Burt matrix are cross-tabulations of each variable with itself,

inflating the total inertia by amounts that are firstly, without any interest, and secondly, impossible to

explain in a low-dimensional solution. Greenacre (1988, 1991, 2007: chap. 19) presented an

alternative way of performing CA in this situation, leading to much higher percentages of inertia,

because only the “interesting” inertia in the off-diagonal tables of the Burt matrix is being explained.

This approach, called joint correspondence analysis (JCA), contains simple CA (i.e., with two

variables) as an exact special case, and so provides a more natural generalization of the bivariate

problem to the multivariate one.

The present work extends these ideas to the CA of a fuzzy-coded matrix, which is a generalization of

the “crisp” zero-one indicator matrix that allows the coded values to be “fuzzy”, i.e., to be real

numbers between 0 to 1 (inclusive) for each category, while maintaining the property that the coded

values for each variable sum to 1. To our knowledge, fuzzy coding was introduced into the French

literature on CA in 1977 by Guitonneau and Roux (1977), while van Rijckevorsel (1988) attributes the

idea originally to the doctoral thesis of Bordet (1973). The idea has been entering into different fields

of application of multivariate methods, for example see Chevenet (1994) for an application of fuzzy

coding in an ecological context.

We use the term “fuzzy MCA” for the CA of the fuzzy coded matrix, and we will show that the

problem of pessimistic percentages of explained inertia also occurs in fuzzy MCA, albeit attenuated.

Page 2

Electronic copy available at: http://ssrn.com/abstract=1107815

3

Various alternative measures of fit will be discussed – these all measure how the CA solution recovers

the data in different ways. The standard way is to measure fit to the input data, whatever these data

may be. The alternatives look at the nature of the input data and define the measure of fit in terms of

recovering what is considered to be the “interesting” part of the data. We will be concerned mainly

with the following three alternatives:

1. Suppose we are interested in simply predicting the categories to which each case belongs or for

which it has highest degree of membership, rather than the exact values 1 and 0 (crisp coding) or

the degrees of membership themselves (fuzzy coding). Then the measure of fit can be calculated

as the percentage of categories correctly predicted (Sections 4.1 and 4.2)

2. Suppose we are interested in predicting the original continuous variables. We can estimate the

continuous variables from the MCA solution, which is inherently fuzzy in both crisp and fuzzy

analyses. Then a measure of fit is calculated between these estimates and the original data

(Sections 4.3 and 4.4)

3. Suppose we are interested in the two-way relationships summarized in the Burt matrix, based on

crisp or fuzzy coded data. Then we can use JCA, already existent for crisp MCA, to explain the

off-diagonal tables which cross-tabulate distinct pairs of variables, ignoring the cross-tabulations

of each variable with itself (Section 4.5 and 4.6).

The proposed approaches will be illustrated using meteorological data from 40 cities of Turkey. We

first describe the coding of these data in Section 2 and various “standard” results of analysing these

data in Section 3, using principal component analysis (PCA), MCA and fuzzy MCA. Then we will

describe the properties of each of the above measures of fit in Section 4, with a discussion in Section 5

where we also giving some guidelines as to the circumstances when one will be more preferable than

the others, and some proposals for the analysis of mixed continuous–categorical data.

4

2. Continuous data and their recoding for CA

Table 1 contains average values of five meteorological variables for 40 cities of Turkey, based on

measurements taken in 2004 (Turkey Statistical Yearbook, 2004) – note that we use this example as a

convenient illustration of our approach rather than a substantive meteorological application.

There are several standard ways in the CA literature to recode continuous data in a form suitable for

CA, for example re-expressing each variable relative to its range, or replacing data values by their

ranks (see Greenacre, 2007: chapter 23); here we consider the discretization of the continuous scales

into a set of categorical ones. In other words, suppose we define three categories for each variable,

which we could call “low” (1), “middle” (2) and “high” (3), then we can replace each value by its

category number and use MCA to analyse the data. For each variable two boundary points need to be

chosen in order to divide the scale into three intervals. These boundary points can be chosen in

different ways, and thanks to the principle of distributional equivalence in CA (see, for example,

Greenacre, 2007: pages 37–38), the particular choice is not so critical. Clearly this recoding of the

data loses a lot of information, so an alternative fuzzy coding can be performed which conserves more

of the information in the data while still maintaining the low-middle-high idea. Again, there are

several ways of coding the data coding into fuzzy categories, called fuzzification (as opposed to

discretization), described in Loslever and Bouilland (1999) and Murtagh (2005), for example. We

have chosen the system of so-called “three-point triangular membership functions” shown in Figure 1

(also called piecewise linear functions, or second order B-splines – see van Rijckevorsel (1988) for a

theoretical account of this topic). The three values of the minimum, median and maximum of the

variable are used as so-called “hinge” points to define triangles which convert the continuous scale

into values on a 0 to 1 scale such that the membership values sum to 1. The first and last triangles are

“shouldered”, with maxima of 1 at the continuous variable’s minimum and maximum values

respectively, while the middle triangle has a maximum of 1 at the median. The definition of the set of

triangular membership functions is thus as follows, for the input continuous variable x which has

minimum value m1, median m2 and maximum value m3:

>

−

−

=

>

−

−

≤

−

−

=

≤

−

−

=

otherwise 0

for ,

)(

otherwise 0

for ,

for ,

)(

otherwise 0

for ,

)(

2

23

2

32

23

3

2

12

1

2

2

12

2

1

mx

mm

mx

xzmx

mm

xm

mx

mm

mx

xz

mx

mm

xm

xz

These functions give the three degrees of membership that code the variable x into the three fuzzy

categories. Notice that a particular input value will belong to no more than two fuzzy categories, and

that the membership degrees sum to 1. Alternatives to triangular membership functions can be

(1)

Page 3

5

trapezoidal, Gaussian and generalized bell (or Cauchy) membership functions, which have various

theoretical advantages (see, for example, Jang, Sun and Mizutani, 1997; Senturk, 2006), and we could

also consider more than three fuzzy categories. These specific aspects about fuzzy coding have been

dealt with extensively in the literature – see, for example, Zhou, Purvis and Kazabov (1997) and

Verkuilen (2005) for a discussion of choice of membership functions (the latter reference gives a more

statistical treatment of the subject and shows connections to psychometric scaling). Here we have

chosen one of the simplest forms of coding for illustrative purposes, since our purpose is to deal with

properties of the CA of such data.

The fuzzy coded data are given in Table 2. To compare this with crisp coded indicator data, we

constructed a table consisting of zero-one dummies where two out of the three categories are zeros and

the value of one is assigned to the category with highest membership value according to Table 2 – that

is, the boldface values in Table 2 are replaced by 1s and the rest 0s.

6

3. PCA, MCA and fuzzy MCA

We analyse the three forms of the data in Tables 1, 2 and the crisp data to make an initial comparison

of their results, focusing on the value and interpretation of the standard measure of fit in each case.

3.1 Principal component analysis (PCA)

Figure 2 shows the principal component analysis (PCA) of Table 1, where the data have been

standardized, and where the map has been scaled to be a biplot (see, for example, Gower and Hand,

1996). The map explains 75.6% of the variance, but what exactly does this measure of fit mean?

Taking the view of PCA as a method of matrix approximation, we can think of Figure 2 as a way of

approximating the standardized data. Suppose that the original data are denoted by the 40×5 matrix X

= [xij], and the standardized data by Y = [yij], with elements

jj ij ij

sxxy

/ )(

−=

, where

jj

sx

and

are

the mean and standard deviation respectively of the j-th column variable. Then the PCA of Y is based

on the singular-value decomposition (SVD):

IVVUU

V UDY

===

TTT

where

1

α

n

(2)

where n = 40. The biplot of Figure 2 is constructed using the first two columns of UDα and √nV ,

denoted by F (40×2) and Γ Γ Γ Γ (5×2) respectively, so that approximated standardized values

ij yˆ can be

computed by calculating all 200 scalar products between the two sets of points:

2211

ˆ

y

jiji ij

ff

γγ+=

,

i=1,…,40; j=1,…,5, i.e.

T

FΓ

Y =

ˆ

This is usually thought of geometrically as a projection of the row

(city) points onto the vectors defined by the variables , with appropriate calibration of the axes which

depends on the length of these vectors (Gabriel and Odoroff, 1990). The total variance of the matrix Y,

with value 5 in this case because there are five variables each with variance 1, can be decomposed into

two parts, thanks to the orthogonality between the solution estimates

ij yˆ and the errors

ij ij

y

ˆ

y

−

:

∑ ∑

i

∑ ∑

i

∑ ∑

i

−+=

j

ijij

j

ij

j

ij

y

ˆ

y

n

y

ˆ

n

y

n

222

)(

1

1

1

(3)

i.e., in our example 5 = 3.782 + 1.218; hence the figure of 75.6% is the explained part (3.782) of the

decomposition relative to the total, whereas the unexplained, or error, part (1.218) is 24.4% of the

total. Literally, 75.6% tells us how close the map is to the standardized data in the least-squares sense.

Putting this another way by using an analogy with regression analysis, the two principal axes of the

map are orthogonal latent variables predicting the data in Y, the first axis explaining 43.3% of the

variance, and the second an additional 32.3%, that is a coefficient of determination R2 = 0.756. The

explained variance of 3.782 can be further decomposed into parts for the five variables in order to see

Page 4

7

which variables are explained better than others: 3.782 = 0.831 + 0.793 + 0.620 + 0.801 + 0.737 (for

perfect explanation of a variable, its explained part would be 1).

3.2 Multiple correspondence analysis (MCA)

Figure 3 shows the CA of the crisp data, in other words the MCA of the indicator matrix Z (n×Q),

where Q = 5 is the number of variables. This analysis is based on the SVD of the centred and scaled

matrix (see, for example, Greenacre, 2006, p. 52):

IVVUU

V UDDD 11Z

===−

−

TTTT

where)

1

n

(

1

2 / 1

α

nQ

(4)

where D is the diagonal matrix of column masses cj, i.e., the marginal frequencies of each category

relative to their grand total of nQ = 200. The asymmetric map in Figure 3 is constructed from

coordinates in the first two columns of √nUDα and D-1/2V, again denoted by F (40×2) and Γ Γ Γ Γ (15×2).

The two principal axes explain 26.2% and 19.7% of the inertia, totalling 45.9% for the two-

dimensional map. This is much less than the PCA, and once more we have to ask what this measure

of quality really means. Again, this is a measure of how well the data are reconstructed from the map,

but the data themselves are just zeros and ones whereas the reconstructed data from the MCA map are

real numbers. Data are again reconstructed using scalar products between these two sets of points as

follows (see, for example, Greenacre, 2007, page 101):

) 1 (

j

ˆ

z

2211

jiji ij

ff Qc

γγ++=

, or in matrix terms:

DFΓ

11

(

Z

ˆ

)

TT+= Q

. (5)

For example, the reconstructed data for the variable “Sun” in the city of Adana can be calculated from

the two-dimensional coordinates of Sun1, Sun2 and Sun3 and Adana to be 0.0622, 0.5630 and 0.3748

(notice that these approximated values sum to 1 – they are, in fact, fuzzy). The explained variance of

45.9% again measures how close the approximations [ 0.0622 0.5630 0.3748 ] are to the “observed”

indicator values [ 0 1 0 ], evaluated globally over all entries in Table 3. It is now understandable why

the measure of fit is doomed to be low, because the reconstructed data will always be fuzzy while the

original data is discrete.

MCA is often defined alternatively as the CA of the Burt matrix, the matrix composed of all two-way

cross-tabulations of the variables: B = ZTZ. In our example this will be a 15×15 matrix formed of 25

3×3 tables cross-tabulating all pairs of the five categorical variables. The standard coordinates of the

variables, i.e. the coordinates of the categories in Figure 3, are identical in this analysis, but the

eigenvalues are the squares of those of the indicator matrix, leading to an apparent improvement in the

percentages of inertia explained. In our example the percentages of inertia in two dimensions are now

8

43.7% and 24.9%, giving a total of 68.6%. This way of measure of fit can still be criticized as being

low, however, due to an artefact of the Burt matrix B (Greenacre, 1991). The tables down the

diagonal of the B, which cross-tabulate each variable with itself, have high inertias and these

artificially inflate the total inertia of B – in fact, in this example these diagonal tables account for

63.8% of the inertia of B. Percentages of explained inertia are thus low because the denominator in

the calculation is artificially high, and a way to resolve this is to avoid fitting these diagonal blocks of

B (see Sections 4.5 and 4.6 later).

3.3 Fuzzy MCA

Figure 4 shows the CA of the fuzzy coded data of Table 2, or fuzzy MCA. The theory here is identical

to that of MCA given in Section 3.2, simply replacing Z in (4) by the fuzzy coded matrix Z*, and

recomputing the column margins whose relative values define the diagonal matrix D. Several

properties of crisp MCA carry over to fuzzy MCA:

(a) Each set of categories has weighted average at the origin of the map, because the sum of each

set of fuzzy coded values is also 1.

(b) As in any CA, the row and column sums of the approximated data, reconstructed from the

low-dimensional solution (using (5) in the present context), are the same as the original data.

In addition, in the reconstructed fuzzy data, each set of membership levels sums to 1; as in the

case of crisp MCA, these sets of values are not necessarily all positive.

(c) A fuzzy Burt matrix can also be computed as B* = Z*TZ*: the CA of B* also has identical

standard (row or column) category coordinates to those (columns) of Z*, and the principal

inertias of B* are equal to the squares of those of Z*.

Since the fuzzy coded matrix Z* has less zeros than the indicator matrix, its total inertia will be lower

– in the example it is 1.025, compared to the value of exactly 2 for Z‡. Since both the input data and

the approximated values from the fuzzy MCA map are fuzzy, it is to be expected that fuzzy MCA will

do better in the reconstruction of data, in terms of the usual measure of fit. In the present example the

percentage of inertia explained in the two-dimensional map is 55.0%, about 9 percentage points higher

than the MCA of the crisp indicator matrix. Notice how much better separated the cities are from one

another in Figure 4 compared to Figure 3 – the cities that coincided in Figure 3 are now separated.

But our thesis is that the value of 55.0% variance explained is still artificially low, and can be

improved by reconsidering the way this quantity is measured in fuzzy MCA.

‡ The inertia of an indicator matrix constructed from Q variables that give rise to J categories is equal to

(J–Q )/Q (see Benzécri (1973) or Greenacre (1984) for a proof).

Page 5

9

4. Measures of fit in crisp and fuzzy MCA

In this section we consider three alternative measures of fit introduced in Section 1, and apply each of

them to crisp and fuzzy MCA respectively.

4.1 Predicting categories: crisp MCA

In regular (crisp) MCA, the coding indicates the categories of each variable to which each individual

(a city in our application) belongs. An obvious measure of the quality of the solution is to count how

many of these categories are correctly predicted. For example, from the approximated values

[ 0.0622 0.5630 0.3748 ] given in Section 3.2 for the city of Adana, we would predict that Adana is

in category Sun2, which is a correct prediction and in this sense without error. Performing this

calculation for the whole table we obtain 171 correct predictions out of the total of 200, giving a

prediction accuracy of 85.5%, much higher than the usual measure of fit of 44.1% from the MCA of

the indicator matrix. The process of converting the fuzzy values in the approximated vector to discrete

ones, for example converting [ 0.0622 0.5630 0.3748 ] to [ 0 1 0 ], is called “defuzzification”, in this

case defuzzifying to discrete indicator data. Hence, once the approximated values from the MCA map

are defuzzified, the quality of the map is 85.5% as far as predicting the categories of the continuous

variables is concerned. The quality is much lower if we defuzzify the results of the MCA back to

estimations of the original data, as we shall describe in Section 4.3.

4.2 Predicting categories: fuzzy MCA

Taking the example of sunshine in Adana again, the fuzzy coded data are (from Table 3)

[ 0 0.634 0.366 ], while the approximated data are computed from the coordinates in Figure 4, using

the same reconstruction formula (5) as in crisp MCA, to be [ 0.112 0.393 0.496 ]. Thepercentage

55.0% computed in Section 3.3 is an evaluation of how close the approximations are to their original

fuzzy values, computed over the whole data set. If we now had to predict the category with highest

membership value (sunshine category Sun2), then we would be wrong: the highest reconstructed value

is 0.496 for category Sun3. We counted the correct category predictions using the fuzzy MCA map in

this way, and obtained 168 correct predictions out of 200, that is an accuracy of 84.0%, slightly less

than in the crisp case. In other words, defuzzifying (to discrete indicator values) both the fuzzy data

and their approximations, we obtain a prediction accuracy of 84.0%. Although the fuzzy MCA is

approximating the fuzzy data better (55.0% inertia explained, compared to 45.9% for crisp MCA), it is

giving slightly less accuracy in predicting the categories with highest degree of membership – this is

intuitively understandable because the indicator matrix codes these categories crisply, so should do

better in predicting them.

10

4.3 Estimating original data: crisp MCA

Since the approximations we obtain from crisp and fuzzy MCA solutions are fuzzy, adding up to 1 in

each case, we can defuzzify them to estimate the original variables. For example, consider again the

approximation of sunshine in Adana from the crisp MCA solution: [ 0.062 0.563 0.375 ]. Using

these values to weight the minimum (4.14), median (7.10) and maximum (8.33) hours of sunshine, we

obtain the estimate: 0.062×4.14 + 0.563×7.10 + 0.375×8.33 = 7.38, compared to the original value of

7.55. The rationale for computing the estimate in this way is that this weighted average calculation is

exactly the way we invert the fuzzification of the data according to the triangular membership

functions (see the next section).

We need to standardize the data and their estimations to be able to measure the fit, as we did in PCA.

The problem with this defuzzification is that the estimated values do not have the same variable means

as the original data, neither are the errors orthogonal to the estimated values, so we do not obtain a

decomposition of the form (3). The best we can do in this situation is to calculate a stress measure in

the form of the sum-of-squared errors, which is equal to 2.25 in this case, and then express it relative

to the total variance of 5, i.e., 44.5% stress. This error percentage is still better than the unexplained

inertia of 54.1% in the case of crisp MCA (its explained inertia was 45.9%), even though one would

not expect the crisp MCA to recover the original data well when only coarse zero-one indicator data

have been fitted. This is another demonstration that the explained variance in MCA is pessimistic,

even by this very conservative measure of error percentage. We would, however, expect the fuzzy

MCA to perform better on this criterion, as explained next, because the input fuzzy data code the

continuous data precisely.

4.4 Estimating original data: fuzzy MCA

In our application the fuzzy coding of the data is a 1-to-1 mapping, thanks to the use of triangular

membership functions, and the process can be reversed, that is, defuzzified back to the original data.

Using the same example as in the previous paragraphs, Adana’s original sunshine value of 7.55 can be

recovered back from the fuzzy values as the centroid of the minimum, median and maximum values,

using its fuzzy values as weights: 7.55 = 0×4.14 + 0.634×7.10 + 0.366×8.33 (this is a direct result of

(1)). In the same way, we can obtain a corresponding estimate from the fuzzy approximation

[ 0.112 0.393 0.496 ]: 0.112×4.14 + 0.393×7.10 + 0.496×8.33, which is equal to 7.39. After

performing this defuzzification for all 200 data values, several interesting results emerge:

1. The defuzzified estimates do have the same (column) means as the original data, and these

means are equal to weighted averages, like those above, using the means of the columns of Z*

as weights (the column means of Z* are the same as those of Z). For example, the means of

the first three columns of Z* corresponding to temperature are 0.191, 0.613 and 0.196:

Page 6

11

defuzzified this gives a value of 0.191×4.14 + 0.613×7.10 + 0.196×8.33 = 6.78, which is the

average temperature in the original data.

2. As in PCA, the estimated data, obtained by defuzzification, are orthogonal to the “error”

differences between the original and estimated data.

3. As a consequence of 2., if the estimated data are standardized in the same way as the original

data, i.e., using the original standard deviations, then the total variance of the five variables,

equal to 5, does decompose exactly into two components, as in the variance decomposition (3)

of PCA.

(the proofs of these three results are given in the Appendix). The decomposition of variance in our

example, in the format of (3), is as follows: 5 = 3.471 + 1.529, in other words, 69.4% of the variance

in the original standardized data is being explained. Again, we can decompose the explained variance

into parts for each variable, as in PCA: 3.471 = 0.758 + 0.749 + 0.502 + 0.815 + 0.646, showing that

variable 3 (precipitation) is the least well-represented variable.

4.5 Approximating two-way interactions: crisp JCA

In Section 3.2 we found that crisp MCA of the indicator matrix explained 45.9% while the analysis of

the Burt matrix B explained 68.6%. We also noted that 63.8% of the inertia of the B was due to the

tables down the diagonal (called the “diagonal tables”, i.e., the ones that cross-tabulate each variable

with itself). Since we are only interested in the two-way tables that cross-tabulate pairs of distinct

variables (called the “off-diagonal tables”), Greenacre (1988) proposed an alternative algorithm for

fitting only these off-diagonal cross-tables, excluding the diagonal tables. This method, called joint

correspondence analysis (JCA), iteratively replaces the diagonal tables until they are perfectly fitted by

the solution, and hence do not interfere with the fitting of the off-diagonal ones. Greenacre (1998,

p.67) pointed out that the inertia in the diagonal tables of the final JCA solution needs to be discounted

both from the eigenvalues and the total inertia, in order to arrive at the correct measure of fit. This

leads to an explained inertia of 87.5% (see also Nenadić and Greenacre, 1998, p. 532, for a worked

example; Greenacre, 2007, p.207, for the theory; and Nenadić and Greenacre, 2006, for the software).

4.6 Approximating two-way interactions: fuzzy MCA

By analogy with the crisp case, we can also define a fuzzy Burt matrix, to be B* = Z*TZ*, and in an

identical way the CA of B* has principal inertias equal to the squares of those of Z*. This leads to an

improvement in the measure of fit from 55.0% to 80.7%. The diagonal tables of B*, which are now

tridiagonal as a result of the fuzzy coding, still contribute heavily to the total inertia: 57.6% of the

inertia of B* is due to these tables. A “fuzzy JCA” can thus be performed to fit just the off-diagonal

tables, as in the previous section. The inertia of the diagonal tables at the final solution again has to be

12

discounted from the eigenvalues and total inertia, and this leads to an explained inertia of 88.9%. The

two-dimensional JCA solution is given in Figure 5 – the category positions resemble strongly those of

the fuzzy MCA in Figure 4, apart from the usual reduction in scale.

4.7 Summary of results

In Table 3 we summarize the numerical results of the measure of fit for each of the two-dimensional

solutions described above. Recall that the PCA of the original data, standardized, explains 75.6% of

the variance (Section 3.1). In the case of the fuzzy coded data, notice that the “classic” measure of fit

of 55.0% is the lowest for all the variants we have discussed, supporting our thesis that this is a

pessimistic and artificially low estimate of fit. Even the defuzzified solution fits the original data by

69.4%, using the same fit criterion as in PCA – this is almost 15 percentage points higher than the

classic value, and we would say that the fit is at least as good as 69.4%, not 55.0%. The fit can be

judged even higher depending on which alternative view one has of what is being approximated. The

most optimistic value is the fuzzy JCA value of 88.9%, but this measures fit to the two-way

associations as summarized in the fuzzy Burt matrix, not to the raw or recoded data.

5. Discussion and conclusions

It is instructive to compare the fit of the two-dimensional PCA, 75.6%, with that of the defuzzified

solution from the fuzzy MCA, 69.4%. Both of these are measured in the same way, by an orthogonal

decomposition of variance of the original (standardized) data into explained and unexplained parts.

Both have solutions that are two-dimensional and the defuzzified matrix is also mathematically of rank

2. Why then does the fuzzy MCA fare worse even though it uses more parameters? Of course, the

comparison is unfair because PCA will, by definition, optimally recover the original 40×5 data matrix.

But PCA is only capable of explaining linear relationships, and operates in the “small” 5-dimensional

space of the original variables, whereas the fuzzy MCA is operating in the 10-dimensional of the fuzzy

variables§. Putting this another way, fuzzy MCA can reveal both linear and quadratic relationships – it

has potentially more to explain. To make a fairer comparison, we performed a PCA on the 10-column

matrix which included the original data and the quadratic transformations of each variable in five

additional columns – the two-dimensional solution now explained only 53.7% of the total variance,

while looking at just the explanation of the first five columns of original data this PCA explained

63.5% of their variance.

Another way to compare the PCA and fuzzy MCA on a more equal footing is to use only two

membership functions that code just the linear information, also known as doubling in CA (see

§ As in crisp MCA, the dimensionality of a matrix with Q variables and a total of J categories is J–Q. The fuzzy

matrix has J columns but there are Q linear relationships in the columns.

Page 7

13

Murtagh, 2005). The two endpoints m1 (minimum) and m3 (maximum) are used as hinges, and the

membership functions for the “positive” and “negative” doubled variables are simply:

z+(x) = (x– m1)/(m3 – m1) z–(x) = (m3–x)/(m3 – m1) = 1 – z+(x) (6)

These two values sum to one and code how close the data are to the respective endpoints;

defuzzification is achieved as before by weighted averaging: x = z+(x) m3 + z–(x) m1. The fuzzy coded

data are now 5-dimensional and the fuzzy MCA in two dimensions explains 77.2%, actually higher

than the PCA. This is due to the fact that the fuzzy MCA standardizes the data differently from PCA

as a result of the chi-square metric – Greenacre (1984, pp.175–179) calls this standardizing by

“polarization” rather than variance – so the measures of fit are not strictly comparable. If one

defuzzifies the approximations from the fuzzy MCA solution to get standardized estimates of the

original data, as we did in Section 4.4, then the measure of fit is 75.0%, just fractionally lower than the

optimal PCA fit. On the other hand, if we were to standardize the PCA results in the same way as the

fuzzy MCA, then the PCA solution would also give a solution close to, but lower than, 77.2% – each

solution is optimal within its own context.

Our application consists of continuous variables only, but in the French literature the justification for

fuzzy coding has mostly been to permit continuous variables to be analysed jointly with categorical

ones – see, for example, Ghermani, Roux and Roux (1977) and Guitonneau and Roux (1977). The

situation of mixed discrete-continuous data presents the particular problem for defining measures of fit

which take into account in an equitable way the different characteristics of logical and fuzzy coding.

Various approaches are possible. For example, Gower (1971) defines a distance function which

attempts to equalize the contributions of the different variables to the total variance. Escofier (1979)

defines a doubling transformation of continuous data, different from (6) above, which is more suitable

for analysing continuous data jointly with dichotomous categorical data. Escofier and Pages (1990)

consider groups of homogeneous variables, for example the group of continuous variables (in original

form or fuzzified) and the group of categorical variables, standardize them internally using the first

eigenvalue as a surrogate for the table variance, and then proceed to joint analysis. Most of these

approaches can be reduced to a type of reweighting of the variables to equalize in some sense their

contributions to the joint analysis.

We have demonstrated a broad spectrum of measures of fit, and the classic measure based on the

percentages of inertia is shown to be the lowest (55.0%). For small data matrices such as the one

considered in our application, where each individual case (city in this case) is of interest, then the

solution which compares the defuzzified approximations to the original data in a PCA-type measure of

fit, would be the one we propose. We have shown and proved that this gives an orthogonal

decomposition of variance, just as in PCA. For large data matrices where individual cases are not of

14

interest, but we are more interested in interrelationships between variables, then the MCA or JCA of

the fuzzy Burt matrix would give more appropriate measures of fit.

References

Benzécri, J.-P. (1979). Sur le calcul des taux d’inertie dans l’analyse d’un questionnaire: Addendum et

Erratum à (Bin. Mult.). Les Cahiers de l’Analyse des Données, 4, 377–378.

Benzécri, J.-P. (1973). L’Analyse des Données. Tôme II : L’Analyse des Correspondances. Dunod,

Paris.

Bordet, (1973). Etudes de Données Geophysiques. Doctoral thesis (3ème cycle). Université de Paris VI.

Chevenet, F. (1994). A fuzzy coding for the analysis of long term ecological data. Freshwater Biology,

31, 295-209.

Escofier, B. (1979). Traitement simultané de variables qualitatives et quantitatives en analyse

factorielle. Les Cahiers de l’Analyse des Données, 4, 137–146.

Escofier, B. and Pagès, J. (1986). Le traitement des variables qualitatives et des tableaux mixtes par

analyse factorielle multiple. In E. Diday (ed.), Data Analysis and Informatics IV, 179–191.

Amsterdam: North Holland.

Analyse desGabriel, K.R. and Odoroff, C.L. (1990). Biplots in biomedical research. Statistics in

Medecine, 9, 469–485.

Ghermani, B.M., Roux, C. and Roux, M. (1977). Sur le codage logique des données héterogènes. Les

Cahiers de l’Analyse des Données, 1, 115–118.

Gower, J.C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27, 857–

871.

Gower, J.C. and Hand, D. (1996). Biplots. London: Chapman and Hall.

Greenacre, M.J. (1988). Correspondence analysis of multivariate categorical data by weighted least-

squares. Biometrika, 75, 457–467.

Greenacre, M.J. (1990). Some limitations of multiple correspondence analysis. Computational

Statistics Quarterly, 3, 149–256.

Greenacre, M.J. (1991). Interpreting multiple correspondence analysis. Applied Stochastic Models and

Data Analysis, 7, 195–210.

Greenacre, M.J. (1995). Multivariate generalisations of correspondence analysis. In Cuadras, C.M. and

Rao, C.R. (eds), Multivariate Analysis: Future Directions 2, pp. 327–340. Amsterdam: North

Holland.

Greenacre, M.J. (2006). From simple to multiple correspondence analysis. In Greenacre, M.J. and

Blasius, J. (eds), Multiple Correspondence Analysis and Related Methods, pp. 41–76. London:

Chapman & Hall / CRC.

Greenacre, M.J. (2007). Correspondence Analysis in Practice. Second Edition. London: Chapman &

Hall / CRC.

Greenacre, M.J. and Blasius, J. (2006), eds. Multiple Correspondence Analysis and Related Methods.

London: Chapman & Hall / CRC.

Guitonneau and Roux (1977). Sur la taxinomie de genre Erodium. Les Cahiers de l’Analyse des

Données, 1, 97–113.

Page 8

15

Jang, J.S.R., Sun, C.T. and Mizutani, E. (1997). Neuro Fuzzy and Soft Computing: a Computational

Approach to Learning and Machine Intelligence. Prentice Hall, USA.

Loslever, P. and Bouilland, S. (1999). Marriage of fuzzy sets and multiple correspondence analysis:

examples with subjective interval and biomedical signals. Fuzzy Sets and Systems, 107, 255–275.

Murtagh, F. (2005). Correspondence Analysis and Data Coding with R and Java. London: Chapman

& Hall / CRC.

Nenadić, O. and Greenacre, M.J. (2006). Computation of multiple correspondence analysis. In

Greenacre, M.J. and Blasius, J. (eds), Multiple Correspondence Analysis and Related Methods, pp.

523–551. London: Chapman & Hall / CRC.

Nenadić, O. and Greenacre, M.J. (2007). Correspondence analysis in R, with two- and three-

dimensional graphics: the ca package. Journal of Statistical Software 20(3). URL

http://www.jstatsoft.org/v20/i03/.

Sentürk, S. (2006). Fuzzy Logic Approach In Experimental Design. Doctoral thesis, Department of

Statistics, Faculty of Science, Anadolu University, Eskisehir, Turkey.

Turkey Statistical Yearbook (2004). Turkish Statistical Institute, Ankara. URL

http://www.turkstat.gov.tr/yillik/yillik_ing.pdf.

van Rijckevorsel, J.L.A. (1988). Fuzzy coding and B-splines. In J.L.A. van Rijckevorsel and J. de

Leeuw (eds), Component and Correspondence Analysis, 33–54. Chichester: Wiley.

Verkuilen, J. (2005). Assigning membership in a fuzzy set analysis. Sociological Methods and

Research, 33, 462–496.

Zhou, Purvis and Kazabov (1997). A membership selection function method for fuzzy neural

networks. Information Science Discussion Paper Series, 97/15, University of Otago, New

Zealand.

16

APPENDIX

Some theoretical results about CA of fuzzy coded data

Consider the n×Q data matrix X and corresponding fuzzy coded matrix Z*, using the triangular

membership functions (1). In the CA of Z* the row masses are all equal to 1/n and the columns

masses cj in vector c are the column averages divided by Q; D is the diagonal matrix of the column

masses.

From the definition of the triangular membership functions (1) we can write the relationship between

X and Z* as the following linear defuzzification formula:

X = Z*M (7)

where:

M =

Q

m

m

m

m

0000

000

000

00

3

2

1

MOMMM

L

L

LL

The defuzzified approximate values, obtained from the reconstructed values in

∗

Zˆ, are similarly

obtained as

MZX

ˆ

∗

=ˆ

.

The CA of Z* implies the same type of decomposition as for the crisp equivalent Z given in (4),

which can be written (cf. (5)):

DDV UD11

(

Z

)*

2 / 1

−

+=

TT

α

nQ

where UTU = VTV = I

(8)

To estimate the data from a K-dimensional approximation, we use the first K columns of and V,

denoted by U[K] and V[K] respectively, and the first K singular values in diagonal matrix Dα[K]:

DDVDU 11

(

Z

ˆ

)*

2/ 1

−

][][][

+=

T

K

T

KK

nQ

α

(9)

We now prove the results of Section 4.4.

where mq is a vector of the parameters of the

membership function for variable q (in our example

these are the minimum, median and maximum values)

Page 9

17

1. The means x of X are the same as those of Xˆand are equal to the defuzzified averages of the

columns of Z* (or of

∗

Zˆ

)

Proof:

MZ

ˆ

1X1x

*

1

n

1

n

TTT

==

Hence the means x are the defuzzified column means of Z.

Since Z* and

*

ˆZ have the same column means (this is a standard property of CA

approximations), the above proof can be reversed to show that the defuzzified means

of

*

ˆZ , i.e.,

TTT

xX

is ,ˆ

1MZ

ˆ

1

also

1

n

1

n

*

=

, that is the means of X and Xˆare the

same.

2. The deviations between the data and the data estimated by defuzziying the reconstructed data from

fuzzy MCA are orthogonal to these estimates.

Proof:

The result holds first for the fuzzy coded matrix and its estimated values from CA.

Suppose that the subindex [–K] indicates the remaining singular components from the

(K+1)-th onwards, so that for example U = [ U[K] U[–K] ]. Then from (8) and (9)

0

DDVDU 11

(

UDVDZ

ˆ

Z

ˆ

Z

)* *)*(

2 / 1

−

][][][][][][

2 / 12

=+=−

−−−

T

K

TTT

KKKKK

nnQ

αα

because

01U

=

−

T

[ K

]

(the rows have equal masses, so their coordinates have arithmetic

mean zero) and

0

UU

][][

=

−

KK

T

(orthogonality of the singular vectors). The linear

operation of defuzzifying does not change this property:

0

MZ

ˆ

Z

ˆ

ZMX

ˆ

)X

ˆ

X

* *)((

=−=−

TTT

*

3. As a result of 2. the sum-of-squares of X decomposes into two components:

]

ˆˆ

[ trace]

ˆ

(

ˆ

[( trace][trace

TTT

XX)XX)XX XX

+−−=

and this property is maintained for any common centring and standardization of X – in our

application centring is with respect to the common means and standardization with respect to the

standard deviations of the original variables.

The above results generalize to any fuzzy coding for which the defuzzification transformation is

linear, as in (7).

18

Table 1: Averages of five meteorological variables observed in 40 cities of Turkey during 2004: SUN

– daily hours of sunshine; HUM – humidity (%); PRE – precipitation (mm); ALT – altitude (m); MAX

– maximum temperature (°C).

SUN

7.55

7.09

8.33

7.19

7.15

8.28

7.42

6.56

5.49

6.35

7.31

8.00

6.24

6.57

7.05

6.46

8.00

6.23

7.29

7.46

8.06

6.24

6.27

6.12

7.11

7.17

7.29

6.02

7.40

6.35

7.48

4.14

4.46

7.43

6.43

5.40

4.36

8.28

7.43

5.54

HUM

66

64

69

60

70

64

63

70

73

69

73

54

70

60

64

68

60

75

61

60

62

68

70

70

65

63

60

67

54

67

62

77

75

51

64

76

72

49

59

72

PRE

647.1

434.4

993.5

377.7

1124.1

1052.3

857.7

588.5

536.4

696.3

615.4

491.4

585.9

366.8

447.0

373.9

548.8

677.2

581.0

842.0

691.1

533.3

501.2

461.6

374.6

377.8

325.9

564.7

387.5

392.4

1196.3

2300.4

650.3

726.5

417.0

575.4

833.8

463.1

380.4

1220.2

ALT

27

1034

5

891

100

54

57

147

742

100

6

677

51

1218

1758

801

855

33

997

518

29

1631

1775

800

1093

1007

1031

969

948

755

646

9

4

896

1285

549

3

30

1661

137

MAX

45.6

39.8

44.2

40.8

43.9

45.0

44.6

43.7

39.4

43.8

38.8

46.2

42.2

40.6

35.6

40.6

44.0

40.5

38.0

45.4

43.0

39.9

35.4

42.2

40.7

40.2

40.6

38.8

42.2

42.6

41.6

38.2

38.4

46.0

40.0

46.8

38.4

38.2

37.5

40.5

Adana

Afyon

Anamur

Ankara

Antakya

Antalya

Aydın

Balıkesir

Bolu

Bursa

Çanakkale

Diyarbakır

Edime

Erzincan

Erzurum

Eskişehir

Gaziantep

Göztepe (Istanbul)

Isparta

Đslâhiye

Đzmir

Karaköse

Kars

Kastamonu

Kayseri

Kırşehir

Konya

Kütahya

Malatya

Merzifon

Muğla

Rize

Samsun

Siirt

Sivas

Tekirdağ

Trabzon

Şanlıurfa

Van

Zonguldak

Page 10

19

Sun1

0.000

0.003

0.000

0.000

0.000

0.000

0.000

0.182

0.544

0.253

0.000

0.000

0.291

0.179

0.017

0.216

0.000

0.294

0.000

0.000

0.000

0.291

0.280

0.331

0.000

0.000

0.000

0.365

0.000

0.253

0.000

1.000

0.892

0.000

0.226

0.574

0.926

0.000

0.000

0.527

Sun2

0.634

0.997

0.000

0.927

0.959

0.041

0.740

0.818

0.456

0.747

0.829

0.268

0.709

0.821

0.983

0.784

0.268

0.706

0.846

0.707

0.220

0.709

0.720

0.669

0.992

0.943

0.846

0.635

0.756

0.747

0.691

0.000

0.108

0.732

0.774

0.426

0.074

0.041

0.732

0.473

Sun3

0.366

0.000

1.000

0.073

0.041

0.959

0.260

0.000

0.000

0.000

0.171

0.732

0.000

0.000

0.000

0.000

0.732

0.000

0.154

0.293

0.780

0.000

0.000

0.000

0.008

0.057

0.154

0.000

0.244

0.000

0.309

0.000

0.000

0.268

0.000

0.000

0.000

0.959

0.268

0.000

Hum1 Hum2 Hum3 Pre1

0.000

0.957

0.091

0.909

0.000

0.696

0.333

0.667

0.000

0.609

0.091

0.909

0.152

0.848

0.000

0.609

0.000 0.348

0.000

0.696

0.000 0.348

0.697

0.303

0.000

0.609

0.333

0.667

0.091

0.909

0.000

0.783

0.333

0.667

0.000 0.174

0.273

0.727

0.333

0.667

0.212

0.788

0.000

0.783

0.000

0.609

0.000

0.609

0.030

0.970

0.152

0.848

0.333

0.667

0.000

0.870

0.697

0.303

0.000

0.870

0.212

0.788

0.0000.000

0.000 0.174

0.879

0.121

0.091

0.909

0.0000.087

0.0000.435

1.000

0.000

0.394

0.606

0.0000.435

Pre2

0.955

0.444

0.755

0.212

0.680

0.721

0.834

0.989

0.862

0.927

0.974

0.678

0.991

0.168

0.496

0.197

0.913

0.938

0.994

0.843

0.930

0.849

0.718

0.556

0.199

0.213

0.000

0.978

0.252

0.272

0.638

0.000

0.954

0.910

0.373

0.997

0.848

0.562

0.223

0.624

Pre3

0.045

0.000

0.245

0.000

0.320

0.279

0.166

0.011

0.000

0.073

0.026

0.000

0.009

0.000

0.000

0.000

0.000

0.062

0.006

0.157

0.070

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.362

1.000

0.046

0.090

0.000

0.003

0.152

0.000

0.000

0.376

Alt1

0.966

0.000

0.997

0.000

0.863

0.928

0.924

0.796

0.000

0.863

0.996

0.046

0.932

0.000

0.000

0.000

0.000

0.958

0.000

0.271

0.963

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.090

0.992

0.999

0.000

0.000

0.227

1.000

0.962

0.000

0.810

Alt2

0.034

0.695

0.003

0.830

0.137

0.072

0.076

0.204

0.970

0.137

0.004

0.954

0.068

0.523

0.016

0.914

0.863

0.042

0.730

0.729

0.037

0.135

0.000

0.915

0.640

0.721

0.698

0.756

0.776

0.957

0.910

0.008

0.001

0.825

0.460

0.773

0.000

0.038

0.107

0.190

Alt3

0.000

0.305

0.000

0.170

0.000

0.000

0.000

0.000

0.031

0.000

0.000

0.000

0.000

0.477

0.984

0.086

0.137

0.000

0.270

0.000

0.000

0.865

1.000

0.085

0.360

0.279

0.302

0.244

0.224

0.043

0.000

0.000

0.000

0.175

0.540

0.000

0.000

0.000

0.893

0.000

Max1

0.000

0.162

0.000

0.000

0.000

0.000

0.000

0.000

0.238

0.000

0.352

0.000

0.000

0.010

0.962

0.010

0.000

0.029

0.505

0.000

0.000

0.143

1.000

0.000

0.000

0.086

0.010

0.352

0.000

0.000

0.000

0.467

0.429

0.000

0.124

0.000

0.429

0.467

0.600

0.029

Max2

0.195

0.838

0.423

0.976

0.472

0.293

0.358

0.504

0.762

0.488

0.648

0.098

0.748

0.990

0.038

0.990

0.455

0.971

0.495

0.228

0.618

0.857

0.000

0.748

0.992

0.914

0.990

0.648

0.748

0.683

0.846

0.533

0.571

0.130

0.876

0.000

0.571

0.533

0.400

0.971

Max3

0.805

0.000

0.577

0.024

0.528

0.707

0.642

0.496

0.000

0.512

0.000

0.902

0.252

0.000

0.000

0.000

0.545

0.000

0.000

0.772

0.382

0.000

0.000

0.252

0.008

0.000

0.000

0.000

0.252

0.317

0.154

0.000

0.000

0.870

0.000

1.000

0.000

0.000

0.000

0.000

Adana

Afyon

Anamur

Ankara

Antakya

Antalya

Aydın

Balıkesir

Bolu

Bursa

Çanakkale

Diyarbakır

Edirne

Erzincan

Erzurum

Eskişehir

Gaziantep

Istanbul

Isparta

Đslâhiye

Đzmir

Karaköse

Kars

Kastamonu

Kayseri

Kırşehir

Konya

Kütahya

Malatya

Merzifon

Muğla

Rize

Samsun

Siirt

Sivas

Tekirdağ

Trabzon

Şanlıurfa

Van

Zonguldak

Table 2: Fuzzy coding of data in Table 1: each variable is recoded as three variables and the data are

the membership grades according to triangular membership functions depicted in Figure 1. The crisp

data are zeros apart from ones in the positions of the boldface entries, which indicate highest

membership.

0.043

0.000

0.304

0.000

0.391

0.000

0.000

0.391

0.652

0.304

0.652

0.000

0.391

0.000

0.000

0.217

0.000

0.826

0.000

0.000

0.000

0.217

0.391

0.391

0.000

0.000

0.000

0.130

0.000

0.130

0.000

1.000

0.826

0.000

0.000

0.913

0.565

0.000

0.000

0.565

0.000

0.556

0.000

0.788

0.000

0.000

0.000

0.000

0.138

0.000

0.000

0.322

0.000

0.832

0.504

0.803

0.087

0.000

0.000

0.000

0.000

0.151

0.282

0.444

0.801

0.787

1.000

0.022

0.748

0.728

0.000

0.000

0.000

0.000

0.627

0.000

0.000

0.438

0.777

0.000

20

MEASURE OF FIT Crisp MCA Fuzzy MCA

Least-squares fit to indicator data & fuzzy data, respectively 45.9% 55.0%

Least-squares fit to Burt matrix & fuzzy Burt matrix, respectively 68.6% 80.7%

JCA of Burt matrix & fuzzy Burt matrix, respectively 87.5% 88.9%

Predicting the category with highest membership degree: 85.5% 84.0%

Fit to original standardized data: * 69.4%

_________________________________________________________________________________

* indicates the stress measure of error, evaluated as 44.6%, in this case.

Table 3: Measures of fit for MCA of crisp and fuzzy coded data.

Page 11

21

Figure 1: Triangular membership functions to recode a continuous variable into three categories

(coloured as red, green and blue respectively), using the minimum, median and maximum as hinge

points. An example is given of a value below the median on the continuous scale which is coded as

[0.22 0.78 0]

1

0.5

min median max

0.22

0.78

original variable

recoded

variables

22

Figure 2: Principal component analysis (PCA) biplot of Table 1, where variables have been

standardized, showing that 75.6% of the variance is explained by the first two principal components.

The variable vectors all lie in the unit circle and have been multiplied by a scaling factor of √n = √40

to show them more clearly.

Zonguldak

Van

Şanlıurfa

Trabzon

Tekirdağ

Sivas

Afyon

Siirt

Samsun

Rize

Muğla

Merzifon

Malatya

Kütahya

Konya

Kırşehir

Kayseri

Kastamonu

Kars

Karaköse

Đzmir

Đslâhiye

Isparta

Göztepe (Istanbul)

Gaziantep

Eskişehir

Erzurum

Erzincan

Edime

Balıkesir

Diyarbakır

Çanakkale

Bursa

Bolu

Aydın

Antalya

Antakya

Ankara

Anamur

Adana

MAX

ALT

PRE

HUM

SUN

-5

-4

-3

-2

-1

0

1

2

3

4

-6 -5 -4-3 -2 -101234

2.164 (43.3%)

1.617 (32.3%)

Page 12

23

Figure 3: Asymmetric map of multiple correspondence analysis (MCA) of Table 3, with cities in

principal coordinates and categories in standard coordinates; 45.9% of the inertia is explained in two

dimensions. Several cities have exactly the same set of categories and thus the same positions, for

example Samsun, Trabzon and Zonguldak in upper left quadrant (their categories are Sun1, Hum3,

Pre2, Alt1 and Max2, so these cities are situated at the average of these category points in the

asymmetric map); the locations of other groups of coincident points are indicated by arrows.

Zonguldak

Van

Erzurum

Şanlıurfa

Trabzon

Tekirdağ

Sivas

Siirt

Gaziantep

Samsun

Göztepe (Istanbul)

Çanakkale

Rize

Muğla

Merzifon

Malatya

Kütahya

Konya

Kırşehir

Kayseri

Kastamonu

Kars

Karaköse

Đzmir

Đslâhiye

Isparta

Hum2

Eskişehir

Erzincan

Edirne

Diyarbakır

Bursa

Bolu

Balıkesir

Aydın

Antalya

Antakya

Ankara

Anamur

Afyon

Adana

Max3

Max2

Max1

Alt3

Alt2

Alt1

Pre3

Pre2

Pre1

Hum3

Hum1

Sun3

Sun2

Sun1

-3

-2

-1

0

1

2

3

4

-4-3 -2 -1012

0.542 (26.2%)

0.395 (19.7%)

24

Figure 4: Fuzzy MCA of Table 3, with 55.0% explained inertia in two dimensions. This is the

asymmetric map, where the standard coordinates of the column category points have been multiplied

by 2/3 to make the map more legible. Categories of each variable are connected.

Zonguldak

Van

Şanlıurfa

Trabzon

Tekirdağ

Sivas

Siirt

Samsun

Rize

Muğla

Merzifon

Malatya

Kütahya

Konya

Kırşehir

Kayseri

Kastamonu

Kars

Karaköse

Đzmir

Đslâhiye

Isparta

Göztepe (Istanbul)

Çanakkale

Gaziantep

Eskişehir

Sun2

Erzurum

Erzincan

Edirne

Diyarbakır

Bursa

Antakya

Bolu

Balıkesir

Aydın

Antalya

Ankara

Anamur

Afyon

Adana

Max3

Max2

Max1

Alt3

Alt2

Alt1

Pre3

Pre2

Pre1

Hum3

Hum2

Hum1

Sun3

Sun1

-2

-1.5

-1

-0.5

0

0.5

1

1.5

-1.5 -1.0 -0.50.0 0.5 1.0

0.326 (31.8%)

0.238 (23.2%)

Page 13

25

Figure 5: JCA of fuzzy Burt matrix, explaining 88.9% of the inertia in the off-diagonal fuzzy cross-

tabulations. Category points are plotted in principal coordinates, using the principal inertias at

convergence of the algorithm.

Sun1

Sun2

Sun3

Hum1

Hum2

Hum3

Pre1

Pre2

Pre3

Alt1

Alt2

Alt3

Max1

Max2

Max3

-0.6

-0.4

-0.2

0

0.2

0.4

-0.8-0.6 -0.4-0.20.0 0.20.4 0.6