Measures of fit in multiple correspondence analysis of crisp and fuzzy coded data

SSRN Electronic Journal 01/2008; DOI: 10.2139/ssrn.1107815
Source: RePEc

ABSTRACT When continuous data are coded to categorical variables, two types of coding are possible: crisp coding in the form of indicator, or dummy, variables with values either 0 or 1; or fuzzy coding where each observation is transformed to a set of “degrees of membership” between 0 and 1, using co-called membership functions. It is well known that the correspondence analysis of crisp coded data, namely multiple correspondence analysis, yields principal inertias (eigenvalues) that considerably underestimate the quality of the solution in a low-dimensional space. Since the crisp data only code the categories to which each individual case belongs, an alternative measure of fit is simply to count how well these categories are predicted by the solution. Another approach is to consider multiple correspondence analysis equivalently as the analysis of the Burt matrix (i.e., the matrix of all two-way cross-tabulations of the categorical variables), and then perform a joint correspondence analysis to fit just the off-diagonal tables of the Burt matrix – the measure of fit is then computed as the quality of explaining these tables only. The correspondence analysis of fuzzy coded data, called “fuzzy multiple correspondence analysis”, suffers from the same problem, albeit attenuated. Again, one can count how many correct predictions are made of the categories which have highest degree of membership. But here one can also defuzzify the results of the analysis to obtain estimated values of the original data, and then calculate a measure of fit in the familiar percentage form, thanks to the resultant orthogonal decomposition of variance. Furthermore, if one thinks of fuzzy multiple correspondence analysis as explaining the two-way associations between variables, a fuzzy Burt matrix can be computed and the same strategy as in the crisp case can be applied to analyse the off-diagonal part of this matrix. In this paper these alternative measures of fit are defined and applied to a

10 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This article provides a largely nontechnical discussion of the acquisition of membership values in fuzzy set analyses. First the basic properties of a membership are discussed. Then the three common strategies of membership assignment—direct subjective assign- ment, indirect subjective assignment, and transformation—are critically examined in turn. Examples are used to illustrate the techniques. The connection with existing psy- chometric and statistical methods is particularly emphasized, focusing on the notion of a membership value as a random variable as a means to assess uncertainty in assignment.
    Sociological Methods &amp Research 05/2005; 33(4):462-496. DOI:10.1177/0049124105274498 · 1.52 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The multiple correspondence analysis (MCA) is a descriptive and multidimensional method which can investigate several empirical situations, each situation being described by a categorical variable set. This paper shows how the fuzzy sets principle can be used to transform raw continuous data into categorical. The transformation is considered in two main stages: data characterizing and data coding. The data characterizing is performed to build analysis variables from complex empirical variables, such as multidimensional signals, using fuzzy windowing of the time and/or the space axes. The analysis variables are indicators summarizing the information within the so obtained windows. The data coding is performed to build homogeneous analysis variables, i.e. variables that are based on a qualitative scale using fuzzy windowing. Both methodological and practical aspect are considered in this paper through two examples. The first example considers the comparative analysis of force and force derivative signals in several load lifting conditions. The second example considers the analysis and the modelling of individual agreements between a graphical view showing two bargraphs and the assertion the height of the first bar is large and the height of the second is large, the agreement being given through an interval.
    Fuzzy Sets and Systems 11/1999; 107(3):255-275. DOI:10.1016/S0165-0114(97)00317-5 · 1.99 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A general coefficient measuring the similarity between two sampling units is defined. The matrix of similarities between all pairs of sample units is shown to be positive semidefinite (except possibly when there are missing values). This is important for the multidimensional Euclidean representation of the sample and also establishes some inequalities amongst the similarities relating three individuals. The definition is extended to cope with a hierarchy of characters.
    Biometrics 12/1971; 27(4-4):857-871. DOI:10.2307/2528823 · 1.57 Impact Factor


10 Reads
Available from