Download full-text PDF

The Guttman effect: Its interpretation and a new redressing method

Article · January 2005with591 Reads
Università degli Studi di Roma “La Sapienza”
The Guttman effect: its interpretation and a new
redressing method
Sergio Camiz
n. 5 / 2004 - Febbraio 2004
DIPARTIMENTO DI MATEMATICA
“GUIDO CASTELNUOVO”
The Guttman effect
25/02/04 grecia.wpd Pag. 1 / 17
| C C C C C C C C C C C C C C C
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
---+---------------------------------------------
U1 | 1 1 1
U2 | 1 1 1 1
U3 | 1 1 1 1 1
U4 | 1 1 1 1 1
U5 | 1 1 1 1 1
U6 | 1 1 1 1 1
U7 | 1 1 1 1 1
U8 | 1 1 1 1 1
U9 | 1 1 1 1 1
U10| 1 1 1 1 1
U11| 1 1 1 1 1
U12| 1 1 1 1 1
U13| 1 1 1 1 1
U14| 1 1 1 1
U15| 1 1 1
Table 1 - A band data table.
Figure 1 - Horse-shoe effect of rows and columns of
Table 1 on the plane spanned by principal axes 1 and 2
of its Principal Components Analysis.
Figure 2 - Arch effect of rows and columns (superposed)
of Table 1 on the plane spanned by factors 1 and 2 of
Correspondence Analysis.
| C C C C C C C C C C C C C C C
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
---+---------------------------------------------
U1 | 5 3 1
U2 | 3 5 3 1
U3 | 1 3 5 3 1
U4 | 1 3 5 3 1
U5 | 1 3 5 3 1
U6 | 1 3 5 3 1
U7 | 1 3 5 3 1
U8 | 1 3 5 3 1
U9 | 1 3 5 3 1
U10| 1 3 5 3 1
U11| 1 3 5 3 1
U12| 1 3 5 3 1
U13| 1 3 5 3 1
U14| 1 3 5 3
U15| 1 3 5
Table 2 - A data table of variables having a
Gaussian distributions on a common factor.
The Guttman effect: its interpretation and a new redressing method
Sergio Camiz
Dipartimento di Matematica Guido Castelnuovo - Università di Roma La Sapienza
e-mail: sergio.camiz@uniroma1.it
Abstract. The Guttman effect is introduced as a result of factor analyses of band data tables. The problems raised are
discussed, in particular with reference to vegetation analysis. Two possible investigation pathways resulted: one aiming
at using different analysis techniques, based on the Gaussian model (Ihm and van Groenewoud, 1975; Johnson and
Goodall, 1980) and the other based on the redressing of the scatter plots to remove this effect (Gauch and Hill, 1980;
Delicado and Aluja, 2003). In this paper, the various methods are described and a new simple redressing method is
introduced, based on the interpretation of the scatter plots and their parabolic pattern and some found properties.
1. Introduction
This effect was first identified by Guttman (1953). Given
either a band data table, as the one in Table 1, or one
obtained by variables having a Gaussian distribution
along a given factor, as in Table 2, if one submits the table
to Principal Components Analysis (in the following, PCA)
the resulting pattern of row points on the plane spanned
by the first two principal components has a horse-shoe
shape, as the one represented in Figure 1. The same table,
submitted to Correspondence Analysis (in the following,
CA), produces an arch pattern of row points on the plane
spanned by the first two factors, as represented in Figure
2. In this latter case, the column-points pattern is very
similar to the row-points one, whereas in the previous one
the column-vectors pattern is represented differently.
Since then, these particular shapes were studied both to
understand their theoretical rationale and to overcome
The Guttman effect
25/02/04 grecia.wpd Pag. 2 / 17
them, in order to get the pattern linear. Indeed, in some cases, a rectilinear pattern is considered
a better representation of the data structure. In particular if one wishes to use the coordinates for
other purposes, such as identifying the position of a point along the underlying factor. Limited,
as most of the studies were, to simulated data sets, they aimed at linearising the arch, say to get
rectilinear the distribution, but they did not take into account neither the multidimensional
pattern usually resulting from the analysis of real cases, nor its meaning. In this way the studies
were incomplete. Only recently Delicado and Aluja (2003) consider sufficiently the second
dimension in a very interesting way. In a contemporary study, Camiz and Polenta (2003) propose
an analogous redressing method, much simpler in its rationale, based on the current interpretation
of the points pattern on factor planes and on the parabolic shape of the scatter. In this paper, after
a review of the main suggestions found in literature aiming at solving both issues (section 2), the
methods based on the Gaussian model are proposed (section 3), followed by the Delicado and Aluja
(2003) redressing (section 4). Then, a two-dimensional interpretation of the Guttman effect is given
(section 5), the new method of redressing of Camiz and Polenta (2003) is proposed (section 6), with
its comparison with some other methods based on an application to real data (section 7).
2. Guttman effect and redressing
Benzécri (1973-82) showed that the pattern of both row and column points of a band matrix on
the planes spanned by the first and the n-th factor of CA is an n-th degree polynomial. Van
Rijckevorsel (1987) studies the problem in the frame of homogeneity analysis. The Benzécri
approach was recently considered by Baccini et al. (1994), who theoretically explain the Guttman
effect. Given the probability distribution function of a random vector (X, Y) having standardised
bivariate normal distribution with correlation coefficient D
its expression in terms of Tchebicheff-Hermite polynomials, through the Mehler expansion
(Watson, 1933), is
By discretisation of (X, Y), a contingency table results with joint probability
that can be approximated by
the reconstruction formula given by CA, where n is the dimension of the solution, the eigenvalues
The Guttman effect
25/02/04 grecia.wpd Pag. 3 / 17
Figure 3 - (a) Example of two unimodal distributions
having different mode along the same gradient; (b) the
joint distribution of the two unimodal variables.
are the even powers of the correlation coefficient
and the eigenvectors are the Hermite polynomials
The second issue, namely the redressing of the horseshoe / arch pattern, was long treated in the
frame of vegetation science, since the multidimensional structure of a vegetation data table is
evident and very often depends on a mayor environmental factor (or a combination of several co-
occurrent factors). In this case, the eigenanalysis-based methods, such as PCA and CA, show the
Guttman effect, since the relation among factors and plant species is unimodal. The detection of
Guttman effect on the factor planes is thus an evidence of the seriation of species and relevés along
an environmental gradient, but vegetation scientists are not satisfied with this information and
would like to estimate the position of both species and relevés along this detected gradient. The
problem of this indirect gradients identification through exploratory analysis tools was already
criticised by Camiz (1991; 1994) who argues that this aim, typical of the inferential statistics, is
far beyond the limits of the exploratory methods.
Nevertheless, there are cases in which the position
of an item along the pattern would be used for
classification purposes and for the reorganisation
of the data table, even remaining in the frame of
exploratory analysis. This occurs in the so-called
table structuration, where both rows and columns
are rearranged, so that similar items are grouped
together and groups and items order reflect their
order along the environmental factor. Both these
issues have a mayor importance in vegetation
science, in particular since a structured table is a
basic tool for the definition of the syntaxa, i.e. the
vegetation taxonomic units. In general, to deal with
a linear structure is helpful, since the coordinates
are related directly to the identified factors, a fact
that is not true in the case of Guttman effect.
In the frame of vegetation studies maybe the
Benzécri (1973-82) proof was not known or they
wished to appeal to intuition: as a consequence, the
explanation of Guttman effect was given as a
multidimensional extension of the two-dimensional
pattern of a sample taken from a population
represented by two similar unimodal distributions,
The Guttman effect
25/02/04 grecia.wpd Pag. 4 / 17
Figure 4 - Original and adjusted position of points on a
plane according to the DECORANA adjustment based on
moving average.
whose mode is shifted along the real line (Figure
3a). If the sample is represented on a plane whose
each coordinate is the frequency of either
distribution, its pattern corresponds to the curve
represented in Figure 3b (Wartenberg et al., 1987;
see also Orlóci, 1978) and the Guttman effect
results from a composition of several such patterns.
This explanation is not exact and it is surprising
that nowadays it is still accepted. As a matter of
facts the practical / empirical interpretation can be
based on the usual method of interpretation of
PCA and CA graphics: the items at the ends of the
distribution are negatively correlated, and those
close to each other have similar behaviour. The point is that the Guttman effect originates from
an eigenanalysis representation of distributions having their modes on different values of a
common factor. So, studies were carried out, in order to empirically clarify the relations among
modes positions, distributions turnover, distribution ranges, length of the factor, etc. (among
others, Gauch and Whittaker, 1972; Austin, 1976a; Kessel and Whittaker, 1976; Gauch et al.,
1977; Prentice, 1980) In addition, other studies were carried out to compare different analysis
techniques, including Kruskal (1964a; 1964b) Non-Metric MultiDimensional Scaling (Austin,
1976b; Kenkel and Orlóci, 1986). This latter was considered the most effective in minimising the
Guttman effect, but nevertheless, PCA and, more, CA, remain in the current practice of vegetation
data analysis.
An important development, so to say, was the discovery that CA is less heavily affected by the
Guttman effect than PCA: in this latter the folding of the pattern tails heavily biases the order of
the items along the first factor. Should this folding not exist, as in CA, the coordinates of items
along the first axis would be a sufficient approximation of the position of items along the factor.
At least, it was argued that the items order along the factor would be sufficiently well represented,
albeit this would not be the case of the reciprocal distances. So, CA became the common method
for the analysis of vegetation tables, thanks to its higher robustness to Guttman effect, a wrong
reason in respect to the true one, that a vegetation table is a transformation of a contingency table.
As a drawback of this choice, the awareness that PCA factors represent groups of co-occurrent
species, thus species associations (Noy-Meir, 1971) was eventually dropped. In all cases, the
attention was drawn on the position of items along the gradient, thus considering the problem only
unidimensional. This certainly depended on a superficial interpretation of Benzécri (1973-82)
explanation of the polynomial relations among factors: stating that the coordinates along the
second factor are but a second degree polynomial transformation of those on the first, those on the
third a third degree transformation, etc., the interest concerning the width of the arch was totally
lost. Indeed, Gauch and Hill (1980) state that the effect is «simply a mathematical artifact,
corresponding to non real data structure» (pag. 48), so that it is not surprising that they proposed
a detrending method, aiming at getting straight the arch pattern resulting from CA. Their
proposal, implemented both in DECORANA (Hill, 1979) and ter Braak’s (1988) CANOCO, was to
The Guttman effect
25/02/04 grecia.wpd Pag. 5 / 17
(1)
remove the arch from the second factor: for this aim they suggested two methods: i) to divide the
first axis range into intervals, compute the average of the second coordinates of points falling in
each interval, then subtract the interval average from the second coordinates of points falling in
the interval. It is evident that such a procedure, albeit getting almost rectangular the arch patter,
severely affects the reciprocal position of items, in particular in what concerns the second
dimension (Figure 4), but this seemed without importance to the detrending supporters, since «the
preservation of the second dimension of the solution provides no additional information about the
underlying order. Thus there is no reason to avoid detrending» (Peet et al., 1988: p. 926). ii) A
better technique that they propose is to fit the distribution along the axes following the first with
a polynomial regression and remove the fitted mean at each position along the first axis from the
coordinate. This procedure causes a smaller distortion of the points reciprocal position, but this
does not give any real advance to the problem solution. In fact, it is questionable if the adjusted
distribution has some meaning in what concerns the following factors. Presently, both DECORANA
and CANOCO are widely used in the vegetation studies framework and the detrending has become
a current (wrong) practice.
3. Guttman effect and Gaussian model
A different approach, theoretically more relevant, was to limit the attention to the uni-dimensional
environmental factor, thus dropping the hope to get a multidimensional representation, and
develop a method based on different assumptions. This led Ihm and Van Groenewoud (1975; 1984)
to develop an analysis based on the Gaussian model. This was already investigated by Gauch and
Chase (1974) and Gauch et al. (1974), that considered the Gaussian curve a reasonable
approximation of the ecological niche, an issue criticised by Austin (1976a; 1980; 1987) who
proposes other models too. In particular, he states that the symmetry of the distribution is far
from evidence, and that unimodality is acceptable, at least as far as the competition among species
is not taken into account.
The Gaussian model is based on the idea that p variables have a symmetrical distribution on the
real line given by
where is the maximum of the variable j, reached at the range mid-point, , the
mean, and is its standard deviation. With this model, the cover y of a plant species at the value
x of an environmental gradient may be estimated.
The Ihm and Van Groenewoud (1975) method is based on the hypothesis of equal variance of the
The Guttman effect
25/02/04 grecia.wpd Pag. 6 / 17
(2)
variables ( ), that allows mayor simplifications and gives a very simple solution.
Besides, is supposed defined for every real value x. For every couple of variables (j,k), a
dissimilarity index is given by the double natural logarithm of the integral of the product of the
two distributions and ,
Assuming the Gaussian distribution (1) for the ys, this gives the dissimilarities
thus a linear function of the squared differences among the means, with marginal values
and
where
To transform the matrix into scalar products, the Gower (1966) transformation is applied
that gives the rank one matrix , such that
and whose characteristic equation is
This gives the eigenvector
associated to the eigenvalue
The Guttman effect
25/02/04 grecia.wpd Pag. 7 / 17
(3)
(4)
In order to estimate the ys, the expected value of the j-th variable on the i-th observation
along the factor is estimated by the observed value : if it is error-free and the are chosen at
regular distance, the sum
is a numerical approximation of the integral (2). Albeit the are not always regularly distributed,
one may considered that they are independent and uniformly distributed along the factor in a
given interval [-A, A]. In this case the (3) is a Monte-Carlo estimate of the integral (2) and the
matrix can be analysed, giving estimates of both the eigenvalue and
the eigenvector of .
Following similar ideas, Johnson and Goodall (1980) propose a technique that they try to extend
to the multidimensional case (Goodall and Johnson, 1982; 1987). The method is essentially devoted
to the indirect identification of the ecological gradients underlying vegetation tables structure, so
that they distinguish the case of species present in a relevé, where they try to fit a Gaussian model
to the species cover, and the case of species absent, where they fit a parabola to the probability of
absence of the species. For the estimation of parameters they use the maximum likelihood method.
Considering the Gaussian model (1) in the reduced form
they use its logarithm
to fit, for each species, a Gaussian curve via the least-squares, by minimizing the sum of squares
where n is the number of relevés, xj is the position of the j-th relevé along the gradient, and Ki, ai,
bi are species specific parameters of the model to be estimated. Note that in (4) only the yij, the
cover value of the j-th species in the i-th relevé, is known. In particular, since xi is unknown,
standard regression techniques cannot be used to estimate Kj, aj, bj, so that they use an iterative
approach, where the estimation of the xi is done through maximum likelihood method. The
likelihood function is built taking into account either the species cover, if it is present, or its
The Guttman effect
25/02/04 grecia.wpd Pag. 8 / 17
Figure 5 - The Principal Curve as the curve whose
points are the mean of those projected onto them
(Delicado and Aluja, 2003).
Figure 6 - The Principal Curve of Oriented Points as
a set of Principal Oriented Points (Delicado and
Aluja, 2003).
probability of absence, in this case using the fitted parabola. The procedure may thus be
summarised as follows:
1. Start from a rough approximation of the xi values on the true gradient;
2. fit the bell-shaped response curves for each species j on the basis of the yij in each relevé i,
according to the position xi along the gradient;
3. obtain a better approximation of xi by using the maximum likelihood method applied to both
functions (Gaussian and parabola);
4. iterate the process from 2. until convergence.
The model was tested for robustness by Goodall and Johnson (1987). Note that, for a better
convergence, the estimation of the species variances was substitute by a single pooled variance.
The application of this method, albeit very interesting and very sophisticated, seems limited to the
vegetation analysis, since it is questionable, for other kind of data, the double fit proposed, as well
as some other adjustments not described here.
In general, dealing with the Gaussian model it is evident that the equal variance hypothesis is far
from being reasonable. It is sufficient to look at a typical vegetation table to acknowledge that
there are species present in all relevés, even very far from each other along the first factor, and
other with a very limited range.
In all these discussions, it is clear that the understated assumption is that the dispersion of the
points from the factor (curve) line is simply noise or something otherwise caused by the arch effect,
so that a uni-dimensional solution is acceptable. This is totally false and it is sufficient to show the
pattern of items on factor planes once that the matrix is a multiband one, as the one in Table 3,
that will be discussed later.
4. Guttman effect and Principal Curves
In most recent times, Delicado and Aluja (2003) propose, as a correction of Guttman effect, the
use of Principal Curves. Principal curves of a random variable X are one-dimensional
parametrized curves having the self-consistency property that
The Guttman effect
25/02/04 grecia.wpd Pag. 9 / 17
Figure 7 - Once identified a principal curve, points are projected onto
it (a); for a linear representation, the coordinate along the curve and
the length of the projection are plotted (b). It is evident the difference
with the non-linear regression results (c).
each point "(t) on the curve is the
mean of all points whose projection on
the curve (i.e. the orthogonal
projection on the straight line tangent
to the curve) is exactly "(t) (Figure 5,
Hastie and Stuetzle, 1989). Starting
from the concept of principal direction
associated to a point x as the straight
line through x that minimizes the
variances of the other points
projections on the line, Delicado (2001)
defines a set of principal oriented
points as a set of points coincident with
the mean on their corresponding
principal direction. In this way, he can
define a principal curve of oriented
points as a principal curve composed of
principal oriented points (Figure 6).
This allows Delicado and Huerta (2003)
to propose an implementation of the
method (for other algorithms, see
Hastie and Stuetzle, 1989; Kégl et al.,
2000). So, given a plane scatter of
points, Delicado and Aluja (2003) build
a scatter principal curve (Figure 7a),
then project each point onto it, thus
obtaining two coordinates: one along
the curve (e.g. the length of the curve
segment from the curve mid-point and
the projection) and one the distance of
the point from its projection on the curve (Figure 7b). This kind of redressing is totally different
from both detrending proposed by Gauch and Hill (1980), in that it takes into account the
scattering of points along a second dimension, that is highly biased in both Gauch and Hill
methods, without any justification. In particular, in Figure 7c the same points are projected
vertically on the curve, as in the parabolic regression, and it is clear the enormous difference of
results.
5. Interpreting the Guttman effect in two dimensions
Instead of limiting the attention to a single-band matrix, as it is usually done in literature, let us
consider now a multi-band matrix with bands having different width, such as the one shown in
Table 3, where the rows B and the columns P are supplemental ones. In both PCA and CA
The Guttman effect
25/02/04 grecia.wpd Pag. 10 / 17
| P | Q | R | S
| 11111111112| 111111111122222| 0123456789012345678| 012345678901234567890
|12345678901234567890|123456789012345678901234|1234567891111111111222222222|123456789111111111122222222223
----+--------------------+------------------------+----------------------------+------------------------------
B1 | | 1 | |
B2 | | 1 | |
B3 | | 1 | |
B4 | | 1 | |
B5 | | 1 | |
B6 | | 1 | |
B7 | | 1 | |
B8 | | 1 | |
B9 | | 1 | |
B10 | | 1 | |
B11 | | 1 | |
B12 | | 1 | |
B13 | | 1 | |
B14 | | 1 | |
B15 | | 1 | |
B16 | | 1 | |
B17 | | 1 | |
B18 | | 1 | |
B19 | | 1 | |
B20 | | 1 | |
----+--------------------+------------------------+----------------------------+------------------------------
C1 |1 |11111 |111111111 |
C2 | 1 | 11111 | 111111111 |
C3 | 1 | 11111 | 111111111 |
C4 | 1 | 11111 | 111111111 |
C5 | 1 | 11111 | 111111111 |
C6 | 1 | 11111 | 111111111 |
C7 | 1 | 11111 | 111111111 |
C8 | 1 | 11111 | 111111111 |
C9 | 1 | 11111 | 111111111 |
C10 | 1 | 11111 | 111111111 |
C11 | 1 | 11111 | 111111111 |
C12 | 1 | 11111 | 111111111 |
C13 | 1 | 11111 | 111111111 |
C14 | 1 | 11111 | 111111111 |
C15 | 1 | 11111 | 111111111 |
C16 | 1 | 11111 | 111111111 |
C17 | 1 | 11111 | 111111111 |
C18 | 1 | 11111 | 111111111 |
C19 | 1 | 11111 | 111111111 |
C20 | 1| 11111| 111111111|
----+--------------------+------------------------+----------------------------+------------------------------
D1 |1 |11111 |111111111 |11111111111
D2 | 1 | 11111 | 111111111 | 11111111111
D3 | 1 | 11111 | 111111111 | 11111111111
D4 | 1 | 11111 | 111111111 | 11111111111
D5 | 1 | 11111 | 111111111 | 11111111111
D6 | 1 | 11111 | 111111111 | 11111111111
D7 | 1 | 11111 | 111111111 | 11111111111
D8 | 1 | 11111 | 111111111 | 11111111111
D9 | 1 | 11111 | 111111111 | 11111111111
D10 | 1 | 11111 | 111111111 | 11111111111
D11 | 1 | 11111 | 111111111 | 11111111111
D12 | 1 | 11111 | 111111111 | 11111111111
D13 | 1 | 11111 | 111111111 | 11111111111
D14 | 1 | 11111 | 111111111 | 11111111111
D15 | 1 | 11111 | 111111111 | 11111111111
D16 | 1 | 11111 | 111111111 | 11111111111
D17 | 1 | 11111 | 111111111 | 11111111111
D18 | 1 | 11111 | 111111111 | 11111111111
D19 | 1 | 11111 | 111111111 | 11111111111
D20 | 1| 11111| 111111111| 11111111111
Table 3 - A multiband data table.
supplemental elements are those that do not participate to the eigenanalysis, but are projected on
the factors the same. In this way, their position is interpreted as that of all other elements, but the
construction of the factors is not influenced by them. Rather, they may be used for the factors
interpretation, as external suggestion. In this case, their use is instrumental: we do not want that
they influence the factors, but we use them to set a limit to the scattering of all others elements. We
limit here our comments to the pattern of points on the plane spanned by the first two factors.
The PCA and CA applied to this data table give patterns shown in Figure 8 and Figure 9 and in
Figure 10 and Figure 11, respectively. This allows a better interpretation of the Guttman effect
than the one that can be done considering only one band matrix. Actually, the band matrix
represented in Table 1 is such that the two extreme columns are totally opposed, since it never
occurs to find rows having two 1s. Consequently, on the first axis the two columns are opposed to
each other. The same occurs for the central columns, that have no row with 1s in common with the
extremes: for this reason, on the first plane they are set as far as possible from both and, as a
consequence, it sets at the extreme of the second axis, whereas the two extremes are set opposite
to it. Intermediate columns having some row with 1 in common with the said one set in an
intermediate position, thus giving the typical arch pattern. The same happens for the rows.
Now, looking at the pattern of both columns (Figure 8) and rows (Figure 9) on the first factor
The Guttman effect
25/02/04 grecia.wpd Pag. 11 / 17
Figure 8 - The pattern of columns of Table 3 on the first
plane of principal components analysis.
Figure 9 - The pattern of rows of Table 3 on the first
plane of principal components analysis.
Figure 10 - Representation of the four column bands of
Table 3 on the first factor plane of Correspondence
Analysis.
Figure 11 - Representation of the three row bands of
Table 3 on the first factor plane of Correspondence
Analysis.
plane of PCA of Table 3, one can observe that the bands are represented along different curves,
the closer to the centroid the narrower is the band. This means that the first principal plane
returns two independent information, as it should do: a position of a column or a row along a
(curvilinear) factor and the width of the range of the column or row along the factor itself. This
is a consequence of the correlation structure among adjacent rows and columns. In particular, the
length of the column vectors depends on the correlation of the column with this factor plane, so
that the folding of both the extreme columns and rows toward the centre originates from the lower
correlation of these columns with the others. This is made clear in Figure 8 where the narrower
bands are closer to the centre than the larger ones.
The same occurs in CA but in opposite direction, due to the centroid property of items
representation: the larger bands, that are tied to more items, are represented closer to the centre
and the supplemental elements, say the band with only one 1, are set at the extreme of the
graphics, forming a convex hull containing all the other points. This is evident for both columns
(Figure 10) and rows (Figure 11). As well, in place of the folding, the extreme elements tend to level
to the convex hull, due to the reduced relation structure towards the end of the band. Analogous
comments can be done when considering the following factors.
Summarising, the band structure of a matrix is represented as an arch pattern and the different
The Guttman effect
25/02/04 grecia.wpd Pag. 12 / 17
Figure 12 - Polar coordinates on the plane, adjusted to
the distance of the convex hull to the origin.
position of the points within the centre and the
convex hull of the band composed by single values
depends on the width of the band. In this way, the
two-dimensional representation is fully
interpreted. This interpretation entails two facts:
first, the Guttman effect has both a theoretical
explanation (Benzécri, 1973-82; Baccini et al.,
1994) and a practical interpretation: once it is
detected, a band structure of the data table should
be supposed; second, both the dimensions have a
meaning, in terms of position of the band mid-point
and of band width. Neither can be ignored but they
had rather be taken into account in order to
understand the data structure.
Based on these facts, it becomes evident that the detrending, as suggested by Gauch and Hill (1980)
is senseless and the Gaussian ordinations are but a partial representation of a more complex
phenomenon, since they do not take into account the information tied to the variance. In the case
of vegetation, where the band represents the turnover of species along an environmental gradient,
if the position of the mode may be roughly or better estimated, the information concerning the
range of the gradient where a species may be found is totally lost.
Now, since we are in condition to correctly interpret the pattern of items once the Guttman effect
is detected, and since the factor of interest is often the curvilinear factor, we may wish to represent
this factor as a straight line as well as the information concerning the species dispersion along this
curvilinear factor. This claims for a different way of redressing, that is the one depicted in Figure
7b. Actually the proposal of Delicado and Aluja (2003) seems adequate, but some critics may
justify another suggestion, that will be shown in the next section.
6. A redressing proposal for the Guttman effect in Correspondence Analysis
The Delicado and Aluja (2003) use of principal curves is quite interesting, since they actually solve
the old problem of getting linear the non-linear representation of a factor underlying a
distribution, without introducing any arbitrary bias. Nevertheless, aiming at finding a curve
passing through the centre of the distribution, they suppose a kind of symmetry of the scatter
along the perpendicular to the curve, far from being the case in the case of Guttman effect. In
addition they do not take into account the known information on the shape of the distribution and
its meaning. Last, the proposed algorithm is quite cumbersome.
Our proposal (Camiz and Polenta, 2003) is to consider the existence of one limit to the scattering
of elements on the first factor plane of CA, that is the convex envelope of the supplemental lines
with only one 1. So, the idea is to consider as first coordinate the position along the curvilinear
pattern and, as second coordinate, the relative distance to that envelope (or to the centroid). These
The Guttman effect
25/02/04 grecia.wpd Pag. 13 / 17
Figure 13 - Linearised representation of the four column
bands of Table 3 using angles and distances ratios as
found in Figure 10.
Figure 14 - Linearised representation of the three row
bands of Table 3 using angles and distances ratios as
found in Figure 11.
(5)
two coordinates represent respectively the position and dispersion statistics of the distribution of
the elements on the underlying factor.
In order to obtain such coordinates, we submit to CA a data table with added-on a set of virtual
row-units having only one 1, each unit in a different column position, and a set of virtual column-
variables, having only one 1, each one in a different unit. These two sets are used as supplemental
elements in CA and will provide, by linear interpolation, the convex hull of the distribution. Then
we consider the system of polar coordinates on the first factor plane of CA, namely the distance
to the origin of points and the angle between the straight line connecting a point to the origin and
the vertical axis. In formulae, if x, y are the two coordinates of a point on the two factors, the
considered transformation is
The set of points with the transformed coordinates can be plotted on orthogonal axes,
giving a redressed pattern. Actually, the pattern would be linear if the curves would be circles.
Dealing with parabolas, or in practice with some generic convex curve, one had rather to adjust
the distance r to the distance of the convex hull to the centroid along the same vector. So, in (5)
the relative distance to the centroid of the point A is the ratio . In formulae, given a point
A whose coordinates are , if B is the intersection with the convex hull of the straight line
connecting the origin O and A, with coordinates , the formula (5) becomes
The Guttman effect
25/02/04 grecia.wpd Pag. 14 / 17
Figure 15 - The pattern of relevés of Ellenberg’s grassland data on the
plane spanned by the first two factors of Correspondence Analysis.
Figure 16 - The pattern of relevés of Ellenberg’s
grassland data adjusted using the moving average.
Figure 17 - The pattern of relevés of Ellenberg’s
grassland data, adjusted through the second degree
polynomial regression.
This transformation to coordinates
, whose implementation is
simple, applied to Figure 10 and Figure
11 gives a nearly-linearised pattern,
shown in Figure 13 and Figure 14
respectively. It is clear from the
representation that the method is very
approximate. Albeit each band is still
represented by a curve line, the
adjustment seems acceptable for
exploratory purposes. It must be
observed that, toward the end of the
factor, the position of points is more
confused, since the width of the pattern is progressively reducing. This seems a limit that cannot
be overcome by any procedure.
7. An example of application
As an example of the application of the polar coordinates method, we consider the Ellenberg
grassland vegetation data table, taken from Müller-Dombois and Ellenberg (1974) and also used
by Camiz (1994) where the whole table is reported. The pattern of the relevés on the plane of first
two factors extracted from CA is shown in Figure 15. It is evident the Guttman effect,
corresponding to a mayor ecological gradient. Applying DECORANA adjustments to the table
according to the moving average of the vertical coordinates or to the residuals of a regression on
a second degree polynomial, the resulting patterns are represented in Figure 16 and Figure 17,
respectively. In both cases it is clear the enormous bias introduced by the adjustment. In
particular, in both cases, relevés 19 and 25 are set very far from each other, but most relations,
in particular distances, are strongly modified. In Figure 18 the pattern obtained using the
proposed method is shown: indeed the reciprocal position of points is much closer to the original.
The Guttman effect
25/02/04 grecia.wpd Pag. 15 / 17
Figure 18 - The pattern of relevés of Ellenberg’s grassland data, adjusted by the
Camiz and Polenta (2003) method.
8. Conclusions
It was proved that the Guttman effect, far from being a distortion due to the algorithm of factor
analyses, is rather a pattern very informative of a particular data table structure. Indeed, in two
dimensions, it gives two information: i) that the data are influenced by a mayor underlying factor
that has a unimodal relation with the variables; and ii) the different range of the variables along
the gradient. For this reason, its removal without a proper technique, gets lost an important part
of its meaning. Rather, its rationale (Benzécri, 1973-82; Baccini et al. 1994), can be helpful in
building a technique of redressing that can keep at the best all the information represented on the
factor plane.
In comparison with the Gauch and Hill (1980) detrending methods, the proposed redressing by
polar coordinates, seem to give much better results, since both orders are kept nearly untouched.
In comparison with the Gaussian method, our method gives a second dimension, missing in Ihm
and van Groenewoud (1975); nevertheless, the order given by the Gaussian model, not reported
here, is different from ours and the reason should be further investigated. As well, a comparison
with both Johnson and Goodall (1979) and Delicado and Aluja (2003) techniques should be done.
We expect that only the latter could give comparable results, or maybe better. Actually, our
method may be used provided that the introduction of supplemental elements with 1 in only one
crossing may have some sense. A better adjustment, to reduce more the curvilinear pattern, would
be useful. As well, the extension of the technique to a similar one, able to take into account the
further dimensions could be an interesting development of this method.
Acknowledgements
The program CANOCO that allowed to run Gauch and Hill (1980) detrending was nicely given by
Cajo ter Braak; the Gaussian model program was nicely given by Peter Ihm and reviewed by
Vanda Tulli. The polar coordinates method was implemented by Giorgia Polenta. Their
contributions are gratefully acknowledged.
The Guttman effect
25/02/04 grecia.wpd Pag. 16 / 17
References
Austin, M.P. (1976a), «On non-Linear Species Response Models in Ordination», Vegetatio, vol. 33 (1): pp.
33-41.
Austin, M.P. (1976b), «Performance of four Ordination Techniques Assuming three Different non-Linear
Response Models», Vegetatio, vol. 33 (1): pp. 43-49.
Austin, M.P. (1980), «Searching for a Model for Use in Vegetation Analysis», Vegetatio, vol. 42: pp. 11-21.
Austin, M.P. (1987), «Models for the Analysis of Species’ Response to Environmental Gradients»,
Vegetatio, vol. 69: pp. 35-45.
Baccini, A., H. Caussinus, and A. de Falguerolles (1994), Diabolic Horseshoes, 9th International Workshop
on Statistical Modelling, Exeter, 11-15 July 1994.
Baxter, M.J. (1994), Exploratory Multivariate Analysis in Archaeology, Edinburgh, Edinburgh University
Press.
Benzécri, J.P. (ed.) (1973-82), L'Analyse des Données, 2 tomes, Paris, Dunod.
Camiz, S. (1991), «Reflections on Spaces and Relationships in Ecological Data Analysis: Effects, Problems,
and Possible Solutions», Coenoses, vol. 6 (1): pp. 3-13.
Camiz, S. (1994), «A Procedure for Structuring Vegetation Tables», Abstracta Botanica, vol. 18 (2): pp.
57-70.
Camiz, S., and G. Polenta (2003), «Effet Guttman: son interprétation et une nouvelle méthode de
redressement», in: Y. Dodge, G. Melfi (eds.) Méthodes et Perspectives en Classification - Actes du Xème
Congrès de la Société Francophone de Classification. Presses Académiques Neuchâtel: pp. 91-94.
Delicado, P. (2001), «Another Look at Principal Curves and Surfaces», Journal of Multivariate Analysis,
vol. 77: pp. 84-116.
Delicado, P., and T. Aluja (2003), «Principal Curves for Correcting the Horseshoe Effect in
Correspondence Analysis», International Conference on Correspondence Analysis and Related
Methods, (CARME 2003), Abstracts, Barcelona, Universidad Pompeu Fabra: p. 23.
Delicado, P., and M. Huerta (2003), «Principal Curves of Oriented Points: Theoretical and Computational
Improvements», Computational Statistics,vol. 18: pp. 293-315.
Gauch, H.G. jr., and G.B. Chase (1974), «Fitting the Gaussian Curve to Ecological Data», Ecology, vol. 55:
pp. 1377-1381.
Gauch, H.G. jr., G.B. Chase, and R.H. Wittaker (1974), «Ordination of Vegetation Samples by Gaussian
Species Distribution», Ecology, vol. 55: pp. 1382-1390.
Gauch, H.G.jr., and M.O. Hill (1980), «Detrended Correspondence Analysis: an Improved Ordination
Technique», Vegetatio, vol. 42: pp. 47-58.
Gauch, H.G.jr., and R.H. Whittaker (1972), «Comparison of Ordination Techniques», Ecology, vol. 53 (5):
pp. 868-875.
Gauch, H.G.jr., R.H. Whittaker, and T.R. Wentworth (1977), «A Comparative Study of Reciprocal
Averaging and other Ordination Techniques», Journal of Ecology, vol. 65: pp. 157-174.
Goodall, D.W., and R.W. Johnson (1982), «Non Linear Ordination in Several Dimensions. A Maximum
Likelihood Approach», Vegetatio, vol. 48: pp. 197-208.
Goodall, D.W., and R.W. Johnson (1987), «Maximum Likelihood Ordination - Some Improvements and
Further Tests», Vegetatio, vol. 71: pp.3-12.
Gower, J.C. (1966), «Some Distance Properties of Latent Root and Vector Methods used in Multivariate
Analysis», Biometrika, vol. 53: pp. 325-338.
Guttman, L. (1953), «A Note on Sir Cyril Burt's Factorial Analysis of Qualitative Data», British Journal
of Statistical Psychology, vol. 6: pp. 21-4.
The Guttman effect
25/02/04 grecia.wpd Pag. 17 / 17
Hastie, T., and W. Stuetzle (1989), «Principal Curves», Journal of the American Statistical Association,
vol. 84 (406): pp. 502-516, 1989.
Hill, M. O. (1979), DECORANA - a FORTRAN Program for Detrended Correspondence Analysis and
Reciprocal Averaging, Ithaca (N.Y.), Cornell University.
Ihm, P., and H. van Groenewoud (1975), «A Multivariate Ordering of Vegetation Data Based on Gaussian
Type Gradient Response Curves», Journal of Ecology, vol. 63: pp. 767-777.
Ihm, P., and H. van Groenewoud (1984), «Correspondence Analysis and Gaussian Ordination», in:
Chambers, J.M. et al. (eds.), Compstat Lectures: pp. 5-60.
Johnson, R.W. and D.W. Goodall (1980), «A Maximum Likelihood Approach to non-Linear Ordination»,
Vegetatio, vol. 41 (3): pp. 133-142.
Kégl, B., A. Krzyzak, T. Linder, and K. Zeger (2000), «Learning and Design of Principal Curves», IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 22 (3): pp. 281-297.
Kenkel, N.C., and L. Orlóci (1986), «Applying Metric and Nonmetric Multidimensional Scaling to Ecological
Studies: some New Results», Ecology, vol. 67(4): pp. 919-928.
Kessel, S.R., and R.H. Whittaker (1976), «Comparison of three Ordination Techniques», Vegetatio, vol.
32 (1): pp. 21-29.
Kruskal, J.B. (1964a), «Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric
Hypothesis», Psychometrika, vol. 29: pp. 1-27.
Kruskal, J.B. (1964b), «Nonmetric Multidimensional Scaling: a Numerical Method», Psychometrika, vol.
29: pp. 115-129.
Müller-Dombois, D., and E. Ellenberg (1974), Aims and Methods of Vegetation Ecology, New York, John
Wiley & Sons.
Noy-Meir, I. (1971), «Multivariate Analysis of the Semi-arid Vegetation in South-Eeastern Australia: Nodal
Ordination by Component Analysis», Proceedings of Ecological Society of Australia., vol. 6 : pp.
159-193.
Orlóci, L. (1978), Multivariate Analysis in Vegetation Research, Den Haag, Junk.
Peet, R.K., R.G. Knox, J.S. Case, and R.B. Allen (1988), «Putting Things in Order: the Advantages of
Detrended Correspondence Analysis», American Naturalist, vol. 31 (6): pp. 924-937.
Prentice, I.C. (1980), «Vegetation Analysis and Order Invariant Gradient Models», Vegetatio, vol. 42: pp.
27-34.
ter Braak, C. J. F. (1988), CANOCO: a FORTRAN program for canonical community ordination by
[partial] [detrended] [canonical] correspondence analysis, principal component analysis and
redundancy analysis (version 2.1), Wageningen (Netherlands), GLW Ministerie van Lanbouw en
Visserij.
Van Rijckevorsel, J. (1987), The Application of Fuzzy Coding and Horseshoes in Multiple Correspondence
Analysis, Leiden, DSWO Press.
Wartenberg, D., S. Ferson, and F.J. Rohlf (1987), «Putting Things in Order: a Critique of Detrended
Correspondence Analysis», American Naturalist, vol. 129 (3): pp. 435-448.
Watson, G.N., (1933), «Notes on Generating Functions of Polynomials: Hermite Polynomials», Journal of
the London Mathematical Society, vol. 8: pp. 194--199.
  • ...Figure 1the results of the three SCAs are represented too: it must be pointed out that the vertical position of the items is significant only for the second graphic. Indeed, the inspection of this factor plane shows an arch pattern due to a Guttman effect [9, 24] ; the same, the interpretation is straightforward: for the first table, both verbs and nouns seem to have in general less syllables than the adjectives; for the second, the variation in use of the words according to the higher complexity of the publication: verbs for the childish, nouns for reviews and disseminations , adjectives for scientific summaries; for the third, the more complicated words (3 and more syllables) in scientific summaries than in all others. It is noteworthy in the second table the opposite pattern of verbs and adjectives, the first reducing while the publication is of higher level and the second raising; this explains clearly the observed Guttman effect. ...
  • ...This indicates an association between the first dimension, measuring the volume of capital, and the second dimension, measuring the intensity of capital volume. It suggests that the first dimension has the strongest interpretive value when it comes to the respondents' position in the space (Baccini, Caussinus, & de Falguerolles, 1994;Camiz, 2005;Järvinen, Ellersgaard, & Larsen, 2013). However, the second dimension is also relevant, especially when it comes to (gendered) work position and cultural capital, in this study: education and partner's education (see Table I in appendix). ...
  • ...The analysis of the tree growth data collected was carried out in two phases following an adaptation of the methodology proposed by Camiz et al. (2008) for ecological data. First, we performed an exploratory phase of ordination by using Redundancy Analysis (Camiz, 2005; Legendre and Legendre, 1998). Then, the data were modeled using Artificial Neural Network techniques (Ripley, 1996). ...
Article
January 1999
    The different techniques used for Euclidean approximation of distances are discussed. In the special case of points in a Euclidean space, whose distances are biased due to measure errors, accepting negative eigenvalues may help in the interpretation of results that are less biased than those obtained through an additive constant solution. Numerical examples are given.
    Article
    January 2004
      The problem of coding archaeological finds is discussed. The different items susceptible to coding are described according to the kind of information that must be collected. Some new coding techniques are described in particular: the landmarks technique, to be used for the shape analysis of corpora of finds all having a similar shape; the textual coding, useful for the study of images, once... [Show full abstract]
      Article
      January 2008
        Opposite to physics, the complexity of randomness and factors in- volved in ecological studies claims for a reflection on the scientific methodology and suggests a methodological pathway, composed by three steps, correspond- ing to exploratory and confirmatory studies, followed by the mathematical modeling. Following this pathway, the scientist can better control his work, since he becomes... [Show full abstract]
        Discover more