Content uploaded by Romain Lucas Glele Kakaï
Author content
All content in this area was uploaded by Romain Lucas Glele Kakaï
Content may be subject to copyright.
On The Empirical Performance Of Non-Metric
Multidimensional Scaling In Vegetation Studies
V. K. Salako
1
, A. Adebanji
2
and R. Glèlè Kakaï
1
1
Faculty of Agronomic sciences, University of Abomey-Calavi,
01 BP 526, Cotonou, Bénin
Email: gleleromain@yahoo.fr
2
Department of Mathematics, Kwame Nkrumah
University of Science and Technology, Kumasi, Ghana
Email: tinuadebanji@gmail.com
ABSTRACT
Non-metric multidimensional scaling (NMDS) is widely used as a routine method for
ordination in vegetation studies. Its use in statistical softwares often requires the choice of
several options on which the accuracy of results will depend. This study focuses on the
combined effect of sample size, similarity/dissimilarity indexes, data standardization and
structure of data matrix (abundance and binary) on NMDS efficiency based on real data
from the Lama Forest Reserve in Southern-Bénin. The Spearman’s Rank Correlation
coefficient and the s-stress were used as an assessment criterion. All the four factors were
found to influence the efficiency of the NMDS and the samples (plots) standardization to
equal totals gave the best results among standardization procedures considered. The
Jaccard and Sorensen similarity/dissimilarity indexes performed equally whatever the nature
of the matrix. However, with binary matrices, Sokal and Michener similarity index performed
better. A quadratic relationship was noted between s-stress and sample size. A lower
optimal sample size (75 plots) was observed for the binary matrices than for the abundance
ones (90 plots).
Keywords: Non-metric multidimensional scaling, efficiency, vegetation studies.
Mathematics Subject Classification: 91C15, 68U20
1. INTRODUCTION
In vegetation studies, ordination aims at sorting samples and/or species along a few axes which must
represent the main compositional gradients in the data set, using either abundance or
presence/absence data (Økland, 1996). It seeks a parsimonious representation of samples and/or
species in a space of low dimensionality. Parsimony in this context implies that distances between
samples and/or species in ordination space optimally represent their original dissimilarities (between
samples or species) in variable space, in some defined sense (Kenkel and Orloçi, 1986). Common
goals of ordination methods in vegetation studies are: description and recognition of vegetation
distribution patterns, identification of plant’s communities and examination of plants and communities
distribution in relation to environmental factors and gradients (Kenkel and Orloçi, 1986; Kent and
Ballard, 1988; Podani, 2006).
International Journal of Applied Mathematics and Statistics,
Int. J. Appl. Math. Stat.; Vol. 36; Issue No. 6; Year 2013,
ISSN 0973-1377 (Print), ISSN 0973-7545 (Online)
Copyright © 2013 by CESER Publications
www.ceser.in/ijamas.html
www.ceserp.com/cp-jour
www.ceserpublications.com
The evolution of successive techniques, each considered, at least by its originators, to be an
improvement on previous works has resulted in complex and diverse methods. Thus, instead of
helping users, this has often caused confusions and difficulties for students and even more
experienced workers namely in terms of choice and evaluation of the methods (Kent and Ballard,
1988). Several papers stressed however that all ordination methods in current use are burdened with
defects such as the potential appearance of artifact or distorted axes (Kenkel and Orloçi, 1986;
Minchin, 1987; Økland, 1990) and none of them is robust under all circumstances (Kenkel and Orloçi,
1986).
Nonetheless, non-metric multidimensional scaling (NMS or NMDS) is currently undoubtedly the most
widely accepted and routinely used ordination technique which utilizes ordinal information (Podani,
2005). Previous, studies have discussed its advantages and disadvantages, sometimes with
contrasting conclusions (Kenkel and Orlóci, 1986; Gauch et al., 1981; Gordon 1999; Digby and
Kempton 1987; Clarke, 1993; Legendre and Legendre, 1998; Podani, 2000; 2005 & 2006).
Nevertheless, most authors agree that NMDS and its variants (e.g. local NMDS, Sibson, 1972;
Prentice, 1977; hybrid MDS: HMDS, Faith et al., 1987) represent a good alternative to the metric
procedures such as principal components analysis and correspondence analysis.
Application of NMDS starts with the computation of the matrix of dissimilarities/similarities among a
set of items in a multidimensional space which is the first step of the most multivariate analyses
(McCune and Grace, 2002). This first step pivotal to the performance of the NMDS and information
not captured at this stage cannot be expressed in the results. Several similarity measure indexes can
be considered and available in literature (Podani, 2001; McCune and Grace, 2002; Chessel et al.,
2004; Choi, 2008). McCune and Grace (2002) recommend using the quantitative version of the
Sørensen coefficient with reference to the principle of the method, while Palm (2003) considered the
similarity index of Sokal and Michener which take into account the co-presence and co-absence of
items. In addition, similarity indexes lose their sensitivity at large environmental distances. However,
some studies have shown that Sørensen and Jaccard distances are less affected by this (McCune
and Grace, 2002). As those indexes express the similarity in different ways, it is possible for them to
result in different ordinations. Standardization also affects results from metric and non-metric scaling
(Kenkel and Orloçi, 1986; Faith et al., 1987). Standardization is a kind of data transformation which
ecologically aims to make distance measures work better, reduce the effect of total quantity (sample
unit totals) to put focus on relative quantities, equalize (or otherwise alter) the relative importance of
common and rare species or emphasize informative species at the expense of uninformative species
(McCune and Grace, 2002). Standardization should therefore be of importance in ordination, mainly
when calculating dissimilarities or similarities (Faith et al., 1987).
As mentioned above, NMDS can be applied either on binary (presence-absence) or abundance data
(Økland, 1996; McCune and Grace, 2002). We hypothesize that the provision of more information by
abundance data is expected to result in more reliable ordination compared to presence-absence data.
It is then useful to investigate how this occurs with NMDS. Furthermore, sample size is a key part of
sampling design. Findings of several studies reported that an increase in sample size invariably
International Journal of Applied Mathematics and Statistics
55
results in improved estimation efficiency (Braun-Blanquet, 1964; Eckblad, 1991; McCune and Lesica,
1992; Condit et al., 1996). The effect of sample size on the performance of NMDS constitutes an
important issue in vegetation studies.
The purpose of this study was to analyze the relative performance of NMDS in vegetation studies
focusing on the effect of sample size, types of data (binary and abundance data) and similarity (or
dissimilarity) measures. The study hypotheses considered are as follows : (i) the increase in the
sample size improves efficiency of NMDS ; (ii) for a given sample size and type of data, similarity
indexes do not affect NMDS efficiency, (iii) abundance matrices result in more accurate ordination
and (iv) data standardization improves efficiency of the NMDS.
2. METHODOLOGY
2.1. Data collection
Data used in this study are obtained from Bonou et al. (2009) and are linked to the identification of
plant vegetation communities in the Lama Forest reserve (LFR). Data were based on a matrix of
Presence–absence of 31 species recorded in 100 plots of 0.15 ha. The LFR under protection since
1946 is located in Southern Benin in the Dahomey Gap, between 6°55' and 7°00' latitude North and
2°04' and 2°12' longitude East. It covers approximately 16,250 hectares. The original vegetation was
a dense semi-deciduous forest established on 4,777 ha composed of 1,900 ha of dense forest, the
remaining area being constituted of fallows (Bonou et al., 2009). This matrix (let M) was subjected to
non-metric multidimensional scaling which resulted into four plant communities: young fallow, old
fallow, land typical dense forest and degraded dense forest (Bonou et al., 2009). The abundance data
of species were also considered.
2.2. Simulation design
We considered the case of abundance data matrices of different sizes submitted to NMDS. These
matrices were obtained by using bootstrap resampling method of Efron and Tibshirani (1993). The
factors considered in this simulation design are the nature of data, the sample size, the similarity
indexes and the type of standardization of the data.
Four values of the sample size (25, 50, 75 and 100 plots) were considered by truncating (or not) the
original data set. Moreover, two types of data matrix were considered: binary and abundance data
matrices. The binary matrices were drawn from the abundance data matrix by replacing all non-zero
values with 1. For binary data, the Sokal and Michener (1958), Sorensen (1948) and Jaccard (1912)
similarity indexes were computed for two given plots (i and h) using the following formulas:
cb2a
2a
similaritySorensen
International Journal of Applied Mathematics and Statistics
56
cba
a
similarityJaccard
dcba
da
Michener&Sokal
In the above formulas, a = number of shared or common species between plots i and h; b = number
of species which are exclusive to plot i; c = number of species which are exclusive to plot h; d =
number of species absent both in plot i and h.
Dissimilarities were derived from similarities using the formula (Gower and Legendre, 1986):
)s -(1d
ihih
where d
ih
= dissimilarity between plots i and h and s
ih
= similarity between the two plots.
For abundance data, two dissimilarity indexes were considered. These are Sorensen dissimilarity and
Jaccard dissimilarity indexes. Sorensen distance also known as Bray-Curtis coefficient is computed
by dividing the shared abundance by the total abundance. For two plots i and h, D
i,h
was computed as
follows (Bray and Curtis, 1957):
p
1j
p
1j
hjij
p
1j
hjij
hi,
aa
a-a
D
where a
ij
and a
hj
are the number of species j in plots i and h respectively; p = total number of species.
Jaccard dissimilarity is the proportion of the combined abundance that is not shared (Jaccard, 1901):
-
p
1j
p
1j
p
1j
hjijhjij
p
1j
hjij
hi,
aaaa
aa2
JD
where a
ij
and a
hj
are defined as in D
i,h
.
Four techniques of standardization have been used; the first one includes the species adjustment to
equal maximum abundances (SPM):
maxj
ij
ij
a
a
b
b
ij
being the corresponding standardized value of a
ij
; a
ij
=abundance of species j in the plot i ;
a
maxj
=maximum abundance of species j in the matrix.
The second technique considered is the samples standardization to equal totals (SAT):
International Journal of Applied Mathematics and Statistics
57
p
1j
ij
ij
ij
a
a
b
b
ij
and a
ij
are the same as above; p = total number of species in the matrix.
The two last techniques were the Bray-Curtis successive double standardizations i.e. SPM followed
by SAT (DBL) and inverse Bray-Curtis successive double standardizations i.e. SAT followed by SPM.
2.3. Data analysis
For abundance data, each of the 4 sample sizes was combined with the 5 different types of
standardization (non standardization was also considered) and each of the two dissimilarities indexes.
Forty combinations (4×5×2) were therefore examined. Because standardization did not concern
binary data, each sample size was only combined with each of the 3 similarity indexes i.e.
examination of twelve (4×3) combinations for binary matrices. In total, fifty two combinations of
sample size, dissimilarity indexes, type of data standardization and type of data matrix were analyzed.
500 replications for each combination were generated using the bootstrap technique.
The basic assumption of NMDS is that for a good ordination, there should be a rank-order relationship
between inter-sample dissimilarity and inter-sample distance in the ordination space (Fasham, 1977;
McCune and Grace, 2002). This implies that the more similar two samples are, the closer they should
be in the ordination space. Based on this assumption, the spearman rank correlation (Rs) was used
as criterion of efficiency. The Spearman rank correlation measures the monotonic relationship
between dissimilarities and ecological distances (computed as the Euclidean distance between
samples in the ecological space).
1)(mm
d6
-1Rs
2
m
1i
i
Where d
i
= difference between ranks from dissimilarities and ranks from Euclidean distances for
observation i (i = 1 to m; m = total number of couples of plots for which similarity indexes were
computed). Rs always ranges from -1 to 1 but according to the basic assumption of NMDS, it ranges
from 0 to 1 in this study.
In addition to the Spearman rank correlation, s-stress value was used as criterion of efficiency. Stress
is the departure from monotonicity in the plot of distance in the original p-dimensional space
(dissimilarity) versus distance in the ordination space (k-dimensional space). The closer the points lie
to a monotonic line, the better the fit and the lower the stress (Kruskal and Carroll, 1969). S-stress is
the squared stress, normalized with the sum of 4th powers of the interpoint distances (Takane and
Young, 1977):
International Journal of Applied Mathematics and Statistics
58
~
~
n
1ij
1n
1i
n
1ij
2
4
ij
2
ij
2
ij
stress-s
d
)d(d
where d
ij
is the interpoint distances in the reduced k-dimensions space and
ij
d
~
is the adjusted
distance which satisfies the monotonicity constraint.
For each combination of factors considered, the spearman rank correlation and s-stress values were
computed using a group of codes written in MATLAB software (V. R2006a).
Boxplots of the spearman rank correlation and s-stress-values for all combinations of dissimilarity
indexes (or similarity indexes in the case of presence-absence data) and types of standardization
were generated. A visual analysis of the boxplots was done to select the best similarity index for
binary data and the best combination of dissimilarity index and standardization for abundance data.
This selection was done with respect to the highest values of the Spearman rank correlation and the
lowest values of the s-stress. An analysis of variance was performed in SAS 9.2 software to test the
effect of sample size on efficiency criteria (Spearman rank correlation and s-stress) with regard to the
best combinations of factors. When the effect of the sample size was significant, contrast analysis
(Everitt, 2002) was performed to model the relationship between sample size and stress values and
determine the optimal sample size. In this study, two models were tested:
Linear:
Quadratic:
In the above formula, Criterion= Spearman rank correlation or s-stress; Size = sample size.
0
indicates the intercept,
1
and
2
, the partial regression slopes and the unexplained error associated
with the model.
3. RESULTS
3.1. Efficiency of NDMS according to the factors considered
Case of abundance data
Results showed that, irrespective to sample size and type of standardizations, the two dissimilarity
measures (Sorensen and Jaccard) performed equally (Figure 1). The dispersion and median values
of the Spearman rank correlation became lower with the increase in the sample size and seemed to
stabilize from 75 plots.
SizeCriterion
10
SizeSizeCriterion
2
210
International Journal of Applied Mathematics and Statistics
59
Figure 1. Box plots of Spearman rank correlation for each combination of similarity index and type of
standardization for each sample size: case of abundance data.
Legend: On the x-axis, first letters of variables are initials of the dissimilarity index (S=Sorensen; J=Jaccard); the
following are relative to the types of standardization (0 = No standardization; SPM=Species adjustment to equal
maximum abundances; SAT=standardization to equal totals; DBL1=SPM followed by SAT; DBL2=SAT followed
by SPM). Example: SSPM correspond to combination of Sorensen index and SPM.
Unlike the dissimilarity index and the sample size, the Spearman rank correlation varied greatly
among types of standardization. For most of the combinations of factors considered, the
standardization to equal totals (SAT) yielded higher spearman rank correlation values (from 0.939)
than those produced by the others. It was followed respectively by no standardization (0), the inverse
of the Bray-Curtis double standardization (DBL2), the Bray-Curtis double standardization (DBL1) and
Species adjustment to equal maximum abundances (SPM) standardization (0.893). Table 1 revealed
a relatively high variability for species totals and relatively low ones for the rows (plots) totals, but all
of them increased with sample size. Moreover, species richness and the Shannon diversity index
increased with sample size in opposition to Pielou evenness.
The same trend is noted for s-stress values (Figure 2): regardless to the sample size, the lower
values of the s-stress were obtained for SAT. This standardization therefore performed better than the
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Spearman rank correlation
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Spearman rank correlation
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Spearman rank correlation
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Spearman rank correlation
Sample size : 25 plots
Sample size : 50 plots
Sample size : 75 plots
Sample size : 100 plots
International Journal of Applied Mathematics and Statistics
60
others. Figures 1 and 2 clearly showed that the s-stress value decreased when the Spearman rank
correlation value increased and ranged from 0.120 to 0.241. A closer examination of this figure
denotes an increase in s-stress value with sample size. From these descriptive analyses, we deduced
that SAT was the best standardization and the two dissimilarity indexes (Sorensen and Jaccard) were
not distinguishable for both Spearman rank correlation and s-stress.
Figure 2. Boxplots of s-stress-values for each combination of similarity index, type of standardization
and sample size: case of abundance data.
Legend: On the x-axis, first letters of variables are initials of the dissimilarity index (S=Sorensen; J=Jaccard); the
following are relative to the types of standardization (0 = No standardization; SPM=Species adjustment to equal
maximum abundances; SAT=standardization to equal totals; DBL1=SPM followed by SAT; DBL2=SAT followed
by SPM). Example: SSPM correspond to combination of Sorensen index and SPM.
Results from analysis of variance showed significant difference only for s-stress, with regard to each
of the two dissimilarity indexes, indicating the significant effect of sample size on s-stress values. The
quadratic model of the relationship between sample size and s-stress values was the most significant
and then retained to determine the optimum sample size (Figure 3): 90 plots with a s-stress value of
0.167.
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0
0.05
0.1
0.15
0.2
0.25
0.3
Stress-values
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0
0.05
0.1
0.15
0.2
0.25
0.3
Stress-values
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0
0.05
0.1
0.15
0.2
0.25
0.3
Stress-values
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0
0.05
0.1
0.15
0.2
0.25
0.3
Stress-values
Sample size : 25 plots
Sample size : 50 plots
Sample size : 75 plots
Sample size : 100 plots
International Journal of Applied Mathematics and Statistics
61
Table 1. Mean (m) and coefficient of variation (%) of diversity parameters, coefficient of variation (cv,
%) of rows and columns totals of data matrices for each sample size.
25 50 75 100
m cv m cv m cv m cv
Cv_rows (%) 44.6 20.1 44.6 14.3 44.9 11.7 45.1 10.1
Cv_columns (%) 141.8 12.4 160.9 9.8 170.3 8.2 176.3 7.3
S 18.7 17.1 22.8 13.6 25.1 11.7 26.6 10.3
H 3.1 4.9 3.2 3.4 3.2 2.7 3.2 2.5
Eq 0.7 4.5 0.7 3.8 0.7 3.3 0.7 3.1
S=Species richness; H= Shannon diversity index; Eq= Pielou evenness.
Figure 3. Relationship between s-stress-values and sample size for Jaccard and Sorensen
dissimilarity indexes.
Case of binary data
Results from binary matrices show a decrease in Spearman rank correlation values, from 0.921 (for
the Sorensen similarity index) to 0.907 (for the Jaccard index) when the sample size increased
(Figure 4). As with abundance data, the dispersion around the median value also decreased when
sample size increased. Furthermore, for a given sample size, the three similarity indexes examined
yielded approximately the same value. But, s-stress values increase with sample size. The s-stress
values ranged from 0.10 (Sokal and Michener similarity index) to 0.17 (Sorensen or Jaccard index).
Sokal and Michener similarity index yielded then the lowest values of s-stress, whatever is the sample
size and therefore was the best and retained for further investigations.
Sstress = -1E-05n
2
+ 0.001n + 0.081
R² = 0.99
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0 20 40 60 80 100 120 140 160
Sstress
Sample size (n)
Jaccard index
Sstress= -1E-05n
2
+ 0.001n + 0.080
R² = 0.99
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0 20 40 60 80 100 120 140 160
Sstress
Sample size (n)
Sorensen index
International Journal of Applied Mathematics and Statistics
62
Figure 4: Boxplots of Spearman rank correlation and s-stress-values for each combination of similarity
index and sample size: case of binary data.
Legend: On the x-axis, first letters of variables are initials of the similarity index (S=Sorensen; J=Jaccard;
SM=Sokal and Michener) and the following are linked to the sample size. Example: SM25 correspond to
combination of Sokal & Michener similarity index and sample size of 25 plots.
Analysis of variance performed to test the effect of sample size indicated a significant difference only
for s-stress values. The linear and the quadratic models were also shown to be highly significant. The
relationship between s-stress values and sample size for the quadratic model was thus plotted (Figure
5) and indicated an optimum of 75 plots with a s-stress value of 0.120.
Figure 5. Relationship between the s-stress-values and sample size for Sokal and Michener similarity
index.
4. DISCUSSION AND CONCLUSION
This study is a complement to previous investigations on designing accurate and strong way for
vegetation data analysis. In consistence with Fasham (1977), Kenkel and Orloci (1986), Faith et al.
(1987) and McCune and Grace (2002), results obtained showed that type of standardization greatly
SM100 SM75 SM50 SM25 S100 S75 S50 S25 J100 J75 J50 J25
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Spearman rank correlation
(a) Spearman rank correlation
(b) Stress values
SM100 SM75 SM50 SM25 S100 S75 S50 S25 J100 J75 J50 J25
0
0.05
0.1
0.15
0.2
0.25
0.3
Stress-values
Sstress= -1E-05n
2
+ 0.001n + 0.067
R² = 0.99
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 20406080100120140160
Sstress
Sample size (n)
International Journal of Applied Mathematics and Statistics
63
affects NMDS efficiency. Some of them improve ordinations in contrary to others. The standardization
to sample totals (SAT) was revealed to be the most outperformed. It then appears that NMDS perform
better when plots have similar weight (number of trees or species). Species adjustment to equal
maximum abundance (SPM) standardization often results in poor ordination. When applied alone, it
was the least successful standardization and hence least recommended. But when used in
combination with SAT, the SPM is however preferable to be used before to being used after SAT.
These results somewhat contrast those of Faith et al. (1987) who found the SPM standardization to
perform better than SAT for some dissimilarity indexes as Canberra metric and Chi-squared. This may
suggest that standardization effect varies according to the index under study since many dissimilarity
coefficients have in-built standardization (Fasham, 1977; Faith et al., 1987).
The choice of dissimilarity/similarity indexes is also of particular interest when ordinating or clustering.
Podani (2006) stated that the most critical step in selecting the appropriate method of ordination is the
choice of dissimilarity coefficient which must be compatible with available data. Results of this study
clearly showed that using quantitative version of either Jaccard or Bray-Curtis dissimilarity coefficient,
the NMDS yielded the same result indicating that despite their mathematical difference (in reference
to their formulas) both were similar. It should be recalled that the calculation of stress depends only
on the rank order of the similarities and not on their magnitude. Thus, two different similarity
coefficients could produce the same MDS ordination if they had the same rank order over all the
sample pairs (Podani, 2005). A next investigation in this study has consisted in computing Spearman
rank and Pearson linear correlation between the two coefficients based on a part of the initial matrix
M. The values obtained (respectively 1 and 0.997) emphasized that the two indexes produce the
same rank order over all the sample pairs. However, dissimilarities were very different. The same
observations can be drawn for binary matrices. Here, however, Sokal & Michener similarity index
showed the best result. In fact, in addition to the co-presence which is common to all of them, this
index takes into account the co-absence of species when computing similarity of a couple of plots
(Palm, 2003). Sokal and Michener similarity index can therefore be recommended for use in NMDS
as long as the data matrix is binary. We then conclude that similarity indexes do affect NMDS
efficiency. However, since only three indexes were examined in this study, it is possible for others
dissimilarity indexes to perform better than Sokal and Michener. Choi (2008) and Podani (2001) have
actually described respectively 76 and 16 indexes for binary data and 17 indexes for ratio scale data.
The stress-value increased with the number of plots and is consistent with the increase of objects
being scaled since stress can be viewed as a variance (McCune and Grace, 2002). Indeed, the
increase in sample size means an increase in objects to scale and therefore an increase in the stress-
value. However, the lower coefficient of variation obtained with the increase in sample size suggests
the higher the sample size the more accurate the scaling.
Binary matrices were shown to be more efficient than abundance matrices from which they were
derived. Presence-absence similarity measure may be better than a quantitative measure in samples
of high alpha and beta diversity in the presence of high levels of sampling noise (Kessel and
Whittaker, 1976). Binary matrices yielded high values of Spearman rank correlation and the low stress
International Journal of Applied Mathematics and Statistics
64
values, probably because of the small differences between pairs of objects with this type of matrix in
comparison to abundance matrices. In fact, two plots containing the same species will be viewed to
be much closed with presence-absence similarity indexes. But a slight difference in species
abundance can result in a great distinction with quantitative (ratio scale) dissimilarity coefficient. It is
also useful to emphasize the fact that NMDS was originally developed to allow for the analysis of
matrices resulting from experiments in which subjects are asked to make pairwise judgments of
similarity or preference (Schiffman et al., 1981). The increase in the species richness S, clearly
denote the fact that this parameter is closed to the sample extent as mentioned by McCune and
Grace (2002) and Stohlgren (2007).
The methodology implemented in this study used the 2-dimensions spaces for the initial configuration.
Normally the number of dimensions should be determined for each data matrix before choosing the
dimension of the initial configuration (Kruskal, 1964a and b). In fact, the determination of the number
of dimensions to use in the ordination space is an important issue in NMDS. With simulated data, this
is known a priori but for real data, a better method could be to use the dissimilarity matrix to calculate
the linkages of the minimum spanning tree (Gower and Ross, 1969). However, the first few
dimensions are sufficient to explain most of the variation (Podani, 2001). Besides this aspect,
Shepard (1962a) has strongly argued for solutions in two dimensions as this is more readily
interpretable.
5. REFERENCES
Bonou, W., Glèlè Kakaï, R., Assogbadjo, A.E., Fonton, H.N., Sinsin, B., 2009, Characterization of
Afzelia africana Sm. habitat in the Lama Forest reserve of Benin. Forest ecology and management
258, 1084–1092.
Braun-Blanquet, J., 1964, Pflanzensoziologie: Grundzüge der vegetationskunde, 3ed. Springer-
Verlag. Wien, 865 p.
Bray, J.R., Curtis, J.T., 1957, An ordination of the upland forest communities in southern Wisconsin.
Ecological Monographs 27, 325-349.
Chessel, D., Thioulouse, J., Dufour, A. B., 2004, Introduction à la classification hiérarchique. Fiche de
Biostatistique – Stage 7. http://pbil.univ-lyon1.fr/R/stage/stage7.pdf.
Choi, S.S., 2008, Correlation Analysis of Binary Similarity Measures and Dissimilarity Measures,
Doctorate dissertation, Pace University.
Clarke, K.R., 1993, Non-parametric multivariate analyses of changes in community structure.
Australian Journal of Ecology 18, 117-143.
Condit, R., Hubbell, S.P., Lafrankie, J.V., Sukumar, R., Manokaran, R., Foster, R.B., Ashton, P.S.,
1996, Species-area and species-individual relationships for tropical trees: a comparison of three
50-ha plots. Journal of Ecology 84, 549-562.
Digby, P.G.N., Kempton, R.A., 1987, Multivariate analysis of ecological communities. Chapman and
Hall, London, UK.
Eckblad, J.W., 1991, How many samples should be taken? Bioscience 41, 346-348.
Efron, B., Tibshirani, R.J., 1993, An introduction to the bootstrap. Chapman and Hall, New York, New
York, USA.
International Journal of Applied Mathematics and Statistics
65
Everitt, B.S., 2002, Cambridge Dictionary of Statistics, CUP, ISBN 0-521-81099-x
Faith, D.P., Minchin, P.R., Belbin, L., 1987, Compositional dissimilarity as a robust measure of
ecological distance. Vegetatio, 69: 57-68.
Fasham, M.J.R., 1977, A comparison of non-metric multidimensional scaling, principal components
analysis and reciprocal averaging for the ordination of simulated coenoclines and coenoplanes.
Ecology 58, 551-561.
Gauch, H.G., Whittaker R.H., Singer, S.B., 1981, A comparative study of non-metric ordinations.
Journal of Ecology 69, 135-152.
Gordon, A.D., 1999, Classification. 2nd. ed. Chapman and Hall, London, UK.
Gower, J.C., Ross G.J.S., 1969, Minimum spanning trees and single linkage cluster analysis. Applied
Statistics. 18, 54-64.
Gower, J.C., Legendre P., 1986, Metric and Euclidean properties of Dissimilarity Coefficients. Journal
of Classification 3, 5-48.
Jaccard, P., 1901, Etude comparative de la distribution florale dans une portion des Alpes et des
Jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37, 547-579.
Jaccard, P., 1912, The distribution of the flora in the alpine zone. New Phytologist 11, 35-50.
Kenkel, N.C., Orlóci, L., 1986, Applying Metric and Nonmetric Multidimensional Scaling to Ecological
Studies: Some New Results. Ecology, 67(4), 919-928.
Kent, M., Ballard, J., 1988, Trends and problems in the application of classification and
ordination methods in plant ecology. Vegetatio, 78, 109-124.
Kruskal, J.B., 1964a, Multidimensional scaling by optimising goodness of fit to a nonmetric
hypothesis. Psychometrika, 29, 1-27.
Kruskal, J.B., 1964b, Nonmetric multidimensional scaling: a numerical method. Psychometrika, 29,
115-129.
Kruskal, J.B., Carroll, J.D., 1969, Geometrical models and badness of fit functions. Pp. 639-671 in
Krishnaiah, P.K., ed., Multivariate Analysis II. Proceedings of the 2
nd
International Symposium on
Multivariate. Wright State University, Dayton, Ohio, June 17-22. 1968. Academic Press, New York.
Legendre, P., Legendre, L., 1998, Numerical ecology. 2nd.ed. Elsevier, Amsterdam, NL.
McCune, B., Grace, B.J., 2002, Analysis of Ecological Communities. Oregon, USA. 300p.
McCune, B., Lesica, P., 1992, The trade-off between species capture and quantitative accuracy in
ecological inventory of lichens and bryophytes in forests in Montana. Bryologist, 95, 296-304.
Minchin, P.R., 1987, An evaluation of the relative robustness of techniques for ecological ordination.
Vegetatio, 69, 89-107.
Økland, R.H., 1990, Vegetation ecology: theory, methods and applications with reference to
Fennoscandia. Sommerifeltia Suppl 1, 1-233.
Økland, R.H., 1996, Are Ordination and Constrained Ordination Alternative or Complementary
Strategies in General Ecological Studies? Journal of Vegetation Science, 7(2), 289-292.
Palm, R., 2003, Notes de statistique et d'informatique. Le Positionnement multidimensionnel:
Principes et application. 33 p.
Podani, J., 2000, Introduction to the exploration of multi-variate biological data. Backhuys, Leiden, NL.
International Journal of Applied Mathematics and Statistics
66
Podani, J., 2001, SYN-TAX 2000. Computer Programs for Data Analysis in Ecology and Systematic.
User’s Manual. Budapest, Hungary, 53p.
Podani, J., 2005, Multivariate exploratory analysis of ordinal data in ecology: Pitfalls, problems and
solutions. Journal of Vegetation Science, 16, 497-510.
Podani, J., 2006, Braun-Blanquet’s legacy and data analysis in vegetation science. Journal of
Vegetation Science 17(1), 113-117.
Prentice, I.C., 1977, Non-metric ordination methods in ecology. Journal of Ecology, 65, 85-94.
Schiffman, S.S., Reynolds, M.L., Young, F.W., 1981, Introduction to multidimensional scaling-theory,
methods, and applications. Academic Press, New York, New York, USA.
Shepard, R.N., 1962a, The analysis of proximities. Multidimensional scaling with an unknown
distance function. I. Psychometrika, 27, 125-140.
Sibson, R., 1972, Order invariant methods for data analysis. Journal of the Royal Statistical Society
(London), Series B 34, 311-349.
Sokal, R.R., Michener, C.D., 1958, A statistical method for evaluating systematic relationships.
University of Kansas Science Bulletin, 38, 1409-1438.
Sorensen, T.A., 1948, A method of establishing groups of equal amplitude in plant sociology based on
similarity of species content, and its application to analyses of the vegetation on Danish commons.
Biologiske Skrifter Kongelige Danske Videnskabernes Selskab, 5, 1-34.
Stohlgren, T.J., 2007, Measuring plant diversity. Lessons from the field. Oxford University Press.
New-York. 408p.
Takane, Y., Young, F.W., 1977, Non-metric individual differences multidimensional scaling: an
alternating least squares method with optimal scaling features. Psychometrika, 42(1), 7-67.
Van Laar, A., Akça, A., 2007, Forest mensuration. Springer, Dordrecht, 383 p.
International Journal of Applied Mathematics and Statistics
67