ArticlePDF Available

On the empirical performance of non-metric multidimensional scaling in vegetation studies

Authors:

Abstract and Figures

Non-metric multidimensional scaling (NMDS) is widely used as a routine method for ordination in vegetation studies. Its use in statistical software often requires the choice of several options on which the accuracy of results will depend. This study focuses on the combined effect of sample size, similarity/dissimilarity indexes, data standardization and structure of data matrix (abundance and binary) on NMDS efficiency based on real data from the Lama Forest Reserve in Southern-Bénin. The Spearman’s Rank Correlation coefficient was used as an assessment criterion. All the four factors were found to influence the efficiency of the NMDS and the samples (plots) standardization to equal totals gave the best results among standardization procedures considered., . The Jaccard and Sorensen similarity/dissimilarity indexes performed equally whatever the nature of the matrix. However, with binary matrices, Sokal and Michener similarity index performed better. A quadratic relationship was noted between s-stress and sample size. A lower optimal sample size (75 plots) was observed for the binary matrices than for the abundance ones (90 plots).
Content may be subject to copyright.
On The Empirical Performance Of Non-Metric
Multidimensional Scaling In Vegetation Studies
V. K. Salako
1
, A. Adebanji
2
and R. Glèlè Kakaï
1
1
Faculty of Agronomic sciences, University of Abomey-Calavi,
01 BP 526, Cotonou, Bénin
Email: gleleromain@yahoo.fr
2
Department of Mathematics, Kwame Nkrumah
University of Science and Technology, Kumasi, Ghana
Email: tinuadebanji@gmail.com
ABSTRACT
Non-metric multidimensional scaling (NMDS) is widely used as a routine method for
ordination in vegetation studies. Its use in statistical softwares often requires the choice of
several options on which the accuracy of results will depend. This study focuses on the
combined effect of sample size, similarity/dissimilarity indexes, data standardization and
structure of data matrix (abundance and binary) on NMDS efficiency based on real data
from the Lama Forest Reserve in Southern-Bénin. The Spearman’s Rank Correlation
coefficient and the s-stress were used as an assessment criterion. All the four factors were
found to influence the efficiency of the NMDS and the samples (plots) standardization to
equal totals gave the best results among standardization procedures considered. The
Jaccard and Sorensen similarity/dissimilarity indexes performed equally whatever the nature
of the matrix. However, with binary matrices, Sokal and Michener similarity index performed
better. A quadratic relationship was noted between s-stress and sample size. A lower
optimal sample size (75 plots) was observed for the binary matrices than for the abundance
ones (90 plots).
Keywords: Non-metric multidimensional scaling, efficiency, vegetation studies.
Mathematics Subject Classification: 91C15, 68U20
1. INTRODUCTION
In vegetation studies, ordination aims at sorting samples and/or species along a few axes which must
represent the main compositional gradients in the data set, using either abundance or
presence/absence data (Økland, 1996). It seeks a parsimonious representation of samples and/or
species in a space of low dimensionality. Parsimony in this context implies that distances between
samples and/or species in ordination space optimally represent their original dissimilarities (between
samples or species) in variable space, in some defined sense (Kenkel and Orloçi, 1986). Common
goals of ordination methods in vegetation studies are: description and recognition of vegetation
distribution patterns, identification of plant’s communities and examination of plants and communities
distribution in relation to environmental factors and gradients (Kenkel and Orloçi, 1986; Kent and
Ballard, 1988; Podani, 2006).
International Journal of Applied Mathematics and Statistics,
Int. J. Appl. Math. Stat.; Vol. 36; Issue No. 6; Year 2013,
ISSN 0973-1377 (Print), ISSN 0973-7545 (Online)
Copyright © 2013 by CESER Publications
www.ceser.in/ijamas.html
www.ceserp.com/cp-jour
www.ceserpublications.com
The evolution of successive techniques, each considered, at least by its originators, to be an
improvement on previous works has resulted in complex and diverse methods. Thus, instead of
helping users, this has often caused confusions and difficulties for students and even more
experienced workers namely in terms of choice and evaluation of the methods (Kent and Ballard,
1988). Several papers stressed however that all ordination methods in current use are burdened with
defects such as the potential appearance of artifact or distorted axes (Kenkel and Orloçi, 1986;
Minchin, 1987; Økland, 1990) and none of them is robust under all circumstances (Kenkel and Orloçi,
1986).
Nonetheless, non-metric multidimensional scaling (NMS or NMDS) is currently undoubtedly the most
widely accepted and routinely used ordination technique which utilizes ordinal information (Podani,
2005). Previous, studies have discussed its advantages and disadvantages, sometimes with
contrasting conclusions (Kenkel and Orlóci, 1986; Gauch et al., 1981; Gordon 1999; Digby and
Kempton 1987; Clarke, 1993; Legendre and Legendre, 1998; Podani, 2000; 2005 & 2006).
Nevertheless, most authors agree that NMDS and its variants (e.g. local NMDS, Sibson, 1972;
Prentice, 1977; hybrid MDS: HMDS, Faith et al., 1987) represent a good alternative to the metric
procedures such as principal components analysis and correspondence analysis.
Application of NMDS starts with the computation of the matrix of dissimilarities/similarities among a
set of items in a multidimensional space which is the first step of the most multivariate analyses
(McCune and Grace, 2002). This first step pivotal to the performance of the NMDS and information
not captured at this stage cannot be expressed in the results. Several similarity measure indexes can
be considered and available in literature (Podani, 2001; McCune and Grace, 2002; Chessel et al.,
2004; Choi, 2008). McCune and Grace (2002) recommend using the quantitative version of the
Sørensen coefficient with reference to the principle of the method, while Palm (2003) considered the
similarity index of Sokal and Michener which take into account the co-presence and co-absence of
items. In addition, similarity indexes lose their sensitivity at large environmental distances. However,
some studies have shown that Sørensen and Jaccard distances are less affected by this (McCune
and Grace, 2002). As those indexes express the similarity in different ways, it is possible for them to
result in different ordinations. Standardization also affects results from metric and non-metric scaling
(Kenkel and Orloçi, 1986; Faith et al., 1987). Standardization is a kind of data transformation which
ecologically aims to make distance measures work better, reduce the effect of total quantity (sample
unit totals) to put focus on relative quantities, equalize (or otherwise alter) the relative importance of
common and rare species or emphasize informative species at the expense of uninformative species
(McCune and Grace, 2002). Standardization should therefore be of importance in ordination, mainly
when calculating dissimilarities or similarities (Faith et al., 1987).
As mentioned above, NMDS can be applied either on binary (presence-absence) or abundance data
(Økland, 1996; McCune and Grace, 2002). We hypothesize that the provision of more information by
abundance data is expected to result in more reliable ordination compared to presence-absence data.
It is then useful to investigate how this occurs with NMDS. Furthermore, sample size is a key part of
sampling design. Findings of several studies reported that an increase in sample size invariably
International Journal of Applied Mathematics and Statistics
55
results in improved estimation efficiency (Braun-Blanquet, 1964; Eckblad, 1991; McCune and Lesica,
1992; Condit et al., 1996). The effect of sample size on the performance of NMDS constitutes an
important issue in vegetation studies.
The purpose of this study was to analyze the relative performance of NMDS in vegetation studies
focusing on the effect of sample size, types of data (binary and abundance data) and similarity (or
dissimilarity) measures. The study hypotheses considered are as follows : (i) the increase in the
sample size improves efficiency of NMDS ; (ii) for a given sample size and type of data, similarity
indexes do not affect NMDS efficiency, (iii) abundance matrices result in more accurate ordination
and (iv) data standardization improves efficiency of the NMDS.
2. METHODOLOGY
2.1. Data collection
Data used in this study are obtained from Bonou et al. (2009) and are linked to the identification of
plant vegetation communities in the Lama Forest reserve (LFR). Data were based on a matrix of
Presence–absence of 31 species recorded in 100 plots of 0.15 ha. The LFR under protection since
1946 is located in Southern Benin in the Dahomey Gap, between 6°55' and 7°00' latitude North and
2°04' and 2°12' longitude East. It covers approximately 16,250 hectares. The original vegetation was
a dense semi-deciduous forest established on 4,777 ha composed of 1,900 ha of dense forest, the
remaining area being constituted of fallows (Bonou et al., 2009). This matrix (let M) was subjected to
non-metric multidimensional scaling which resulted into four plant communities: young fallow, old
fallow, land typical dense forest and degraded dense forest (Bonou et al., 2009). The abundance data
of species were also considered.
2.2. Simulation design
We considered the case of abundance data matrices of different sizes submitted to NMDS. These
matrices were obtained by using bootstrap resampling method of Efron and Tibshirani (1993). The
factors considered in this simulation design are the nature of data, the sample size, the similarity
indexes and the type of standardization of the data.
Four values of the sample size (25, 50, 75 and 100 plots) were considered by truncating (or not) the
original data set. Moreover, two types of data matrix were considered: binary and abundance data
matrices. The binary matrices were drawn from the abundance data matrix by replacing all non-zero
values with 1. For binary data, the Sokal and Michener (1958), Sorensen (1948) and Jaccard (1912)
similarity indexes were computed for two given plots (i and h) using the following formulas:
cb2a
2a
similaritySorensen
International Journal of Applied Mathematics and Statistics
56
cba
a
similarityJaccard
dcba
da
Michener&Sokal
In the above formulas, a = number of shared or common species between plots i and h; b = number
of species which are exclusive to plot i; c = number of species which are exclusive to plot h; d =
number of species absent both in plot i and h.
Dissimilarities were derived from similarities using the formula (Gower and Legendre, 1986):
)s -(1d
ihih
where d
ih
= dissimilarity between plots i and h and s
ih
= similarity between the two plots.
For abundance data, two dissimilarity indexes were considered. These are Sorensen dissimilarity and
Jaccard dissimilarity indexes. Sorensen distance also known as Bray-Curtis coefficient is computed
by dividing the shared abundance by the total abundance. For two plots i and h, D
i,h
was computed as
follows (Bray and Curtis, 1957):
෤෤
p
1j
p
1j
hjij
p
1j
hjij
hi,
aa
a-a
D
where a
ij
and a
hj
are the number of species j in plots i and h respectively; p = total number of species.
Jaccard dissimilarity is the proportion of the combined abundance that is not shared (Jaccard, 1901):
෤෤
-
p
1j
p
1j
p
1j
hjijhjij
p
1j
hjij
hi,
aaaa
aa2
JD
where a
ij
and a
hj
are defined as in D
i,h
.
Four techniques of standardization have been used; the first one includes the species adjustment to
equal maximum abundances (SPM):
maxj
ij
ij
a
a
b
b
ij
being the corresponding standardized value of a
ij
; a
ij
=abundance of species j in the plot i ;
a
maxj
=maximum abundance of species j in the matrix.
The second technique considered is the samples standardization to equal totals (SAT):
International Journal of Applied Mathematics and Statistics
57
p
1j
ij
ij
ij
a
a
b
b
ij
and a
ij
are the same as above; p = total number of species in the matrix.
The two last techniques were the Bray-Curtis successive double standardizations i.e. SPM followed
by SAT (DBL) and inverse Bray-Curtis successive double standardizations i.e. SAT followed by SPM.
2.3. Data analysis
For abundance data, each of the 4 sample sizes was combined with the 5 different types of
standardization (non standardization was also considered) and each of the two dissimilarities indexes.
Forty combinations (4×5×2) were therefore examined. Because standardization did not concern
binary data, each sample size was only combined with each of the 3 similarity indexes i.e.
examination of twelve (4×3) combinations for binary matrices. In total, fifty two combinations of
sample size, dissimilarity indexes, type of data standardization and type of data matrix were analyzed.
500 replications for each combination were generated using the bootstrap technique.
The basic assumption of NMDS is that for a good ordination, there should be a rank-order relationship
between inter-sample dissimilarity and inter-sample distance in the ordination space (Fasham, 1977;
McCune and Grace, 2002). This implies that the more similar two samples are, the closer they should
be in the ordination space. Based on this assumption, the spearman rank correlation (Rs) was used
as criterion of efficiency. The Spearman rank correlation measures the monotonic relationship
between dissimilarities and ecological distances (computed as the Euclidean distance between
samples in the ecological space).
1)(mm
d6
-1Rs
2
m
1i
i
Where d
i
= difference between ranks from dissimilarities and ranks from Euclidean distances for
observation i (i = 1 to m; m = total number of couples of plots for which similarity indexes were
computed). Rs always ranges from -1 to 1 but according to the basic assumption of NMDS, it ranges
from 0 to 1 in this study.
In addition to the Spearman rank correlation, s-stress value was used as criterion of efficiency. Stress
is the departure from monotonicity in the plot of distance in the original p-dimensional space
(dissimilarity) versus distance in the ordination space (k-dimensional space). The closer the points lie
to a monotonic line, the better the fit and the lower the stress (Kruskal and Carroll, 1969). S-stress is
the squared stress, normalized with the sum of 4th powers of the interpoint distances (Takane and
Young, 1977):
International Journal of Applied Mathematics and Statistics
58
~
෤෤
~
n
1ij
1n
1i
n
1ij
2
4
ij
2
ij
2
ij
stress-s
d
)d(d
where d
ij
is the interpoint distances in the reduced k-dimensions space and
ij
d
~
is the adjusted
distance which satisfies the monotonicity constraint.
For each combination of factors considered, the spearman rank correlation and s-stress values were
computed using a group of codes written in MATLAB software (V. R2006a).
Boxplots of the spearman rank correlation and s-stress-values for all combinations of dissimilarity
indexes (or similarity indexes in the case of presence-absence data) and types of standardization
were generated. A visual analysis of the boxplots was done to select the best similarity index for
binary data and the best combination of dissimilarity index and standardization for abundance data.
This selection was done with respect to the highest values of the Spearman rank correlation and the
lowest values of the s-stress. An analysis of variance was performed in SAS 9.2 software to test the
effect of sample size on efficiency criteria (Spearman rank correlation and s-stress) with regard to the
best combinations of factors. When the effect of the sample size was significant, contrast analysis
(Everitt, 2002) was performed to model the relationship between sample size and stress values and
determine the optimal sample size. In this study, two models were tested:
Linear:
Quadratic:
In the above formula, Criterion= Spearman rank correlation or s-stress; Size = sample size. 
0
indicates the intercept, 
1
and 
2
, the partial regression slopes and  the unexplained error associated
with the model.
3. RESULTS
3.1. Efficiency of NDMS according to the factors considered
Case of abundance data
Results showed that, irrespective to sample size and type of standardizations, the two dissimilarity
measures (Sorensen and Jaccard) performed equally (Figure 1). The dispersion and median values
of the Spearman rank correlation became lower with the increase in the sample size and seemed to
stabilize from 75 plots.
SizeCriterion
10
SizeSizeCriterion
2
210
International Journal of Applied Mathematics and Statistics
59
Figure 1. Box plots of Spearman rank correlation for each combination of similarity index and type of
standardization for each sample size: case of abundance data.
Legend: On the x-axis, first letters of variables are initials of the dissimilarity index (S=Sorensen; J=Jaccard); the
following are relative to the types of standardization (0 = No standardization; SPM=Species adjustment to equal
maximum abundances; SAT=standardization to equal totals; DBL1=SPM followed by SAT; DBL2=SAT followed
by SPM). Example: SSPM correspond to combination of Sorensen index and SPM.
Unlike the dissimilarity index and the sample size, the Spearman rank correlation varied greatly
among types of standardization. For most of the combinations of factors considered, the
standardization to equal totals (SAT) yielded higher spearman rank correlation values (from 0.939)
than those produced by the others. It was followed respectively by no standardization (0), the inverse
of the Bray-Curtis double standardization (DBL2), the Bray-Curtis double standardization (DBL1) and
Species adjustment to equal maximum abundances (SPM) standardization (0.893). Table 1 revealed
a relatively high variability for species totals and relatively low ones for the rows (plots) totals, but all
of them increased with sample size. Moreover, species richness and the Shannon diversity index
increased with sample size in opposition to Pielou evenness.
The same trend is noted for s-stress values (Figure 2): regardless to the sample size, the lower
values of the s-stress were obtained for SAT. This standardization therefore performed better than the
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Spearman rank correlation
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Spearman rank correlation
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Spearman rank correlation
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Spearman rank correlation
Sample size : 25 plots
Sample size : 50 plots
Sample size : 75 plots
Sample size : 100 plots
International Journal of Applied Mathematics and Statistics
60
others. Figures 1 and 2 clearly showed that the s-stress value decreased when the Spearman rank
correlation value increased and ranged from 0.120 to 0.241. A closer examination of this figure
denotes an increase in s-stress value with sample size. From these descriptive analyses, we deduced
that SAT was the best standardization and the two dissimilarity indexes (Sorensen and Jaccard) were
not distinguishable for both Spearman rank correlation and s-stress.
Figure 2. Boxplots of s-stress-values for each combination of similarity index, type of standardization
and sample size: case of abundance data.
Legend: On the x-axis, first letters of variables are initials of the dissimilarity index (S=Sorensen; J=Jaccard); the
following are relative to the types of standardization (0 = No standardization; SPM=Species adjustment to equal
maximum abundances; SAT=standardization to equal totals; DBL1=SPM followed by SAT; DBL2=SAT followed
by SPM). Example: SSPM correspond to combination of Sorensen index and SPM.
Results from analysis of variance showed significant difference only for s-stress, with regard to each
of the two dissimilarity indexes, indicating the significant effect of sample size on s-stress values. The
quadratic model of the relationship between sample size and s-stress values was the most significant
and then retained to determine the optimum sample size (Figure 3): 90 plots with a s-stress value of
0.167.
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0
0.05
0.1
0.15
0.2
0.25
0.3
Stress-values
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0
0.05
0.1
0.15
0.2
0.25
0.3
Stress-values
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0
0.05
0.1
0.15
0.2
0.25
0.3
Stress-values
SSPM JSPM SDBL1 JDBL1 SDBL2 JDBL2 S0 J0 SSAT JSAT
0
0.05
0.1
0.15
0.2
0.25
0.3
Stress-values
Sample size : 25 plots
Sample size : 50 plots
Sample size : 75 plots
Sample size : 100 plots
International Journal of Applied Mathematics and Statistics
61
Table 1. Mean (m) and coefficient of variation (%) of diversity parameters, coefficient of variation (cv,
%) of rows and columns totals of data matrices for each sample size.
25 50 75 100
m cv m cv m cv m cv
Cv_rows (%) 44.6 20.1 44.6 14.3 44.9 11.7 45.1 10.1
Cv_columns (%) 141.8 12.4 160.9 9.8 170.3 8.2 176.3 7.3
S 18.7 17.1 22.8 13.6 25.1 11.7 26.6 10.3
H 3.1 4.9 3.2 3.4 3.2 2.7 3.2 2.5
Eq 0.7 4.5 0.7 3.8 0.7 3.3 0.7 3.1
S=Species richness; H= Shannon diversity index; Eq= Pielou evenness.
Figure 3. Relationship between s-stress-values and sample size for Jaccard and Sorensen
dissimilarity indexes.
Case of binary data
Results from binary matrices show a decrease in Spearman rank correlation values, from 0.921 (for
the Sorensen similarity index) to 0.907 (for the Jaccard index) when the sample size increased
(Figure 4). As with abundance data, the dispersion around the median value also decreased when
sample size increased. Furthermore, for a given sample size, the three similarity indexes examined
yielded approximately the same value. But, s-stress values increase with sample size. The s-stress
values ranged from 0.10 (Sokal and Michener similarity index) to 0.17 (Sorensen or Jaccard index).
Sokal and Michener similarity index yielded then the lowest values of s-stress, whatever is the sample
size and therefore was the best and retained for further investigations.
Sstress = -1E-05n
2
+ 0.001n + 0.081
R² = 0.99
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0 20 40 60 80 100 120 140 160
Sstress
Sample size (n)
Jaccard index
Sstress= -1E-05n
2
+ 0.001n + 0.080
= 0.99
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0 20 40 60 80 100 120 140 160
Sstress
Sample size (n)
Sorensen index
International Journal of Applied Mathematics and Statistics
62
Figure 4: Boxplots of Spearman rank correlation and s-stress-values for each combination of similarity
index and sample size: case of binary data.
Legend: On the x-axis, first letters of variables are initials of the similarity index (S=Sorensen; J=Jaccard;
SM=Sokal and Michener) and the following are linked to the sample size. Example: SM25 correspond to
combination of Sokal & Michener similarity index and sample size of 25 plots.
Analysis of variance performed to test the effect of sample size indicated a significant difference only
for s-stress values. The linear and the quadratic models were also shown to be highly significant. The
relationship between s-stress values and sample size for the quadratic model was thus plotted (Figure
5) and indicated an optimum of 75 plots with a s-stress value of 0.120.
Figure 5. Relationship between the s-stress-values and sample size for Sokal and Michener similarity
index.
4. DISCUSSION AND CONCLUSION
This study is a complement to previous investigations on designing accurate and strong way for
vegetation data analysis. In consistence with Fasham (1977), Kenkel and Orloci (1986), Faith et al.
(1987) and McCune and Grace (2002), results obtained showed that type of standardization greatly
SM100 SM75 SM50 SM25 S100 S75 S50 S25 J100 J75 J50 J25
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Spearman rank correlation
(a) Spearman rank correlation
(b) Stress values
SM100 SM75 SM50 SM25 S100 S75 S50 S25 J100 J75 J50 J25
0
0.05
0.1
0.15
0.2
0.25
0.3
Stress-values
Sstress= -1E-05n
2
+ 0.001n + 0.067
= 0.99
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 20406080100120140160
Sstress
Sample size (n)
International Journal of Applied Mathematics and Statistics
63
affects NMDS efficiency. Some of them improve ordinations in contrary to others. The standardization
to sample totals (SAT) was revealed to be the most outperformed. It then appears that NMDS perform
better when plots have similar weight (number of trees or species). Species adjustment to equal
maximum abundance (SPM) standardization often results in poor ordination. When applied alone, it
was the least successful standardization and hence least recommended. But when used in
combination with SAT, the SPM is however preferable to be used before to being used after SAT.
These results somewhat contrast those of Faith et al. (1987) who found the SPM standardization to
perform better than SAT for some dissimilarity indexes as Canberra metric and Chi-squared. This may
suggest that standardization effect varies according to the index under study since many dissimilarity
coefficients have in-built standardization (Fasham, 1977; Faith et al., 1987).
The choice of dissimilarity/similarity indexes is also of particular interest when ordinating or clustering.
Podani (2006) stated that the most critical step in selecting the appropriate method of ordination is the
choice of dissimilarity coefficient which must be compatible with available data. Results of this study
clearly showed that using quantitative version of either Jaccard or Bray-Curtis dissimilarity coefficient,
the NMDS yielded the same result indicating that despite their mathematical difference (in reference
to their formulas) both were similar. It should be recalled that the calculation of stress depends only
on the rank order of the similarities and not on their magnitude. Thus, two different similarity
coefficients could produce the same MDS ordination if they had the same rank order over all the
sample pairs (Podani, 2005). A next investigation in this study has consisted in computing Spearman
rank and Pearson linear correlation between the two coefficients based on a part of the initial matrix
M. The values obtained (respectively 1 and 0.997) emphasized that the two indexes produce the
same rank order over all the sample pairs. However, dissimilarities were very different. The same
observations can be drawn for binary matrices. Here, however, Sokal & Michener similarity index
showed the best result. In fact, in addition to the co-presence which is common to all of them, this
index takes into account the co-absence of species when computing similarity of a couple of plots
(Palm, 2003). Sokal and Michener similarity index can therefore be recommended for use in NMDS
as long as the data matrix is binary. We then conclude that similarity indexes do affect NMDS
efficiency. However, since only three indexes were examined in this study, it is possible for others
dissimilarity indexes to perform better than Sokal and Michener. Choi (2008) and Podani (2001) have
actually described respectively 76 and 16 indexes for binary data and 17 indexes for ratio scale data.
The stress-value increased with the number of plots and is consistent with the increase of objects
being scaled since stress can be viewed as a variance (McCune and Grace, 2002). Indeed, the
increase in sample size means an increase in objects to scale and therefore an increase in the stress-
value. However, the lower coefficient of variation obtained with the increase in sample size suggests
the higher the sample size the more accurate the scaling.
Binary matrices were shown to be more efficient than abundance matrices from which they were
derived. Presence-absence similarity measure may be better than a quantitative measure in samples
of high alpha and beta diversity in the presence of high levels of sampling noise (Kessel and
Whittaker, 1976). Binary matrices yielded high values of Spearman rank correlation and the low stress
International Journal of Applied Mathematics and Statistics
64
values, probably because of the small differences between pairs of objects with this type of matrix in
comparison to abundance matrices. In fact, two plots containing the same species will be viewed to
be much closed with presence-absence similarity indexes. But a slight difference in species
abundance can result in a great distinction with quantitative (ratio scale) dissimilarity coefficient. It is
also useful to emphasize the fact that NMDS was originally developed to allow for the analysis of
matrices resulting from experiments in which subjects are asked to make pairwise judgments of
similarity or preference (Schiffman et al., 1981). The increase in the species richness S, clearly
denote the fact that this parameter is closed to the sample extent as mentioned by McCune and
Grace (2002) and Stohlgren (2007).
The methodology implemented in this study used the 2-dimensions spaces for the initial configuration.
Normally the number of dimensions should be determined for each data matrix before choosing the
dimension of the initial configuration (Kruskal, 1964a and b). In fact, the determination of the number
of dimensions to use in the ordination space is an important issue in NMDS. With simulated data, this
is known a priori but for real data, a better method could be to use the dissimilarity matrix to calculate
the linkages of the minimum spanning tree (Gower and Ross, 1969). However, the first few
dimensions are sufficient to explain most of the variation (Podani, 2001). Besides this aspect,
Shepard (1962a) has strongly argued for solutions in two dimensions as this is more readily
interpretable.
5. REFERENCES
Bonou, W., Glèlè Kakaï, R., Assogbadjo, A.E., Fonton, H.N., Sinsin, B., 2009, Characterization of
Afzelia africana Sm. habitat in the Lama Forest reserve of Benin. Forest ecology and management
258, 1084–1092.
Braun-Blanquet, J., 1964, Pflanzensoziologie: Grundzüge der vegetationskunde, 3ed. Springer-
Verlag. Wien, 865 p.
Bray, J.R., Curtis, J.T., 1957, An ordination of the upland forest communities in southern Wisconsin.
Ecological Monographs 27, 325-349.
Chessel, D., Thioulouse, J., Dufour, A. B., 2004, Introduction à la classification hiérarchique. Fiche de
Biostatistique – Stage 7. http://pbil.univ-lyon1.fr/R/stage/stage7.pdf.
Choi, S.S., 2008, Correlation Analysis of Binary Similarity Measures and Dissimilarity Measures,
Doctorate dissertation, Pace University.
Clarke, K.R., 1993, Non-parametric multivariate analyses of changes in community structure.
Australian Journal of Ecology 18, 117-143.
Condit, R., Hubbell, S.P., Lafrankie, J.V., Sukumar, R., Manokaran, R., Foster, R.B., Ashton, P.S.,
1996, Species-area and species-individual relationships for tropical trees: a comparison of three
50-ha plots. Journal of Ecology 84, 549-562.
Digby, P.G.N., Kempton, R.A., 1987, Multivariate analysis of ecological communities. Chapman and
Hall, London, UK.
Eckblad, J.W., 1991, How many samples should be taken? Bioscience 41, 346-348.
Efron, B., Tibshirani, R.J., 1993, An introduction to the bootstrap. Chapman and Hall, New York, New
York, USA.
International Journal of Applied Mathematics and Statistics
65
Everitt, B.S., 2002, Cambridge Dictionary of Statistics, CUP, ISBN 0-521-81099-x
Faith, D.P., Minchin, P.R., Belbin, L., 1987, Compositional dissimilarity as a robust measure of
ecological distance. Vegetatio, 69: 57-68.
Fasham, M.J.R., 1977, A comparison of non-metric multidimensional scaling, principal components
analysis and reciprocal averaging for the ordination of simulated coenoclines and coenoplanes.
Ecology 58, 551-561.
Gauch, H.G., Whittaker R.H., Singer, S.B., 1981, A comparative study of non-metric ordinations.
Journal of Ecology 69, 135-152.
Gordon, A.D., 1999, Classification. 2nd. ed. Chapman and Hall, London, UK.
Gower, J.C., Ross G.J.S., 1969, Minimum spanning trees and single linkage cluster analysis. Applied
Statistics. 18, 54-64.
Gower, J.C., Legendre P., 1986, Metric and Euclidean properties of Dissimilarity Coefficients. Journal
of Classification 3, 5-48.
Jaccard, P., 1901, Etude comparative de la distribution florale dans une portion des Alpes et des
Jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37, 547-579.
Jaccard, P., 1912, The distribution of the flora in the alpine zone. New Phytologist 11, 35-50.
Kenkel, N.C., Orlóci, L., 1986, Applying Metric and Nonmetric Multidimensional Scaling to Ecological
Studies: Some New Results. Ecology, 67(4), 919-928.
Kent, M., Ballard, J., 1988, Trends and problems in the application of classification and
ordination methods in plant ecology. Vegetatio, 78, 109-124.
Kruskal, J.B., 1964a, Multidimensional scaling by optimising goodness of fit to a nonmetric
hypothesis. Psychometrika, 29, 1-27.
Kruskal, J.B., 1964b, Nonmetric multidimensional scaling: a numerical method. Psychometrika, 29,
115-129.
Kruskal, J.B., Carroll, J.D., 1969, Geometrical models and badness of fit functions. Pp. 639-671 in
Krishnaiah, P.K., ed., Multivariate Analysis II. Proceedings of the 2
nd
International Symposium on
Multivariate. Wright State University, Dayton, Ohio, June 17-22. 1968. Academic Press, New York.
Legendre, P., Legendre, L., 1998, Numerical ecology. 2nd.ed. Elsevier, Amsterdam, NL.
McCune, B., Grace, B.J., 2002, Analysis of Ecological Communities. Oregon, USA. 300p.
McCune, B., Lesica, P., 1992, The trade-off between species capture and quantitative accuracy in
ecological inventory of lichens and bryophytes in forests in Montana. Bryologist, 95, 296-304.
Minchin, P.R., 1987, An evaluation of the relative robustness of techniques for ecological ordination.
Vegetatio, 69, 89-107.
Økland, R.H., 1990, Vegetation ecology: theory, methods and applications with reference to
Fennoscandia. Sommerifeltia Suppl 1, 1-233.
Økland, R.H., 1996, Are Ordination and Constrained Ordination Alternative or Complementary
Strategies in General Ecological Studies? Journal of Vegetation Science, 7(2), 289-292.
Palm, R., 2003, Notes de statistique et d'informatique. Le Positionnement multidimensionnel:
Principes et application. 33 p.
Podani, J., 2000, Introduction to the exploration of multi-variate biological data. Backhuys, Leiden, NL.
International Journal of Applied Mathematics and Statistics
66
Podani, J., 2001, SYN-TAX 2000. Computer Programs for Data Analysis in Ecology and Systematic.
User’s Manual. Budapest, Hungary, 53p.
Podani, J., 2005, Multivariate exploratory analysis of ordinal data in ecology: Pitfalls, problems and
solutions. Journal of Vegetation Science, 16, 497-510.
Podani, J., 2006, Braun-Blanquet’s legacy and data analysis in vegetation science. Journal of
Vegetation Science 17(1), 113-117.
Prentice, I.C., 1977, Non-metric ordination methods in ecology. Journal of Ecology, 65, 85-94.
Schiffman, S.S., Reynolds, M.L., Young, F.W., 1981, Introduction to multidimensional scaling-theory,
methods, and applications. Academic Press, New York, New York, USA.
Shepard, R.N., 1962a, The analysis of proximities. Multidimensional scaling with an unknown
distance function. I. Psychometrika, 27, 125-140.
Sibson, R., 1972, Order invariant methods for data analysis. Journal of the Royal Statistical Society
(London), Series B 34, 311-349.
Sokal, R.R., Michener, C.D., 1958, A statistical method for evaluating systematic relationships.
University of Kansas Science Bulletin, 38, 1409-1438.
Sorensen, T.A., 1948, A method of establishing groups of equal amplitude in plant sociology based on
similarity of species content, and its application to analyses of the vegetation on Danish commons.
Biologiske Skrifter Kongelige Danske Videnskabernes Selskab, 5, 1-34.
Stohlgren, T.J., 2007, Measuring plant diversity. Lessons from the field. Oxford University Press.
New-York. 408p.
Takane, Y., Young, F.W., 1977, Non-metric individual differences multidimensional scaling: an
alternating least squares method with optimal scaling features. Psychometrika, 42(1), 7-67.
Van Laar, A., Akça, A., 2007, Forest mensuration. Springer, Dordrecht, 383 p.
International Journal of Applied Mathematics and Statistics
67
... In our study, we employed Multiple Correspondence Analysis (MCA) to explore the interrelationships among the variables Sex, Age, Family Status, Education, Occupation, Therapeutic Plants Used (Local Name), and Divisions. The Non-metric multidimensional scaling (NMDS) is commonly utilized for ordination in vegetation research (Salako et al. 2013). When using NMDS in statistical software, the accuracy of the results can be influenced by the selection of various options. ...
Article
Full-text available
Ethnobotany Research and Applications 29:43 (20xx)-http://dx. Abstract Background: For centuries, the Moroccan population has been using medicinal plants to treat various diseases. However, few investigations have been conducted to document and properly preserve these traditions. This ethnobotanical study aims to compile a comprehensive inventory of medicinal plants used by the residents of Khemisset, a region noted for its significant plant biodiversity, for the treatment of various diseases.
... To assess the effect of CADs on species assemblages, a non-metric multidimensional scaling (NMDS) was performed. NMDS is a widely used ordination technique in vegetation ecology (Salako et al., 2013;Sotelo-Caro et al., 2023;Ramos et al., 2023) that highlights the similarities between samples in terms of species composition. This ordination technique is not limited by the parametric assumptions usually absent in ecological data (Prentice, 1980;Tong, 1988). ...
... Moreover, it makes few assumptions about the nature of data, and balance the noise of data with the statistical significance of patterns, which is useful for ecology data. It makes NMS the most widely accepted and used ordination technique based on ordinal information (Salako et al., 2013). Stress was evaluated by Monte Carlo test (249 runs), and since no eigenvectors/eigenvalues are obtained by this powerful and flexible tool, the graphic results were rotated to the desired configuration. ...
... Since we used a quantitative dataset, we did not have any discrepancy between the dissimilarity method being Bray-Curtis or Jaccard. Salako (et al., 2012) stated that the NMDS yielded the same results after using a quantitative version of either Jaccard or Bray-Curtis dissimilarity coefficient. The ordination of species composition was differed by region in non-metric scaling, and it can be said that the effect here was mostly due to a response to landscape and biogeography. ...
Article
Full-text available
Biomonitoring is a significant method for evaluating aquatic life forms and their environments. The longer the process continues, the results of it become more precise. Benthic macroinvertebrates’ exposure to changes in environmental conditions makes them an important part of any biomonitoring program. This paper reviews a long-term water quality of the Buyuk Menderes River Basin which is the biggest river basin spread across the western Anatolia (Turkey). The study area was divided into three regions (Usak, Aydin, Denizli), primarily considering the provincial borders in the basin. A total of 40 sampling sites from the main river and its tributaries were selected. The prominent agricultural and industrial pollutants (textile, tannery and sugar factories) from each region have been taken into account. The most common and current biotic indices (BMWP Spanish version, ASPT, RBPIII, MMIF, EPT%, Diversity and Evenness) based on the pollution tolerance of benthic macroinvertebrates have been used to track water quality changes. The relationships between environmental variables (sO2, dO2, water temp., salinity, flow, TDS, Cond, pH, NO3-N, NO2-N, PO4-P, Fe+3, NH4-N) and bioindicators have been revealed by using multivariate analyses (NMDS, CCA). The region-based variations in water quality were compared with the Kruskal-Wallis test. The one-way variance analysis test (ANOVA) was used for the contrast between the biotic indices. Significant differences (p < 0.05) were found among the regions in terms of Shannon-Wiener, Evenness, BMWP and MMIF indices. Regions were separated according to pollution sources, and the impact of provinces on water quality may vary according to their industry types. It has been observed that pollutants can spread across a basin for very long distances and reinstatement of the environmental conditions may require long periods.
... Second, we employed NMDS on relative abundance data (no. of species sightings per site divided by no. of sightings per site) as per Salako, Adebanji, and Glèlè Kakaï (2013) with the 'vegan' package in R (Oksanen et al. 2019). This analysis allowed us to investigate valuable species identity data that the model analysis omitted. ...
Article
Full-text available
Increasing global urbanisation has steered research towards understanding biodiversity in urban areas. Old city spaces throughout Europe have a proliferation of urban court gardens, which can create a mosaic of habitat pockets in an urban area. This article examines the patterns and drivers of avian species richness and community structure in 20 gardens of the constituent colleges of the University of Oxford. We conducted morning surveys across 7 weeks in May and June 2017 and used an information-theoretic approach and model averaging to identify important habitat predictors of species richness. We also studied community structure with Sorensen indices and non-metric multi-dimensional analysis. A total of 43 avian species were observed across all sites. Our sites generally differed in their avian assemblages, with greater species turnover than nestedness between sites. Site area was the strongest predictor of site species richness and surrounding habitat composition was the dominant driver of community structure. Thus, the largest gardens were the most species rich, but species composition among gardens differed based on the habitats in which they were embedded. We support using island biogeography theory to understand the avian species assemblages of urban ecosystems and stress the suitability of our study sites for future urban ecosystem research and generating wildlife awareness.
... To complement these comparisons, we performed three different analyses to study the relative importance of geographical distance and environmental filtering. These were non-metric multidimensional scaling (NMDS) (Fasham 1977;Minchin 1987;Salako et al. 2013) using the Morisita index of diversity (Morisita 1959) as distance measure and a correlation analysis between environmental, geographic and species distance matrices using Mantel tests (Mantel 1967;Legendre & Fortin 2010), using the same distance measure. All are explained in more detail in the Supporting Information S2. ...
Article
Full-text available
Neutral models are often used as null models, testing the relative importance of niche versus neutral processes in shaping diversity. Most versions, however, focus only on regional scale predictions and neglect local level contributions. Recently, a new formulation of spatial neutral theory was published showing an incompatibility between regional and local scale fits where especially the number of rare species was dramatically under‐predicted. Using a forward in time semi‐spatially explicit neutral model and a unique large‐scale Amazonian tree inventory data set, we show that neutral theory not only underestimates the number of rare species but also fails in predicting the excessive dominance of species on both regional and local levels. We show that although there are clear relationships between species composition, spatial and environmental distances, there is also a clear differentiation between species able to attain dominance with and without restriction to specific habitats. We conclude therefore that the apparent dominance of these species is real, and that their excessive abundance can be attributed to fitness differences in different ways, a clear violation of the ecological equivalence assumption of neutral theory.
... ANOSIM provides a way to test whether there is a significant difference between two or more groups of sampling units. ANOSIM operates directly on a dissimilarity matrix and is philosophically allied with non-metric multidimensional scaling ordination [44] in that it uses only the rank order of dissimilarity values. If two groups of sampling units are really different in their species composition, then compositional dissimilarities between the groups ought to be greater than those within the groups. ...
Article
Full-text available
Background Traditional knowledge about the use of medicinal plants for herbal drinks (HDs) is not well documented in the Azad Kashmir region despite their widespread use. This study highlights the taxonomic diversity and traditional knowledge on medicinal plants used for HDs while examining the diversity of diseases treated with HDs in the study area. Methods Individual discussions were conducted with 255 informants (84 women and 171 men). Data gathered included (i) informant age and gender, (ii) HD species and respective plant parts used, (iii) health disorders treated, and (iv) mode of preparation and utilizations. Quantitative ethnobotanical indices including relative frequency of citation (RFC), informant consensus factor (ICF), and use value (UV) were used for data analyses. ResultsAltogether, 73 medicinal plants belonging to 40 families and 66 genera were reported to be used in HD preparations, with Asteraceae being the richest family. The average number of HD species cited was 9.09 ± 0.17 per informant and did not vary either by age or gender. In addition, men and women, and adults and the young used the same pool of species (dissimilarity nearly zero). The most used plant parts were leaves (20.00%), roots (17.25%), and fruits (16.47%). Based on UV, the top five most used species were Valeriana jatamansi, Isodon rugosus, Onopordum acanthium, Acacia nilotica, and Viola canescens; and the UV was similar among gender and age categories too. The most utilized herbal preparation forms included decoctions, infusions, and tea. One hundred and eleven diseases grouped into 13 ailment categories were reported to be cured using HDs. The main category of disease treated with HDs was gastrointestinal (GIT) disorders (RFC = 17.43%). Relatively few species were used by a large proportion of informants for each category of ailment (ICF ≥ 0.60). Only one species was used for “glandular disorders” and “eye diseases” (ICF = 1).A novelty of about 22% (16 species) was recorded for HD species in the present work. Conclusion The diversity of medicinal plant species used as HDs and the associated traditional knowledge are of considerable value to the indigenous communities of the Azad Kashmir region. Therefore, there is a need for conservation and preservation of medicinal HD species as well as the wealth of indigenous knowledge. The conservation effort should be high for species in the ailments categories glandular disorders and eye diseases. The therapeutic uses of HDs have provided basic data for further research focused on phytochemical and pharmacological studies and conservation of the most important species.
... This data was submitted to a GLM with beta distribution to compare PDs. To assess similarity of floristic composition amongst PDs, a matrix describing for each PD, the percentage of HGs of that PD containing each species was established and submitted to a non-metric multidimensional scaling (NMDS), (Kruskal 1964;Salako et al. 2013). The NMDS was based on Jaccard dissimilarity index in package ''vegan' ' (Oksanen et al. 2016). ...
Article
Full-text available
Home gardens have received increasing attention and have been insistently presented as hotspots for agro-biodiversity over the last decades. However, apart from their exceptional high plant species diversity, there is little quantitative evidence of the effectiveness of plant species conservation in home gardens. This study examined this issue by assessing (i) the size and membership of garden flora and the contribution to the maintenance of the national flora, (ii) how home garden flora connects to the larger ecosystem it belongs to and (iii) the conservation status of plant species at the home garden level. 360 home gardens distributed in three agroecological zones and nine phytogeographical districts in Benin were visited and inventoried. Diversity parameters at different taxonomic levels were calculated. Species accumulation and spatial occupancy, multivariate methods and rarity index were also used for data analysis. Findings showed that the 360 studied home gardens hosted up to 14.21% of plant species and 44.32% of plant families of the national flora. Home garden flora was constantly dominated by exotic plant species but strongly connected to their surrounding ecosystems, being composed of at least 60% of plant species from their phytogeographical districts. Finally, home garden plant species were mostly rare and threatened at the home garden level. In this study, we acknowledge the contribution of home gardens to the maintenance of plant species diversity at regional and global levels than local level. Based on the observed prevalence of exotic species, HG effectiveness in sustainably conserving native plant species biodiversity remains questionable.
... La meilleure ordination est obtenue en minimisant une fonction de contrainte définie (appeléestress). Avec le NMS, un optimum n'est pas garanti, car des résultats différents peuvent être obtenus avec les mêmes données (elle fonctionne en effet par itération) (Salako et al., 2013). Le résultat de l'ordination dépend du nombre d'axes sélectionnés. ...
Article
Full-text available
This article addresses the multivariate statistical methods such as ordination and classification methods commonly used in ecology. The ordination methods summarize the information in the matrix of data by minimizing the loss. The ordination methods are the Principal Component Analysis; the Principal Coordinate Analysis; the Correspondence Analysis; the Multiple Correspondence Analysis; the Detrended Correspondence Analysis; the Canonical Redundancy Analysis; the Canonical Correspondence Analysis; the Non-metric Multidimensional Scaling; and the Canonical Discriminant Analysis. The classification methods group as possible similar individuals. They are the agglomerative classification (also called hierarchical cluster); the Typological analysis; the Decisional Discriminant Analysis; and the Multivariate Analysis of Variance (MANOVA). These methods have the advantage of allowing the extraction of the main information in a multivariate matrix.
Chapter
Full-text available
The robustness of quantitative measures of compositional dissimilarity between sites was evaluated using extensive computer simulations of species’ abundance patterns over one and two dimensional configurations of sample sites in ecological space. Robustness was equated with the strength, over a range of models, of the linear and monotonic (rank-order) relationship between the compositional dissimilarities and the corresponding Euclidean distances between sites measured in the ecological space. The range of models reflected different assumptions about species’ response curve shape, sampling pattern of sites, noise level of the data, species’ interactions, trends in total site abundance, and beta diversity of gradients. The Kulczynski, Bray-Curtis and Relativized Manhattan measures were found to have not only a robust monotonic relationship with ecological distance, but also a robust linear (proportional) relationship until ecological distances became large. Less robust measures included Chord distance, Kendall’s coefficient, Chisquared distance, Manhattan distance, and Euclidean distance. A new ordination method, hybrid multidimensional scaling (HMDS), is introduced that combines metric and nonmetric criteria, and so takes advantage of the particular properties of robust dissimilarity measures such as the Kulczynski measure.
Article
This book provides sampling designs for measuring species richness and diversity, patterns of plant diversity, species-environment relationships, and species distributions in complex landscapes and natural ecosystems. Part I introduces the problem: plant diversity studies are difficult to design and conduct in part because of the history and baggage associated with the evolution of plant ecology into a quantitative science. Issues of scale, resolution, and extent must be effectively commandeered. Part II implores the practitioner to take an experimental approach to sampling plant diversity with a clear understanding of advantages and disadvantages of single-scale and multi-scale techniques. Part III focuses on scaling plant diversity measurements from plots to landscapes. Part IV provides a brief introduction to modeling plant diversity in relation to environmental factors. Examples of common non-spatial (correlative) and spatial analyses are explained. Part V introduces the concept of measuring temporal changes in plant diversity at landscape scales and follows with a case study designed to collect the necessary baseline data to monitor plant diversity. Part VI discusses research needed to understand better changes in plant diversity in space and time. Specific objectives are to: (1) provide a basic understanding of the history of design considerations in past and modern vegetation field studies; (2) demonstrate with real-life case studies the use of single-scale and multi-scale sampling methods, and statistical and spatial analysis techniques that may be particularly helpful in measuring plant diversity at landscape scales; and (3) address several sampling questions typically asked by students and field ecologists.
Article
(1) Several nonmetric multidimensional scaling programs (PARAMAP, POLYCON, ALSCAL and SIBSON) were applied to simulated and real plant community data in order to test their effectiveness as ordination techniques in comparison to reciprocal averaging (RA) and an improved version of RA: detrended correspondence analysis (DCA). (2) Nonmetric ordination gave better results than did RA for data having three to four dimensions, whereas RA was superior to nonmetric ordination for one dimension. For two dimensions there was little difference. Experimental variation of the sample-set (beta) diversity and of noise showed that neither method had consistent advantages over the other. For most ecological uses RA is preferable because it requires much less computation than do nonmetric methods. DCA was superior to RA and to nonmetric ordination, and needs exceptionally little computer time and storage. (3) The programs PARAMAP, POLYCON, ALSCAL and SIBSON differ little in the solutions produced, but they differ considerably in computing speed, quality of the initial configuration (and hence ability to avoid local minima), and convenience of output. Dissimilarities can be weighted by POLYCON, and SIBSON can use local scaling, but neither of these features improved results. In general ALSCAL was best. (4) Because its assumptions are unusually simple and general, nonmetric ordination results are of special interest for models of vegetation structure. The results presented are consistent with a bell-shaped model of species response to environmental gradients, and with an error function relationship between sample similarity and ecological separation.