Conference PaperPDF Available

Non-linear scaling techniques for uncovering the perceptual dimensions of timbre

Authors:

Abstract and Figures

Seeking to identify the constituent parts of the multidi- mensional auditory attribute that musicians know as tim- bre, music psychologists have made extensive use of mul- tidimensional scaling (MDS), a statistical technique for visualising the geometric spaces implied by perceived dis- similarity. MDS is also well known in the machine learn- ing community, where it is used as a basic technique for dimensionality reduction. We adapt a popular non-linear variant of MDS in machine learning, Isomap, for use in analysing psychological data and re-analyse an earlier ex- periment on human perception of timbre. Isomap is de- signed to be better than linear MDS at maintaining the local relationships of each data point with its neighbours, and our results show that it can produce a more musically intuitive timbre space without sacrificing correlation with those aspects of timbre perception that have already been discovered. Future work should explore the new timbral dimensions uncovered by our algorithm.
Content may be subject to copyright.
NON-LINEAR SCALING TECHNIQUES FOR UNCOVERING
THE PERCEPTUAL DIMENSIONS OF TIMBRE
John Ashley Burgoyne and Stephen McAdams
Centre for Interdisciplinary Research in Music and Media Technology
Schulich School of Music of McGill University
555 Sherbrooke Street West
Montr´
eal, Qu´
ebec, Canada H3A 1E3
{ashley,smc}@music.mcgill.ca
ABSTRACT
Seeking to identify the constituent parts of the multidi-
mensional auditory attribute that musicians know as tim-
bre, music psychologists have made extensive use of mul-
tidimensional scaling (MDS), a statistical technique for
visualising the geometric spaces implied by perceived dis-
similarity. MDS is also well known in the machine learn-
ing community, where it is used as a basic technique for
dimensionality reduction. We adapt a popular non-linear
variant of MDS in machine learning, Isomap, for use in
analysing psychological data and re-analyse an earlier ex-
periment on human perception of timbre. Isomap is de-
signed to be better than linear MDS at maintaining the
local relationships of each data point with its neighbours,
and our results show that it can produce a more musically
intuitive timbre space without sacrificing correlation with
those aspects of timbre perception that have already been
discovered. Future work should explore the new timbral
dimensions uncovered by our algorithm.
1. INTRODUCTION
As any computer musician knows, timbre is one of the
most important compositional parameters, and yet it re-
mains one of the most under-theorised. Part of the rea-
son for the relative lack of theory may be due to the fact
that, unlike pitch, timbre is a multi-dimensional auditory
attribute, and it it is difficult to draw general conclusions
about timbre as a whole without first identifying its con-
stituent parts. Nonetheless, there have been a number of
attempts to uncover the underlying dimensionality of tim-
bre space over the past few decades, most based on per-
ceptual experiments with synthesised and recorded tones.
Early experiments with synthetic tones identified spectral
centroid and attack time as primary components of tim-
bre, in addition, at times, to a third dimension that was
more difficult to interpret and dependent on the stimulus
set [4, 5]. Later studies with more sophisticated models
came to similar conclusions, and suggested that the third
component might be a measure of irregularity in the spec-
tral envelope [6, 7]. A recent confirmatory study has veri-
fied these interpretations [2].
All of these studies are based on a statistical technique
known as multidimensional scaling (MDS) [9]. The ba-
sic idea of MDS is to take the set of proximities between
all members of some set of data points, e.g., sample tim-
bres, and to model them as distances in a Euclidean space
of as few dimensions as possible. In the context of timbre,
these proximities are usually taken from psychological ex-
periments in which human subjects have rated their per-
ception of the (dis)similarity between timbre pairs. The
trouble with MDS in this context is that its classical form
was designed to interpret a single set of dissimilarities
among items, not the average over all subjects of an ex-
periment. The first robust solution to this problem was the
INDSCAL algorithm [3], which models a special weight
on each dimension for each subject in the experiment in
order to better fit model distances to the set of empiri-
cal dissimilarities. The more sophisticated CLASCAL al-
gorithm reduces the number of parameters in INDSCAL
by modelling weights not for individual subjects but for a
smaller number of aggregate subject groups, called latent
classes [11]. CLASCAL and its variants are the standard
techniques for analysing timbre spaces today.
There is another problem with these techniques, how-
ever: being linear, they consider all distances estimated by
the human subjects to be equally reliable and of equal rel-
ative scale. Although some relatively straightforward ex-
tensions to MDS can treat the latter problem, e.g., CON-
SCAL [12], the former requires more aggressive modifica-
tions. One such modification, known as Isomap, replaces
large distances in the original distance matrices with so-
called geodesic distances along a hypothetical manifold
[8]. Previous papers at this conference have demonstrated
that Isomap and its relatives can uncover meaningful mu-
sical relationships that traditional linear MDS will always
miss [1].
This paper combines the CLASCAL and Isomap mod-
els to re-analyse the data from a major study of timbre [7].
Section 2 provides a more detailed explanation of these
algorithms and best practises for interpreting their results.
Section 3 presents the results of our new scaling and com-
pares them to the original study. Section 4 concludes with
suggestions for future applications of non-linear scaling to
the study of musical timbre.
2. CLASCAL AND ISOMAP
2.1. CLASCAL
Traditional MDS was designed to handle a single set of
pairwise proximities only. A number of models have been
presented to adapt MDS for multiple-subject experiments,
of which the most important for studying timbre has been
CLASCAL [11]. The CLASCAL model seeks to min-
imise the approximation error in the following equation:
di jk "R
r=1
wC(i),r(xjr xkr )2#1
/2
(1)
where di jk is the dissimilarity rating that subject iassigned
to stimulus pair (j,k),Ris the number of dimensions in
the output set, wC(i),ris a special weight for the so-called
latent class C(i)to which CLASCAL has assigned sub-
ject i, and xjr and xkr are the r-th components of the output
vectors for stimuli jand k. Latent classes are meant to rep-
resent groups of subjects who pursue similar rating strate-
gies. The number of latent classes used is a compromise
between over-parametrisation, e.g., the INDSCAL model,
which assigns each subject to its own class, and over-
generalisation, e.g., ignoring differences between subjects
by taking the average over all dissimilarity matrices. A
Monte-Carlo likelihood-ratio technique is used to deter-
mine the optimal number of classes.
Another problem with traditional MDS is that it as-
sumes all of the variance in a data set can be explained by
dimensions common to all stimuli. This assumption does
not hold for timbres: many include instrument-specific
components such as the sound of the returning hopper in a
harpsichord. A more sophisticated version of CLASCAL
separates these components, known as specificities, using
the following model:
di jk "R
r=1
wC(i),r(xjr xkr )2+vC(i)(sj+sk)#1
/2
(2)
where sjand skare the specificities for stimuli jand kand
vC(i)represents the weight subjects in class C(i)give to
specificities when distinguishing timbres [7, 10].
2.2. Isomap
Isomap arose as a solution to the problem of dimensional-
ity reduction for data sets like the famous ‘Swiss roll’ pic-
tured in Figure 1 [8]. Looking at the plot, it is obvious to
a human that the data are arranged on a two-dimensional
plane that has been coiled and presented in three dimen-
sions. This fact is not obvious to traditional MDS, which
strives to preserve every pairwise distance in the set, in-
cluding those between the ends of the roll and the inner or
outer loops. The ingenious solution in Isomap is to throw
away all pairwise distances in the set except those at the
local level, i.e., those in a small region immediately sur-
rounding each point in the data set. These regions can be
selected as a fixed number kof the nearest neighbours to
(a) Embedded in 3-D (b) Unrolled in 2-D
Figure 1: The ‘Swiss roll’ data set. On the left, the data is
presented in its original form. On the right, the data is pre-
sented as it should be unrolled for human interpretation.
Traditional MDS can never arrive at this solution, how-
ever, because it seeks to preserve the distances between
the ends of the roll and the inner/outer loops.
each point in the data set or as those points that fall within
a sphere of fixed radius εaround each point. The other
distances are then recomputed using an all-pairs shortest-
path algorithm, yielding an approximation of the so-called
geodesic distances, or distances in the lower-dimensional
form. After these approximate distances are computed,
traditional MDS is applied.
At first glance, the Swiss roll appears to be a funda-
mentally different problem than that of estimating tim-
bre spaces. There is little reason to believe that human
subjects would willfully twist their ratings of the simi-
larities between timbre pairs into more dimensions than
are already present. The larger message of Isomap, how-
ever, is that unless a space is perfectly linear, large dis-
tances in a scaling model can mask important structures
in the data. It seems prudent to check for such structures
in psychological data, and because Isomap is based on
classical MDS, unlike a number other non-linear scaling
techniques, it lends itself naturally to combination with
CLASCAL. Each subject’s dissimilarity matrix is processed
according to the Isomap algorithm up to the final MDS
step. After this pre-processing is complete, the new dis-
similarity matrices are fed to CLASCAL as usual.
3. EXPERIMENTS AND RESULTS
We did not perform a new perceptual study for this paper,
but rather re-examined data from McAdams et al.’s 1995
study of 88 subjects [7]. We chose to examine the judge-
ments of the professional musicians in the study only, 24
in all, in order to simplify our analysis. The timbres used
in the study overlapped considerably with those used in
[6], which were a set of recorded, FM-synthesised tim-
bres designed to mimic traditional musical instruments.
In addition to 12 timbres from this set, McAdams et al.,
following the legacy of [5], also synthesised 6 hybrid tim-
bres, e.g., the oboleste, a combination of the perceptual
features of oboe and celesta sounds. Each subject had an
opportunity to rate the dissimilarity between all 153 pairs
of these 18 timbres.
Isomap (non-linear)
Rise -0.87 -0.45 -0.72
S.C. 0.03 -0.79 0.07
Flux -0.15 0.18 0.22
Linear
-0.94 -0.00 0.02 0.92 0.42 0.82
0.44 -0.14 0.10 -0.35 0.18 -0.67
0.30 0.09 0.49 -0.53 -0.21 0.24
0.26 0.91 -0.15 -0.15 -0.88 -0.16
Table 1: Correlation coefficients of significant dimensions
in the linear and Isomap models against each other and
log rise (attack) time, spectral centroid, and spectral flux.
Values in bold are significant at p=0.005; values in italic
are significant at p=0.1
3.1. Dimensionality and class membership
As the authors of CLASCAL recommend, we used the
standard Bayesian information criterion (BIC) to estab-
lish the dimensionality of the MDS spaces and the special
Monte-Carlo technique mentioned above to determine the
number of latent subject classes. We found that the tra-
ditional linear model produces a four-dimensional space
when including specificities, which surprisingly, is of higher
dimensionality than the space uncovered in [7] for a larger
set of subjects. The reason for this difference is likely that
only three latent classes are necessary to get the best fit for
our data as opposed to the five necessary for the larger set.
The Isomap model was able to reduce the space back to
three dimensions with the same number of latent classes,
although the class membership differs.
The lower right-hand corner of Table 1 presents the
correlation coefficients between the dimensions of the lin-
ear and Isomap models. The leading dimensions are very
strongly correlated (r=0.92). The second dimension of
the linear space, however, correlates best with the trail-
ing dimension of the non-linear space, and the trailing di-
mension of the linear space correlates strongly with the
second dimension of the non-linear space. Because the
dimensions are ordered according to their perceptual im-
portance, this last result is somewhat surprising: although
[7] employs the same linear model as ours, our non-linear
model yields a better match to their results. In a further
wrinkle, the leading dimension of the linear model corre-
lates better with the trailing dimension of the non-linear
model than the second does.
The table also presents correlations with McAdams et
al’s acoustic correlates of the dimensions of timbre space:
(log) rise, or attack, time, the spectral centroid, and spec-
tral flux, i.e., a measure of change in the spectral shape
throughout the duration of the stimulus. The first two
dimensions of the non-linear space correlate with attack
time and spectral centroid; in keeping with the relation-
ship between the dimensions of the linear and non-linear
models, it is the leading and trailing dimensions of the
linear model that exhibit similar correlations. Consistent
with the findings of [7], spectral flux correlates only weakly
with the model dimensions. The problem trailing dimen-
sion of the non-linear space again exhibits an unexpected
Instrument Abbreviation Lin. N.-L.
French horn hrn 0.83 2.75
Trumpet tpt 0.47 1.56
Trombone trn 2.01 1.80
Harp hrp 1.29 0.87
Trumpar tpr = tpt + gtr 1.93 1.73
Oboleste ols = obo + cel 1.65 2.46
Vibraphone vbs 1.74 3.43
Striano sno = str + pno 1.99 1.78
Harpsichord hcd 2.56 3.55
English horn can 1.23 2.85
Bassoon bsn 1.50 0.77
Clarinet cnt 2.10 4.22
Vibrone vbn = vbs + trn 2.12 2.36
Obochord obc = obo + hcd 0.00 3.53
Guitar gtr 1.53 2.47
Strings stg 1.35 1.76
Piano pno 2.20 2.71
Guitarnet gtn 1.26 2.98
Table 2: Square roots of the specificity values for the lin-
ear and non-linear models. Names of hybrid timbres are
printed in italic.
correlation, correlating with log rise time at a comparable
level of significance (p=0.005) to the leading dimension.
These coefficients confirm the commonly held result
that attack time is a dominating component of the percep-
tion of timbre and that spectral centroid is also a signifi-
cant component. Like previous studies, however, our re-
sults leave room to explain at least one further component
that could explain the trailing dimension of our non-linear
space or the second and third dimensions of our linear one.
Spectral flux, the explanatory power of which has already
been called into question [2], fares notably poorly.
3.2. Outliers and specificities
A somewhat surprising result emerged when examining
our data for outliers. Although no outliers were detected
in the untransformed data, an analysis of the latent class
assignments revealed one outlier after the transformation.
Fundamentally, the Isomap transformation emphasises the
effect of fine-grained distinctions between those timbres a
subject perceives as fairly similar and discounts the effect
of coarse-grained distinctions between timbres a subject
perceives as largely dissimilar. Thus, this outlying subject
uses comparable strategies to other subjects’ when mak-
ing coarse distinctions but a unique strategy to make fine
distinctions. It bears further study to determine exactly
how this strategy differs from the others.
Although the outlier situation and class membership
differ between the two models overall, they appear to have
one latent class in common, i.e, with the same member-
ship. Looking at the weights wC(i),rand vC(i), one can see
that the defining characteristic of this common group is
an emphasis on specificities when making timbre similar-
ity judgements. Given this result, one would expect that
the specificity values would be similar between the two
models, but this is not the case. Table 2 presents these val-
ues for both models. One can see that specificity values
!5
0
5
!4
!2
0
2
4
!4
!3
!2
!1
0
1
2
3
4
hcd
obc
tpr
can
bsn
tpt
stg
sno
gtr
hrp
Spectral centroid
pno
hrn
gtn
ols
tbn
cnt
vbs
vbn
Attack time
(a) Linear MDS
!5
0
5
!4
!2
0
2
4
!3
!2
!1
0
1
2
3hcd
bsn
can
tpr
stg
tpt
obc
sno
gtr
cnt
ols
gtn
vbs
Spectral centroid
pno
hrn
hrp
tbn
vbn
Attack time
(b) Isomap
Figure 2: Two timbre spaces based on [7], one generated with linear MDS and the other generated with Isomap on
averaged subject spaces. The unlabelled axes correspond to dimensions 2 and 3 in the respective CLASCAL spaces.
are higher for the non-linear model in general and that the
correlation is poor (r=0.23).
3.3. Timbre spaces
The principal advantage of the non-linear model is that it
can preserve spatial structure with fewer dimensions. Fig-
ure 2 presents the complete non-linear space and dimen-
sions 1, 2, and 4 of the linear space (so as to preserve the
dimensions that correlate with the non-linear model). The
third axes are left blank because it is difficult to interpret
the meaning of these dimensions. As one would expect
from the strong correlations in Table 1, the spatial groups
of the models are similar, although three-dimensional ro-
tation illustrates that the points in the non-linear space are
clustered more tightly.
4. SUMMARY AND FUTURE WORK
When applied to experiments on human timbre percep-
tion, Isomap appears to retain the most desirable aspects
of the global structure of linear models while tightening
the local structure of the resulting timbre space. Our re-
sults raise interesting questions about differing strategies
subjects use when distinguishing highly similar vs. highly
dissimilar timbres and suggest that there remains at least
one dimension of human timbre perception for the com-
munity to discover how to interpret.
5. REFERENCES
[1] J. A. Burgoyne and L. K. Saul, “Visualization of
low-dimensional structure in tonal pitch space, in
Proc. Int. Comp. Mus. Conf., 2005, pp. 243–46.
[2] A. Caclin, S. McAdams, B. K. Smith, and S. Winsberg,
“Acoustic correlates of timbre space dimensions: A confir-
matory study using synthetic tones,” J. Acoust. Soc. Am.,
vol. 118, no. 1, pp. 471–82, 2005.
[3] J. D. Carroll and J.-J. Chang, “Analysis of individual dif-
ferences in multidimensional scaling via an n-way general-
ization of ‘Eckart-Young’ decomposition,” Psychometrika,
vol. 35, no. 3, pp. 283–319, 1970.
[4] J. M. Grey, “Multidimensional perceptual scaling of mu-
sical timbre,” J. Acoust. Soc. Am., vol. 61, pp. 1270–77,
1977.
[5] J. M. Grey and J. W. Gordon, “Perceptual effects of spec-
tral modifications on musical timbres,” J. Acoust. Soc. Am.,
vol. 63, no. 5, pp. 1493–1500, 1978.
[6] C. L. Krumhansl, “Why is musical timbre so hard to un-
derstand?” in Structure and Perception of Electroacoustic
Sound and Music, ser. Excerpta Medica, S. Nielzen and
O. Olsson, Eds. Amsterdam: Elsevier, 1989, no. 846.
[7] S. McAdams, S. Winsberg, S. Donnadieu, G. D. Soete, and
J. Krimphoof, “Perceptual scaling of synthesized musi-
cal timbres: Common dimensions, specificities, and latent
subject classes,” Psych. Res., vol. 58, pp. 177–92, 1995.
[8] J. B. Tennenbaum, V. de Silva, and J. C. Langford, “A
global geometric framework for nonlinear dimensionality
reduction,” Science, vol. 290, pp. 2319–23, 22 December
2000.
[9] W. S. Torgerson, Theory and Methods of Scaling. New
York: Wiley, 1958.
[10] S. Winsberg and J. D. Carroll, “A quasi-nonmetric method
for multidimensional scaling via an extended Euclidean
model,” Psychometrika, vol. 54, no. 2, pp. 217–29, 1989.
[11] S. Winsberg and G. De Soete, “A latent class approach to
fitting the weighted Euclidean model, CLASCAL,” Psy-
chometrika, vol. 58, no. 2, pp. 315–30, 1993.
[12] ——, “Multidimensional scaling with constrained dimen-
sions: CONSCAL,” Br. J. Math. Stat. Psychol., vol. 50, pp.
55–72, 1997.
... Hence we reduce the timbre space dimensionality to a number of components N identical to that of the vocal gestural controller. The reduction is performed with Isomap to preserves the geodesic distance in the reduced space, providing greater accuracy in the low dimensional representation of the timbre than linear techniques (Burgoyne and McAdams 2007). Over a large set of experiments with 4 to 8 variable parameters we observed that 2 to 4 dimensions are generally sufficient to cover a significant degree of the total timbre variance. ...
Article
Full-text available
In this article we describe a user-driven adaptive method to control the sonic response of digital musical instruments using information extracted from the timbre of the human voice. The mapping between heterogeneous attributes of the input and output timbres is determined from data collected through machine-listening techniques and then processed by unsupervised machine-learning algorithms. This approach is based on a minimum-loss mapping that hides any synthesizer-specific parameters and that maps the vocal interaction directly to perceptual characteristics of the generated sound. The mapping adapts to the dynamics detected in the voice and maximizes the timbral space covered by the sound synthesizer. The strategies for mapping vocal control to perceptual timbral features and for automating the customization of vocal interfaces for different users and synthesizers, in general, are evaluated through a variety of qualitative and quantitative methods.
... Hence we reduce the timbre space dimensionality to a number of components N identical to that of the vocal gestural controller. The reduction is performed with Isomap to preserves the geodesic distance in the reduced space, providing greater accuracy in the low dimensional representation of the timbre than linear techniques (Burgoyne and McAdams 2007). Over a large set of experiments with 4 to 8 variable parameters we observed that 2 to 4 dimensions are generally sufficient to cover a significant degree of the total timbre variance. ...
Article
Full-text available
In this article we describe a user-driven adaptive method to control the sonic response of digital musical instruments using information extracted from the timbre of the human voice. The mapping between heterogeneous attributes of the input and output timbres is determined from data collected through machine-listening techniques and then processed by unsupervised machine-learning algorithms. This approach is based on a minimum-loss mapping that hides any synthesizer-specific parameters and that maps the vocal interaction directly to perceptual characteristics of the generated sound. The mapping adapts to the dynamics detected in the voice and maximizes the timbral space covered by the sound synthesizer. The strategies for mapping vocal control to perceptual timbral features and for automating the customization of vocal interfaces for different users and synthesizers, in general, are evaluated through a variety of qualitative and quantitative methods.
... A more recent study by Caclin, McAdams, Smith, and Winsberg (2005), employing the recent CONSCAL model, confirmed the close relations between perceptual dimensions and attack time, spectral centroid, and spectral flux, and also discovered the significant role of the attenuation of even harmonics. The work on scaling techniques and their application to timbre perception is on-going (see Burgoyne & McAdams, 2007), and it might generate some interesting results on individual perceptual differences between different groups of subjects along with models describing the different perceptual strategies. ...
Article
The Music Technology program at McGill University prepares students for commercial and academic opportunities in music technology, offering degrees at the master's and doctoral levels as well as two minor programs at the undergraduate level. Areas of research include audio signal processing, music information retrieval, human-computer interface design and analysis, computational acoustic modeling, and music perception and cognition.
Article
Full-text available
In his 2001 monograph Tonal Pitch Space, Fred Lerdahl defined a distance function over tonal and post-tonal harmonies distilled from years of research on music cognition. Although this work references the toroidal structure commonly associated with harmonic space, it stops short of presenting an explicit embedding of this torus. It is possible to use statistical techniques to recreate such an embedding from the distance function, yielding a more complex structure than the standard toroidal model has heretofore assumed. Nonlinear techniques can reduce the dimensionality of this structure and be tuned to emphasize global or local anatomy. The resulting manifolds highlight the relationships inherent in the tonal system and offer a basis for future work in machine-assisted analysis and music theory.
Article
Scientists working with large volumes of high-dimensional data, such as global climate patterns, stellar spectra, or human gene distributions, regularly confront the problem of dimensionality reduction: finding meaningful low-dimensional structures hidden in their high-dimensional observations. The human brain confronts the same problem in everyday perception, extracting from its high-dimensional sensory inputs—30,000 auditory nerve fibers or 106 optic nerve fibers—a manageably small number of perceptually relevant features. Here we describe an approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set. Unlike classical techniques such as principal component analysis (PCA) and multidimensional scaling (MDS), our approach is capable of discovering the nonlinear degrees of freedom that underlie complex natural observations, such as human handwriting or images of a face under different viewing conditions. In contrast to previous algorithms for nonlinear dimensionality reduction, ours efficiently computes a globally optimal solution, and, for an important class of data manifolds, is guaranteed to converge asymptotically to the true structure.
Article
Multidimensional scaling is a technique used to represent pairwise dissimilarities among a set of objects in a Euclidean space of generally low dimensionality. The data determine the number of dimensions to retain, and in the case of the weighted Euclidean model the psychologically meaningful dimensions to interpret. However, it is often the case that it is difficult to determine the appropriate dimensionality based on goodness-of-fit statistics, and it is sometimes difficult to interpret each recovered dimension. Moreover, this problem may occur in situations where a small number of physical parameters may be used to describe the objects. Then it would appear useful to use this information and constrain the dimensions of the distance model to be monotone transformations of these physical dimensions. We therefore propose two models: where djk is the distance between stimulus j and stimulus k; there are R physical dimensions; x(r)j is the coordinate of the jth object on the rth dimension; fj is the vector of monotone transformations for the jth object, the rth component being f(r) (x(r)j); A is an R × R symmetric matrix with 1s on the diagonal; I is an R × R identity matrix with 1s on the diagonal, 0s elsewhere. The monotone transformations are represented as I-splines. Model one is a special case of model two for uncorrelated dimensions. Solutions may be compared with those obtained in the unconstrained case to choose the best representation of the data. Examples of real and artificial data will be presented.
Article
An experiment was performed to evaluate the effects of spectral modifications on the similarity structure for a set of musical timbres. The stimuli were 16 music instrument tones, 8 of which were modified in pairs. This modification consisted of exchanging the shape of the spectral energy distribution between the two tones within each pair. The three‐dimensional spatial representation of similarities among the 16 tones was obtained by multidimensional scaling techniques and compared to a previous scaling of the original 16 unmodified tones [J. M. Grey, J. Acoust. Soc. Am. 61, 1270–1277 (1977)]. The pairs of tones which had exchanged spectral shapes in fact exchanged orders on the spatial axis which had been previously interpreted as relating to spectral shape, thereby supporting the earlier interpretation. The two remaining axes of the spatial solution also retained their original interpretations, relating to various temporal details of the tones. A set of formal quantitative models for the spectral dimension was constructed and tested, and the results further supported the interpretation of this perceptual axis.
Article
An individual differences model for multidimensional scaling is outlined in which individuals are assumed differentially to weight the several dimensions of a common “psychological space”. A corresponding method of analyzing similarities data is proposed, involving a generalization of “Eckart-Young analysis” to decomposition of three-way (or higher-way) tables. In the present case this decomposition is applied to a derived three-way table of scalar products between stimuli for individuals. This analysis yields a stimulus by dimensions coordinate matrix and a subjects by dimensions matrix of weights. This method is illustrated with data on auditory stimuli and on perception of nations.
Article
A weighted Euclidean distance model for analyzing three-way proximity data is proposed that incorporates a latent class approach. In this latent class weighted Euclidean model, the contribution to the distance function between two stimuli is per dimension weighted identically by all subjects in the same latent class. This model removes the rotational invariance of the classical multidimensional scaling model retaining psychologically meaningful dimensions, and drastically reduces the number of parameters in the traditional INDSCAL model. The probability density function for the data of a subject is posited to be a finite mixture of spherical multivariate normal densities. The maximum likelihood function is optimized by means of an EM algorithm; a modified Fisher scoring method is used to update the parameters in the M-step. A model selection strategy is proposed and illustrated on both real and artificial data.
Article
An Extended Two-Way Euclidean Multidimensional Scaling (MDS) model which assumes both common and specific dimensions is described and contrasted with the standard (Two-Way) MDS model. In this Extended Two-Way Euclidean model then stimuli (or other objects) are assumed to be characterized by coordinates onR common dimensions. In addition each stimulus is assumed to have a dimension (or dimensions) specific to it alone. The overall distance between objecti and objectj then is defined as the square root of the ordinary squared Euclidean distance plus terms denoting the specificity of each object. The specificity,s j , can be thought of as the sum of squares of coordinates on those dimensions specific to objecti, all of which have nonzero coordinatesonly for objecti. (In practice, we may think of there being just one such specific dimension for each object, as this situation is mathematically indistinguishable from the case in which there are more than one.)We further assume that ij =F(d ij ) +e ij where ij is the proximity value (e.g., similarity or dissimilarity) of objectsi andj,d ij is the extended Euclidean distance defined above, whilee ij is an error term assumed i.i.d.N(0, 2).F is assumed either a linear function (in the metric case) or a monotone spline of specified form (in the quasi-nonmetric case). A numerical procedure alternating a modified Newton-Raphson algorithm with an algorithm for fitting an optimal monotone spline (or linear function) is used to secure maximum likelihood estimates of the paramstatistics) can be used to test hypotheses about the number of common dimensions, and/or the existence of specific (in addition toR common) dimensions.This approach is illustrated with applications to both artificial data and real data on judged similarity of nations.
Article
Two experiments were performed to evaluate the perceptual relationships between 16 music instrument tones. The stimuli were computer synthesized based upon an analysis of actual instrument tones, and they were perceptually equalized for loudness, pitch, and duration. Experiment 1 evaluated the tones with respect to perceptual similarities, and the results were treated with multidimensional scaling techniques and hierarchic clustering analysis. A three-dimensional scaling solution, well matching the clustering analysis, was found to be interpretable in terms of the spectral energy distribution; the presence of synchronicity in the transients of the higher harmonics, along with the closely related amount of spectral fluctuation within the tone through time; and the presence of low-amplitude, high-frequency energy in the initial attack segment; an alternate interpretation of the latter two dimensions viewed the cylindrical distribution of clusters of stimulus points about the spectral energy distribution, grouping on the basis of musical instrument family (with two exceptions). Experiment 2 was a learning task of a set of labels for the 16 tones. Confusions were examined in light of the similarity structure for the tones from experiment 1, and one of the family-grouping exceptions was found to be reflected in the difficulty of learning the labels.