TSAM: A TOOL FOR ANALYZING, MODELING, AND
MAPPING THE TIMBRE OF SOUND SYNTHESIZERS
Faculty of Engineering and Information Sciences
University of Wollongong in Dubai
Synthesis algorithms often have a large number of adjust-
able parameters that determine the generated sound and
its resultant psychoacoustic features. The relationship
between parameters and timbre is important for end us-
ers, but it is generally unknown, complex, and difficult to
analytically derive. In this paper we introduce a strategy
for the analysis of the sonic response of synthesizers sub-
ject to the variation of an arbitrary set of parameters. We
use an extensive set of sound descriptors which are
ranked using a novel metric based on statistical analysis.
This enables the study of how changes to a synthesis pa-
rameter affect timbral descriptors, and provides a multi-
dimensional model for the mapping of the synthesis con-
trol through specific timbre spaces. The analysis, model-
ing and mapping are integrated in the Timbre Space Ana-
lyzer & Mapper (TSAM) tool, which enables further in-
vestigation into synthesis sonic response and on percep-
tually related sonic interactions.
The timbre generated by a sound synthesis algorithm de-
pends on the values assigned to the variable parameters,
typically user configurable. Regardless of the synthesis
method, the relationship between control and perceptual
features of the resultant sound is generally weak  and
difficult to determine. Modern synthesis algorithms pre-
sent a wide timbre range and a high dimensional control
space. The timbre, which is central in modern sonic arts,
has high dimensionality as well  and a blurry scientific
definition . For designers of sonic interactive systems
and of musical instruments, knowing the parameter-to-
timbre relationship supports the implementation of the
intended sonic response. For sound designers and per-
formers this knowledge eases the development of control
intimacy . Also, this insight can help in improving the
expressivity of musical instruments by reducing the con-
trol dimensionality while broadening the timbral re-
sponse. The heuristic estimation of the parameter-to-
timbre causality is workable, but is subjective and inaccu-
rate. This task is challenging due to nonlinearities and
correlations in the synthesis process, especially when a
large set of variable parameters are involved.
We address this issue by proposing a systematic and
generic method to analyze the timbre in relation to the
synthesis variables. The collected data is then processed
by computing a quality metric for each sound descriptor,
composed of four weighted components, each represent-
ing a specific statistical characteristic. Additionally, qual-
ity metrics for synthesis parameters are provided as well.
This information can be used in designing the mapping of
musical gestures to the synthesis control, providing a
tighter causal link with the timbral response of the sys-
tem. The tool we present here, the Timbre Space Analyz-
er & Mapper (TSAM), integrates these functionalities and
supports implementation of few-to-many lossless map-
pings , through an intermediate timbre-related layer
. The tool, after analyzing the sonic response of the
synthesizer, computes a reduced timbre-to-parameter
model, which supports real-time interaction with the
sound synthesizer. In particular, we integrate an exten-
sion of the modeling and mapping strategy we introduced
in , highlighting the enhancement achieved when con-
sidering the quality metric for selecting the descriptor for
The TSAM is a flexible tool, exposing internal compu-
tation settings and options on a Graphical User Interface
(GUI), which supports a range of applications and aims.
The perceptual characteristics of synthesis method can be
studied, characterized, and compared numerically or
graphically. The relationship between timbre, spectrum
and different musical scales can be investigated . Dif-
ferent mapping approaches for musical instrument can be
explored and compared. The rest of this paper is orga-
nized as follows. In Section 2 we describe the synthesis
analysis procedure and present the quality metric for de-
scriptors and parameters. Section 3 provides a summary
of the timbre space mapping strategy. The TSAM imple-
mentation is detailed in Section 4. Finally, Section 5 con-
cludes with discussion and future works.
2. TIMBRE RESPONSE ANALYSIS
Understanding the sonic variation resulting by tweaking
parameters is common when getting familiar with a
sound synthesizer. Different users may have distinct in-
Copyright: © 2016 Stefano Fasciani. This is an open-access article dis-
tributed under the terms of the Creative Commons Attribution License 3.0
Unported, which permits unrestricted use, distribution, and reproduction
in any medium, provided the original author and source are credited.
tents. Sound designers aim at synthesizer configurations
generating the their desired sound, whereas performers
and instrument builders look at a mapping that yields
sonic expressivity. Synthesizers generally feature a large
number of controllable parameters, representing the syn-
thesis algorithm variables. In analog synthesizers, each
parameter can theoretically assume an infinite number of
values, while in digital (or software) synthesizer we have
more than 4 billion possible values if considering single-
precision implementations (32 bit). Synthesizers inter-
faced using the MIDI protocol allow only up to 128 dis-
tinct values per parameter (7 bit), despite the resolution of
the internal circuitry. However with only three MIDI con-
trolled parameters we have more than 2 million (221) dif-
ferent parameter permutations or unique synthesis states.
This combinatorial explosion limits the feasibility of a
comprehensive analysis of the all timbre resultant from
each of these states.
Limiting the dimensionality of the parameter space al-
lows coping with the large number of synthesis states to
analyze, laving only a few variable parameters and fixing
the remaining to specific values. In this case the timbre
analysis is limited to a subset of the entire parameter
space, which is a scenario equivalent to users tweaking
only a few parameters of a synthesis configuration (or
preset). To further reduce the number of states to analyze
we use the principle of spatial locality: states close in the
parameter space generate similar timbres. Therefore we
can sample the parameter space with a larger step size,
and eventually interpolate at a later stage. This principle
is generally true if we exclude synthesis algorithms fea-
turing stochastic components, and parameters with strong
nonlinearities (e.g. binary switches). Generally, the oppo-
site of this principle does not hold. Proximity in the tim-
bre space does not necessarily imply similar parameter
configuration. The TSAM itself can be used to verify
these principles. A further reduction can be achieved lim-
iting the individual range of interest of each parameter.
Given k variable synthesis parameter, the synthesis state
space I (set of unique parameter permutations) is given
by the Equations (1)-(3) .
max 𝑖!−min 𝑖!
Each synthesis state is represented with a vector i with
dimensionality k, as in Equation (2), while n, the number
of vectors in I, depends on the individual range and step
size of the k parameters, as in (3). I is the synthesis state
space we consider for the timbre analysis, presenting di-
mensionality k and cardinality n.
2.1 Descriptors Set and Computation
For each state i of the sound synthesizer we compute a set
of audio descriptors, that we indicate with d, representing
the timbral descriptors of the resulting synthetic sound. A
large set of low-level computational descriptors, includ-
ing eventual redundancies, is essential for the detailed
timbre analysis we require in this context. A few higher-
level timbre descriptors (e.g. brightness, noisiness, color-
ation), often subjective and language dependent semantic
, are suitable to discriminate sounds with major tim-
bral differences, but in this context they fail to capture the
subtle sonic nuances determined by small variations of
the synthesis parameters.
A posterior descriptor selection is possible considering
the quality metric we present in this paper. The method is
independent of the specific descriptors set. In the TSAM
we use the CUIDADO features set  implemented in
the IRCAM descriptors object for Max/MSP. The set
includes spectral and perceptual features listed in Table 1.
It includes 24 scalar and 7 vectorial descriptors, as speci-
fied in the dimensionality column, resulting in a dimen-
sionality q of d equal to 108, as in (4). Some of the scalar
descriptors in the set are closely related to traditional
timbre labels (e.g. spectral centroid to brightness).
Signal Zero Crossing Rate
Perceptual Odd To Even Ratio
Perceptual Spectral Centroid
Perceptual Spectral Decrease
Perceptual Spectral Deviation
Perceptual Spectral Kurtosis
Perceptual Spectral Rolloff
Perceptual Spectral Skewness
Perceptual Spectral Slope
Perceptual Spectral Spread
Perceptual Spectral Variation
Relative Specific Loudness
Table 1. List of descriptors used in the TSAM.
The descriptors listed above are computed on a short
temporal window, typically in the range 2 ms to 200 ms.
They provide an instantaneous sonic representation suffi-
cient to characterize only absolutely periodic sounds. In
synthesis states we may observe and hear low rate timbre
variations, spanning beyond the largest temporal window
we consider for the descriptors. Hence an appropriate
characterization of the timbre requires computation and
merges of descriptors computed from multiple short time
windows. We propose two analysis modes named ‘sus-
tain’ and ’envelope’ mode. In the first, given a synthesis
state i, we compute m descriptor vectors and we combine
these taking their mean and optionally their range, as in
Equation (5), doubling the dimensionality of the de-
scriptor set. The second approach simply concatenates the
m descriptor vectors into a single vector, as in Equation
(6), increasing the dimensionality by m times.
max 𝐝!,…,𝐝!−min 𝐝!,…,𝐝!
Considering the synthesis as an binary process, and the
sound generated as almost periodic, the first approach
provides a sufficient approximation of the timbre. When
the synthesis produces dynamic timbres, such as texture-
like sounds, or when ADSR envelopes are applied to am-
plitude and other parameters, the second approach is pre-
ferred. However also in presence of ADSR envelopes, we
can still use the first approach, analyzing only the sustain
phase of the synthesis, intentionally discarding the attack,
decay and release phases, or because these do not signifi-
cantly change within the parameter space I we analyze.
The concatenation of short-term static descriptors to
analyze timbral dynamics is a simplification with respect
to the use of dynamic descriptors computed on longer
temporal windows. However this approach reduces the
time needed to execute the timbre analysis and allows
users to change the merging mode from ‘sustain’ to ‘en-
velope’ and vice versa without repeating the analysis.
In the TSAM implementation, presented in Section 4,
the computation of the descriptors is completely automat-
ed. Users are only required to identify the k variable pa-
rameters of the synthesizer, their range, step size, number
of descriptor vectors per state 𝑚, and analysis mode. The
tool computes I and drives the synthesizer with one i at a
time, computing and storing 𝑚 vectors d. For analysis in
envelope mode, the tool also manages the triggering of
the synthesizer at every i. Users can further specify the
temporal unfolding of the analysis, selecting only a sub-
set of the ADSR envelope. Advanced options related to
the descriptor computation, such as window size, hop
size, sampling rate, are exposed as well.
2.2 Descriptor Quality Metric
The quality metric we compute for each descriptor is
aimed at capturing the four characteristics listed below.
• Noisiness: deviation of the descriptor from its
mean given a synthesis state i.
• Variance: spread of descriptor value across the syn-
thesis state space I.
• Independence: uniqueness of the descriptor varia-
tion pattern across the synthesis state space I.
• Correlation: coherence of the descriptor variation
with synthesis parameters across the synthesis state
Ideally, a descriptor representative of I should present
low noisiness, high variance, high independence, and
high correlation. High noisiness indicates that a particular
descriptor and the associated timbral characteristic also
varies when synthesis parameters are fixed, and therefore
its eventual variance across I may be not significant. A
descriptor with low variance reveals that the related tim-
bral characteristic does not change significantly when
varying the synthesis parameters. Descriptors varying
with a similar trend are redundant, and thus less signifi-
cant, when computing a dimensionality-reduced timbre
space modeling I, instead those more independent carry a
larger amount of information. Descriptors can also be
highly independent when varying randomly across I. We
address this by also including the correlation between
descriptor and parameters in the metric, as we expect
representative descriptors to change accordingly to one or
more synthesis parameter.
For each descriptor, we compute the nosiness N!,𝐢 from
the m descriptor vectors in synthesis state i before these
are merged, as per Equations (5) and (6). The subscript x
is the index identifying the descriptor across the set of q
computed in the TSAM. For ‘sustain’ mode, we measure
the deviation of the descriptor x in the state i using the
Relative Mean absolute Difference (RMD), as in Equa-
tion (7). The RMD is a scale invariant measure of statisti-
cal dispersion, hence allows the comparison of heteroge-
neous descriptors. For ‘envelope’ mode, N!,𝐢 is estimated
as the zero crossing rate, as in Equation (8), of the for-
ward second order finite difference (discrete approxima-
tion of the second order derivative) of the series of m
descriptors, as in (9). This represents the rate at which a
descriptor inverts its trend (from increasing to decreasing
and vice versa) in the analyzed envelope. Noisy de-
scriptors invert their trends at higher rates.
In Equations (7)-(9), 𝑑!,! represents the x-th descriptor
in the set of q, from the j-th vector d out of the m com-
puted for each state i. The indicator function 𝕀 is
equal to 1 if its argument is true, 0 otherwise. Δ! is
the forward second order finite difference function. The
overall noisiness of each descriptor N! is computed by
taking the average over the set of synthesis unique states
I we analyze.
Variance, independence, and correlation are computed
across I, after the m descriptors are merged as in (5)-(6).
The same method is used for both ‘sustain’ and ‘enve-
lope’ modes. The variance V! is computed as the RMD
over the n synthesis states i. We use the same expression
as in (7), replacing m with n, but in this case 𝑑!,! is the x-
th descriptor in the set of q, from the j-th vector d out of
the n we compute across I.
We assume that descriptors are independent if poorly
correlated, therefore we compute I! taking the comple-
ment of the averaged absolute value of the correlation
coefficient between the descriptor x and the other q-1
descriptors over I, as in Equation (10). Both positive and
negative correlations indicate dependence, therefore we
take the absolute value of the correlation coefficient
corr . We subtract 1 from the summation to remove
the correlation coefficient of the descriptor with itself,
when j=x. Finally, the correlation C! between descriptors
and parameters is computed taking the average correla-
tion coefficient between the x-th descriptor and the k var-
iable synthesis parameter, as in Equation (11).
In (10) and (11) with 𝐝!,𝐈 we represent the vector con-
taining the n values of the x-th descriptor computed over
the synthesis state space I, while 𝐢!,𝐈 represents the vector
containing the n values of the x-th synthesis parameter
over I. Note that according to (5) and (6) each descriptor
may contribute with more than one component in each
vector d. In particular, for ‘sustain’ mode we have two
components per descriptor if the range is included in the
analysis, whereas for the ‘envelope’ mode we have m
components per descriptor. Therefore we compute multi-
ple V!, I! and C! per each x-th descriptor, and use their
average in the quality metric we introduce next.
The quality metric S! of each descriptor is computed
from the individual noisiness, variance, independence,
and correlation as in Equation (12). The noisiness, being
an undesirable feature, lowers the value of S!. The four
components are combined using individual weights 𝑤.
The selection of the 𝑤 values depends on the aim and
context of the timbre analysis, and also on individual
preferences. For instance, when analyzing a synthesizer
configuration with a texture-like timbre, we expect con-
siderable sonic variation within each synthesis state i,
therefore the noisiness has no significance and 𝑤! should
be close to zero. If the purpose of the analysis is the sole
study of the synthesizer timbre through the descriptors,
their independence has little relevance. Instead when de-
scriptors are used for mapping purposes, as in Section 3,
the independence has a higher significance. In the TSAM,
the default values of the weights are 0.33 for variance,
independence and correlation, and 0.66 for noisiness.
Users can change these in the unitary range. The four
components of the quality metric have different ranges. I!
and C! span between [0,1], while N! and V! can be zero
but do not have a theoretical maximum. In the TSAM we
include the option to normalize these to the unitary range,
easing the balancing through individual weights. Howev-
er when comparing the quality metrics S! across different
synthesizers, or between different state spaces I of the
same synthesizer, normalization should not be used. In
the TSAM we also rank also the k synthesis parameter by
their average correlation with the q descriptors, computed
as in (11) but replacing k with q and taking x as the sum-
mation index. Furthermore for each parameter, the
TSAM displays the two descriptors with associated high-
est and lowest correlation, and vice versa.
3. TIMBRE SPACE MODELING AND
Audio descriptors have been extensively used for visuali-
zation, measurement, classification, and recognition of
sounds. Works proposing the timbre as a control structure
for sound synthesis  or for interactive sonic systems
have recently proliferated –. These allow for ex-
plicit control of psychoacoustic characteristics of the
generated sound, hiding synthesis parameters from users,
simplifying the user interaction, facilitating the search for
specific timbres, and enhancing the expressivity of the
system. Similar benefits are provided by synthesis meth-
ods using a timbre representation derived by a prior anal-
ysis stage of the target sound , . A model relating
parameters to sonic response of the sound synthesizer is
necessary to implement explicit timbre control. Our ge-
neric approach, introduced in  and extended here, de-
rives a model from the prior analysis stage, and therefore
it is independent of the specific synthesis method and
The generative mapping is based on unsupervised ma-
chine learning techniques, and it provides a low dimen-
sional and perceptually related synthesis control. The
mapping maximizes the breadth of the explorable sonic
space covered by the synthesis space I, and minimizes
possible timbre losses due to the reduced dimensionality
of the control space (i.e. few-to-many mapping). The
timbre response analysis described in the previous section
returns a synthesis space I, with dimensionality k, and a
descriptor space D, with dimensionality q. Both spaces
present n entries i and d, which are pairwise associated,
representing a basic model relating parameters and tim-
bre. Hence we can explicitly express a timbre through the
q descriptors (e.g. mapped on a large bank of faders), find
the closest entry in D, and drive of the synthesizer with
the associated parameter set i. Such control is affected by
several drawbacks: the high dimensionality of the timbre-
based control, with q generally much greater than k; the
lack of accuracy due to the large parameter step size we
use in the analysis stage (3); entries in the timbre space D
are not evenly distributed as in I, hence regions of D with
low density determine a poor system response.
The real dimensionality of D is usually much less than
q. Generally the data of interest lies on an embedded non-
linear manifold within the q-dimensional space. There-
fore we reduce the dimensionality of D, using Isomap,
down to two or three dimensions, which are easy to map
to general-purpose controllers with low cognitive com-
plexity. In the TSAM users can explore the application of
34 different dimensionality reduction methods .
Before reducing the dimensionality of D, we use the
quality metric S! to discard those descriptors with a low
score. Particularly noisy or poorly correlated descriptors
present a large variance that have a significant impact in
the dimensionality reduction stage, but this would not be
not representative of the parameter-to-timbre relationship,
corrupting the timbre space mapping. The selection of
descriptors based on the quality metric determines im-
provements in accuracy and usability against our previ-
ous approach. Alternatively, users can bypass the dimen-
sionality reduction stage, and explicitly specify the two or
three descriptors composing the low dimensional timbre
space we use for the mapping to synthesis parameters.
To address the issue of the possible unresponsiveness of
the timbre space due to arbitrary distribution in D we
apply an iterative algorithm based on the Voronoi tessel-
lation, derived from , that redistribute the n entries d
into an uniformly distributed square or cube, while pre-
serving the local neighborhood relationships (homomor-
phic transformation). The inverse of this transformation
represent the required mapping to project a generic mul-
tidimensional control space C onto the case specific tim-
bre space. Hence we use an Artificial Neural Network
(ANN) to learn a function 𝑚 approximating the in-
verse of the redistribution process. We use 𝑚 to pro-
ject the generic multidimensional control vector c onto
the dimensionally reduced timbre space D*. The ANN
includes a single hidden layer and therefore can be
trained efficiently using a non-iterative algorithm . In
Figure 1 we show an example of a highly clustered tim-
bre space reduced to three dimensions, and its transfor-
mation to a uniform cube. The side arrows identify the
two stages of the mapping computation. In the TSAM we
provide also an alternative mapping, skipping the ANN
and computing the synthesis parameters directly from the
uniformly distributed timbre space.
In the final stage of the mapping we compute the pa-
rameters to interact with the sound synthesizer. We use d*
to represent a descriptor vector in the dimensionality re-
duced timbre space D*. Driving the synthesis with the
parameters i associated with the d* closer to 𝑚𝐜 may
lead to discontinuities, that in turn may generate glitches
in the sonic output. These are due to the coarse parameter
step size used in the analysis stage, and due to the not
one-to-one relationship between parameters and sound.
Two synthesis states i, far apart in the synthesis state
space I, may be associated identical or similar descriptor
vectors d, hence close in D. The latter is an implicit
drawback of any methods for controlling sound synthesis
from any representation of the generated signal.
Figure 1. Example of a timbre space reduced to three
dimensions, and related transformation to a uniform cube.
We address these issues computing the synthesis pa-
rameter by spatial interpolation, including only entries of
D* from the neighborhood the current state i. The set of
parameters driving the synthesizer 𝐢!"#$ is computed by
Inverse Distance Weighting (IDW) as in Equations (12)
and (13), where represent the Euclidean distance.
In (12) and (13) N represents the total number of points
considered in the interpolation, and the 𝐢! in (12) are
those pairwise associated with the 𝐝!
∗ in (13). In the
TSAM instead of using the N closest point 𝐝!
∗ in D*, we
select those 𝐝!
∗ that limit the maximum variation of 𝐢!"#$
between two consecutive iterations, that is the set of 𝐝!
associated with the 𝐢! close to the current 𝐢!"#$ (within a
user-defined distance). In Figure 2 we show an example
of this interpolation points selection, where the green
entries are the 𝐝!
∗ related to 𝐢! close to the current 𝐢!"#$,
which is in turn associated with the yellow one in figure.
The set of 𝐝!
∗ used for IDW interpolation may include
entries distant from 𝑚𝐜, but these will poorly contrib-
ute in (12). In the IDW, p represents the power parame-
ter, which determines the influence of each point based
on the distance. This value should be larger than the di-
mensionality of the reduced timbre space D*, and increas-
ing p closer points has larger weight. In the TSAM, the
𝐢!"#$ maximum instantaneous distance and interpolation
power parameter p, are among the options exposed to
users to tune in real time the timbre mapping response.
The TSAM provides interactive timbre space visualiza-
tions, such as those in Figure 1 and 2.
Figure 2. Detail of a timbre space reduced to three di-
mensions. The green entries are those used in the interpo-
lation to compute the synthesis parameter, because close
to the yellow current entry in the synthesis state space.
4. IMPLEMENTATION AND USAGE
The TSAM1 is an open-source software implemented in
in Max/MSP using FTM extension2 , supported by a
background engine written and compiled in MATLAB.
The analysis of the synthesis timbre, the real-time timbre
space mapping and the visualizations are computed in
Max/MSP, whereas the background engine computes the
descriptor quality and the timbre space mapping (dimen-
sionality reduction, redistribution, ANN training), taking
as input the outcome of the analysis stage. The two com-
ponents of the system communicate via Open Sound
Control (OSC) protocol and large matrices are exchanged
using files. The TSAM can host software synthesizer
developed using Steinberg’s Virtual Studio Technology
(VST). It acts as a wrapper for VST synth, providing a
fully integrated environment. The TSAM allows full con-
trol of all parameters for analysis and mapping purposes.
It captures the synthetized signal for descriptor computa-
tion and playback, and manages the global state of the
synthesizer when saving and restoring presets. In Figure
3 there is a screenshot of the main TSAM GUI. This ex-
poses a large number of options for further exploration of
the mapping method we propose, and also for customiz-
ing analysis, mapping computation, real-time control, and
visualization. Default settings are provided for basic use.
Users can load a VST synth and select up to 10 variable
parameters, their range, analysis step size, and the num-
ber of vectors m per state i. Advanced analysis options
include digital signal processing settings and analysis
timing with respect to the synthesis triggering (note-on
and note-off messages). The TSAM estimates and shows
the total analysis time, and users may opt to reduce the
parameter step sizes, in (3), when this is excessive.
Thereafter the analysis is carried out automatically. In
Section 2 we discussed two analysis modes, ‘sustain’ and
‘envelope’ respectively. These, besides the automatic
mode, can also be carried out manually. Users arbitrarily
tune the synthesizer to a specific state i, and request for
the descriptor analysis of the related sonic response (both
modes are supported). Furthermore we included the inter-
active ‘sustain’ analysis mode  where descriptor vec-
tors d are computed while users vary in the MIDI mapped
synthesis parameters in real-time, dynamically generating
a stream of i. The latter analysis mode does not guarantee
to observe an identical number of descriptor vectors d per
state i, hence the noisiness in the quality metric result
may be inconsistent.
When the analysis stage is completed, users can request
the computation of the descriptor quality metric, which is
visualized in the TSAM as shown in Figure 4. In the de-
scriptors page, users can also specify the weights of
Equation (12), enable the normalization of its compo-
nents, find and rank the descriptors by highest score, ob-
serve the synthesis parameter ranking, and find the high-
est and lowest correlation between each parameter and
descriptor. Furthermore, users can specify which subset
of the 108 descriptors will be used for mapping purposes.
Options for the timbre space mapping computation in-
clude the dimensionality of the map, selection of the di-
mensionality reduction technique and the ANN activation
function. The mapping can be tuned at runtime using the
settings discussed in Section 3. The timbre analysis, qual-
ity metric, and mapping are saved into files that can be
individually recalled through the TSAM presets.
5. DISCUSSION AND FUTURE WORK
We presented a generic tool that integrates functionalities
to study and map the timbre of sound synthesizers. Pre-
liminary studies demonstrated that the adoption of large
sets of descriptors, and their selection based on the novel
quality metric, improves the accuracy of the timbre-based
interaction. The TSAM can be used for the study of the
sonic response of synthesizers, for an explicit control of
timbral character, or for a reduction of the synthesis con-
trol space, exposing only a few perceptually relevant con-
trol dimensions. Previous user studies on a system with a
similar mapping approach demonstrated that synthesis
parameters become transparent to users , which are
exclusively focused on the timbral interaction. Future
works include user studies with the TSAM to evaluate the
effectiveness of the timbre-based mapping, comparing it
against traditional and alternative approaches to sound
synthesis interaction, in performing and sound design
scenarios. Moreover we will investigate the relevance of
different descriptor categories for a more perceptually
related sonic control.
Figure 3. TSAM main page, including options for analysis, mapping computation, real-time control, and visualization.
Figure 4. TSAM descriptor page, providing an insight into the timbre response and parameter relationship of the synth.
 T. Wishart, On Sonic Art. Harwood Academic Pub-
 S. McAdams and A. Bergman, “Hearing musical
streams,” Comput. Music J., vol. 3, no. 4, pp. 26–43,
 J. C. Risset and D. Wessel, “Exploration of timbre
by analysis and synthesis,” Psychol. Music, pp. 113–
 S. Fels, “Intimacy and embodiment: implications for
art and technology,” in Proc. of the 2000 ACM
workshops on Multimedia, 2000, pp. 13–16.
 E. R. Miranda and M. M. Wanderley, New digital
musical instruments: control and interaction beyond
the keyboard. A-R Editions, Inc., 2006.
 D. Arfib, J. M. Couturier, L. Kessous, and V. Ver-
faille, “Strategies of mapping between gesture data
and synthesis model parameters using perceptual
spaces,” Organ. Sound, vol. 7, no. 2, pp. 127–144,
 S. Fasciani, “Interactive Computation of Timbre
Spaces for Sound Synthesis Control,” in Proc. of the
2nd Int. Symposium on Sound and Interactivity, Sin-
 W. A. Sethares, Tuning, Timbre, Spectrum, Scale.
Springer Science & Business Media, 2005.
 S. Fasciani and L. Wyse, “Adapting general purpose
interfaces to synthesis engines using unsupervised
dimensionality reduction techniques and inverse
mapping from features to parameters,” in Proc. of
the 2012 Int. Computer Music Conf., Ljubljana, Slo-
 A. Zacharakis, K. Pastiadis, and J. D. Reiss, “An
Interlanguage Unification of Musical Timbre,” Mu-
sic Percept. Interdiscip. J., vol. 32, no. 4, pp. 394–
412, Apr. 2015.
 G. Peeters, “A Large Set of Audio Features for
Sound Description (Similarity and Classification) in
the Cuidado Project,” IRCAM, 2004.
 D. Wessel, “Timbre space as a musical control struc-
ture,” Comput. Music J., vol. 3, no. 2, pp. 45–52,
 A. Lazier and P. R. Cook, “Mosievius: feature driv-
en interactive audio mosaicing,” in Proc. of the 7th
Int. Conf. on Digital Audio Effects, Napoli, Italy,
 M. Puckette, “Low-dimensional parameter mapping
using spectral envelopes,” in Proc. of the 2004 Int.
Computer Music Conf., Miami, US, 2004.
 C. Nicol, S. A. Brewster, and P. D. Gray, “Designing
Sound: Towards a System for Designing Audio In-
terfaces using Timbre Spaces.,” in Proc. of the 10th
Int. Conf. on Auditory Display, Sydney, Australia,
 D. Schwarz, G. Beller, B. Verbrugghe, and S. Brit-
ton, “Real-time corpus-based concatenative synthe-
sis with CARART,” in Proc. of the 9th Int. Conf. on
Digital Audio Effects, Montreal, Canada, 2006, pp.
 M. Hoffman and P. R. Cook, “Feature-based synthe-
sis: Mapping acoustic and perceptual features onto
synthesis parameters,” in Proc. of the 2006 Int.
Computer Music Conf., New Orleans, US, 2006.
 N. Schnell, M. A. S. Cifuentes, and J. P. Lambert,
“First steps in relaxed real-time typo-morphological
audio analysis/synthesis,” in Proceeding of the 7th
Sound and Music Computing Int. Conf., Barcelona,
 T. Grill, “Constructing high-level perceptual audio
descriptors for textural sounds,” in Proc. of the 9th
Sound and Music Computing Int. Conf., Copenha-
gen, Denmark, 2012.
 A. Seago, “A New Interaction Strategy for Musical
Timbre Design,” in Music and Human-Computer In-
teraction, S. Holland, K. Wilkie, P. Mulholland, and
A. Seago, Eds. Springer, 2013, pp. 153–169.
 A. Pošćić and G. Kreković, “Controlling a sound
synthesizer using timbral attributes,” in Proc. of the
10th Sound and Music Computing Int. Conf., Stock-
holm, Sweden, 2013.
 N. Klügel, T. Becker, and G. Groh, “Designing
Sound Collaboratively Perceptually Motivated Au-
dio Synthesis,” in Proc. of the 14th Int. Conf. on
New Interfaces for Musical Expression, London,
United Kingdom, 2014, pp. 327–330.
 S. Ferguson, “Using Audio Feature Extraction for
Interactive Feature-Based Sonification of Sound,” in
Proc. of the 21st Int. Conf. on Auditory Display
(ICAD 2015), Graz, Austria, 2015.
 S. Stasis, R. Stables, and J. Hockman, “A Model For
Adaptive Reduced-Dimensionality Equalisation,” in
Proc. of the 18th Int. Conf. on Digital Audio Effects
(DAFx-15), Trondheim, Norway, 2015.
 X. Serra and J. Smith, “Spectral Modeling Synthesis:
A Sound Analysis/Synthesis System Based on a De-
terministic Plus Stochastic Decomposition,” Com-
put. Music J., vol. 14, no. 4, pp. 12–24, 1990.
 T. Jehan and B. Schoner, “An audio-driven percep-
tually meaningful timbre synthesizer,” in Proc. of
the 2001 Int. Computer Music Conf., Havana, Cuba,
 L. J. P. Van Der Maaten, E. O. Postma, and H. J.
Van Den Herik, “Dimensionality reduction: a com-
parative review,” Tilburg University Technical Re-
 H. Nguyen, J. Burkardt, M. Gunzburger, L. Ju, and
Y. Saka, “Constrained CVT meshes and a compari-
son of triangular mesh generators,” Comput. Geom.,
vol. 42, no. 1, pp. 1–19, Jan. 2009.
 G. B. Huang, Q. Y. Zhu, and C. K. Siew, “Extreme
learning machine: Theory and applications,” Neuro-
computing, vol. 70, no. 1–3, pp. 489–501, Dec.
 N. Schnell, R. Borghesi, D. Schwarz, F. Bevilacqua,
and R. Muller, “FTM - Complex Data Structure for
Max,” in Proc. of the 2005 Int. Computer Music
Conf., Barcelona, Spain, 2005.
 S. Fasciani, “Voice-controlled interface for digital
musical instruments,” Ph.D. Thesis, National Uni-
versity of Singapore, Singapore, 2014.