Conference PaperPDF Available



Abstract and Figures

This paper presents a real-time additive sound synthesis application with individual outputs for each partial and noise component. The synthesizer is programmed in C++, relying on the Jack API for audio connectivity with an OSC interface for control input. These features allow the individual spatialization of the partials and noise, referred to as spectro-spatial synthesis, in connection with an OSC capable spatial rendering software. Additive synthesis is performed in the time domain, using previously extracted partial trajectories from instrument recordings. Noise is synthesized using bark band energy trajectories. The sinusoidal data set for the synthesis is generated from a custom violin sample library in advance. Spatialization is realized using established rendering software implementations on a dedicated server. Pure Data is used for processing control streams from an expressive musical interface and distributing it to synthesizer and renderer.
Content may be subject to copyright.
Proceedings of the 17th Linux Audio Conference (LAC-19), CCRMA, Stanford University, USA, March 23–26, 2019
Henrik von Coler
Audio Communication Group
TU Berlin
This paper presents a real-time additive sound synthesis appli-
cation with individual outputs for each partial and noise component.
The synthesizer is programmed in C++, relying on the Jack API for
audio connectivity with an OSC interface for control input. These
features allow the individual spatialization of the partials and noise,
referred to as spectro-spatial synthesis, in connection with an OSC
capable spatial rendering software. Additive synthesis is performed
in the time domain, using previously extracted partial trajectories
from instrument recordings. Noise is synthesized using bark band
energy trajectories. The sinusoidal data set for the synthesis is gen-
erated from a custom violin sample library in advance. Spatialization
is realized using established rendering software implementations on
a dedicated server. Pure Data is used for processing control streams
from an expressive musical interface and distributing it to synthe-
sizer and renderer.
1.1. Sinusoidal Modeling
Additive synthesis is among the oldest digital sound creation meth-
ods and has been the foundation of early experiments by Max Math-
ews at Bell Labs. It allows the generation of sounds rich in timbre,
by superimposing single sinusoidal components, referred to as par-
tials, either in the time- or frequency domain. Based on the Fourier
Principle, any quasi-periodic signal y(t)can be expressed as a sum
of Npart sinusoids with varying amplitudes an(t)and frequencies
ωn(t)and an individual phase offset ϕn:
y(t) =
In harmonic cases, which applies to the majority of musical in-
strument sounds, the partial frequencies can be approximated as in-
teger multiples of f0:
y(t) =
an(t)sin(2 π n f0(t)t+ϕn)(2)
Although relative phase fluctuations are important for the per-
ception [1], the original phase can be ignored in many cases, which
is of benefit for manipulations of the modeled sound:
y(t) =
an(t)sin(2 π n f0(t)t)(3)
Based on this theory, an algorithm for speech synthesis has been
proposed by McAulay et al. [2]. For musical sound synthesis the
algorithm has been added a noise component [3], resulting in the
sinusoids+noise model. The signal is then modeled as the sum of the
deterministic part xdet and the stochastic part xstoch, also referred
to as residual:
x=xdet +xstoch (4)
Modeling of residuals can for example be performed by approx-
imating the spectral envelope using linear predictive coding [3] or a
filter bank based on Bark frequencies [4]. The phase of the stochastic
signal is random, in theory, and thus needs not be modeled. However,
residuals usually are not completely random since they still contain
information from the removed harmonic content.
In order to fully model the sounds of arbitrary musical instru-
ments, a transient component xtrans is included [4] in the full signal
model. This component captures plucking sounds and other percus-
sive elements:
x=xdet +xstoch +xtrans (5)
Since the work presented in this paper focuses on the violin in
legato techniques, the transient component can be neglected without
impairing the perceived quality of a re-synthesis.
1.2. Spectral Spatialization
In electronic and electroacoustic music, the term spectral spatializa-
tion refers to the individual treatment of a sound’s frequency compo-
nents for a distribution on sound reproduction systems [5]. Timbral
sound qualities can thusly be linked to the spatial image of the sound,
even for pre-existing or fixed sound material. In the case of spectro-
spatial synthesis, this process is integrated on the synthesis level,for
example in additive approaches. This is not yet a common feature
in available synthesizers, but several research projects have been in-
vestigating the possibilities of such approaches with applications in
musical sound processing, sound design, virtual acoustics and psy-
Topper et al. [6] apply additive synthesis of basic waveforms
(square wave, sawtooth), physical modeling and sub-band decompo-
sition in a multichannel panning system with real time, prerecorded
and graphic control. Their system is implemented in MAX/MSP and
RTcmix, running on both Mac and PC/Linux hardware with a total
of 8 audio channels.
Verron et al. [7] use the sinusoids + noise model for spectral
spatialization of environmental sounds. Each component can be syn-
thesized with individual position in space on Ambisonics and Binau-
ral systems. Deterministic and stochastic components are composed
and added together in the frequency domain and subsequently spa-
tially encoded with a filterbank. Control over the synthesis process
is depending on the nature of the environmental sounds [8].
In the context of electroacoustic music, James [9] expands Den-
nis Smalley’s concept of spectromorphology to the idea of spatiomor-
phology.Timbre Spatialization is achieved using terrain surfaces
Proceedings of the 17th Linux Audio Conference (LAC-19), CCRMA, Stanford University, USA, March 23–26, 2019
0 50 100
F rame
Partial index
Figure 1: Partial amplitude trajectories of a violin sound
0 50 100
F rame
Partial index
Figure 2: Partial frequency trajectories of a violin sound
and by mapping these to spacio-spectral distributions. Max-MSP
is used for computing the contribution of spectral content to individ-
ual speakers with Distance-based amplitude panning (DBAP) and
Ambisonic Equivalent panning (AEP) methods.
Spectral spatialization can also be used to synthesize dynamic
directivity patterns of musical instruments in virtual acoustic envi-
ronments. Since the directivity in combination with movement has
a significant influence on an instrument’s sound, this can increase
the plausibility. Warusfel et al. [10] use a tower with three cubes,
each containing multiple speakers, to spatialize frequency bands of
an input signal for the simulation of radiation patterns.
1.3. The Presented Application
The presented application incorporates different synthesis modes, of
which only the so called deterministic mode will be subject of this
paper. In this basic mode, precalculated parameter trajectories, as
presented in Sec. 2, are used for a manipulable resynthesis of the
original instrument sounds.
The software architecture is designed to allow the use of addi-
tive synthesis, respectively of sinusoidal modeling, on sound field
synthesis systems or other reproduction setups. This is achieved by
providing individual outputs for all partials and noise bands in an
application implemented as a JACK client, described in Sec. 3. Us-
ing JACK allows the connection of all individual synthesizer output
0 100 200 300
F rame
Partial index
Figure 3: Unwrapped partial phases of a violin sound
0 100 200 300
F rame
Bark band
Figure 4: Bark band energy trajectories of a violin sound
channels to a JACK-capable renderer, such as the SoundScape Ren-
derer (SSR) [11], Panoramix [12] or the HOA- Library [13]. Making
each partial a single virtual sound source in combination with these
rendering softwares, the spatial distribution of the synthesis can be
modulated in real-time. Pure Data [14] is used to receive control
data from gestural interfaces or to play back predefined trajectories
for generating control streams for both the synthesizer and the spa-
tialization renderer. A direct linkage between timbre and spatializa-
tion is thus created, which is considered essential for a meaningful
spectro-spatial synthesis.
The TU-Note Violin Sample Library [15], [16], is used as audio con-
tent for generating the sinusoidal model. Designed in the style of
classic sample libraries, this data set contains single sounds of a vio-
lin in different pitches and intensities, recorded at an audio sampling
rate of 96 kHz with 24 Bit resolution.
Analysis and modeling is performed beforehand in Matlab, us-
ing monophonic pitch tracking and subsequent extraction of the par-
tial trajectories by peak picking in the spectrogram. YIN [17] and
SWIPE [18] are used as monophonic pitch tracking algorithms. Based
on the f0-trajectories, partial tracking is performed with STFT, ap-
plying a hop-size of 256 samples (2.7ms) and a window size of
Proceedings of the 17th Linux Audio Conference (LAC-19), CCRMA, Stanford University, USA, March 23–26, 2019
jackclient:JackClient voicemanager:VoiceManager singlevoice:SingleVoice sinusoid:Sinusoid ressynth:ResidualSynth
Figure 5: Sequence diagram for the jack callback function
4096 samples, zero-padded to 8192 samples. Quadratic interpola-
tion (QIFFT), as presented by Smith et al. [19], is applied for peak
parameter estimation of up to 80 partials. Due to the sampling fre-
quency, the full number of partials is only analyzable up to the note
D5(576.65 Hz)
By subtracting the deterministic part from the complete sound in
the time domain, the residual signal is obtained. The residual is then
filtered using a Bark scale filterbank with second order Chebyshev
bandpasses and the temporal energy trajectories are calculated for
the resulting 24 band-limited signals. At this point, a large amount
of information is removed from the residual signal. Due to the short-
comings of the time domain subtraction method, the residual still
contains information from the deterministic component. By averag-
ing the energy over the Bark bands, this relation is eliminated.
Results of the synthesis stage are trajectories of the partial am-
plitudes, as shown in Figure 1, the trajectories of partial frequencies
and phases, as shown in Figure 2, respectively Figure 3 as well as the
trajectories of the Bark-band energies, illustrated in Figure 4. The
resulting data is exported to individual YAML files for each sound,
which can be read by the synthesis system.
3.1. Libraries
The synthesis application is designed as a standalone Linux com-
mand line software. The main functionality of the synthesis system
relies on the JACK1API for audio connectivity and the liblo2, respec-
tively the liblo C++ wrapper for receiving control signals. libyaml-
cpp3is used for reading the data of the modeled sounds and the rel-
evant configuration files. libsndfile4for reading the original sound
files, as well as the libfftw5are included but not relevant for the as-
pects presented in this paper. Frequency domain synthesis and sam-
ple playback are partially implemented but not used at this point.
3.2. Algorithm
Both the sinusoidal and the noise component are synthesized in the
time domain, using a non-overlapping method. For the sinusoidal
component, the builtin sin() function of the cmath library and a
custom lookup table can be selected. The choice does not affect the
overall performance, significantly. The filter bank for the noise syn-
thesis consists of 24 second order Chebyshev bandpass filters with
fixed coefficients, calculated before runtime. The amplitude of each
frequency band is driven by the previously analyzed energy trajecto-
During synthesis, the algorithm reads a new set of support points
from the model data for each audio buffer and increments the posi-
tion within the played note. Figure 5 shows a sequence diagram
for the deterministic synthesis algorithm, starting at the JACK call-
back function, which is executed for each buffer of the JACK audio
server. Since the synth is designed to enable polyphonic play, the
voice manager object handles incoming OSC messages in the func-
tion update_voices() to activate or deactivate single voices.
Proceedings of the 17th Linux Audio Conference (LAC-19), CCRMA, Stanford University, USA, March 23–26, 2019
Control input
Spatial Renderer
Audio Out
MIDI, OSC, ...
(1ch / partial)
(1ch / speaker)
Figure 6: Combination of synthesizer and renderer on separate ma-
chines using Pure Data for synth configuration and parameter parsing
For the synthesis of mostly monophonic, excitation continuous in-
struments like the violin, the polyphony merely handles the overlap-
ping of released notes. Subsequently, the voice manager loops over
all active voices in the function getNextFrame_TD(), first set-
ting the new control parameters for each voice.
In cycle_start_deterministic(), support points for
all partial’s parameters are picked at the relevant voice’s playback
position. These support points are then linearly interpolated over the
buffer length in set_interpolator().
Finally, in getNextBlock_TD(), each single voice gener-
ates the output for all sinusoids and all noise bands in two separate
vectorizable loops, adding both to the output buffer.
3.3. Runtime Environment and Periphery
The runtime system for the synthesis is starting a JACK server with
48 kHz sampling rate, a buffer size of 128 samples and 2 periods
per buffer. This results in 5.3 ms latency for the audio playback,
which is within the limits for this synthesis approach. On an Intel(R)
Core(TM) i7-5500U CPU @ 2.40GHz with disabled speed-stepping
and a Fireface UFX, the JACK server is showing an average load of
approximately 20 %.
The interaction of the involved software components is visual-
ized in Figure 6. For reasons of performance and increased flexibility
in the studio, two separate machines are used for synthesis and spa-
tialization. Connectivity between the systems is realized with MADI
or DANTE, using individual channels for the 80 partials and 24 noise
3.4. Control
321 1 2 3
Figure 7: Spatialization scene in a 2D setup with 30 partials and their
The control data for the partial positions in the rendering soft-
ware is not generated in the synthesis system at this point and is
managed, externally. This offers more flexibility for testing different
mappings at this stage of development. A Pure Data patch is used to
receive incoming control messages, either from OSC or MIDI, and
distribute them to the synthesizer and the spatialization software. For
live performance, the patch receives continuous control streams for
pitch and intensity from an improved version of the interface pre-
sented by von Coler et al. [20] and visualizes the sensor data. Pitch
and intensity are forwarded to the synth, directly. Additionally, data
from several Force Sensitive Resistors (FSR) and a 9 degrees of free-
dom IMU, which can be used for controlling the spatialization, is
sent to the patch.
Figure 7 shows an example for a simple spatialization mapping
on a 2D system. The absolute orientation of the IMU is used to con-
trol the general direction ϕof the partial flock. A second parameter
S, derived from the intensity and additional sensor data, controls the
spread of the partials around this angle, depending on the partial in-
After significantly improving the performance of the synthesis sys-
tem, the application can now be used with the full 80 partials and
24 Bark bands as individual outputs. Recent tests in combination
with different spatial rendering softwares and different loudspeaker
setups show promising results. However, the dynamic spatialization
of such number of virtual sound sources and the resulting traffic of
OSC messages is demanding for the runtime system. Using separate
machines for synthesis and rendering reduces the individual load.
The number of rendering inputs can also be reduced without limit-
ing the perceived quality of the spatialization. Multiple partials may
share one virtual sound source.
Proceedings of the 17th Linux Audio Conference (LAC-19), CCRMA, Stanford University, USA, March 23–26, 2019
Next steps are now possible, which include the empirical inves-
tigation of mappings from controller sensors to both the spectral and
spatial sound properties. This includes user experiments to evalu-
ate different mapping and control paradigms, as well as perceptual
measurements of the synthesis results.
Thanks to Benjamin Wiemann for contributions to the project in it’s
early stage and to Robin Gareus for the help in restructuring the code
and hencewith improving the performance.
[1] T. H. Andersen and K. Jensen, “Importance and Representa-
tion of Phase in the Sinusoidal Model”, J. Audio Eng. Soc,
vol. 52, no. 11, pp. 1157–1169, 2004.
[2] R. McAulay and T. Quatieri, “Speech analysis/Synthesis based
on a sinusoidal representation”, Acoustics, Speech and Signal
Processing, IEEE Transactions on, vol. 34, no. 4, pp. 744–
754, 1986.
[3] X. Serra and J. Smith, “Spectral Modeling Synthesis: A Sound
Analysis/Synthesis System Based on a Deterministic Plus Stochas-
tic Decomposition ”, Computer Music Journal, vol. 14, no. 4,
pp. 12–14, 1990.
[4] S. N. Levine and J. O. Smith, “A Sines+Transients+Noise
Audio Representation for Data Compression and Time/Pitch
Scale Modi cations”, in Proceedings of the 105th Audio En-
gineering Society Convention, San Francisco, CA, 1998.
[5] D. Kim-Boyle, “Spectral spatialization - an Overview”, in
Proceedings of the International Computer Music Conference,
Belfast, UK, 2008.
[6] D. Topper, M. Burtner, and S. Serafin, “Spatio-operational
spectral (sos) synthesis.”, in Proceedings of the International
Computer Music Conference (ICMC), Singapore, 2003.
[7] C. Verron, M. Aramaki, R. Kronland-Martinet, and G. Pal-
lone, “Spatialized additive synthesis of environmental sounds”,
in Audio Engineering Society Convention 125, Audio Engi-
neering Society, 2008.
[8] C. Verron, G. Pallone, M. Aramaki, and R. Kronland-Martinet,
“Controlling a spatialized environmental sound synthesizer”,
in 2009 IEEE Workshop on Applications of Signal Processing
to Audio and Acoustics, IEEE, 2009, pp. 321–324.
[9] S. James, “Spectromorphology and spatiomorphology of sound
shapes: Audio-rate aep and dbap panning of spectra”, in Pro-
ceedings of the International Computer Music Conference 2015,
[10] O. Warusfel and N. Misdariis, “Directivity synthesis with a 3d
array of loudspeakers: Application for stage performance”, in
Proceedings of the COST G-6 Conference on Digital Audio
Effects (DAFX-01), Limerick, Ireland, 2001, pp. 1–5.
[11] J. Ahrens, M. Geier, and S. Spors, “The SoundScape Ren-
derer: A unified spatial audio reproduction framework for ar-
bitrary rendering methods”, in Audio Engineering Society Con-
vention 124, Audio Engineering Society, 2008.
[12] T. Carpentier, “Panoramix: 3d mixing and post-production
workstation”, in Proceedings of the International Computer
Music Conference (ICMC), 2016.
[13] A. Sèdes, P. Guillot, and E. Paris, “The HOA library, review
and prospects”, in International Computer Music Conference|
Sound and Music Computing, 2014, pp. 855–860.
[14] M. S. Puckette, “Pure Data”, in Proceedings of the Interna-
tional Computer Music Conference (ICMC), Thessaloniki,
Greece, 1997.
[15] H. von Coler, J. Margraf, and P. Schuladen, TU-Note Vio-
lin Sample Library, TU-Berlin, 2018. DO I:10 . 14279 /
[16] H. von Coler, “TU-Note Violin Sample Library – A Database
of Violin Sounds with Segmentation Ground Truth”, in Pro-
ceedings of the 21st Int. Conference on Digital Audio Effects
(DAFx-18), Aveiro, Portugal, 2018.
[17] A. de Cheveigné and H. Kawahara, “YIN, a Fundamental
Frequency Estimator for Speech and Music”, The Journal of
the Acoustical Society of America, vol. 111, no. 4, pp. 1917–
1930, 2002.
[18] A. Camacho, “Swipe: A Sawtooth Waveform Inspired Pitch
Estimator for Speech and Music”, PhD thesis, Gainesville,
FL, USA, 2007.
[19] J. O. Smith and X. Serra, “PARSHL: An Analysis/Synthesis
Program for Non-Harmonic Sounds Based on a Sinusoidal
Representation”, Center for Computer Research in Music and
Acoustics (CCRMA), Stanford University, Tech. Rep., 2005.
[20] H. von Coler, G. Treindl, H. Egermann, and S. Weinzierl,
“Development and Evaluation of an Interface with Four-Finger
Pitch Selection”, in Audio Engineering Society Convention
142, Audio Engineering Society, 2017.
... Sound synthesis is carried out in a standalone application, implemented as a Jack 1 client on a Linux audio system [7]. The underlying method for sound synthesis is based on a statistical model of spectral modeling data, gathered from a library of violin recordings [8]. ...
Conference Paper
Full-text available
The presented sound synthesis system allows the individual spatialization of spectral components in real-time, using a sinusoidal modeling approach within 3-dimensional sound reproduction systems. A co-developed, dedicated haptic interface is used to jointly control spectral and spatial attributes of the sound. Within a user study, participants were asked to create an individual mapping between control parameters of the interface and rendering parameters of sound synthesis and spatialization, using a visual programming environment. Resulting mappings of all participants are evaluated, indicating the preference of single control parameters for specific tasks. In comparison with mappings intended by the development team, the results validate certain design decisions and indicate new directions.
... The implementation of the synthesis algorithm is included in a C ++ based framework [22], using the JACK API. Synthesis is performed in the time domain, with a non-overlapping approach and a frame size related to the buffer size of the audio interface. ...
Conference Paper
Full-text available
Statistical sinusoidal modeling represents a method for transferring a sample library of instrument sounds into a data base of sinusoidal parameters for the use in real time additive synthesis. Single sounds, capturing a musical instrument in combinations of pitch and intensity, are therefor segmented into attack, sustain and release. Partial amplitudes, frequencies and Bark band energies are calculated for all sounds and segments. For the sustain part, all partial and noise parameters are transformed to probabilistic distributions. Interpolated inverse transform sampling is introduced for generating parameter trajectories during synthesis in real time, allowing the creation of sounds located at pitches and intensities between the actual support points of the sample library. Evaluation is performed by qualitative analysis of the system response to sweeps of the control parameters pitch and intensity. Results for a set of violin samples demonstrate the ability of the approach to model dynamic timbre changes, which is crucial for the perceived quality of expressive sound synthesis.
Full-text available
The presented sample library of violin sounds is designed as a tool for the research, development and testing of sound analysis / synthesis algorithms. The library features single sounds which cover the entire frequency range of the instrument in four dynamic levels, two-note sequences for the study of note transitions, and solo pieces and scales. All parts come with hand-labeled segmentation ground-truth files which mark attack, release and transition/transient segments. Additional relevant information on the samples' properties is provided for single sounds and two-note sequences. Recordings took place in an anechoic chamber with a professional violinist and a recording engineer, using two microphone positions. This document briefly describes the content and structure of the data set.
Conference Paper
Full-text available
The presented sample library of violin sounds is designed as a tool for the research, development and testing of sound analy- sis/synthesis algorithms. The library features single sounds which cover the entire frequency range of the instrument in four dynamic levels, two-note sequences for the study of note transitions and vi- brato, as well as solo pieces for performance analysis. All parts come with a hand-labeled segmentation ground truth which mark attack, release and transition/transient segments. Additional rele- vant information on the samples’ properties is provided for single sounds and two-note sequences. Recordings took place in an ane- choic chamber with a professional violinist and a recording engi- neer, using two microphone positions. This document describes the content and the recording setup in detail, alongside basic sta- tistical properties of the data.
Conference Paper
Full-text available
Explorations of a new mapping strategy for spectral spatialisation demonstrate a concise and flexible control of both spatiomorphology and spectromorphology. With the creation of customized software by the author for audio-rate histograms, spectral processing function smoothing, spectral centroid width modulation, audio-rate distance-based amplitude panning, audio-rate ambisonic equivalent panning, a growing library of audio trajectory functions, and an assortment of spectral transformation functions, this article aims to explain the rationale of this process.
Conference Paper
Full-text available
Full-text available
The SoundScape Renderer is a versatile software framework for real-time spatial audio rendering. The modular system architecture allows the use of arbitrary rendering methods. Three rendering modules are currently implemented: Wave Field Synthesis, Vector Base Amplitude Panning and Binaural Rendering. After a description of the software architecture, the implementation of the available rendering methods is explained and the graphical user interface is shown as well as the network interface for the remote control of the virtual audio scene. Finally, the Audio Scene Description Format, a system-independent storage file format, is briefly presented.
Conference Paper
Full-text available
This paper presents the design and the control of a spatialized additive synthesizer aiming at simulating environmental sounds. First the synthesis engine, based on a combination of an additive signal model and spatialization processes, is presented. Then, the control of the synthesizer, based on a hierarchical organization of sounds, is discussed. Complex environmental sounds (such as a water flow or a fire) may then be designed thanks to an adequate combination of a limited number of basic sounds consisting in elementary signals (impacts, chirps, noises). The mapping between parameters describing these basic sounds and high-level descriptors describing an environmental auditory scene is finally presented in the case of a rainy sound ambiance.
Conference Paper
In this paper we present an interface for digital musical instruments which is primarily designed for playing mono-phonic melody synthesizers. The hand-held device allows the pitch selection with four valve-like metal mechanics and three octave switches. Note events are triggered with a wooden excitation pad, operated with the second hand. Another feature is the advanced aftertouch of the four me- chanics and the pad, which enables expressive playing. In a user experiment, the controller is compared to a classic MIDI keyboard, regarding the time needed for responding to simple visual stimuli and the mean error rate produced in that task. The results show no significant difference in the response time but a higher error rate for the novel in- terface for untrained users. Outcome of this work is a list of necessary improvements, as well as a plan for further experiments.
In virtual auditory environment, sound sources are typically created in two stages: the "dry" monophonic signal is synthesized, and then, the spatial attributes (like source position, size and directivity) are applied by specific signal processing algorithms. In this paper we present an architecture that combines additive sound synthesis and 3D positional audio at the same level of sound generation. Our algorithm is based on inverse fast Fourier transform synthesis and amplitude-based sound positioning. It allows synthesizing and spatializing efficiently sinusoids and colored noise, to simulate point-like and extended sound sources. The audio rendering can be adapted to any reproduction system (headphones, stereo, 5.1 etc.). Possibilities offered by the algorithm are illustrated with environmental sounds.
In this paper work is presented on the representation and perceptual im- portance of phase. Based on a standard sinusoidal analysis/synthesis system, the phase alignment of the sound components is analyzed. A novel phase representation, partial period phase, is introduced, which characterizes phase evolution over time with an almost stationary parameter for many musical sounds. The proposed partial period phase representation is used to control the phase when synthesizing sounds. Sounds synthesized with varying amount of phase information are compared in a listening experiment with 11 subjects. It is shown that phase is of great importance to the perception of sound quality of common harmonic musical sounds, but indications are found that phase is not of importance to the slightly inharmonic piano sounds. In particular, the sound degradation is large for low-pitched sounds, approaching Slightly An- noying when no phase information is used. In addition, a model based on the partial period phase representation has a significantly better perceived sound quality than sounds with random phase shifts.
Spectral modeling synthesis is an analysis-based technique capable of capturing the perceptual characteristics of a wide variety of sounds. The representation that results from the analysis is intuitive and is easily mapped to useful musical parameters. The analysis part is central to the system. It is a complex algorithm that requires the manual setting of a few control parameters. Further work may automate the analysis process, particularly if there is a specialization for a group of sounds. Some aspects of the analysis are also open to further research, in particular the peak-continuation algorithm. The synthesis from the deterministic plus stochastic representation is simple and can be performed in real time with current technology. A real-time implementation of this system would allow the use of this technnique in performance.