ArticlePDF Available

Methods for Multiple Wavetable Synthesis of Musical Instrument Tones

Authors:

Abstract

Spectrum matching of musical instrument tones is a fundamental problem in computer music. Two methods are presented for determining near-optimal parameters for the synthesis of harmonic musical instrument or voice sounds using the addition of several fixed wavetables with time-varying weights. The overall objective is to find wavetable spectra and associated amplitude envelopes which together provide a close fit to an original time-varying spectrum. Techniques used for determining the wavetable spectra include a genetic algorithm (GA) and principal components analysis (PCA). In one study a GA was used to select spectra from the original signal at various time points. In another study PCA was used to obtain a set of orthogonal basis spectra for the wavetables. In both cases, least-squares solution is utilized to determine the associated amplitude envelopes. Both methods provide solutions which converge gracefully to the original as the number of tables is increased, but three to five wavetables frequently yield a good replica of the original sound. For the three instruments we analyzed, a trumpet, a guitar, and a tenor voice, the GA method seemed to offer the best results, especially when less than four wavetables were used. Comparative results using the methods are discussed and illustrated.
PAPERS
Methodsfor Multiple WavetableSynthesisof Musical
Instrument Tones*
ANDREW HORNER, JAMES BEAUCHAMP, AESFellow,AND LIPPOLD HAKEN**
University of lllinois at Champaign-Urbana, IL 61801, USA
Spectrum matching of musical instrument tones is a fundamental problem in computer
music. Two methods are presented for determining near-optimal parameters for the
synthesis of harmonic musical instrument or voice sounds using the addition of several
fixed wavetables with time-varying weights. The overall objective is to find wavetable
spectra and associated amplitude envelopes which together provide a close fit to an
original time-varying spectrum. Techniques used for determining the wavetable spectra
include a genetic algorithm (GA) and principal components analysis (PCA). In one
study a GA was used to select spectra from the original signal at various time points.
In another study PCA was used to obtain a set of orthogonal basis spectra for the
wavetables. In both cases, least-squares solution is utilized to determine the associated
amplitude envelopes. Both methods provide solutions which converge gracefully to
the original as the number of tables is increased, but three to five wavetables frequently
yield a good replica of the original sound. For the three instruments we analyzed, a
trumpet, a guitar, and a tenor voice, the GA method seemed to offer the best results,
especially when less than four wavetables were used. Comparative results using the
methods are discussed'and illustrated.
0 INTRODUCTION white noise, but there is some debate about what is the
"best" input signal. LPC has generally been applied
Matching synthesis of musical instrument tones is a more successfully to speech sounds than to musical
fundamental problem in computer music. Generally, tones. It has been very successful for ordinary speech
for a particular synthesis model matching begins with and "singing speech" [2], and works well with variable
a time-variant spectral analysis of the original sound, fundamental frequency, since pitch detection is usually
Next, the model synthesis parameters which produce part of the analysis technique. The method can be per-
a "best fit" to the analysis data are determined. Finally, suaded to converge to perfection if a sufficient number
resynthesis of the sound is performed using the matched of filter stages is included.
parameters. These steps are shown in Fig. 1. Finding parameters which enable nonlinear synthesis
Synthesis models generally fall into one of three methods to match acoustic sounds is inherently difficult
categories: 1) time-variant filter synthesis, 2) nonlinear due to the complex spectral evolution characteristics
distortion synthesis, and 3) fixed-waveform additive inherent with these methods. Frequency modulation
synthesis. Previous matching methods are briefly re- (FM) and nonlinear processing (waveshaping) are two
viewed with respect to these categories and are illus- synthesis techniques which fall into this category. Es-
trated in Fig.2. timation of the FM parameters of an acoustic sound ,
Time-variant filter synthesis includes linear predictive has been an elusive problem, despite some attempts
coding (LPC) [1], a method of finding time-varying made in this direction [3]-[5]. Recently the authors
digital filter parameters for matching sounds. Tradi- of this paper showed how a genetic algorithm (GA)
tionally the input is either a pulse train waveform or can be successfully applied to matching a single-mod-
ulator, multiple-carrier FM model [6]. The GA was
* Manuscript received 1992 July 29; revised 1993 February used to select fixed values of the modulation indexes
15. andcarrier-to-modulatorfrequencyratios, while the
** A. Horner and L. Haken are with the CERL Sound
Group; A. Homer and J. Beauchamp are with the Computer amplitude of each carrier was determined by least-
MusicProject. squaressolution.
336 J.AudioEng.Soc.,Vol.41,No.5,1993May
PAPERS SYNTHESIS OF MUSICAL INSTRUMENTTONES
Spectral centroid matching [3] for nonlinear pro- are the same.) This method might be thought of as the
cessing synthesis leads to a relatively simple method opposite extreme of group additive synthesis, in that
for achieving approximations of some acoustic sounds, spectra are disjoint in time rather than in frequency.
This technique works best for spectra which are well Serra et al. give a method based on linear regression
characterized by a principal spectrum whose shape is whereby target waveforms are selected based on the
modified according to a well-defined time-varying assumption oflinearrampinterpolationbetweenspectra.
spectral centroid. This paper presents two general matching methods
Multiple wavetable synthesis, the subject of this pa- for selecting additive wavetable spectra, one based on
per, is based on a sum of fixed waveforms or periodic a GA [1], [11], the other on principal components
basis functions with time-varying weights. Each analysis (PCA) [12]. GAs have been applied to a wide
waveform can be expressed as a fixed weighted sum array of Pr0blem domains from stack filter design [13]
of several harmonic sine waves. If the sets of harmonics to computer-assisted composition [14]. The GA-based
for the various waveforms are disjoint, the method is spectrum matching methods presented in this paper
termed "group additive synthesis" [7]. More generally, find parameters which can be used to perform traditional
Stapleton and Bass presented a statistical method based wavetable synthesis. Our PCA-based technique is re-
on the Karhunen-Lo_ve (KL) transform to determine lated to that of Stapleton and Bass; however, our basis
periodic time-domain basis functions (waveforms) as functions are determined in the frequency rather than
well as amplitude and phase-control functions to op- in the time domain. The PCA approach has been used
timally fit acoustic tones [8]. Their optimization was in various speech applications [15], [16]. For both of
based directly on the time signal, so that spectral analysis our methods, the time-varying weights are determined
was not necessary. The same basis functions were used by least-squares or direct matrix solution.
for several different instruments. Two drawbacks of
this method are its computationally expensive matching 1 WAVETABLE SYNTHESIS OVERVIEW
procedure and the phase cancellation problems which
potentially arise when waveform amplitudes vary from Wavetable or fixed-waveform synthesis is an efficient
their designated values, technique for the generation of a particular periodic
Waveform or spectrum interpolation synthesis [9] waveform. Prior to synthesis, one cycle of the waveform
assumes that a signal may be divided into a series of is stored in a table. The spectrum of the waveform can
"target" waveforms. Synthesis proceeds by gradually be an arbitrary harmonic spectrum, which is specified
fading (or interpolating) from one target to the next. by the amplitude values of its harmonics. The table
(Interpolation between waveforms and interpolating entries are given by
between corresponding spectra are the same only if the
;v_ [. 2'ri'kg +*) (1)phases of the corresponding harmonics of the two spectra table/ = _'_ aa sin _table length k
k=l
original sound where 1 _< i _< table length, and ak and qbk are the
MATCHING SYNTHESIS
short-timeFourieranalysis TECHNIQUES TECHNIQUES
LPC _ Time-variant
time-varyingharmonicspectrum digitalfilter
matching procedure Spectral
centroid Waveshaping
matching
qt
synthesis parameters GA
_f parameter FM
estimation
synthesismodel
PCA Fixed
and wavetable
KL
reconstructed sound
Fig. 1. Wavetable matching analysis/synthesis overview. Fig. 2. Matching and synthesis models.
J. Audio Eng. Soc., VoL 41, No. 5, 1993May 337
HORNERETAL. PAPERS
amplitude and phase of the kth partial, and Nhars is the 3 shows the standard symbolic notation for a simple
number of harmonics needed to represent the signal, wavetable instrument.
The phases _bkare generally not audibly important, and The sample increment will generally not be an exact
are often simply set to 0 or arbitrary values. The spec- integer. The table index value may be truncated or
trum produced by a particular set of ak values will be rounded, or, alternatively, interpolation may be used
referred to as the wavetable's associated basis spectrum, to improve lookup accuracy. Signal-to-noise consid-
To generate samples during synthesis, table lookup erations generally determine the approach used [17],
is performed for the desired number of samples. Initially [18].
the table is indexed at its first entry. Subsequent lookups Further control can be gained by using multiple
increment the index by the sample increment and read weighted wavetables in the synthesis model, as shown
the sample at the new index point. The sample increment in Fig. 4. The time-varying weights on the tables allow
is givenby themto be cross-fadedand generallymixedwithone
another in various ways. Note that the phases of the
table length corresponding harmonics of multiple wavetables must
sample increment = fl * sampling rate (2) be the same to avoid inadvertent phase cancellation.
The principal advantage of wavetable synthesis is
its efficiency. For each wavetable sample, the main
where fl is the desired fundamental frequency of the steps are to compute the sample increment (and then
sound. Note that fl can be fixed or time varying. Fig. only if the fundamental is time varying), perform the
waveform table lookup, look up the weights from the
envelope tables, and postmultiply the waveforms by
the table weights. In terms of storage, only one period
fl of each waveform is needed plus its associated table
weights, a relatively inexpensiverequirement.
A disadvantage of the technique stems from the fact
]_ that each wavetable produces a static spectrum, while
real sounds produce dynamic spectra. For an arbitrary
small set of wavetables, most time-varying spectra
cannot be approximatedvery closely by a linear com-
bination of these wavetables, even if their weights are
time varying. Thus the basis spectra must be chosen
x(t) carefully and their weights appropriately manipulated
Fig. 3. Simple wavetable model, when synthesizing dynamic spectra.
fl
wl(t) _, _ w2(t)_, _ -WNtabs(t)
t
¢
x(t)
Fig. 4. Wavetable synthesis model.
338 d.Audio Eng. Soc., Vol. 41, No. 5, 1993May
PAPERS SYNTHESIS OF MUSICAL INSTRUMENTTONES
2 SHORT-TIME SPECTRUM ANALYSIS of pitch. By itself, the MQ method computes amplitudes
and frequencies on the basis of finding peaks in a spec-
Matching of a wavetable-synthesized signal to an trum computed by using a fixed-length fast Fourier
original musical signal is facilitated by working in the transform. However, since this method is not inherently
frequency domain and matching the sound spectra of restricted to harmonics, harmonic frequencies must be
the original and synthesized signals. The basic as- "interpreted" from the data for the peaks. We do this
sumption we make about any original signal is that it by first estimating the fundamental frequency as a
can be represented by a sum of sine waves with time- function of time from the MQ spectral data and then
varying amplitudes and frequencies: by sorting these data into appropriate harmonic bins
[21].
y(t) -- _ bk(t) sin 2_ fk(t) dt + Ok (3) 3 WAVETABLE SPECTRAL MATCHING
k=l o
By determining a set of basis spectra and associated
where bk(t) and fk(t) are the time-varying amplitude amplitude envelopes whose sum best matches the orig-
and frequency of the kth harmonic in the original signal, inal time-variant spectrum, we attempt to reconstruct
the sound using an elaboration of traditional wavetable
Note that in our matching procedure we will ignore the
starting phases Oksince they have no audible effect in synthesis. The matching procedure consists of two steps
most music synthesis situations, whereby the basis spectra and amplitude envelopes are
A further restriction that we use in this paper is that determined. The principal contribution of this paper
is a method for efficient determination of the basis
the sound is harmonic, so that
spectra and their envelopes.
As the first step, the user specifies the number of
j_(t) = kfl(t) . (4) basis spectra to be used in making the match, and the
basis spectra are then determined. Two methods for
Not all instruments are periodic or quasi-periodic. Thus determining basis spectra are given in Section 4.
we would not expect techniques based on such a har- The second step is to determine the optimum-am-
monic restriction to fare as well with less periodic plitude envelope (time-varying weight)for each table
sounds, such as low piano tones, than with other in- by straightforward matrix solution. Using the already
struments, determinedbasis spectraandthesequenceof discrete-
We use two different techniques to analyze a sound time spectra of the original sound, we form a system
in order to estimate its harmonic amplitudes and fun- of linear equations represented by the matrix equation
damental frequency. Our usual method is a fixed filter
bank approach where bandpass filters are centered on AW _ B . (5)
the harmonics of an "analysis frequency" which ap-
proximates the mean of fl(t) [3], [19]. In this method As Fig. 5 shows, the matrix A contains the wavetable
the filter outputs give real and imaginary parts, which basis spectra stored as a series of columns, with one
are converted into amplitudes and phases by the right- column for each spectrum; the matrix W contains the
triangle solution. The fundamental frequency is then unknown amplitude weights, corresponding to time
computed from the derivatives (or finite differences) samples of the (as yet undetermined) envelopes for
of the phases. However, this method fails if the fl(t) each time frame of the analysis, arranged in a series
frequency deviations are so great that upper harmonic of columns; and the matrix B contains successive frames
frequencies swing on the edges or outside the ranges of the original discrete-time spectra. This system of
of the filters, since the harmonic amplitudes would equations is of the form
then be seriously in error. Therefore for cases where
substantial vibrato or portamento occur, we use an ex- Jv_bs
tension of the McAulay-Quatieri (MQ) analysis tech- _ akj Wj,r _ bk,r (6)
nique [20], which is capable of tracking severe changes j=l
al, 1 ... al,Ntabs Wl,1 ... Wl,Nframes bl,1 ''' bl,Nrrames
a2,I ... a2,Ntabs W2,1 ... W2,Nframes b2,1 ..- b2,Nframes
aNhars,1 · . . aNhars,Ntabs WNtabs,l · . . WNtabs,Nframes bNhars,1 · . bNhars,Nframes
Fig. 5. Matrix representation of Eq. (5).
J. Audio Eng. Soc., Vol. 41, No. 5, 1993 May 339
HORNERETAL. PAPERS
for 1 <_k <_Nhars and 1 _<r _< Nframes. In this equation basis spectra? Section 4 presents two approaches for
akj is the time-fixed amplitude of the kth harmonic due solving this problem. One uses a GA to determine the
to the jth basis spectra, Wj.ris the envelope weight for basis spectra. The other is based on PCA.
the jth basis spectra at the rth time frame, and bk, r is
the amplitude of the kth harmonic of the analysis spec- 3.1 The Relative Error Measure
trum at the rth time frame. Note that the duration of The relative error is used to measure the quality of
the sound under analysis is tsar = At Nframes, where At the match between a candidate synthetic signal and the
is the duration of a single frame. If the number of basis original signal. In the case of the GA approach, the
spectra Ntabs is equal to the number of harmonics Nhars relative error is used asa fitness measure to guide the
of the original sound and assuming that the basis spectra search for a good solution. We define the relative error
are independent, Eqs. (5) and (6) can be solved exactly as
by direct matrix solution, so a perfect solution, anal-
ogous to ordinary sine-wave additive synthesis, results. _-Nh_
But what we want is a reduced set of basis spectra. For [ _ [bk(ti) - b'k(ti)]21
N frames
this case we can determine a best solution in the least- _ __ 1 _ l i=1 N_- -- -- [/
squares sense. This is tantamount to determining the eel Nframes i=l Z bk2(ti)
{Wi,r} that minimize the squared error Jk=l
akj Wi,r -- bk,r , (7) (9)
k=l \j=l
where the ti are particular selected time values within
for each time frame r. Note that each time frame is the duration of the sound being matched, Nframes is the
independent and could be solved without consideration number of time values selected, bk(t) is the kth harmonic
of other time points. However, a more efficient solution amplitude of the original signal, Nhars is the number of
results when time frames are considered as a group, harmonics, and
Indeed, there exist very efficient algorithms for a direct
Ntabs
least-squares solution of Eq. (5), such as solution by b_(t) = _ wi(t) ak,j (10)
the use of the normal equations [22]. Thus for a given
j=l
set of basis spectra, the computation of their amplitude
envelopes is a straightforward process, is the time-varying amplitude of the kth harmonic of
Specifically, we must solve for W in the symmetric the synthesized signal. Obviously we would expect
linear system that the lower the value of this error measure, the better
the perceptual match. Our experience so far is that this
ATAW = ATB (8) is generally but not always true. However, lacking a
formula which is a good predictor of subjective pref-
known as the normal equations, where A Tis the trans- erence, this is what we are using for the time being.
pose of the matrix A. In order to solve for the weights The computational cost of computing Eq. (9) for a
W, ATA must be nonsingular, and this is true only if candidate solution is reduced considerably by restricting
the columns of A are linearly independent, that is, the the time average to a limited number of representative
basis spectra are linearly independent of one another, spectra from the sound being matched, rather than using
This is a reasonable requirement, since we will naturally all of the analysis frames. Judicious choices of spectra
wish to find a minimal set of dissimilar basis spectra from the original time-variant spectrum are important
which efficiently spans the spectral space, for achieving a reasonable wavetable approximation.
In unusual cases the normal equations will have For example, spectra in the attack portion of a sound
problems due to inaccuracies resulting from finite pre- are very good choices, since the attack is a perceptually
cision arithmetic. For instance, information can be lost critical and a fast-changing portion of the tone [24]-
in forming the normal equations matrix ATA as well [26].
as the right-hand side matrix ATB. In these situations, In fact, some matches were found to be perceptually
orthogonalization methods, such as QR factorization better when only a few spectra were used in the error
[22], can be used instead of the normal equations, since calculation instead of utilizing all of the analysis spectra
they do not amplify error. However, the improved ac- available. This somewhat surprising result makes sense
curacy afforded by these methods is accompanied by when one considers that the discrete spectra which most
increased computational expense. In practice, when dominate the time average come from the comparatively
only a few wavetables are used, solution by the normal long sustain portion of the tone. Perceptually important
equations will generally suffice. However, if inaccur- spectra occurring in the brief attack are simply over-
acies manifest themselves in the form of unusually whelmed by these spectra. We needed a method to
large weights, orthogonalization methods should be avoid this problem. After considerable experimentation
considered [23]. we arrived at a spectrum selection procedure based on
Back to the first step--how should we determine the picking 50% of our representatives from the "attack"
340 J.AudioEng.Soc.,Vol.41,No.5,1993May
PAPERS SYNTHESIS OF MUSICAL INSTRUMENT TONES
portion of the tone (defined as the portion before the time points in these two regions, as depicted in Fig.
peak rms occurs), and the other 50% from the remainder 6(a). An alternative approach entailed picking spectra
of the tone. The specific method which was most suc- equally spaced in amplitude rather than time, as shown
cessful simply picked the spectra from equally spaced in Fig. 6(b). However, this latter approach fails if the
$ 0.75
0.25
0 --
0 0.25 0.50 0.75 1.00
time
(a)
1.00
0.75
o. 0.50
0.25
0
0
time
(b)
1.00
0.50
0.25
0
0 1
time
(c)
Fig. 6. (a) Selected spectra equally spaced in time on both Sides of peak. (b) Selected spectra equally spaced in amplitude
on both sides of peak. (c) Problematic case for picking spectra equally spaced in amplitude.
J. Audio Eng. Soc., Vol. 41, No. 5, 1993 May 341
HORNER ET AL. PAPERS
attack and decay are too short. Fig. 6(c) is an example from different parts of the original sound are simply
of such a case. The first approach appears to be more cross-faded from one to the next as time progresses.
robust under these conditions. In this case, only one parameter per basis spectrum is
needed, an index corresponding to the time of the chosen
4 METHODS FOR DETERMINING BASIS analysis spectrum. Initially the GA considers a popu-
SPECTRA lation of randomly chosen indexes corresponding to
particular basis spectra. Subsequently the GA mixes
As depicted in Fig. 7, there are a number of possible and matches these choices in an attempt to determine
approaches for generating basis spectra. In generaleither a set of basis spectra which work well Over the course
one may select basis spectra from the set of short-time of the tone. We have found this GA-index technique
spectra found by Fourier analysis of the original sound, to be the most successful we have tested so far, in that
or one may generate spectra based on a suitable al- the computation time to determine good parameter val-
gorithm, uesis quitelow,andtheresultsgivethelowestaverage
relative errors as well as the best subjective results.
4.1 GA-Based Selection Another method, which we explore in a companion
As mentioned earlier, we could, without applying paper [6], uses FM basis spectra. Fixed FM spectra
any initial criteria on the types of candidate spectra to are very special cases of wavetable spectra. If a single
be considered, simply let the GA routine search for modulator modulates a carrier and its modulation index
the best relative harmonic amplitudes for each basis is held constant, a static spectrum results. A harmonic
spectrum. However, an obvious problem with allowing spectrum results whenever the carrier-to-modulator
each harmonic to take on any value between 0 and 1 frequency ratio is an integer. Using a single modulator
is that with most sounds, the higher harmonics tend to and several carriers, each having a different index and
have relatively small amplitudes. Thus a large portion carrier-to-modulator ratio, we get a set of basis spectra
of the candidate solution space would yield very poor (one for each carrier) as in the preceding cases. Thus
matches. This could be remedied by setting the upper each basis spectrum is characterized by two variables.
bound for the search range of each harm0nic's relative The GA-FM technique will not be pursued here, but
amplitude to be that harmonic's maximum over the interested readers are invited to consult the companion
duration of the tone. Even so, another, more general, paper.
disadvantage with this scheme is that it intrinsically
gives rise to a large number of variables, thus defining 4.2 Principal-Components-Based Matching
a huge search space with a relatively small number of Basis spectra can be determined by statistical factor
usable solutions. The dimension of this space is equal analysis procedures, and PCA is one such procedure
to the product of the number of basis spectra to be which offers an elegant solution to the wavetable
determined and the number of harmonics for each matching problem. This method has the advantage that
spectrum. However, we can speed up the search con- the derived basis spectra will be optimal in a statistical
siderably if we can find methods which reduce the sense--they capture the maximum variance of the
number of variables to only one or two for each basis analyzed tone. Moreover, the basis spectra found are
spectrum, guaranteedto be orthogonalto one another (that is,
One simple but very effective approach is to use a any one of them may not be expanded as a weighted
GA to pick basis spectra from the sound's own set Of sum of the others). Finally the PCA method ensures
discrete-time spectra. This method bears some resem- that, for a given number of fixed basis spectra, the
blance to spectral interpolation [9], where spectra picked time-averaged mean-square error between the original
and the matched spectra will be minimal. On the other
hand, these basis spectra may have a rather artificial
relation to the original tone, Since generally no basis
Determinationof spectrum will resemble any of the actual analysis spec-
basisspectra tra. In any case, the PCA technique offers an interesting
alternative to the GA-based techniques, which, in gen-
eral,wehavefoundto bemoresuccessful.
PCA determination of the basis spectra consists of
-- threesteps,as illustratedin Fig. 8. Recallthat the
original analysis spectra are contained in the matrix B
Selectfrom Generate of Fig. 5 Inthe first step weform the covariance matrix
analysisspectra arbitraryspectra
C from the original tone's short-time spectra in the
matrix B. Each entry of C can be found using the equa-
_ _ _j -"------.._ tion
1 Nframes
[CAse,ectionlmapproachj[eCAapproachGAapproachI ck,,k -N rames
Fig.7. Hierarchyofbasisspectra-generationmethods. (11)
342 J. Audio Eng. Soc., Vol. 41, No. 5, 1993 May
PAPERS SYNTHESIS OF MUSICAL INSTRUMENT TONES
where k_, k2 = 1, 2 ..... Nhars, and waves. In most matches tried to date, four or five basis
spectra seem to be adequate for achieving excellent
1 Nrramcs
bk -- Nframes _ bk'i (12) simulation. Even fewer basis spectra can be used if
only reasonable approximations are desired. The GA-
i=1
index approach generally fares better on these lower
with bk,i = bk(ti), order matches.
The covariance matrix contains the variance of each
harmonic's amplitude on its respective diagonal. The 5.1 GA-Index Matching Results
nondiagonal elements are the covariances between the As mentioned earlier, we have found the GA to be
respective harmonics, which is simply the mean of most practical when it is used to choose basis spectra
their cross products as given in Eq. (11). from the set of analysis spectra. The matches performed
Next an eigenanalysis is performed on the resulting to date have been based on the sounds of a trumpet, a
covariance matrix C. Mathematically we are looking tenor voice, and a guitar. The trumpet was only remotely
for eigenvectors xj which satisfy approximated when one basis spectrum was used. Of
course, the use of more tables gave much better matches.
C xj = )_jxj (13) Surprisingly a one-basis-spectrum-match to the tenor
voice did sound quite close to the original, while with
for a scalar hj. The resulting eigenvectors are the basis two basis spectra, the match was almost perceptually
spectra we seek. However, these eigenvectors must be indistinguishable from the original. The decay portion
sorted according to their corresponding eigenvalues of the guitar tone was quite easy to capture, but its
before placement.in the columns of the matrix A, as attack was more elusive, regardless of the number of
defined in Eq. (5). Sorting should be done such that tables. This was probably due to the guitar tone requiring
the eigenvector that has the largest eigenvalue is placed a very large number (80 or so in this case) of upper
in the first column of A, and the others are placed ac- harmonics to adequately represent its attack transient.
cording to decreasing eigenvalues. This ordering ensures Even so, matches using a relatively small number of
that the principal components which contribute most wavetables clearly sounded "guitarlike." Overall, the
to the approximation are selected when we use. less results for the three instruments paralleled those found
than the total number of basis spectra Nhars. in our FM matching study, except for the one-basis-
After the basis spectra are determined, the complete spectrum case for the tenor voice, where the GA-index
set of weights may be solved for by Gaussian elimi- result was notably superior to the corresponding one-
nation. Alter.natively, least squares may be used to carrier FM match.
determine only those weights associated with the most Figs. 9-14 show amplitude-versus-time plots for the
important basis spectra, our usual case. Since the PCA- second and fourth harmonics of the trumpet, tenor, and
generated basis spectra are orthogonal to one another, guitar. Amplitude envelopes for the original tones are
the results ofthe two methods are identical, displayed along with one, three, and five basis spectra
approximations to the original. With the trumpet, higher
5 MATCHING RESULTS order matches were required to capture the shape of
the original envelopes, especially on upper partials,
In the case of four or more basis spectra the GA- such as the fourth harmonic.
index and PCA approaches to determining basis spectra The tenor's amplitude vibrato (tremolo) is a distin-
result in perceptually similar matches. In both cases, guishing characteristic, and it shows up very clearly
if the number of basis spectra equals the number of in Fig. 12. The single-table match basically ignores
harmonics of the tone, an exact match can be made this tremolo, while the three- and five-table matches
through ordinary additive synthesis of harmonic sine model it quite well. Given the crudeness of the single-
table match, it is rather remarkable that the resynthesis
sounds as good as it does. This suggests that the tremolo
FindthecovariancematrixCofthe original of the tenor tone is only a by-product of the tenor's
analyzedspectra wide vibrato and is of secondary perceptual importance.
The importance of frequency vibrato for voice sounds
was also emphasized by Chowning in his FM model
for soprano voice synthesis [27]. In a two-table GA
match to the tenor, the tremolo was captured by pe-
FindtheeigenvectorsofC riodically cross-fading between the two tables. This
result is similar to that used in a previous vocal analysis
( synthesis study [21].
and
The guitar is characterized by a very bright attack
Sorttheeigenvectorsby theireigenvalues followed by a rather simple decay. The decay envelopes
andplace inthe matrixA are modeled quite well by a single-table match, as shown
in Figs. 13 and 14. However, as mentioned, the im-
Fig. 8. PCA procedure for determining basis spectra, pulsive attack is much more difficult to mimic. Note
J. Audio Eng. Soc., Vol. 41, No. 5, 1993May 343
HORNERETAL. PAPERS
the spike which occurs near time zero in the original Fig. 15 illustrates the average relative error [defined
tone's fourth harmonic amplitude shown in Fig. 14. by Eq. (9)], plotted against the number of basis spectra
Only the five-table match manages a similar spike, and used to match various tones. The graph shows that as
even its level is lower than the original, the number of basis spectra approaches the number of
ongin.I I table match
_000
EO0
[ '
15oo
Iooo
· . , 0 . ,
2_0 400 600 800 1000 1290 14_0 1600 0 2_0 400 eGO CO0 1000 1200 1400 1600
time(inanaltndeIran,Am) time(inina[ytll f_)
3 table match 5 tablematch
g i 1
200 400 600 800 1000 1200 1400 1600 200 400 (lO0 600 1000 1200 1400 1600
time(inir,aIvsisfnun_) ' _ (inanalysisf_)
Fig. 9. Second-harmonic amplitude envelope of trumpet: Original and 1-, 3-, and 5-table GA-index matcfies. Duration is
2.4s.
orion mI 1 table match
4_x) 4000
3OOO
25OO 25OO
_. 2ooo t_oo
1500 1600
1000 1000 _
, 0 ........
0 _0 400 600 800 1000 1200 1400 1600 0 200 400 600 6_0 1000 1200 1400 16_0
time_ analys_trarrm_) tkne_ _ f_)
3 tablematch 6 table match
4_00
1500
,!
200 400 600 800 1000 1200 t400 1600 0 200 400 frO0 600 I000 1200 1400 1600
Fig. 10. Fourth-harmonic amplitude envelope of trumpet: Original and 1-, 3-, and 5-table GA-index matches. Duration is
2.4 s.
344 J. AudioEng.Soc., Vol.41, No.5, 1903May
PAPERS SYNTHESIS OF MUSICALINSTRUMENT TONES
harmonics, the error does indeed tend to zero. These measure of subjective quality, but it is usually indicative
curves should not be used to compare the relative quality of how matches to a particular sound compare with
of matches for different original sounds, however. Our one another. Even for the same sound, there is no guar-
error measure should not be construed as an absolute antee that the result of minimizing the least-squares
I table ma_h
3ooo
Z
' !
Iz
0
0 50 loo 150 200 250 300 360 400 460 so loo 150 _0 250 3o0 350 400 460
umeOnaM)/_4fmeM) mr4_ _ h,amN)
3 tabl mitch 5 table match
30o0
i !
0 60 100 150 200 250 300 350 400 450 0 50 100 150 200 250 300 360 400 460
tk,_ 0nmdtnk Imnm) al_ _ _ fn,m_)
Fig. 11. Second-harmonic amplitude envelope of tenor voice: Original and 1-, 3-, and 5-table GA-index matches. Duration
is 3.9s.
_iginal 1 labl match
i l-
o
._o Ioo 150 200 250 300 350 400 450 50 100 150 200 250 300 350 400 450
tk_ _ analymfrm_) ta_ _ analymha._)
3 table match 5 tablematch
50 100 150 200 250 300 350 400 450 50 100 150 200 250 300 350 400 450
tkn4OnW Itarn#) time(inana)/sil_)
Fig. 12. Fourth-harmonic amplitude envelope of tenor voice: Original and 1-, 3-, and 5-table GA-index matches. Duration
is 3.9s.
J. Audio Eng. Soc., Vol. 41, No. 5, 1993 May 345
HORNERETAL. PAPERS
error will give the best perceptual match, number of harmonics. The harmonics used are not nec-
For comparative purposes we include Fig. 16, which essarily the lowest, but rather are those which contribute
illustrates the average relative error that would occur the most to reducing the error. Comparison with Fig.
using sine-wave additive synthesis with a restricted 15 reveals that the matching error generally converges
odginal 1 tablematch
18000 18000
16000 16000
14000 14000
12000 12_00
·8 1000o i 10000
8000 · ilO00
6ooo eQ00
40O0 40O0
200 400 600 800 1000 12_ 1400 1_ 0 200 400 _0 800 lOgO 12'00 liOO 16:00
time(inirmlyslsframe_) time(Inana_Ji framM)
3 tablematch 6 table match
18ooo 18(300
10000 1(1000
14000 14_
12<300 12000
6OOO
tlrMOnInaly&llframes) time(_1analysllfrarnt_)
Fig. 13. Second harmonic amplitude envelope of guitar: Original and l-, 3-, and 5-table GA-index matches. Duration in
time is 8 s.
original 1 table match
'ii
lOOO
2oo
o , ,
o 2oo 4oo _ _o ,_ _ l_ ,_o o 2oo _ _o _o l_o l_o l_ _o
tkTNB(in.nalys_f_) time(inI_ f_)
3 tablematch 5 tablematch
looo
' !!
o o
o 2oo 4oo _ _o l_ l_o l&o _o o _ 4oo _ _ ,_ _o _ _&o
tk'_e(inanalysis franms) time_ analyskframM)
Fig. 14. Fourth-harmonic amplitude envelope of guitar: Original and 1-, 3-, 5-table GA-index matches. Duration is 8 s.
346 J. AudioEng.Soc.,Vol.41, No.5,1993May
PAPERS SYNTHESISOF MUSICAL INSTRUMENTTONES
much more rapidly with wavetable matching synthesis tone, the brightness begins to decrease, and we see a
than with sine-wave additive synthesis. The relatively corresponding mixing of the tables. During this section,
fast convergence for the guitar tone shown in Fig. 16 table l's weight is negative, indicating that the lower
is due to the fact that there are only a handful of bar- harmonics are being partially canceled to offset the
monies of significant amplitude during its long decay, addition of both tables 2 and 3. Ever/tally table 3emerges
The optimized parameters for a three-table match to as the dominant basis spectrum. As the brightness of
the trumpet sound are given in Fig. 17. It shows plots the tone wanes, table 3 cross-fades with table 1, the
of the basis spectra and amplitude envelopes wi(t) for reverse of the opening trend.
this match. We can get a feel for what is happening in The three tables were drawn from analysis frames
the match by examining these parameters. Note that 493, 1336, and 1575 (times 0.69, 1.88, and 2.22, re-
during the initial attack of the tone, table 1, then table spectively). At each of these time points, while the
3, and finally table 2 fade in successively. This cor- weight for the corresponding table is finite, the weights
responds to the brightening of the spectrum during the for the other tables are zero, since the least-squares
attack. Table 2 then dominates the tone's sustain por- procedure forced a perfect fit with the appropriate source
tion. This table is in fact drawn from spectra in the table at each of these points. Fig. 18 plots error versus
middle portion of the tone. About halfway through the time for this three-table trumpet match. Note that the
0.6
0.5
' _. trumpet I
tenor
', -'-'-'-' Zguitar
0.4 ',
I x
0.3 '
0.2
0.1
0 2 3 n 6 8 b to
number of tables
Fig. 15. Convergence of average relative error with increasing numbers of wavetables with the GA-index method.
1.0
0.9
0.8 I t mpetl
_. -.......... tenor
0.7 _'"_-.. l ...... guitar [
0.6" ' ....
_ 0.5 , '.,.
0.4 ' ' ....
0.3 , '",,,
0.2
0.1
o 2 6 8 b t0
number of harmonics
Fig. 16. Convergence of average relative error using sine-wave additive synthesis.
J. Audio Eng. Soc., Vol. 41, No. 5, 1993 May 347
HORNER ET AL. PAPERS
error goes to zero at the time points at which the basis PCA trumpet simulations, this was primarily manifested
spectra occur in the original sound, reflecting perfect as an excess of brightness in the release of the synthetic
matches at these points. Larger values of errors tend tone. Fig. 19 shows the relative error-versus-time curve
to occur during the transient attack and decay portions for a PCA-based trumpet match where three wavetables
of the tone, where the spectral changes are more diverse, were used. Fig. 19 should be compared to Fig. 18,
This example is suggestive of how basis spectra typically which is for the corresponding GA match. Note that
interact in their final mix. the relative error never goes to zero in the PCA match,
since none of the basis spectra ever exactly matches a
5.2 PeA-Based Matching Results particular short-time spectrum of the original tone. Also
Principal-components-based matching results are note that the error is consistently low in the sustain
perceptually similar in character to those found in ge- portion of the tone, but is much higher during the attack
netic matching, though they suffer from problems in- and decay. Generally PCA decomposition of a sound
herent in the underlying statistical approach. For our will suffer from relatively large errors during the low-
table I (from frarrle1576)
1.0
0.8
i 0.6
0.4
O.2
I '
o
o 5 lO 15 20
h_rmnlo nur_b_r
table 2 (from frame 493)
1.0
O.S
{ o.. I
O.4
o · , i I , i i i , ,
o 5 lo Is 20
harm_nlo number
table 3 (from frame 1336)
1.0
o.a
0.4
o I I
0 8 1 I0 16 20
hlrmonlc number
4000
,,,.,:;,fi..,,,,,,,,..:,.:.?..._... -'_'_':.':-' tabletable1 1
,:..v..::'.?.5,...:...,.,,>...:' table 3
3000 f,_, /,
2000 : _. .._. ,4"_ ,
.,.. ,.,. .,... _,;,_v _ /
.. , ?........ : ',/ ,',.
'= 1000 .: ,_ . , ,_ '. .... ., ,.
-1000
-2000
0 200 4()0 6(:)0 8C)0 10'00 12'00 14'00 16'00
time (in analysis frames)
Fig. 17. Basis spectra and amplitude envelopes of a 3-table GA-index match for the trumpet. Arrows indicate times for which
only one of the basis spectra is employed in the mix.
348 J. Audio Eng. Soc., V01.41, No. 5, 1993 May
PAPERS SYNTHESISOFMUSICALINSTRUMENTTONES
amplitude sections of the sound. This is due to the fact is several times greater than that of the tone's average
that most of the spectral variance will be caused by the amplitude. This fact leads PCA to decompose the tone
highest amplitude portions of the tone. Thus the with primary emphasis on its tremolo. Neveretheless,
matching accuracy during lower amplitude sections will combining the first two principal components yields a
generally be sacrificed in order to match the higher shape very similar to that desired.
amplitude sections better. While this problem might If we resynthesize the tenor tone with only the first
be obviated by using a logarithmic amplitude measure, principal component, the tone seems to turn on and off
we would then lose the linear additive synthesis feature with each period of the tremolo. Though the tenor voice
which we require. Another possibility, which we have is clearly heard in the background of these modulated
not checked out, would be to use more spectra from bursts, this is not what one hopes for in a good sounding
the low-amplitude portions of a sound than from the match. However, if the second principal component is
high-amplitude portions in our PCA analysis, added, the match is suddenly very convincing, since
A statistical artifact also occurred in the case of the the resulting sound now has both the correct tremolo
tenor voice. Fig. 20 illustrates the problem. We see and the correct overall spectral shape. This result is
that the first principal component with its time-varying perceptually similar to that found by GA matching. In
weight tracks the tremolo of the tone. Examining the that case, however, the two basis spectra were cross-
second basis spectrum and its amplitude envelope in faded to emulate the periodically alternating spectrum
isolation, we see that this component tracks the tones' of the tenor, rather than as an oscillation on top of a
rms amplitude, which is rather flat. Normally we might baseline spectrum.
expect the first principal component to track this. Figs. 21-26 illustrate amplitude-versus-time plots
However, for the tenor the variance of the tone's tremolo for the second and fourth harmonics of the trumpet,
1.O
0.8
0.6
0.20'4 , , ; ,_
0
200 400 600 800 1000 1200 1400 1600
time (in analysis frames)
Fig. 18. Relative error versus time for a 3-table GA-index match for the trumpet. Arrows indicate times for which only one
of the basis spectra is employed in the mix.
1.O
0.8
0.6
0.4
0.2
0 ; _ ,
200 400 6oo 8bo _o'oo 12oo lnoo _6oo
time (in analysis frames)
Fig. 19. Relative error versus time for 3-table trumpet match using PCA.
J. Audio Eng. Soc., Vol. 41, No. 5, 1993 May 349
HORNERETAL. PAPERS
tenor, and guitar PCA matches. Amplitude envelopes almost identical to those found by the GA-index method.
for the original tones are displayed along with one, The tenor's three- and five-table matches are also very
three, and five basis spectra approximations to the similar to the GA result, while the single-table PCA
original. With the trumpet, these approximations are match suffers from the isolated tremolo problem noted.
original 1st principal con3)o_nt only
oo0o
, {
i
3000 .
50 1130 150 2_0 250 300 _60 400 450 0 50 qffi 150 200 250 300 350 400 450
tlrr,e On(,,_1. fnma) time(inarmb_ fromM)
2nd I_incipal c(xTlponentonly 1st and2nd pdhcipal components together
I_oo
lOOO
o
50 100 150 200 250 300 350 400 450 50 100 150 200 _o0 300 350 400 450
time(ina lmm) tkne(ininalysi lranNm)
Fig. 20. Fourth-harmonic amplitude envelopes for the tenor voice: Original, first and second principal components individually,
and first and second principal components combined.
original I PC match
, {
Iooo
.
0 2_0 400 600 800 10(X) 1200' 1400 1600 200 40G 600 800 1000 1200 1400 1600
time(inanalysisfan1#) time(inir,alylkllIrall_4)
3 PCmatch 5 PCmatch
2 -
, ,_ _
1500
, 0
200 400 600 800 Lt1000 1200 1400 1600 0 2{)0 400 600 800 100Q 1200 14_0 1_
time(inanaly-L',fn_me_) time(inanalysislrames)
Fig. 21. Second-harmonic amplitude envelope of trumpet: Original and 1-, 3-, and 5-table PCA matches. Duration is 2.4 s.
350 _/ J. Audio Eng. Soo., Vol. 41, No. 5, 1993 May
HORNERETAL. PAPERS
harmonic shown in Fig. 21. However, the first table form. Aside from the negative components, the primary
actually controls a broad spectrum of harmonics. The difference between the PCA basis spectra and the GA-
other tables, which, in general, have much smaller selected basis spectra in Fig. 17 is the presence of
weights, sculpt the first basis spectra into the proper upper harmonics in all the PCA tables. Even with can-
odginal I PC match
60 100 150 2_0 260 300 350 4430 450 0 150 100 150 200 250 300 350 400 450
tkne(inanalya frames) time(inanalytk frame)
3 PCmatch 5 PCmatch
6000
50 100 150 200 250 300 3_0 400 450 50 I00 150 200 250 300 350 400 450
time(Inanal_ Irate#) time(Inanalyinltram#)
Fig. 24. Fourth-harmonic amplitude envelope of tenor voice: Original and 1-, 3-, and 5-table PCA matches. Duration is 3.9 s.
ori_nal I PC match
18ooo
ltooo
:ot
timeOnamllyl_ lflntN ) tkne(inana¥1_f_)
3 PCmatch 5 PCmatch
18000 I(_Oo
16000 igloo
14000 14000
12000 12_00 .
_oooo J loooo
- !
4000
0 0
t_ (bianalyskframes) time('il Ju_aly_fran_)
Fig. 25. Second-harmonic amplitude envelope of guitar: Original and 1-, 3-, and 5-table PCA matches. Duration is 8 s.
352 J. AudioEng.Soc.,Vol.41,No.5, 1993May
PAPERS SYNTHESIS OF MUSICAL INSTRUMENT TONES
cellation, significant energy is bound to he left un- 6 CONCLUSIONS
checked in the higher frequencies during the low-
bandwidth spectral decay of the tone. This accounts We have explored two techniques for determining
for the excess brightness in the synthesized decay, due basis spectra and amplitude envelopes for resynthesizing
to the low-versus high-amplitude problem noted, which tones via multiple fixed wavetable synthesis. Breaking
is characteristic of the PCA trumpet matches, down the matching processes into efficient, robust
Fig. 28 shows the average relative error [defined in subprocedures was central to the success of both the
Eq. (9)] plotted against the number of principal eom- GA-index and the PCA-based techniques. For four or
ponent spectra used to match various tones. As in Fig. more basis spectra, the GA-index and PCA methods
15, when the number of basis spectra approaches the gave similar results, but on average the GA-index results
number of harmonics, the error tends to zero. However, were markedly better. For less than four basis spectra,
the nature of the convergence is different. In general, the GA-index approach was clearly superior. In the
the GA-index method leaves less error in its one- and future we expect that these matching methods will be
two-table matches. With four or more tables, the errors used to facilitate applications such as data reduction,
are generally similar, although on average the GA data stretching, and synthesis by rule.
method still wins. This suggests that for one or two
wavetables the GA method is clearly superior. More- 7 ACKNOWLEDGMENT
over, the GA-index method shows more consistent im-
provement than the PCA method as the number of basis This material is based on work supported by the CERL
spectra is increased. Sound Group and the Computer Music Project at the
In conclusion, though statistically optimal, PCA University of Illinois at Urbana-Champaign. The work
should by no means be regarded as the best match- was facilitated by NeXT computers in the Computer
ing method. So far, results found by genetic selec- Music Project at the School of Music of the UIUC and
tion of the analysis spectra are perceptually and nu- Symbolic Sound Corporation's Kyma workstation. The
merically superior for relatively small numbers of authors wish to thank the members of the CERL Sound
wavetables. However, results, such as that found by Group, whose input and feedback have been invaluable
PCA for the two-basis-spectra tenor match, may be in this work. These include Kurt Hebel, Carla Scaletti,
useful in terms of decomposing the tone into corn- Bill Walker, Kelly Fitz, and Richard Baraniuk. Thanks
ponents for further analysis or modification. Thus are also due to Lydia Ayers, Chris Gennaula, Camille
genetic and principal-components-based matching Goudeseune, Chris Kriese, and Michael Hammond of
techniques offer different perspectives on the match- the Computer Music Project for conversations related
ing analysisof sounds, to thiswork.
od_nal 1 PCmatch
1200
10oo
8OO
i i 2
0
200 400 600 EX) 10(30 1_0 1400 1_0 0 200 400 600 800 10(30 1200 1400 1600
time(laanat_ fran',e_) trine(inanalyskfram#)
3 PC match 5 PCmatch
1200
100(3
800
0 2oo 4o0 60o goo l_ ,_ l_ l_ 0 _ 4oo _0 _ ,&o ,/00 lioo l_
tinle(inana¥_ frarm_) lime(laanaysisframes)
Fig. 26. Fourth-harmonic amplitude envelope of guitar: Original and 1-, 3-, and 5-table PCA matches. Duration is 3.9 s.
J.AudioEng.Soc.,Vol.41,No.5,1993May 353
HORNER ET AL. PAPERS
8 REFERENCES [4] R. Payne, "A Microcomputer Based Analysis/
Resynthesis Scheme for Processing Sampled Sounds
[1] B. Atal and S. Hanauer, "Speech Analysis and Using FM," in Proc. 1987 Int. Computer Music Conf.
Synthesis by Linear Prediction of the Speech Wave," (Int. Computer Music Assn., San Francisco, CA, 1987),
J. Acoust. Soc. Am., vol. 50, pp. 637-655 (1971). pp. 282-289.
[2] C. Dodge, "In Celebration: The Composition [5] N. Delprat, P. Guillemain, and R. Kronland-
and Its Realization in Synthetic Speech," in C. Roads, Martinet, "Parameter Estimation for Non-Linear Re-
Ed., Composers and the Computers (A-R Editions, synthesis Methods with the Help of a Time-Frequency
Inc., Madison, WI, 1985), pp. 47-74. Analysis of Natural Sounds," in Proc. 1990 Int. Cora-
l3] J. W. Beauchamp, "Synthesis by Spectral Am- puter Music Conf. (Int. Computer Music Assn., San
plitude and 'Brightness' Matching of Analyzed Musical Francisco, CA, 1990, pp. 88-90.
Instrument Tones," J. Audio Eng. Soc., vol. 30, pp. [6] A. Horner, J. Beauchamp, and L. Haken, "FM
396-406 (1982 June). Matching Synthesis with Genetic Algorithms," Corn-
table 1
1.0
0.9
._ 0.0
0.4
O.2 I
o I I I , I I I i ,
5 10 lS 20
harmoni= number
table 2
1,0
0,8
0.6
0.4
i_ 0.2 I
o [
-O.4
o _ lo 15 20
harmonic number
table 3
1.0
0.9
0.6
0.4
{ I
o I
[ I I 'l'''' L'
-O.2
-O.4
06
Oo
o _ lO is 20
harmmonlc number
3000
r tab,o1
[-- -----_-_ table 2 [
2000
1000
0
0 230 400 600 8()0 10'00 12'00 14'00 16'00
time (in analysis frames)
Fig. 27. Basis spectra and amplitude weight envelopes for a 3-table trumpet PCA match.
354 J. Audio Eng. Soc., V01.41, No. 5, 1993 May
PAPERS SYNTHESISOF MUSICAL INSTRUMENTTONES
puter Music J., to be published, vol. 17 (1993). by Fractional Addressing," J. Acoust. Soc. Am., vol.
[7] P. Kleczkowski, "Group Additive Synthesis," 82, pp. 1883-1891 (1987).
Computer Music J., vol. 13, no. 1, pp. 12-20 (1989). [18] F. R. Moore, "Tablelookup Noise for Sinusoidal
[8] J. Stapleton and S. Bass, "Synthesis of Musical Digital Oscillators," Computer Music J., vol. 1, no.
Tones Based on the Karhuene-Lo_ve Transform," IEEE 2, pp. 26-29 (1977).
Trans. Acoust., Speech, SignalProcess., vol. 36, pp. [19] J. B. Allen, "Short Term Spectral Analysis,
305-319 (1988). Synthesis, and Modification by Discrete Fourier
[9] M.-H. Serra, D. Rubine, and R. Dannenberg, Transform," IEEE Trans. Acoust., Speech, Signal
"Analysis and Synthesis of Tones by Spectral Inter- Process., vol. ASSP-25, pp. 235-238 (1977).
polation," J. Audio Eng. Soc., vol. 38, pp. 111-128 [20] R. McAulay and T. Quatieri, "Speech Analysis/
(1990 Mar.). Synthesis Based on a Sinusoidal Representation," IEEE
[10] D. Goldberg, Genetic Algorithms in Search, Trans. Acoust., Speech, Signal Process., vol. ASSP-
Optimization, and Machine Learning (Addison-Wesley, 34, pp. 744-754 (1986).
Reading, MA, 1989). [21] R. Maher and J. Beauchamp,"An Investigation
[11] J. Holland, Adaptation inNatural and Artificial of Vocal Vibrato for Synthesis," Appl. Acoust., vol.
Systems. (University of Michigan Press, Ann Arbor, 30, pp. 219-245 (1990).
1975). [22] W. Press, B. Flannery, S. Teukolsky, and W.
[ 12] G. Dunteman, Principal Components Analysis Vetterling, Numerical Recipes (Cambridge University
(Sage Publ., Newbury Park, CA, 1989). Press, Cambridge, UK, 1989).
[13] C. Chu, "A Genetic Algorithm Approach to the [23] G. Golub and C. Van Loan, Matrix Computa-
Configuration of Stack Filters," in Proc. 3rdlntl. Conf. tions (Johns Hopkins University Press, Baltimore, MD,
on Genetic Algorithms and Their Applications (Morgan 1983).
Kaufmann, San Mateo, CA, 1989), pp. 112-120. [24] M. Clark, Jr., D. Luce, R. Abrams, H.
[14] A. Homer and D. Goldberg, "Genetic Algo- Schlossberg, and J. Rome, "Preliminary Experiments
rithms and Computer-Assisted Music Composition," on the Aural Significance of Parts of Tones of Orchestral
in Proc. 1991 Int. ComputerMusic Conf. (Int. Computer Instruments and on Choral Tones," J. Audio Eng. Soc.,
Music Assn., San Francisco, CA, 1991), pp. 479- vol. 11, pp. 45-54 (1963).
482. [25] K. Berger,"SomeFactorsin the Recognition
[15] J. Stautner, "Analysis and Synthesis of Music of Timbre," J. Acoust. Soc. Am., vol. 41, pp. 793-
Using the Auditory Transform," masters thesis, Dept. 806 (1963).
of Electrical and Computer Science, M.I.T., Cam- [26] J. Grey and J. Moorer, "Perceptual Evaluations
bridge, MA (1983). of Synthesized Musical Instrument Tones," J. Acoust.
[16] S. Zahorian and M. Rothenberg, "Principal- Soc. Am., vol. 62, pp. 454-462 (1977).
Components Analysis for Low Redundancy Encoding [27] J. Chowning, "Computer Synthesis of the
of Speech Spectra," J. Acoust. Soc. Am., vol. 69, pp. Singing Voice," in Sound Generation in Wind, Strings,
832-845 (1981). Computers (Royal Swedish Academy of Music, Stock-
[17] W. Hartmann, "Digital Waveform Generation holm, Sweden, 1980).
1.0
0.9 "...
0.8 ', I trumpet1
" -.........tenor
0.7 ,.. -.... guitar
0.6 ':
0.5 ";,
0.4 ':ii.
0.3 ":
o ;. 6 8 9
number of tables
Fig. 28. Convergence of average relative error with increasing numbers of principal components.
J. Audio Eng. Soc., VoL 41, No. 5, 1993 May 355
HORNERETAL. PAPERS
THE AUTHORS
I
A.Horner J. Beauchamp L. Haken
Andrew Homer was born in San Rafael, CA in 1964. computer music in both the School of Music and the
He received a B.M. degree in music from Boston Uni- Department of Electrical and Computer Engineering.
versityandanM.S, in computer science from the Uni- Since 1984 he has directed areas of musical timbre
versity of Tennessee, Knoxville. He is currently fin- characterization based on time-variant spectra of mu-
ishing his Ph.D. studies in computer science at the sical instrument tones, nonlinear/filter synthesis, and
University of Illinois at Champaign-Urbana. musical pitch detection.
At the University of Illinois, he is a researcher for He is a member of the Acoustical Society of America,
the School of Music's Computer Music Project, the a fellow ofthe Audio Engineering Society, andamem-
CERL Sound Group, and the Center for Complex Sys- ber of the International Computer Music Association
tems Research. He has also been supported by the I1- and its board of directors.
linois Genetic Algorithm Laboratory. His primary re- ·
search interests are in applying computational evolution
to sound computation and computer-assisted compo- Lippold Haken was born in Munich, West Germany,
sition, and has lived mostly in central Illinois. He received a
B.S. degree in 1982, an M.S. degree in 1984, and a
· Ph.D. in 1989in Electrical and Computer Engineering
James Beauchamp was born in Detroit in 1937. He from the University of Illinois. He is an Assistant Pro-
received B.S. and M.S. degrees in electrical engineering fessor of Electrical and Computer Engineering at the
from the University of Michigan during 1960-61 and University of Illinois, with research interests in audio
a Ph.D. in electrical engineering from the University signal processing, computer architecture, and user in_
of Illinois at Urbana-Champaign in 1965. During terface hardware and software.
1962-65 he developed analogsynthesizer equipment He is leader of the CERL Sound Group, and has
for the electronic music studio at UIUC under spon- developed real-time audio signal processing hardware
sorship of the Magnavox Company. He joined the UIUC and software, focusing primarily on time-frequency
electrical and computer engineering faculty in 1965 analysis and synthesis of musical instruments. He is
and began work on time-variant spectrum analysis/ coauthor of a sophisticated music notation editor, Lime.
synthesis of musical sounds. During 1968-69 he was He is also leader of the hardware design group for the
a research associate at Stanford University's Artificial Zephyr, a high-speed mainframe computer built at the
Intelligence Project working on problems in speech University of Illinois Computer-based Education Re-
recognition. Since 1969 he has held a joint faculty search Laboratory. The Zephyr provides centralized
appointment in music and electrical and computer eh- program execution and data-keeping for thousands of
gineering at UIUC. In 1988 he was a visiting scholar simultaneous users on NovaNET, a real-time nationwide
at Stanford's Center for Computer Research in Music network used for computer-based instruction. Currently
and Acoustics. Dr. Haken is teaching a new project-oriented course
Dr. Beauchamp teachers courses at UIUC in musical that introduces the major areas of electrical and corn-
acoustics, electronic music technology, audio, and puter engineering to freshmen.
356 J. AudioEng.Soc.,Vol.41,No.5, 1993May
... It has been first introduced in [4] by German company PPG and later used in many popular Japanese synthesizers from the 1980s. Notwithstanding its simplicity, the technique offers the possibility of creating evolving and morphing sounds [5], [6], making it quite useful in practice. Although being relatively simple in its implementation, some studies and tutorials can be found in the scientific literature, e.g. ...
... As anticipated in the Section II-B, low-order interpolation schemes, such as zero-order hold and linear interpolation, can introduce significant imaging artifacts when ρ < 1. However, a wide selection of strategies can be employed to alleviate this problem, such as low-pass filtering the output to attenuate frequency components above ρf s /2, offline resampling the wavetable signal to higher sample rates, and/or multiple wavetable synthesis techniques [6]. Therefore, in the rest of this paper we will ignore this issue. ...
... III-F, this is due to the summation in Eq. (21), whose number of terms (i max − i min ) increases with playback speed and number of samples in the wavetable. In practice, this problem can be overcome by applying a method that is common to wavetable synthesizers, i.e., the use of multiple wavetables and split points [6]. The method consists in generating multiple wavetables, e.g., one per octave, by means of high quality offline resampling of the original waveform. ...
Article
Full-text available
In the last decades many efficient methods have been proposed to generate waveforms with reduced aliasing, proving this field quite mature. However, the introduction of Antiderivative Antialiasing (AA) methods for the reduction of aliasing in nonlinear discrete-time processing, can shed a new light on bandlimited oscillators and provide a general method for dealing with aliasing in arbitrary waveforms generation. In this work we will first bridge the gap between AA methods and bandlimited waveform generation, and then show the flexibility of the recently introduced AA-IIR method for dealing with both classical synthesizer waveforms and arbitrary wavetables. We will show how AA methods can be used for periodic and non-periodic waveform generation, and provide an innovative AA-IIR method able to compute alias-reduced version of any waveform with arbitrary antialiasing filter order, without the effort of computing ad hoc analytical expressions. Antialiasing performance is compared with other well known methods, showing the effectiveness of the approach.
... Recent parameter matching methods for multiple wavetable synthesis have used a simple relative spectral error formula to measure how accurately the synthetic spectrum matches an original spectrum [1]. It is supposed that the smaller the spectral error, the better the match, but this is not always true. ...
... Multiple wavetable synthesis [1] is an efficient synthesis technique based on the addition of a number of fixed waveforms with time-varying weights. Matching synthesis starts with a time-varying spectral analysis of the original sound. ...
Article
Recent parameter matching methods for multiple wavetable synthesis have used a simple relative spectral error formula to measure how accurately the synthetic spectrum matches an original spectrum. It is supposed that the smaller the spectral error, the better the match, but this is not always true. A modified error formula is described, which takes into account the masking characteristics of our auditory system, as an improved measure of the perceived quality of the matched spectrum. Selected instrument tones have been matched using both error formulas and resynthesized. Listening test results show that wavetable matching using the perceptual error formula slightly outperforms ordinary matching, especially for instrument tones that have several masked partials.
... Transform codecs are hardly ever used for encoding soundbank audio data. In order to obtain a sound of a required pitch, decimation and interpolation algorithms are used, as in the case of sample rate conversion (Horner et al., 1993). Furthermore, in order to keep the synthesis complexity low, the interpolation filter quality of a samplebased synthesizer deteriorates proportionally to the current polyphony (Horner et al., 1993). ...
... In order to obtain a sound of a required pitch, decimation and interpolation algorithms are used, as in the case of sample rate conversion (Horner et al., 1993). Furthermore, in order to keep the synthesis complexity low, the interpolation filter quality of a samplebased synthesizer deteriorates proportionally to the current polyphony (Horner et al., 1993). This can result in noisy artefacts accompanying dense fragments of a synthesized music. ...
Article
This paper reviews parametric audio coders and discusses novel technologies introduced in a low-complexity, low-power consumption audio decoder and music synthesizer platform developed by the authors. The decoder uses parametric coding scheme based on the MPEG-4 Parametric Audio standard. In order to keep the complexity low, most of the processing is performed in the parametric domain. This parametric processing includes pitch and tempo shifting, volume adjustment, selection of psychoacoustically relevant components for synthesis and stereo image creation. The decoder allows for good quality 44.1 kHz stereo audio streaming at 24 kbps. The synthesizer matches the audio quality of industry-standard sample-based synthesizers while using a twenty times smaller memory footprint soundbank. The presented decoder/synthesizer is designed for low-power mobile platforms and supports music streaming, ringtone synthesis, gaming and remixing applications.
... A storm, for example, may need to wax and wane through rage calm as the story unfolds. A range of techniques from physical modelling (Cook, 1997;Smith, 1992) acoustic modelling (Arfib , D., 1979;Horner, Beauchamp, & Haken, 1993;Serra, 1997;Wyse, 2004), and sample based techniques can be used to provide flexible, interactive, and when appropriate, realistic sounds under the real-time control of a story teller. ...
Article
Full-text available
The traditional practice of oral storytelling has particular characteristics that make it amenable to extending with interactive electroacoustic sound. Recent developments in mobile device and sound generation technologies are also lend themselves to the particular practices of the traditional art form. This paper establishes a context for interactive sound design in a domain that has been little explored in order to create an agenda for future research. The goal is to identify the opportunities and constraints for sound particularly suited to live storytelling, and to identify criteria for evaluating interaction designs. The storytelling domain addressed includes not only particular instances of telling, but also the variability of stories between tellings and tellers, as well as the mechanisms by which stories are passed between tellers. The outcome of the research will be a computer-based platform providing storytellers with the ability to create auditory scenes, sonic elements, and vocal transformations that are controllable in real time in order to support the telling, retelling, and sharing of stories.
Chapter
The possibility of generating sounds from computers has had an extraordinary impact on the way in which the musical message is created and enjoyed. In 1962 Mathews et al. conclude their article Musical Sounds from Digital Computers [1], by stating that.
Book
Full-text available
Książka "Architektura informacji istotą projektu" jest wielowątkowym głosem w dyskusji o problemach, zasadach, rozwiązaniach oraz badaniach z dziedziny, która obecnie bardzo intensywnie się rozwija zarówno w swoich zastosowaniach praktycznych, jak i w teorii. Zawarte w niej treści zostały przygotowane przez praktyków i teoretyków – specjalistów architektury informacji, naukowców i studentów. Jej tematyka obejmuje zagadnienia związane z dydaktyką z tego zakresu. Podejmuje również tematy konkretnych rozwiązań stosowanych w budowaniu zasobów cyfrowych w ramach ich architektury informacji. Z architekturą informacji nierozłącznie związane są również tematy dotyczące użytkowników zasobów i serwisów internetowych oraz projektowania ich treści i funkcjonalności pod kątem potrzeb odbiorców. Ta tematyka jest również interesująco przedstawiona w niniejszej pracy. Architektura informacji i jej wizualizacja to zagadnienia mocno ze sobą splecione. Odzwierciedleniem tych połączeń są zawarte w książce teksty ukazujące zagadnienia wizualizacji informacji oraz analiz praktycznych związanych z konkretnymi projektami i serwisami. (Prof. dr hab. Ewa Głowacka, Uniwersytet Mikołaja Kopernika w Toruniu)
Thesis
Cette recherche vise à étudier l’intégration contemporaine, dans la composition et la performance, des outils technologiques et artistiques correspondant à l’état de l’art en matière d’interaction temps-réel entre production instrumentale, production numérique du son et production de formes spatio-temporelles dans le lieu d’écoute. On souhaite notamment étudier comment cette intégration pourra constituer en retour une nouvelle modalité de l’écriture où fusionnent en cohérence une écriture du son, une écriture du temps et une écriture de l’espace informées par la technologie.Les paradigmes informatiques pour la gestion du temps et de l'interaction, les outils de synchronisation de processus, l'analyse de flux sonores et gestuels, le contrôle des paramètres à partir du son instrumental, la recherche sur la question du timbre instrumental et de ses descripteurs numériques et l’interaction interprète-ordinateur forment les éléments clefs de cette recherche.L’idée principale de ce travail est centrée sur l’interaction en temps réel avec des dispositifs informatiques avancés, dans le cadre d’une écriture particulièrement virtuose, avec des aspects spécifiques de construction temporelle et spatiale, cette situation hybride influençant en retour la nature de l’écriture elle-même. Les différents thèmes relatifs à cette exploration, tels que l’écriture du son, du temps et de l’espace, sont le point de départ pour décliner et développer, selon la nature des diverses productions envisagées, les liens possibles avec d’autres disciplines artistiques.
Chapter
IntroductionBackground Quantization of Audio SignalsTraditional Conversion Methods Advanced Conversion TechniquesTransmission and Storage of Digital Audio DataDigital Audio Signal Processing: Tools and ApplicationsSummaryReferences
Article
Full-text available
Replicating musical instruments is a classic problem in computer music. Asystematic collection of instrument designs for each of the main synthesis methods has long been the El Dorado of the computer music community. Here is what James Moorer, the pioneering computer music researcher at Stanford University and later director of the audio project at Lucasfilm, had to say about it (Roads 1982):
Article
Full-text available
Article
Full-text available
Historically, frequency modulation (FM) synthesis has required trial and error to create emulations of natural sounds. This article presents a genetic-algorithm-based technique which determines optimized parameters for reconstruction through FM synthesis of a sound having harmonic partials. We obtain the best results by using invariant modulation indices and a multiple carrier formant FM synthesis model. We present our results along with extensions and generalizations of the technique.
Article
Where conditions, such as other machinery noise or limited space, prohibit noise measurements, the sound power must be calculated from the surface vibration velocity measured by accelerometers. A system was developed which could acquire average and store thousands of spectrum levels by interfacing a 1/3 octave band filter set with a microcomputer. This system was applied to measurements of fuel injection pump noise, sound power of machine tools and presses, and evaluation of hearing protectors.
Article
The principal-components statistical procedure for data reduction is used to efficiently encode speech power spectra by exploiting the correlations of power spectral amplitudes at various frequencies. Although this data-reduction procedure has been used in several previous studies, little attempt was made to optimize the methods for spectral selection and coding through the use of intelligibility testing. In the present study, principal-components basis vectors were computed from the continuous speech of several male and female speakers using various nonlinear spectral amplitude scales. Speech was synthesized using a combination linear predictive (LP) principal-components vocoder. Of the amplitude scales investigated for use with a principal-components analysis of speech spectra, logarithmic amplitude coding of non-normalized spectra emerged as a slight favorite. Speech synthesized from four principal components was found to be about 80% intelligible using a form of the Diagnostic Rhyme Test for rhyming word pairs and about 95% intelligible for words within a sentence context. Speech synthesized from spectral principal components compared favorably in intelligibility and quality with speech synthesized from a control LP vocoder with the same number of parameters.
Article
An analysis‐based synthesis technique for the computer generation of musical instrumenttones was perpetually evaluated in terms of the discriminability of 16 original and resynthesized tones taken from a wide class of orchestral instruments having quasiharmonic series. The analysis technique used was the heterodyne filter—which produced a set of intermediate data for additive synthesis, consisting of time‐varying amplitude and frequency functions for the set of partials of each tone. Three successive levels of data reduction were applied to this intermediate data, producing types of simplified signals that were also compared with the original and resynthesized tones. The results of this study, in which all combinations of signals were compared, demonstrated the perceptual closeness of the original and directly resynthesized tones. An orderly relationship was found between the form of data reduction incurred by the signals and their relative discriminability, measured by a modified AAAB discrimination procedure. Direct judgments of the relative sizes of the differences between the tones agreed with these results; multidimensional scaling of the latter data provided a visual display of the relationships among the different forms of tones and pointed out the importance of certain small details existing in the attack segments of tones. Of the three forms of simplification attempted with the tones, the most successful was a l i n e‐s e g m e n t a p p r o x i m a t i o n to time‐varying amplitude and frequency functions for the partials. The pronounced success of this modification strongly suggests that many of the microfluctuations usually found in the analyzed physical attributes of music instrument tones have little perceptual significance.