Conference PaperPDF Available



Abstract and Figures

One goal of physical modelling synthesis is the creation of new virtual instruments. Modular approaches, whereby a set of basic primitive elements can be connected to form a more complex instrument have a long history in audio synthesis. This paper examines such modular methods using finite difference schemes, within the constraints of real-time audio systems. Focusing on consumer hardware and the application of parallel programming techniques for CPU processors, useable combinations of 1D and 2D objects are demonstrated. These can form the basis for a modular synthesis environment that is implemented in a standard plug-in architecture such as an Audio Unit, and controllable via a MIDI keyboard. Optimisation techniques such as vectorization and multi-threading are examined in order to maximise the performance of these com-putationally demanding systems.
Content may be subject to copyright.
Proc. of the 18th Int. Conference on Digital Audio Effects (DAFx-15), Trondheim, Norway, Nov 30 - Dec 3, 2015
Craig J. Webb
Acoustics and Audio Group
University of Edinburgh
Edinburgh, UK
Stefan Bilbao
Acoustics and Audio Group
University of Edinburgh
Edinburgh, UK
One goal of physical modelling synthesis is the creation of new
virtual instruments. Modular approaches, whereby a set of basic
primitive elements can be connected to form a more complex in-
strument have a long history in audio synthesis. This paper exam-
ines such modular methods using finite difference schemes, within
the constraints of real-time audio systems. Focusing on consumer
hardware and the application of parallel programming techniques
for CPU processors, useable combinations of 1D and 2D objects
are demonstrated. These can form the basis for a modular synthe-
sis environment that is implemented in a standard plug-in architec-
ture such as an Audio Unit, and controllable via a MIDI keyboard.
Optimisation techniques such as vectorization and multi-threading
are examined in order to maximise the performance of these com-
putationally demanding systems.
A principle objective of physical modelling synthesis is to emu-
late existing acoustic or analog electronic instruments. However,
another wide-ranging goal is to create entirely new instruments
that do not necessarily have existing real-world counterparts—yet
which generate sounds of a natural character. If the goal is the
latter, then one often-used approach is to provide the user (the in-
strument designer) with a collection of primitive or canonical ob-
jects, as well as a means of connecting them, so as to allow for the
generation of relatively complex sound-producing objects. Such
a modular approach of course has a long history in audio synthe-
sis, and deeper roots in the world of analog electronic synthesiz-
ers. In terms of physical modelling synthesis, modularity underlies
various synthesis methodologies including mass-spring networks
[1, 2] modal synthesis [3], as well as scattering-based structures
A different approach, also allowing for modular design, in-
volves the use of direct spatiotemporal integrators such as finite
difference time domain methods. Such methods have various ben-
efits, such as minimal pre-computation and memory usage, and
also provable stability conditions [5]—which are extremely im-
portant under the very general design conditions that a musician
will almost certainly demand. Such methods can be used to cre-
ate large-scale models of instruments that can be coupled inside
a 3D virtual space [6]. However, the simulation of such com-
plex systems is computationally very intensive, to the extent that
even with high performance computing sound output can not be
This work was supported by the European Research Council, under
grant number StG-2011-279068-NESS
produced near the real-time threshold. Simulation for certain sys-
tems, however, is now coming within range of real-time. Simpler
1D and 2D linear objects have manageable levels of computation,
and can form the basis of a modular system when using multiple
connected objects. Nonlinear behaviour can be introduced through
the connection elements, leading to a wide range of possible sonic
outputs. The purpose of this study is to demonstrate the audio
programming techniques used in order to develop a useable real-
time system which runs on consumer hardware (i.e., a laptop/basic
desktop machine, with implementation in a standard plug-in archi-
tecture such as an Audio Unit [7]).
The remaining sections of this paper are as follows: Section
2 presents model equations for stiff strings and plates, and for the
nonlinear connections between these objects, as well as a general
description of the numerical update equations in the run-time loop.
Section 3 examines the maximum size of the most expensive ele-
ment, the linear plate, that can be computed in one second at a
sample rate of 44.1kHz, and the various optimisations that can be
used for CPU computation. Section 4 applies the same testing,
but within a working Audio Unit plug-in. Finally, Section 5 de-
tails the possibilities and issues involved in designing full modular
systems using multiple objects, whilst Section 6 gives concluding
remarks. A number of audio examples can be found at this web-
A complete modular percussion synthesis environment, based on
user-specified interconnections among a set of bars and plates, has
been described in detail in [8]. In that case, the environment was
implemented in the Matlab language, and performance was a long
way from being real-time, in all but the simplest configurations. In
this section, the model’s basic operation is briefly summarized.
2.1. Components
The current system is composed of a collection of stiff strings/bars
and rectangular plates, vibrating, in isolation, under linear condi-
tions. For such an object in isolation, and subject to an excitation
force, the dynamics are described by the following equation:
Lu+gefe= 0 (1)
Here, u(x, t)represents the transverse displacement of the object,
as a function of time t, and at spatial coordinate x∈ D. If the
object is a stiff string, the spatial domain is a one-dimensional in-
terval D= [0, L], for some length L, and for a plate, the domain
Proc. of the 18th Int. Conference on Digital Audio Effects (DAFx-15), Trondheim, Norway, Nov 30 - Dec 3, 2015
is a two-dimensional rectangular region D= [0, Lx]×[0, Ly], for
side lengths Lxand Ly. See Figure 1.
In the case of a stiff string, the operator L=Lsis defined as
tT ∂2
x+EI 4
x+ 2ρAσ0t2ρAσ1t2
Here, ρis the material density, in kg/m3,Ais cross-sectional area,
in m2,Tis tension, in N, Eis Young’s modulus, in Pa, Iis the
moment of inertia, in m4, and σ0and σ1, both non-negative, are
parameters allowing for some control over frequency dependant
loss. tand xrepresent partial differentiations with respect to
time tand the spatial coordinate x, respectively. The stiff string
equation must be complemented by two boundary conditions at
each end of the domain D. In the current environment, these may
be chosen as clamped, simply supported or free, separately at each
end of the string. Note that under low tension, the system describes
the behaviour of a stiff bar.
In the case of a plate, the operator L=Lpis defined as
Lp=ρH∂ 2
t+EH 3
12 (1 ν2)2+ 2ρH σ0t2ρHσ1t(3)
Here, ρ,E,σ0and σ1are as before, His thickness, in m, and ν
is Poisson’s ratio. Here, is the Laplacian operator in two spa-
tial dimensions. The plate equation must be complemented by two
boundary conditions at each edge of the domain D. Here, ten-
sioning effects (to simulate a membrane) have not been included,
as computational costs become too large for real-time under most
conditions—it is not difficult to introduce an extra term above al-
lowing for such tensioning effects.
In either case, the term geferepresents an excitation. Here,
ge=ge(x)is a spatial distribution representing the region over
which the excitation is applied, and fe(t)is an externally applied
force. For typical striking or plucking gestures, ge(x)is highly
localised (and perhaps idealised to a Dirac delta function). For
a strike or pluck, feis also usually localised. For example, the
function fedefined by
fe(t) = f0
21cos tt0
T, t0tt0+T(4)
and which is zero otherwise is a good approximation to the force
history of a strike, occurring at time t=t0, of duration Tseconds,
and of maximum force f0when q= 2, and a pluck when q= 1.
Figure 1: Basic stiff string or bar (left) and plate (right) elements.
2.2. Connecting elements
Consider now two components aand bwith transverse displace-
ments uaand ub, under a single connection:
Laua+gafc= 0 Lbubgbfc= 0 (5)
Here, Laand Lbare linear operators of the types given in (2) and
(3), each defined by an appropriate set of material and geometric
parameters; the components can be of either stiff string of plate
type. Here, gaand gbare again distributions, of dimension appro-
priate to the particular component, selecting a region of interaction
for the connection element. fc(t)is the connection force in N. In
particular, it acts in an equal and opposite sense on the two objects
(see Figure 2).
Figure 2: Connection between stiff string element and plate ele-
Until the form of the connection has been specified, the system
is not complete. To this end, define an interaction distance ηc(t),
averaged over the connection region, by
where Daand Dbare the domains of definition of the two compo-
nents, and where dVaand dVbare differential elements of appro-
priate dimension for the two components. The connection force
fc(t)can be related to the interaction distance ηc(t)by
dt (7)
for connection parameters K1,K3and R, all assumed non-negative.
Such a connection may be thought of as a combination of a linear
spring, a cubic nonlinear spring, and a linear damper. Many other
choices are of course possible, but such a configuration already al-
lows for a wide variety of connection types and, furthermore, in
the numerical case, leads to an efficient implementation in which
energy-based stability guarantees are available, see [8]. Such nu-
merical stability guarantees are particularly important in the case
of arbitrary nonlinear modular networks in the hands of a com-
poser/designer/musician. Once a single excitation and a single
connection element have been described, it is not a large step to
describe a network composed of a multitude of objects, linked by
an arbitrary number of connections, and under an arbitrary number
of excitations, as in Figure 3.
2.3. Finite Difference Schemes and State Space Update Form
In a finite difference implementation, each object is represented by
a set of values defined over a grid—1D in the case of a stiff string,
and 2D in the case of a plate. The complete mechanics of the con-
struction of finite difference schemes for such objects is provided
in [8], and is far too involved to be re-presented here, particularly
as this paper is concerned mainly with real-time implementation.
It is useful nonetheless to briefly describe the vector-matrix update
equations for a complete system.
Proc. of the 18th Int. Conference on Digital Audio Effects (DAFx-15), Trondheim, Norway, Nov 30 - Dec 3, 2015
Figure 3: Interconnected network of plate and stiff string elements.
Consider an object of either stiff string or plate type in iso-
lation, subject to a single excitation, and for which the dynamics
obey (1). Any such object may be approximated by values over a
grid—in 1D, for a stiff string, or 2D for a plate (Figure 4). Suppose
now that the column vector unrepresents the concatenation of all
values for all system components at time step n; here, time steps
are separated by Tsseconds, where the sample rate Fsis defined
as Fs= 1/Ts. The defining equations are of second order in the
time variable t, and in the discrete case, two-step recursions are
minimal in terms of memory usage, and are a natural choice.
Figure 4: Grids for plate and stiff string elements.
The following recursion, operating at an audio sample rate,
may be derived from the combination of external excitations, as
given in (1) and connections, as per (5), for a system constructed of
multiple components and connections, using standard procedures:
un+1 =Bun+Cun1Gefn
Here the total state size (that of un) is N, the total number
of connections is Nc, and the total number of separate excitations
is Ne. The constant matrices Band Care square and N×N,
sparse, and incorporate the effects of boundary conditions in the
various components. In particular, they may be derived directly
from discretisation of the individual defining operators Lsand Lp
for the various components.
The update (8), including only the terms involving the matri-
ces Band Cis a simulation of a set of isolated, unforced compo-
nents. The column vector fn
eis of size Ne×1, and consists of the
external forcing signals, which could in principle be of any form,
but in the current system are constrained to be sampled from the
pluck/strike waveforms given in (4). The constant matrix Geis
N×Ne, consisting of columns representing the spatial distribu-
tions of the various excitations, sampled over the constituent grids.
Finally, fn
cis an Nc×1vector consisting of the connection forces,
and, as in the case of the excitations, Gcis an N×Ncconstant
matrix, the columns of which contain the spatial connection distri-
In the absence of the connection forces, update (8) is a com-
plete recursion for a set of isolated externally-forced objects. The
connection forces fn
c, however, are related nonlinearly to the yet-
to-be-determined state un+1. A semi-implicit choice of discreti-
sation, as discussed in [8] is useful, in that it leads to numerical
stablity conditions, and also to the simple expression:
where An=Aun,un1and bn=bun,un1. Though
nonlinear, it may be solved through an inversion of the Nc×Nc
matrix An(which is positive definite by construction); indeed, if
the connection spatial distribution functions are non-overlapping,
then Anis diagonal, and the solution to (9) may be computed
using Ncdivisions. Once fchas been determined, it may then be
used to advance the solution in (8).
The computational complexity of the individual objects, at an au-
dio rate of 44.1 kHz, can be ranked as follows:
1. Connection: 20 floating-point calculations/time step.
2. Bar / String: 102103floating-point calculations/time step.
3. Linear Plate: 103105floating-point calculations/time step.
Such operation counts, for the stiff string and plate, follow from
the required grid resolution, which is itself dependent on the sam-
ple rate. This section details initial testing of the most expensive
element, the linear plate, outside of a real-time audio system. As a
basic ‘benchmark’ we can assess the largest size of plate that can
be computed within one second of wall clock time, at 44.1kHz. In
the absence of any real-time audio overheads, this would be the
outer limit of achievable computation; the amount of computation
that can be performed in an actual audio plug-in is somewhat less
than this, as is shown in Section 4.
As the main objective of this study is to evaluate what is achiev-
able on recent consumer hardware, all testing is performed on an
Intel Ivy Bridge Core i7 3770S processor. This has four physical
cores, a base clock rate of 3.1GHz, turbo clock rate of 3.9GHz,
and a maximum memory bandwidth of 25.6 GB/s. Preliminary
testing on a newer Haswell Core i7 showed similar results, with
a small improvement of around 5%. The testing performed here
is restricted to double precision floating-point—see Section 6 for
further discussion on this point. The purpose of this section is to
analyse the options for implementing the plate object in C code,
and the various optimisation strategies available to the program-
Updating the state of each point on the plate requires a weighted
sum of thirteen neighbouring points from the previous time step
Proc. of the 18th Int. Conference on Digital Audio Effects (DAFx-15), Trondheim, Norway, Nov 30 - Dec 3, 2015
(See Figure 5), as well as five neighbouring points from two time
steps past. The grid also includes a boundary layer, and halo layer
of non-updated ghost points. A clamped boundary condition is
used for this testing section.
Figure 5: 13 point update stencil, boundary layer and halo.
In terms of high level design, there are two principal methods
for implementing such a scheme: in vector-matrix form, or as an
‘unrolled’ update equation applied to individual grid points.
3.1. Vector-matrix form
The vector-matrix update form is given by (8), accompanied by
the connection force calculation (9). The matrices B, and Care
sparse and (nearly) block Toeplitz. Figure 6 shows the sparsity
pattern of the matrix Bcorresponding to a single isolated plate.
The vectors hold the 2D state data, which is decomposed linearly
using either a row or column major approach. Implementing this
form in C code is straightforward [9], requiring only a matrix by
vector multiplication, and a function for vector addition, to be ap-
plied at each time step of the simulation. The CSR (compressed
sparse row) format was used as the sparse matrix container.
Figure 6: Sparsity pattern of coefficient matrix Bfor a single iso-
lated plate.
3.2. Unrolled update equation
Whilst the vector-matrix form is highly concise (and useful for
prototyping in systems such as Matlab), is it clearly somewhat
wasteful as it requires storing and reading a large number of con-
stant coefficients. Applying the update equation to individual grid
points alleviates this, as factorized scalar coefficients are all that
is required. Figure 7 shows a basic implementation of this, inside
of the time domain loop. A small optimisation can be achieved by
pre-defining the offset constants such as 2*wid, which are shown
here for clarity.
Note that whilst it is possible to use a non-factorized version
equivalent to the matrix operations, it is not optimal here (there is
// Time loop
// Loop over state data in 2D, as we have boundaries
// Calc linear index, from row-major decomposition
cp = Y*width+X;
// Calculate update equation
u[cp] = B1*(u1[cp-1] + u1[cp+1]
+ u1[cp-width] + u1[cp+width])
+ B2*(u1[cp-2] + u1[cp+2]
+u1[cp-2*width] + u1[cp+2*width])
+ B3*(u1[cp+width-1] + u1[cp+width+1]
+u1[cp-width-1] + u1[cp-width+1])
+ B4*u1[cp]
+ C1*(u2[cp-1] + u2[cp+1]
+ u2[cp-width] + u2[cp+width]
+ C2*u2[cp] );
// Read output
out[n] = u[out_location];
// Swap pointers
dummy_ptr = u2;
u2 = u1;
u1 = u;
u = dummy_ptr;
Figure 7: Standard C code time loop and state update.
no fused multiply-add instruction in Ivy Bridge). This basic im-
plementation in a single thread is unlikely to make the most of the
potential of the processor, even when using compiler optimisation.
To do that, further manual optimisation is required, in the form of
vector instructions and multi-threading.
3.3. AVX vector instructions
AVX instructions make use of register vector units that are capable
of computing arithmetic operations over multiple data simultane-
ously. For example, rather than computing a single multiplication
with two input operands, an AVX instruction will perform a num-
ber of multiplications using vector operands, and giving a vector
result. On Ivy Bridge and Haswell/Broadwell architectures these
are 256 bit registers, and so can operate on four double precision
data elements at a time (in the upcoming Skylake architecture these
are extended to 512 bit registers). Clearly the ability to perform ba-
sic arithmetic operations on multiple data at the same time is well
suited to finite difference schemes, where each updated element is
independent in a given time step. Figure 8 shows the vector imple-
mentation of the update scheme, using nested intrinsic functions
to compute the stages of the update.
Note that the loop over the state data is now one-dimensional,
and is incremented by the vector size. Reducing down to a single
FOR loop allows the SIMD vector operations to be performed in
a continuous manner over the entire state (save for any additional
overs at the end). This does require corrections at the boundaries,
but these are at a minimal cost, especially when alternative bound-
ary conditions are applied. Further manual unrolling of the loop
(i.e. to a size of eight using two sets of updates applied consecu-
tively) did not provide a further speedup.
Proc. of the 18th Int. Conference on Digital Audio Effects (DAFx-15), Trondheim, Norway, Nov 30 - Dec 3, 2015
// Time loop
// Loop over state data
// Load up state data into separate vectors
u1m2 = _mm256_load_pd(&u1[cp-2]);
u1m1 = _mm256_load_pd(&u1[cp-1]);
// Perform update equation in stages
s1 = _mm256_mul_pd(B1v,_mm256_add_pd(_mm256_add_pd(
u1m1, u1p1),_mm256_add_pd(u1mw,u1pw)));
s2 = _mm256_mul_pd(B2v,_mm256_add_pd(_mm256_add_pd(
u1m2, u1p2),_mm256_add_pd(u1m2w,u1p2w)));
s3 = _mm256_mul_pd(B3v,_mm256_add_pd(_mm256_add_pd(
s4 = _mm256_mul_pd(B4v, u1cp);
s5 = _mm256_mul_pd(C1v,_mm256_add_pd(_mm256_add_pd(
u2m1, u2p1),_mm256_add_pd(u2mw, u2pw)));
s6 = _mm256_mul_pd(C2v, u2cp);
s7 = _mm256_add_pd(_mm256_add_pd(_mm256_add_pd(s1,
s2), _mm256_add_pd(s3, s4)), _mm256_add_pd(s5,
// Store result
_mm256_store_pd(&u[cp], s7);
// Deal with overs at the end of the state size, and
correct boundaries that were over-written
// Read output
out[n] = u[out_location];
// Swap pointers
dummy_ptr = u2; u2 = u1; u1 = u; u = dummy_ptr;
Figure 8: AVX vectorization time loop.
3.4. Multi-threading
Whilst vector extensions provide a highly effective optimisation,
the system is still only making use of a single core of the proces-
sor. Even the consumer-based i7 has four cores available, and so
a further optimisation is to explore the use of multiple threads that
can parallelize the operation over these cores. Again, the finite dif-
ference scheme is well suited for this, as the state can simply be
partitioned into a suitable number of parts, with each thread operat-
ing over an independent section of data at each time step. This can
be combined with AVX instructions as used above. Whilst frame-
works such as OpenMP can be used to implement multi-threading
using compiler directives, this section considers the manual use of
POSIX threads (Pthreads).
There are, however, some design issues that need to be consid-
ered. The primary concern is how to issue threads that operate over
spatial elements of the grid, but also take into account the updates
over time. There are two possible approaches. First, one could
create the threads at the beginning of each time step, perform the
state update, and then destroy them. This would then be repeated
each time step, but has the benefit of not requiring any additional
thread barrier synchronisation. However, the overheads involved
in the approach mean that it only becomes viable when large-scale
arrays are used [10]. At the scale of the plates possible in real-time,
this approach does not yield any performance benefits.
In order to hide the latencies in issuing threads, we need to
create them not at each time step, but only once over many time
steps. In the context of real-time audio, creating them at the start
of an audio buffer (i.e. every 256 time steps) works well. Figure 9
shows the detail of the time loop.
// Time loop
for(int n=0;n<Nf;n+=buffer_size){
// Create threads
for (i=0; i<NUM_THREADS; i++) {
td[i].n = n;
td[i].u = u;
td[i].u1 = u1;
td[i].u2 = u2;
pthread_create(&threads[i], NULL, updateScheme, &td[
// Destroy threads
for (i=0; i<NUM_THREADS; i++) {
pthread_join(threads[i], NULL);
Figure 9: Time loop for multi-threading using POSIX.
Note that the time loop is now incremented by the buffer size,
and the current time step integer is passed into the thread kernel
in order to correctly read the output into an array. The coefficients
are loaded into the td struct prior to the time loop. As the threads
are now operating over time, a barrier is required inside the kernel
to ensure that the threads are synchronised prior to performing the
pointer swap. The outline of the kernel function is shown in Figure
12. Initial testing revealed that using two threads only provides a
small increase over a single thread, eight threads actually perform
worse (despite hyper-threading showing eight logical cores), and
four is the optimal configuration here.
3.5. Summary
Testing was performed using a square plate at 44.1kHz, with the
size varied such that one second of clock time was required to com-
pute 44,100 time steps. Timing functions from <sys/time.h>
were used, employing ‘wall clock’ functions that give accurate
measurement even when using multi-threading. All codes were
compiled using LLVM, with -O3 optimisation (-O0 was also tested
as a comparison). Table 1 shows the results for each implementa-
The first result to note is the difference between the matrix
form, which achieved 41 x 41, and the standard C code which
achieved 80 x 80, nearly four times more efficient. Clearly the ex-
tra data movement is a cost, but also the compiler is more likely to
aggressively optimise the unrolled version. However, the standard
C code is still using only a fraction is the possible CPU perfor-
mance. The AVX instructions provide a 2X speedup, clearly well
below a linear 4X but there are overheads in loading the data points
into the vector registers. From there, applying multiple threads
achieves a further 35% increase in performance. Again, this is sig-
nificantly below a linear increase over the single thread version,
partly due to the decrease in clock frequency when running over
Proc. of the 18th Int. Conference on Digital Audio Effects (DAFx-15), Trondheim, Norway, Nov 30 - Dec 3, 2015
Code version Plate size Total state size
(-O3 unless otherwise stated) (grid points) (grid points)
CSR matrix 41 x 41 1,681
C with -O0 44 x 44 1,936
C 80 x 80 6,400
C with AVX 112 x 112 12,544
C with AVX & 4 Pthreads 130 x 130 16,900
Table 1: Maximum plate sizes for one second computation time,
running for 44100 time steps.
multiple cores, and also latencies involved in thread synchronisa-
tion at each time step. This is typical of multi-threaded codes,
where best performance is achieved when parallelising over much
larger arrays [10].
The algorithm is clearly memory bandwidth limited, and so it
is useful to assess the memory throughput efficiency of the imple-
mentation. Taking one read and one write per grid point (assuming
all other reads will be from cache), this can be calculated as 130
x130 grid points x 2 x 44100 time steps x 8 bytes = 11.9 GB/sec.
This is half of the theoretical maximum, but there does not seem
to be any obvious further avenues for optimisation. Note the im-
portance of applying vectorization and multi-threading. Without
them, the basic C code with -O3 only achieves 38% of the maxi-
mum performance.
Audio Units are Apple’s native plug-in architecture for enhancing
digital audio hosts such as Logic Pro or Ableton. From a pro-
gramming perspective, they are executable code within a standard
API defined by the Core Audio system [11]. The plug-ins can be
created either by using a development framework such as JUCE
[12], or by directly sub-classing the Core Audio SDK. The latter
approach is used here (the AU v2 API), to minimise any overhead
involved with the use of additional structures.
The purpose of this section is to determine the maximum size
of plate that can be computed from within an actual working plug-
in. The basic unrolled code, as well as the AVX and Pthread codes
were all tested, with the size of the plate varied to determine the
largest size achievable just prior to audio dropouts. Within the
real-time system, audio dropouts are caused by the buffer not being
written completely, resulting in ‘clicks’ at the output stream. Test-
ing was performed using the AU Lab host application, which gives
visual warnings when buffer under-runs occur. Drop outs typically
start at around 90% of CPU load. A buffer size of 512 samples was
used, although the load appeared constant across sizes from 128 to
2048 samples. Table 2 shows the resulting plate sizes.
The multi-threaded and vectorized code still produces the best
result, but at a reduced margin over the single thread AVX ver-
sion. However, compared to the testing in the previous section, it
achieves only 50% of the maximum plate size. As previous men-
tioned, this is largely an issue of data size, where smaller arrays
achieve less performance benefits. Considering the single thread
AVX code, this achieved 65% of the maximum achieved outside
of the real-time system, with a size of 90 x 90 compared to 112
x 112. The non-vectorized code performs at 68%, so in general
terms we can say that C code that runs in just less than 0.7 seconds
Code version Plate size Total state size
(-O3 unless otherwise stated) (grid points) (grid points)
C 66 x 66 4,356
C with AVX 90 x 90 8,100
C with AVX & 4 Pthreads 92 x 92 8,464
Table 2: Maximum plate size for an Audio Unit instrument plug-in,
running at 44.1kHz.
will be the maximum computable within the real-time plug-in for
the single thread case.
Having established the maximum performance for a single plate
within an Audio Unit plug-in, this section examines the potential
for real-time modular systems containing multiple object types.
Creating a usable system that is capable of producing a wide sonic
range requires the combination of multiple stiff strings, connec-
tions and plates. Clearly only one full-scale plate will be possible,
but there is scope for a large number of additional 1D objects.
Very stiff strings (or bars) can be tuned a fundamental fre-
quency over a range of around two octaves, from C3 to C5. This
results in very small state arrays of around twenty elements each.
Low stiffness strings require more state, varying from around 30
elements at C5 to around 250 elements for a low C2. Both can
benefit from the use of AVX instructions, which result in 2X per-
formance gains. At these sizes, a large number of such objects can
be computed, depending on the size of the plate that is used.
Bars can also be used as resonators, when using a low stiffness
value, and here the state size can increase to up to a thousand ele-
ments. A plate of dimensions comparable to a plate reverberation
device (steel, 1.5m x 1.3m and of thickness 2mm) requires a state
size of approximately 7000. Testing showed that this still leaves
around 20% of the CPU to compute 1D objects and connections.
This is sufficient to run a system such as that shown in Figure 10.
Metal Plate
Output positions
Tuned Strings
Loss parameters
Input / Output positions
Nonlinear Connections
Angular frequency
Loss T60
Connection positions
Figure 10: Tuned string and plate system, with control parameters.
There are a number of issues involved in actually implement-
ing such a system in a real-time plug-in. The first is the question
of parameter updating. Whilst some of the parameters such as the
output position and angular frequency of the connections can be
varied during real-time computation, others, such as the density
of the plate cannot as the underlying grid representation will nec-
essarily be altered. The parameters therefore need to be grouped
Proc. of the 18th Int. Conference on Digital Audio Effects (DAFx-15), Trondheim, Norway, Nov 30 - Dec 3, 2015
into ‘state’ and ‘real-time’ controls, and an appropriate system of
applying state control changes is required.
The second major issue is how to deal with the large number of
parameters that arise. A system consisting of, e.g., 20 stiff string
objects, requires 200 parameter settings. This is problematic in
terms of user interface design, and requires some level of group-
ing to be manageable in the first instance. The ability to work with
hundreds of parameters may only be possible with a fully-realised
visual interface, where individual objects could be manipulated us-
ing graphical representations.
There is also the question of the usable parameter space—
understanding the combinations and limits of control parameters
that lead to usable output. Experimentation in this regard is cer-
tainly easier inside a real-time environment. Despite these issues,
initial prototype Audio Units have been created, such as systems
of pitched bars connected to resonator bars, and tuned strings con-
nected to a large-scale plate. These are playable from a standard
MIDI keyboard, and make use of control change signals to vary
the real-time parameters. Audio examples can be found at this
These prototypes were created using a two-stage process. First,
individual C++ objects were created that define a stiff string, a
plate, and a connection. These used common interface methods
in order to facilitate the second stage, which is to create an instru-
ment model. Here, a number of objects were instantiated using the
primitive elements, and a further interface is constructed to allow
the instrument to be easily tied into the Audio Unit SDK. Figure
11 shows the class diagram of the string and plate instrument used
to create the plug-in.
Figure 11: Class diagram of the string and plate instrument plug-
in, showing common interface elements.
This paper has demonstrated the use of CPU optimisation tech-
niques to compute coupled 2D and 1D finite difference schemes in
real-time, as well as implementation techniques within real-time
plug-ins such as Audio Units. Whilst AVX instructions provide
a 2X speedup at double precision floating-point, the use of multi-
threading is more complex. For the size of plates that can be com-
puted in real-time, the use of multiple threads within a single ob-
ject does not produce significant performance benefits. A more ef-
ficient approach may be to apply threads to separate objects, such
as a thread for a plate, and another operating over the 1D objects
and connections.
A further avenue for experimentation is the use of single pre-
cision floating-point. Although this may result in artefacts in the
system due to the nonlinear elements, it would widen the possi-
bilities for computation on consumer hardware. Vector extensions
can operate over twice the number of single precision data, which
could easily provide performance benefits on the CPU. Also, sin-
gle precision would allow the use of consumer level GPUs. There
have already been some experiments in this regard [13] [14], but
with a simpler system consisting of a single 2D wave equation
solver. The use of a desktop machine such as Apple’s recent Mac
Pro has interesting possibilities as it contains two GPUs, both ca-
pable of double precision calculations. This would require the use
of the OpenCL language, and could also be used to explore multi-
core CPU operation.
[1] C. Cadoz, A. Luciani, and J.-L. Florens, “Cordis-anima: A
modeling and simulation system for sound and image syn-
thesis,” Computer Music Journal, vol. 17, no. 1, pp. 19–29,
[2] J.-L. Florens and C. Cadoz, “The physical model: Modeling
and simulating the instrument universe, in Representations
of Musical Signals, G. DePoli, A. Picialli, and C. Roads,
Eds., pp. 227–268. MIT Press, Cambridge, Massachusetts,
[3] D. Morrison and J.-M. Adrien, “Mosaic: A framework for
modal synthesis,” Computer Music Journal, vol. 17, no. 1,
pp. 45–46, 1993.
[4] F. Pedersini, A. Sarti, S. Tubaro, and R. Zattoni, “Towards
the automatic synthesis of nonlinear wave digital models for
musical acoustics,” in Proceedings of EUSIPCO-98, Ninth
European Signal Processing Conference, Rhodes, Greece,
1998, vol. 4, pp. 2361–2364.
[5] S. Bilbao, Numerical Sound Synthesis: Finite Difference
Schemes and Simulation, Wiley, 2009.
[6] S. Bilbao, B. Hamilton, A. Torin, C.J. Webb, P. Graham,
A. Gray, K. Kavoussanakis, and J. Perry, “Large Scale
Physical Modeling Sound Synthesis,” in Proceedings of the
Stockholm Music Acoustics Conference, Stockholm, Swe-
den, 2013, pp. 593–600.
[7] Apple Incorporated, “Audio Unit Programming
Guide,” [Onine document][Cited: 7th June 2015.]
July 2014.
[8] S. Bilbao, A modular percussive synthesis environment,” in
Proceedings of the 12th International Conference on Digital
Audio Effects, Como, Italy, 2009.
[9] G. Golub and C. Van Loan, Matrix computations (3rd ed.),
Johns Hopkins University Press, Baltimore, MD, USA, 1996.
[10] C. Webb, Parallel computation techniques for virtual acous-
tics and physical modelling synthesis, Ph.D. thesis, Univer-
sity of Edinburgh, 2014.
Proc. of the 18th Int. Conference on Digital Audio Effects (DAFx-15), Trondheim, Norway, Nov 30 - Dec 3, 2015
[11] W. Pirkle, Designing Software Synthesizer Plug-Ins in C++,
Focal Press, 2015.
[12] M. Robinson, Getting started with JUCE, Packt Publishing,
[13] B. Hsu and M. Sosnick, “Realtime GPU audio: Finite
difference-based sound synthesis using graphics processors,
ACM Queue, vol. 11, no. 4, May 2013.
[14] M. Sosnick amd B. Hsu, “Implementing a finite difference-
based real-time sound synthesizer using GPUs,” in Proceed-
ings of the International Conference on New Interfaces for
Musical Expression, Oslo, Norway, 2011.
void *updateScheme(void *targ){
// Calculate state array partition points for thread
int psize = 1 + ( (Nx+1)*(Ny+1)-1)/NUM_THREADS;
// Load coefficients into vector objects
__m256d B1v = _mm256_set1_pd(B1);
// Loop over buffer size
for(int n=0;n<buffer_size;n++){
// Loop over state data
for (cp = start;cp<vecend+1;cp+=vector_size){
// Load up state data into separate vectors
u1m2 = _mm256_load_pd(&u1[cp-2]);
u1m1 = _mm256_load_pd(&u1[cp-1]);
// Perform update equation in stages
s1 = _mm256_mul_pd(B1v,_mm256_add_pd(_mm256_add_pd
(u1m1, u1p1),_mm256_add_pd(u1mw,u1pw)));
s2 = _mm256_mul_pd(B2v,_mm256_add_pd(_mm256_add_pd
(u1m2, u1p2),_mm256_add_pd(u1m2w,u1p2w)));
// Store result
_mm256_store_pd(&u[cp], s7);
// Deal with overs
// Read output
if (tid==0) md->out[...] = u[md->out_location];
// Barrier
pthread_barrier_wait (&barrier);
// Swap pointers
dummy_ptr = u2; u2 = u1; u1 = u; u = dummy_ptr;
Figure 12: POSIX thread kernel function.
... These systems and more recent variations of the same concept (see, e.g. [5,6,7,8]) facilitate the construction of new instruments by connecting either elementary masses or distributed objects (e.g. strings, membranes), usually via spring elements. ...
... The first challenge that arises from this objective is one of design, i.e. determining what kind of physical model configuration is appropriate as a testbed. Drawing inspiration from several relevant DAFx studies [9,5,8,10] and partly building on earlier ideas [11,12], the proposed model takes the form of a string and a plate connected by a parameterised bridge element, with a local damper fitted on the string (see Fig. 1). The bridge can be parametrically configured to simulate different types of linear and nonlinear coupling, including mass-like behaviour, spring stiffening and contact phenomena (i.e. ...
... where vs,i = sin(βix), vp,i,j(x, y) = sin(βx,ix) sin(βy,jy), (8) are the respective mode shape functions under simply supported boundary conditions. The wave numbers are βi = iπ/Ls for the string/beam and βx,i = iπ/Lx, βy,j = jπ/Ly for the plate (the overall wave number defined as βi,j = β 2 x,i + β 2 y,i ). ...
Conference Paper
Full-text available
The virtual exploration of the domain of mechano-acoustically produced sound and music is a long-held aspiration of physical modelling. A physics-based algorithm developed for this purpose combined with an interface can be referred to as a virtual-acoustic instrument; its design, formulation, implementation, and control are subject to a mix of technical and aesthetic criteria, including sonic complexity, versatility, modal accuracy, and computational efficiency. This paper reports on the development of one such system, based on simulating the vibrations of a string and a plate coupled via a (nonlinear) bridge element. Attention is given to formulating and implementing the numerical algorithm such that any of its parameters can be adjusted in real-time, thus facilitating musician-friendly exploration of the parameter space and offering novel possibilities regarding gestural control. Simulation results are presented exemplifying the sonic potential of the string-bridge-plate model (including bridge rattling and buzzing), and details regarding efficiency, real-time implementation and control interface development are discussed.
... In this work, an extension of the modal approach, including nonlinearly coupled subsystems, is presented. In the work by Bilbao and Webb, using finite differences [12], simple objects such as bars and plates are connected nonlinearly using springs of cubic type. The same systems and connections are used here. ...
... Then, expansions analogous to (12) and (13) are assumed valid for X, Y. Doing a modal projection over X in (19a), and over Y in (19b), results in ...
Conference Paper
Full-text available
Modal decomposition is a popular analysis approach involving the description of a target system via a bank of resonant oscillators called modes. Early sound synthesis frameworks successfully exploited this idea for the simulation of vibrating objects such as bars, plates and strings. While popular, modal synthesis is often applied to linear systems, since the modes become densely coupled in systems presenting distributed or multiple nonlinearities. In this work, the modal approach is used for the simulation of nonlinearly connected systems. When the nonlinearity is of cubic type, a suitable energy-stable modal update can be derived requiring the solution of a single linear system at each time step. A working plugin written in the C++ programming language is presented. Moreover, the performance of the plugin is analysed considering systems of different dimensions, defining the current limits for a real-time application of these models. The analysis also revealed a linear correlation between the number of modes which compose the systems and the CPU usage necessary for their real-time computation.
... which is analogous to (11,12). For the corner at = 0, = 0, a suitable numerical condition is ...
... Firstly, the implementation in finite-difference form will place limits on the possible size of the plate (in terms of the number of finite-difference nodes). Parallelisation techniques, such as Single Instruction, Multiple Data (SIMD) or Advanced Vector Extensions (AVX) have already proven useful in reducing the relevant CPU time [12,13]. The new, additional challenges that emerge here are due to (a) the need to map the pressure data as sensed on the grid of the sensing device to the finitedifference grid, which involves 2-D (de-)interpolation, and (b) the need to carry out linear interpolation on the contact parameters, which are initially calculated at control rate (i.e. ...
Full-text available
The potential of physics-based synthesis algorithms for developing computer-based musical instruments relies on the inclusion of articulatory elements that enable physically plausible and musically meaningful interactions. In this paper, non-excitational interaction with a rectangular vibrating plate is modelled through time variation in distributed contact parameters. For numerical simulation a finite difference approach is taken, enabling efficient modelling of local interactions. Comparison between the continuous- and discrete-domain system power balances confirms conditional stability and a match in the source terms due to parameter time-variance. The methodology is exemplified with a few case studies, and its potential for application in the design of a virtual-acoustic plate-based musical instrument is discussed.
... In addition, physical models are often formulated in modular form, allowing the user to construct and explore new instruments by connecting elementary objects (see, e.g. [4,2,5,16]). A further, lower-level requirement is that the algorithm is stable, robust, and accurate, as such approximating the underlying continuous-domain equations without significant artefacts. ...
... Two specific objectives of the work are (1) to facilitate the best conditions for learning and navigating the physical model parameter space, and (2) to begin to investigate how musicians exploit such enhanced tunability. Full parametric explorability is not necessarily forthcoming in discrete-time modelling of distributed objects, usually due to the underlying grid form [16]. Following the approach taken in [14], the present study overcomes this by combining modal expansion with energy methods to formulate a gridless discrete-time model, the stability of which does not depend on any of the parameters. ...
Conference Paper
Full-text available
Exploration is an intrinsic element of designing and engaging with acoustic as well as digital musical instruments. This paper reports on the ongoing development of an explorative virtual-acoustic instrument based on simulation of the vibrations of a string coupled nonlinearly to a plate. The performer drives the model by tactile interaction with a string-board controller fitted with piezo-electric sensors. The string-plate model is formulated in a way that prioritises its parametric explorability. Where the roles of creating performance gestures and designing instruments are traditionally separated, such a design provides a continuum across these domains, with retainment of instrument physicality. The string-plate model, its real-time implementation , and the control interface are described, and the system is preliminarily evaluated through informal observations of how musicians engage with the system.
... Even the simulation of large 3D spaces at audio rate is becoming tractable, and this has been the subject of a recent work by Webb [201]. With clever implementation strategies, smaller problems, like networks of plates and strings, have also been implemented as a real-time plug-in for commercial laptops [203]. The virtual instruments presented in this work have been successfully ported to a hybrid GPU/CPU implementation, speeding up considerably the run time with respect to the original prototyping code designed by the Author [28,142]. ...
... The development of new techniques similar to those presented here is obviously desirable to increase the computation speed of these algorithms, with the ultimate goal being their use in real time. Although still a long way ahead, at least for the models presented in this work, recent advances in finite difference simulation of complex systems by Webb and Bilbao [203] give hope that this goal may become a reality. ...
Full-text available
This work is concerned with the numerical simulation of percussion instruments based on physical principles. Three novel modular environments for sound synthesis are presented: a system composed of various plates vibrating under nonlinear conditions, a model for a nonlinear double membrane drum and a snare drum. All are embedded in a 3D acoustic environment. The approach adopted is based on the finite difference method, and extends recent results in the field. Starting from simple models, the modular instruments can be created by combining different components in order to obtain virtual environments with increasing complexity. The resulting numerical codes can be used by composers and musicians to create music by specifying the parameters and a score for the systems. Stability is a major concern in numerical simulation. In this work, energy techniques are employed in order to guarantee the stability of the numerical schemes for the virtual instruments, by imposing suitable coupling conditions between the various components of the system. Before presenting the virtual instruments, the various components are individually analysed. Plates are the main elements of the multiple plate system, and they represent the first approximation to the simulation of gongs and cymbals. Similarly to plates, membranes are important in the simulation of drums. Linear and nonlinear plate/membrane vibration is thus the starting point of this work. An important aspect of percussion instruments is the modelling of collisions. A novel approach based on penalty methods is adopted here to describe lumped collisions with a mallet and distributed collisions with a string in the case of a membrane. Another point discussed in the present work is the coupling between 2D structures like plates and membranes with the 3D acoustic field, in order to obtain an integrated system. It is demonstrated how the air coupling can be implemented when nonlinearities and collisions are present. Finally, some attention is devoted to the experimental validation of the numerical simulation in the case of tom tom drums. Preliminary results comparing different types of nonlinear models for membrane vibration are presented.
... Modularized physical modelling sound synthesis, whereby the user may construct a virtual instrument using basic canonical components dates back to the work of Cadoz and collaborators [1][2][3]. It has been also used as a design principle in the context of FD methods [13][14][15], where the canonical elements are strings and plates, with a non-linear connection mechanism. Though computational cost of such methods is high, standard computing power is now approaching a level suitable for real-time performance for simpler systems. ...
Conference Paper
Full-text available
In this paper, implementation, instrument design and control issues surrounding a modular physical modelling synthesis environment are described. The environment is constructed as a network of stiff strings and a resonant plate, accompanied by user-defined connections and excitation models. The bow, in particular, is a novel feature in this setting. The system as a whole is simulated using finite difference (FD) methods. The mathematical formulation of these models is presented, alongside several new instrument designs, together with a real-time implementation in JUCE using FD methods. Control is through the Sensel Morph.
... Mehes, van Walstijn, and Stapleton (2017) crafted a hybrid device that couples a numerical simulation of a string-plate instrument with an acoustic exciter (a physical string); the result is an apparently simple but expressive system, yet constrained by the computational requirements of the designed model. A more modular system is at the core of the Derailer (, a highly efficient real-time physical model based on numerical analysis developed by Webb and Bilbao (2015). The underlying technology permits to create, combine and play multiple instances of the model, that represents stiff strings and plates. ...
Innovation and tradition are two fundamental factors in the design of new digital musical instruments. Although apparently mutually exclusive, novelty does not imply a total disconnection from what we have inherited from hundreds of years of traditional design, and the balance of these two factors often determines the overall quality of an instrument. Inspired by this rationale, in this article we introduce the Hyper Drumhead, a novel augmented virtual instrument whose design is deeply rooted in traditional musical paradigms, yet aimed at the exploration of unprecedented sounds and control. In the first part of the article we analyze the concepts of designing an augmented virtual instrument, explaining their connection with the practice of augmenting traditional instruments. Then we describe the design of the Hyper Drumhead in detail, focusing on its innovative physical modeling implementation. The finite-difference time-domain solver that we use runs on the parallel cores of a commercially available graphics card and permits the simulation of real-time 2-D wave propagation in massively sized domains. Thanks to the modularity of this implementation, musicians can create several 2-D virtual percussive instruments that support realistic playing techniques but whose affordances can be enhanced beyond most of the limits of traditional augmentation.
Full-text available
In this paper we propose and describe the principles be-hind our approach to sound synthesis through nonlinear wave digital modeling. The method is general enough to include a wide variety of nonlinearities that cannot be modeled through classical WDF principles. We also present an automatic synthesis method that, starting from a semantic description of the physical model, gen-erates, validates and initializes an appropriate simula-tion source code with time-varying parameters.
Conference Paper
Full-text available
Sound synthesis based on physical models of musical instruments is, ultimately, an exercise in numerical simulation. As such, for complex systems of the type seen in musical acoustics, simulation can be a computationally costly undertaking, particularly if simplifying hypotheses, such as those of traveling wave or mode decompositions are not employed. In this paper, large scale time stepping methods, such as the finite difference time domain and finite volume time domain methods are explored for a variety of systems of interest in musical acoustics, including brass instruments, percussion instruments based on thin plate and shell vibration, and also their embeddings in 3D acoustic spaces. Attention is paid here to implementation issues, particularly on parallel hardware, which is well-suited to time stepping methods operating over regular grids. Sound examples are presented.
Full-text available
The construction of new virtual instruments is one long-term goal of physical modeling synthesis; a common strategy across various different physical modeling methodologies, including lumped net-work models, modal synthesis and scattering based methods, is to provide a canonical set of basic elements, and allow the user to build an instrument via certain specified connection rules. Such an environment may be described as modular. Percussion instruments form a good test-bed for the devel-opment of modular synthesis techniques—the basic components are bars and plates, and may be accompanied by connection el-ements, with a nonlinear character. Modular synthesis has been approached using all of the techniques mentioned above, but time domain finite difference schemes are an alternative, allowing many problems inherent in the above methods, including computability, large memory and precomputation requirements, and lack of ex-tensibility to more complex systems, to be circumvented. One such network model is presented here along with the asso-ciated difference schemes, followed by a discussion of implemen-tation details, the issues of excitation and output, and a description of various instrument configurations. The article concludes with a presentation of simulation results, generated in the Matlab proto-typing language.
One of the strengths of Meteor is how quickly you can get started. With a little guidance, a beginner can have a Meteor development environment setup and their first app created in a matter of minutes. This chapter provides that guidance. It also covers some of the guiding principles of Meteor and what makes it different from other options.
Finite difference methods can be used to model the vibrations of plates and membranes; the output of these numerical simulations can then be used to generate realistic and dynamic sounds. To create interactive, playable software instruments with finite difference methods, we need to be able run large simulations in real-time. Real-time finite difference-based simulations of large models are typically too compute-intensive to run on CPUs. The ubiquity of graphics processors (GPUs) today make them obvious choices for speeding up such applications. We have implemented finite difference simulations that run in real-time on GPUs. We will describe how we address the problems that arise from interactions between real-time audio constraints and GPU architecture and performance characteristics, and demonstrate the current version of FDS, our Finite Difference Synthesizer.
CORDIS-ANIMA is a digital, real-time object modeling and simulation system. The main purpose of the system is to model the instrumental world. This purpose is achieved by the computer simulation of music and by the animation of images. The synthesized music represents the real life instruments that produce sound vibrations when subjected to some action. Similarly, the animated images are taken from the real world as well. Total simulation effect (of the music and the images) is produced by optimizing the 'man-machine' interaction feature of this software.
MOSAIC is a synthesis program based on physical models. The musical instruments that MOSAIC can simulate include a collection of mechanical and acoustic resonant instruments such as violins, bells, and strings. MOSAIC is a collection of algorithms. Model simulation algorithms compute the vibration of the simulated objects. Similarly, the connections algorithm helps various simulated objects to interact with each other. The virtual pickups module is responsible for the sound output. Controller objects are used to control the data. The controller objects are further broken down in modules that perform specific functions. The signal generators, MIDI-file readers and user-programmable controllers are some of these sub-systems.
Digital sound synthesis has long been approached using standard digital filtering techniques. Newer synthesis strategies, however, make use of physical descriptions of musical instruments, and allow for much more realistic and complex sound production and thereby synthesis becomes a problem of simulation. This book has a special focus on time domain finite difference methods presented within an audio framework. It covers time series and difference operators, and basic tools for the construction and analysis of finite difference schemes, including frequency-domain and energy-based methods, with special attention paid to problems inherent to sound synthesis. Various basic lumped systems and excitation mechanisms are covered, followed by a look at the 1D wave equation, linear bar and string vibration, acoustic tube modelling, and linear membrane and plate vibration. Various advanced topics, such as the nonlinear vibration of strings and plates, are given an elaborate treatment. Key features: Includes a historical overview of digital sound synthesis techniques, highlighting the links between the various physical modelling methodologies. A pedagogical presentation containing over 150 problems and programming exercises, and numerous figures and diagrams, and code fragments in the MATLAB® programming language helps the reader with limited experience of numerical methods reach an understanding of this subject. Offers a complete treatment of all of the major families of musical instruments, including certain audio effects. Numerical Sound Synthesis is suitable for audio and software engineers, and researchers in digital audio, sound synthesis and more general musical acoustics. Graduate students in electrical engineering, mechanical engineering or computer science, working on the more technical side of digital audio and sound synthesis, will also find this book of interest.