Conference PaperPDF Available

A vacuum-tube guitar amplifier model using a recurrent neural network


Abstract and Figures

Rock and blues guitar players prefer the use of vacuum-tube amplifiers due to the harmonic structures developed when the amplifiers are overdriven. The disadvantages of vacuum tubes compared against solid-state implementations, such as power consumption, reliability, cost, etc., are far outweighed by the desirable sound characteristics of the overdriven vacuum-tube amplifier. There are many approaches to modeling vacuum-tube amplifier behaviors in solid-state implementations. These include a variety of both analog and digital techniques, some of which are judged to be good approximations to the tube sound. In this paper we present early results of experiments in using a neural network to model the distortion produced by an overdriven vacuum-tube amplifier. Our approach is to use artificial neural networks of the recurrent variety, specifically a Nonlinear AutoRegressive eXogenous (NARX) network, to capture the nonlinear, dynamic characteristics of vacuum-tube amplifiers. NARX networks of various sizes have been trained on data sets consisting of samples of both sinusoidal and raw electric guitar signals and the amplified output of those signals applied to a tube-based amplifier driven at various levels of saturation. Models are evaluated using both quantitative (e.g., RMS error) and qualitative (listening tests) assessment methods on data sets that were not used in the network training. Listening tests- considered by us to be the most important evaluation method-at this point in the work, are indicative of the potential for success in the modeling of a vacuum-tube amplifier using a recurrent neural network.
Content may be subject to copyright.
A Vacuum-Tube Guitar Amplifier Model Using a
Recurrent Neural Network
John Covert and David L. Livingston
Department of Electrical and Computer Engineering
Virginia Military Institute
Lexington, Virginia, USA
AbstractRock and blues guitar players prefer the use of
vacuum-tube amplifiers due to the harmonic structures
developed when the amplifiers are overdriven. The disadvantages
of vacuum tubes compared against solid-state implementations,
such as power consumption, reliability, cost, etc., are far
outweighed by the desirable sound characteristics of the
overdriven vacuum-tube amplifier. There are many approaches
to modeling vacuum-tube amplifier behaviors in solid-state
implementations. These include a variety of both analog and
digital techniques, some of which are judged to be good
approximations to the tube sound.
In this paper we present early results of experiments in using a
neural network to model the distortion produced by an
overdriven vacuum-tube amplifier. Our approach is to use
artificial neural networks of the recurrent variety, specifically a
Nonlinear AutoRegressive eXogenous (NARX) network, to
capture the nonlinear, dynamic characteristics of vacuum-tube
amplifiers. NARX networks of various sizes have been trained on
data sets consisting of samples of both sinusoidal and raw electric
guitar signals and the amplified output of those signals applied to
a tube-based amplifier driven at various levels of saturation.
Models are evaluated using both quantitative (e.g., RMS error)
and qualitative (listening tests) assessment methods on data sets
that were not used in the network training. Listening tests
considered by us to be the most important evaluation methodat
this point in the work, are indicative of the potential for success
in the modeling of a vacuum-tube amplifier using a recurrent
neural network.
Keywordsvacuum-tube amplifiers; recurrent neural networks
Since the invention of electronic audio amplification, much
effort has been directed in designing electronic systems which
faithfully amplify audio signals for the purposes of conveying
information or providing entertainment. Biasing techniques and
the use of negative feedback were developed for improving the
linear response of electronic amplifiers in an effort to minimize
harmonic and intermodulation distortion. But an interesting
thing happened along the path of audio development, guitars
were electrified. To be used and therefore heard in big bands
with many instruments, guitar sound had to be amplified [1].
Electric guitars went on to become the musical instruments of
choice for various musical styles such as country, blues, and
particularly rock and roll. In the early periods of the these
musical styles, the state of the art of electronic amplification
was the vacuum-tube amplifier. It was soon discovered,
particularly by early blues and rock musicians, that when
overdriven, vacuum-tube amplifiers produced harmonic
distortion which was musically pleasing [2]. Rock musicians
routinely used their amplifiers in an overdriven mode and even
modified the circuits to force the amps into saturation to obtain
the desired harmonic distortion.
With the invention of the transistor, the technology of
choice for the implementation of audio amplifiers shifted from
vacuum tubes to solid-state devices. Solid-state amplifiers have
a significant number of advantages over vacuum-tube
amplifiers including lower cost, energy efficiency, smaller size,
lighter weight, reliability, etc. However, despite the
disadvantages of tube-based amplifiers, modern electric guitar
players still prefer them over solid-state amplifiers for one
reason: the distortion produced by overdriven vacuum-tube
amplifiers is significantly different than that produced by
overdriven solid-state amplifiers. Solid-state distortion is often
characterized as sounding harsher than tube-based distortion,
and is attributed to the difference in the way the amplifier
transitions from the linear part of its characteristic to the
saturation region.
Much time has been devoted by amplifier and effects
designers to reproduce the vacuum-tube sound in solid-state
amplifiers and is reflected by the number of patents and
resulting products that attempt to do so. Analog approaches
range from fairly simple “stomp boxes” such as the Ibanez
Tube Screamer® [3] that uses an operational amplifier with
back-to-back limiting diodes, to FET-based analog circuit
emulations [4], to sophisticated electronic systems composed of
many stages of linear and nonlinear elements [5]. Guitar
amplifier manufacturers have incorporated digital signal
processing approaches that can be found in “modeling
amplifiers” [6].
The pursuit of the vacuum-tube model in solid-state
formparticularly as a DSP implementationhas become an
academic exercise with a corresponding increase in scholarly
papers addressing the topic. Pakarinen and Yeh [7] have
Sponsored by Virginia Military Institute Grants in Aid of Research
Fig. 1. Soft clipping characteristic vs. hard clipping.
produced an extensive review of the work towards modeling
tube amplifiers. Yeh [8] has devised a technique for modeling a
tube amplifier by developing a nonlinear filter using state-space
methods for systems of nonlinear differential equations.
Since the distortion from vacuum-tube amplifiers is due to
the nonlinear characteristics of the vacuum-tubes (and possibly
the saturation characteristics of the output transformer),
techniques for modeling nonlinear behaviors need to be the
focus for developing a successful model of the overall system.
Once such technique is the use of multilayer feedforward
neural networks which have been shown to be capable of
approximating almost any function of interestthose that are
Borel measurablegiven a sufficient number of hidden-layer
neurons [9].
Published research into the application of neural networks
to the problem of modeling guitar amplifiers and effects is
scant. Mendoza [10] used multilayer feedforward networks to
learn the static nonlinearities of an Ibanez Tube-Screamer®
applied to a guitar signal. DeBoer and Stanley [11] filed a
patent on a technique for evolving recurrent neural networks to
implement various guitar effects.
Our approach to the problem of modeling a vacuum-tube
guitar amplifier is the use of a recurrent neural network
implementation of the Nonlinear AutoRegressive eXogenous
(NARX) model [12][13]. The use of the NARX model for
exactly this application was suggested in lecture notes by
Videll [14]. We present preliminary, but promising, results of
our investigations into using a NARX network to model a
vacuum-tube amplifier.
The remainder of the paper is organized as follows: Section
II covers how harmonic distortion generated from vacuum-tube
amplifiers is different than distortion from solid-state amplifier
saturation. The properties of recurrent neural networks and
their use in modeling nonlinear behaviors are also discussed.
The details concerning the construction and training of neural
network models with guitar signals are examined in Section III.
The results of recent experiments are presented and examined
in Section IV and conclusions and future work are detailed in
Section V.
A. Harmonic Distortion
The characteristics of the harmonic distortion produced by
a vacuum-tube amplifier versus a solid-state amplifier are the
principal reason the tube amp is favored by guitarists. The
differences in the characteristics are primarily due to how the
amplifiers are driven into saturation. Solid-state amplifiers tend
to clip signals abruptlycalled hard clipping, whereas tube
amplifiers produce a gradual saturationcalled soft clipping
as shown in Fig. 1. For symmetric clipping, odd harmonics are
produced. Hard clipping results in more energy in the higher
harmonics than soft clipping, leading to a sound which is often
described as harsh. Fig. 2 compares the spectra of a sinusoid
which is soft clippedin blueversus hard clippedin red.
Another contributor to preferred sound is the occurrence of
even harmonics in the distorted signal. The first two even
harmonics are octaves above the fundamental frequency which
are musically pleasing. Even harmonics are produced when
clipping is asymmetric which occurs when the bias point or Q-
point is not centered on the load-line between the
saturation/cutoff levels. In vacuum-tube amplifiers, the
introduction of even harmonics is a dynamic process.
Capacitances in the biasing circuits of vacuum-tube stages
contribute to Q-point sensitivity to changes in the signal
envelope; i.e., the Q-point moves on the load-line in response
to signal strength [7]. Thus, the strength of the even harmonic
content is a function of the signal envelope. It is this response
which dictates the need for a dynamic model rather than a
simple static model of the nonlinearity in the vacuum-tube
B. Recurrent Neural Networks
To be able to emulate dynamic behavior, a neural network
must have state. State is implemented in digital form via
feedback and delay, resulting in a network designated as
recurrent. A fully connected recurrent network, i.e., with
connections from all neurons, to all neurons, is computationally
powerful, but difficult to train and may have copious local
minima [15]. One particular type of recurrent neural network
with limited connectivity, the NARX network, compares well
in its computational abilities to a fully connected network and
is trainable [13]. Back and Tsoi [16] treated the NARX
network as a nonlinear IIR filter and demonstrated its use for
learning time series. Essentially, the NARX network is a
multilayer, feedforward network with a tapped delay line of
input units and output units serving as inputs. In functional
where yk is the output, xk is the input, yki are delayed outputs,
xki are delayed inputs, and f is a nonlinear function composed
of a multilayer, feedforward neural network. The indices m and
1 2 1
( , ,..., , , ,..., )
k k k m k k n
y f y y y x x x
 
, (1)
Fig. 2. Harmonic content of soft vs. hard clipping
n are the numbers of delayed versions of the outputs and
inputs, respectively.
The structure of the NARX network used in this work is
essentially a two-level, feedforward network with a variable
number of sigmoidal neurons in the hidden layer and a linear
neuron in the output layer. The input and variable numbers of
delayed inputs, and delayed outputs feed all hidden-layer
neurons. The structure is displayed in Figure 3.
A. Traing Data Acquisition System
Since the objective is to model a vacuum-tube amplifier,
training and testing data have to be generated for input to the
NARX network training system. The training data consist of
non-distorted or “raw” input signals synchronized to the
resulting output signals of the vacuum-tube amplifier. The
“input” and “target” signals are digitized and stored in files for
application to the NARX network. The set-up, displayed in Fig.
4, consists of a signal sourcefunction generator or guitar
applied directly to a digitizer and to the vacuum-tube amplifier.
The output of the amplifier is connected to a second channel of
the digitizer and the resulting digital signals are acquired and
stored in files on a computer. To drive a tube amplifier into
saturation, the gain is often “turned to eleven,” i.e., adjusted to
the maximum value which can produce an audio signal of a
substantial volume level. To prevent hearing damage, a passive
speaker load simulator was built and a monitor amplifier was
added to get aural feedback at a reasonable level.
The specifications on the data acquisition set-up are as
follows: The signal is provided by either a function generator at
frequencies in the range of 100 Hz to 500 Hz or an electric
guitar. The vacuum-tube amplifier is a 4 W Vox AC4TV model
with its tone control set to maximum bandwidth and its volume
set at various levels. The passive load is a resistor, inductor,
capacitor circuit model of a speaker with a nominal impedance
of 16 . Audio signals are monitored using a 10W Fender
Frontman. To digitize the signals, a Roland Quad Capture USB
2.0 audio interface is used with word size of 24 bits and sample
rate of 96 kSa/s. Signals are captured and recorded using the
recording software Audacity and are
converted to 24-bit, 96 kSa/s stereo WAV files with the raw
signal on one stereo channel and the tube-amp signal on the
B. NARX Training Software
A program was written in C# to train, test, and implement
NARX networks of various sizes. The program is specific to
networks composed of varying numbers of input and output
delays, and hidden units with one linear neuron in the output
layer executing a weighted sum of the hidden units. Standard
backpropagation with momentum and decaying learning rate
parameters was initially used to train the resulting feedforward
network. This method, however, suffered from extremely slow
learning. A faster method: RPROP [17] is now being used in
place of standard backpropagation. Networks are trained in
batch mode due to the sequential nature of the signals and the
requirements of RPROP. Since they constitute part of the input
to the hidden units, delayed outputs can either be those
computed by the network or the target outputs. The network
training software uses the latter to help accelerate learning.
To execute a trial, the network structure is configured and
training/testing files are submitted. Once training begins, the
number of iterations, the RMS error and the maximum error are
displayed. Network details and errors are saved periodically,
allowing for stopping and restarting training and for error trend
analysis. Testing signals can be loaded and evaluated for error
and audio files can be generated to be used for listening tests.
To test the feasibility of modeling a tube amplifier with the
NARX network, a set of training/testing signals were produced
as described in the previous section. The signals were
submitted to the NARX network program at various numbers
of input delays, output delays, and hidden units. With the latest
version of the training software, all configurations produced
similar results. As with many gradient descent learning
problems, the error tends to decrease in a hyperbolic fashion
with occasional abrupt steps; once the “knee” is crossed,
learning tends to progress very slowly.
The results of five configurations trained to a slightly
distorted guitar signal and an overdriven guitar signal at
100,000 iterations are shown in Table 1. The % RMS errors are
seen to be low; however, this is deceiving. The % RMS errors
measured during training are based on using the training data to
fill the output delay units, rather than the output from the
network. Testing is performed using the network outputs only,
propagating error in time and resulting in significantly worse
performance. This is shown in the table by the errors in
parentheses. The network models of the slightly distorted
Fig. 3. NARX Structure.
amp load digitizer computer
Fig. 4. Training and testing data acquisition set-up.
guitar signal performed better than those of the overdriven
guitar. This can be explained by noting that the operating point
of the amplifier for the slightly distorted guitar is such that the
amplifier is not being driven far into the saturation region; i.e.,
the amplifier is close to linear and therefore the NARX net is
only having to learn to be close to linear. Figures 4 and 5 show
snippets of waveforms for the best cases of the slightly
distorted and overdriven guitar models, respectively.
The poorer than expected performance of the NARX
network can be attributed to a number of factors. The first is
insufficient training. Further training may improve the results,
but the improvements may be marginal as indicated by the
hyperbolic nature of the error trend. A second factor is the
explosive combinations of parameters, i.e., numbers of input
delays, output delays, and hidden units. Larger numbers of
hidden units and delays may significantly improve
performance, but will take much longer to train and ultimately
make implementation difficult. And, last, when using gradient
descent for learningof which backpropagation and RPROP
are examplesthere is always the danger of getting trapped in
local minima.
The results of preliminary experiments have demonstrated
the feasibility of modeling vacuum-tube amplifiers with
recurrent neural networks. However, they also indicate a need
for much more investigation. The easiest, but not necessarily
productive, next phases are to extend the number of training
steps and increase the numbers of delays and hidden units.
There is an indication that this will improve the results as the
largest networks in the experiments performed best in the test
mode. Training rates will have to be increased substantially as
larger networks take more time per iteration. This might be
achieved by parallelizing the process and further tailoring the
training algorithm to the structure of the NARX network.
Other possible lines of research to follow are to add another
layer of hidden units to the network or to use a different
feedforward model such as a radial basis function network.
Since a vacuum-tube amplifier is generally a collection of
subsystems, partitioning the amplifier into linear and nonlinear
sections could possibly aid the learning process by
decomposing or modularizing the neural network.
[1] “The Invention of the Electric Guitar.” Internet:
invention.htm [December 1, 2012].
[2] Eric Barbour, “ The cool sound of tubes.IEEE Spectrum, August 1998,
pp. 24-35.
[3] Dave Hunter, “Ibanez,” Guitar Effects Pedals, the Practical Handbook.
San Francisco, California: Backbeat Books, 2004, pp. 68-71.
[4] S. Tokodoro, Signal amplifier circuit using a field-effect transistor
having current unsaturated triode vacuum tube characteristics.” U.S.
Patent 4 000 474, December 28, 1976.
[5] Eric Pritchard, “Semiconductor emulation of tube amplifiers.” U.S.
Patent 4 994 084, February 19, 1991.
[6] Michel Doidic et al., Tube modeling programmable digital guitar
amplification system.” U.S. Patent 5 789 689, August 4, 1998.
RMS Error
Slightly Distorted Guitar Signal
0.561 % (43.6 %)
0.393 % (44.3 %)
0.388 % (35.6 %)
0.314 % (40.8 %)
0.788 % (64.2 %)
1.35 % (23.7 %)
Overdriven Guitar Signal
0.530 % (221 %)
0.463 % (268 %)
0.515 % (154 %)
0.449 % (155 %)
0.464 % (100 %)
0.521 % (110 %)
Table 1. RMS error for NARX configurations after 100,000 iterations.
Fig. 6. 50-10-10 network; blue: raw guitar, green: overdriven guitar, and
red: output of NARX network.
Fig. 5. 50-10-10 network; blue: raw guitar, green: slightly distorted guitar,
and red: output of NARX network.
[7] Jyri Pakarinen and David Yeh, “A review of digital techniques for
modeling vacuum-tube guitar amplifiers. Computer Music Journal.
vol.33:2, pp. 85-100, summer 2009.
[8] David Yeh, Automated physical modeling of nonlinear audio circuits
for real-time audio effectspart II: BJT and vacuum tube examples.
IEEE Transactions on Audio, Speech, and Language Processing, vol.
20, no. 4, May 2012, pp. 1207-1216.
[9] K. Hornik, M. Stinchcombe, and H. White, "Multilayer feedforward
networks are universal approximators." Neural Networks, vol. 2, 1989,
pp. 359-366.
[10] D. Mendoza, “Emulating electric guitar effects with neural networks.”
Masters Thesis, Universitat Pompeu Fabra, Barcelona, 2005.
[11] Scott DeBoer and Kenneth Stanley, Systems and methods for inducing
effects in a signal.” U.S. Patent US 2009/0022331, July 16, 2007.
[12] I. Leontaritis and S. Billings, Inputoutput parametric models for
nonlinear systems Part I: Deterministic nonlinear systems.” International
Journal of Control, vol. 41, 1985, pp. 303-328.
[13] H. Siegelmann, B. Horne, and C.L. Giles, “ Computational capabilities of
recurrent NARX neural networks.” IEEE Transactions on Systems, Man,
and Cybernetics, Part B: Cybernetics, vol. 27, issue 2 , 1997, pp. 208-
[14] Yon Visell, “Meeting 18: Temporal Processing in Neural Networks.
Internet: [July
[15] M. Bianchini, M. Gori, and M. Maggini, “On the problem of local
minima in recurrent neural networks.” IEEE Transactions on Neural
Networks, vol. 5, issue 2, , March 1994, pp. 167-177.
[16] A. Back and A. Tsoi, “FIR and IIR synapses, a new neural network
architecture for time series modelling.” Neural Computation, vol. 3,
issue 3, Fall 1991, pp. 375-385.
[17] M. Riedmiller, “A direct adaptive method for faster backpropagation
learning: the RPROP algorithm,” Conference proceedings of IEEE
International Conference on Neural Networks 1993, vol. 1, 1993,
... The first article presenting the use of recurrent neural networks for vacuum tube amplifier modelling dates back to 2013 where a Nonlinear AutoRegressive eXogenous (NARX) network was applied to the task in [30]. A NARX network is similar to a traditional RNN (i.e., a fully-connected network incorporating recurrence) but with limited connectivity to remedy the training problems usually present in RNN which are linked to vanishing and exploding gradients due to their recursive nature. ...
... The training data used for the NARX network of [30] was comprised of both signals from a function generator (with frequencies in the range [100 Hz, 500 Hz]) and an electric guitar fed to a vacuum tube amplifier, a 4W Vox AC4TV. All training data was recorded at a sampling frequency of 96 kHz and saved to 24-bit stereo wav files with one channel containing the raw input signal and the other containing the tube amplifier signal. ...
... It was used in one of the first articles presenting recurrent networks for amplifier simulation [31] as well as in all of the gray-box methods presented previously, including a normalized variant used in [41] in order to stabilize the initial training of the network. [30]: 14) and the Normalized Root Mean Squared Error (NRMSE) used in [35]: ...
Full-text available
Vacuum tube amplifiers present sonic characteristics frequently coveted by musicians, that are often due to the distinct nonlinearities of their circuits, and accurately modelling such effects can be a challenging task. A recent rise in machine learning methods has lead to the ubiquity of neural networks in all fields of study including virtual analog modelling. This has lead to the appearance of a variety of architectures tailored to this task. This article aims to provide an overview of the current state of the research in neural emulation of analog distortion circuits by first presenting preceding methods in the field and then focusing on a complete review of the deep learning landscape that has appeared in recent years, detailing each subclass of available architectures. This is done in order to bring to light future possible avenues of work in this field.
... Thus far, applications of neural networks have focused mostly on modeling vacuum-tube amplifiers [15]- [18] and distortion circuits [7], [8], [10], [19], with some demonstrating the ability to run in real-time on CPU. In contrast, dynamic range compressors [20] pose a greater challenging in the modeling task due to their time-dependant nonlinearities, and have so far seen less attention. ...
... While there have been a number of works that apply neural networks for the audio effect modeling task, many of these works consider only a single configuration of the device, i.e. they optimize g without conditioning at only a single value of φ [15], [17], [24]. Other approaches consider multiple parameterizations, but during training use only a subset of input signal types x, e.g. ...
... These can be divided into three categories, recurrent neural networks (RNNs), and their variants (LSTM, GRU, vanilla RNN), temporal convolutional networks (TCNs), also known as the feedfoward WaveNet, and architectures that combine both elements. Simple RNNs have been shown to be effective in modeling nonlinear effects like those produced from vacuum-tube amplifiers and guitar distortion effects, often within perceptual tolerances [7], [10], [15]- [17]. These formulations process the signal on a sample-by-sample [7] The input at each time step is a vector of 3 elements: the current input sample, along with the conditioning parameters for the limit and peak reduction controls. ...
Full-text available
Deep learning approaches have demonstrated success in the task of modeling analog audio effects such as distortion and overdrive. Nevertheless, challenges remain in modeling more complex effects, such as dynamic range compressors, along with their variable parameters. Previous methods are computationally complex, and noncausal, prohibiting real-time operation, which is critical for use in audio production contexts. They additionally utilize large training datasets, which are time-intensive to generate. In this work, we demonstrate that shallower temporal convolutional networks (TCNs) that exploit very large dilation factors for significant receptive field can achieve state-of-the-art performance, while remaining efficient. Not only are these models found to be perceptually similar to the original effect, they achieve a 4x speedup, enabling real-time operation on CPU, and can be trained using only 1% of the data from previous methods.
... A nonlinear system cannot be modeled using classic frequency response analysis, since this assumes linear and time-invariant systems; for this reason, different mathematical methods have been proposed for addressing this task over time [14]. More recently, deep learning [5] methods have been explored, showing increasingly good results [2][3][4][8][9][10]16]. This paper is organized as follows: Sect. 2 explores two techniques for audio distortion modeling, both based on convolution. ...
... For this reason, only models that take raw audio as input were considered in this work. Recurrent neural networks (RNNs) are a common way to approach the task of generating data with a definite temporal structure, and many works on audio effects have been using this technique [2,15,16]; an extensive evaluation of these methods is behind the scope of this paper. Moreover, WaveNet demonstrated that it is possible to achieve significant results by using only convolutional layers [12]. ...
Most music production nowadays is carried out using software tools: for this reason, the market demands faithful audio effect simulations. Traditional methods for modeling nonlinear systems are effect-specific or labor-intensive; however, recent works yielded promising results by black-box simulation of these effects using neural networks. This work aims to explore two models of distortion effects based on autoencoders: one makes use of fully-connected layers only, and the other employs convolutional layers. Both models were trained using clean sounds as input and distorted sounds as target, thus, the learning method was not self-supervised, as it is mostly the case when dealing with autoencoders. The networks were then tested with visual inspection of the output spectrograms, as well as with an informal listening test, and performed well in reconstructing the distorted signal spectra, however a fair amount of noise was also introduced.
... -Neural Networks: In 2013, [Covert and Livingston, 2013] introduce a simple feedforward neural network model to emulate a tube amplifier. Despite the inaccurate results (100% of root mean square error), it can be considered as a first attempt to emulate tube amplifiers by neural networks. ...
... Only few researches have used neural networks for the emulation of tube amplifiers [Covert and Livingston, 2013;Embrechts, 2018c,b, 2019;Damskägg et al., 2019], even if similar models have been widely applied in the machine learning field for other types of tasks. To understand their capabilities and their limitations, it is important to trace some fundamental contributions in the machine learning field (see Appendix A.2). Artificial Neural Networks (ANN) (in opposition with biological neural networks) often simply called neural networks are the core of deep learning. ...
Full-text available
Nonlinear systems identification and modeling is a central topic in many engineering areas since most real world devices may exhibit a nonlinear behavior. This thesis is devoted to the emulation of the nonlinear devices present in a guitar signal chain. The emulation aims to replace the hardware elements of the guitar signal chain in order to reduce its cost, its size, its weight and to increase its versatility. The challenge consists in enabling an accurate nonlinear emulation of the guitar signal chain while keeping the execution time of the model under the real time constraint. To do so, we have developed two methods. The first method developed in this thesis is based on a subclass of the Volterra series where only static nonlinearities are considered: the polynomial parallel cascade of Hammerstein models. The resulting method is called the Hammerstein Kernels Identification by Sine Sweep method (HKISS). According to the tests carried out in this thesis and to the results obtained, the method enables an accurate emulation of nonlinear audio devices unless if the system to model is too far from an ideal Hammerstein one. The second method, based on neural networks, better generalizes to guitar signals and is well adapted to the emulation of guitar signal chain (e.g., tube and transistor amplifiers). We developed and compared eight models using different performance indexes including listening tests. The accuracy obtained depends on the tested audio device and on the selected model but we have shown that the probability for a listener to be able to hear a difference between the target and the prediction could be less than 1%. This method could still be improved by training the neural networks with an objective function that better corresponds to the objective of this audio application, i.e., minimizing the audible difference between the target and the prediction. Finally, it is shown that these two methods enable an accurate emulation of a guitar signal chain while keeping a fast execution time which is required for real-time audio applications.
... This motivates black-box models that enable emulations using only measurements from the device. Recently, deep learning approaches have seen success in modelling a range of effects [11][12][13][14]. These approaches often take the form of either recurrent or convolutional networks operating in the time domain [15][16][17][18]. ...
Full-text available
Deep learning approaches for black-box modelling of audio effects have shown promise, however, the majority of existing work focuses on nonlinear effects with behaviour on relatively short time-scales, such as guitar amplifiers and distortion. While recurrent and convolutional architectures can theoretically be extended to capture behaviour at longer time scales, we show that simply scaling the width, depth, or dilation factor of existing architectures does not result in satisfactory performance when modelling audio effects such as fuzz and dynamic range compression. To address this, we propose the integration of time-varying feature-wise linear modulation into existing temporal convolutional backbones, an approach that enables learnable adaptation of the intermediate activations. We demonstrate that our approach more accurately captures long-range dependencies for a range of fuzz and compressor implementations across both time and frequency domain metrics. We provide sound examples, source code, and pretrained models to faciliate reproducibility.
... The use of neural networks and deep learning has recently become popular in audio processing research [1][2][3][4][5]. The first neural models for audio effects processing imitated time-invariant linear and nonlinear filtering [6][7][8]. Deep neural models are the newest phase in the black-box modeling of audio devices, which had previously relied on nonlinear system identification methods, such as Volterra series [9][10][11][12] or the Wiener-Hammerstein model [13]. The present paper focuses on neural modeling of time-variant effects, which requires a different approach and poses a challenge during training, as the target behavior varies over time. ...
Full-text available
This article further explores a previously proposed gray-box neural network approach to modeling LFO (low-frequency oscillator) modulated time-varying audio effects. The network inputs are both the unprocessed audio and LFO signal. This allows the LFO to be freely controlled after model training. This paper introduces an improved process for accurately measuring the frequency response of a time-varying system over time, which is used to annotate the neural network training data with the LFO of the effect being modeled. Accuracy is improved by using a frequency domain synthesized chirp signal and using shorter and more closely spaced chirps. A digital flanger effect is used to test the accuracy of the method and neural network models of two guitar effects pedals, a phaser and flanger, were created. The improvement in the system measurement method is reflected in the accuracy of the resulting models, which significantly outperform previously reported results. When modeling a phaser and flanger pedal, error-to-signal ratios of 0.2% and 0.3% were achieved, respectively. Previous work suggests errors of this size are often inaudible. The model architecture can run in real time on a modern computer while using relatively little processing power.
... Various deep learning approaches have already been proposed for the task of modeling audio effects [12][13][14][15][16][17]. While previous approaches have focused on training a single model for each effect, we believe our work is the first to consider building a model that emulates a series connection of effects and their parameters, jointly. ...
Full-text available
Applications of deep learning to automatic multitrack mixing are largely unexplored. This is partly due to the limited available data, coupled with the fact that such data is relatively unstructured and variable. To address these challenges, we propose a domain-inspired model with a strong inductive bias for the mixing task. We achieve this with the application of pre-trained sub-networks and weight sharing, as well as with a sum/difference stereo loss function. The proposed model can be trained with a limited number of examples, is permutation invariant with respect to the input ordering, and places no limit on the number of input sources. Furthermore, it produces human-readable mixing parameters, allowing users to manually adjust or refine the generated mix. Results from a perceptual evaluation involving audio engineers indicate that our approach generates mixes that outperform baseline approaches. To the best of our knowledge, this work demonstrates the first approach in learning multitrack mixing conventions from real-world data at the waveform level, without knowledge of the underlying mixing parameters.
... Early works with neural networks included the use of recurrent neural networks (RNNs) which use their internal memory to compute the current output. However, the results of first studies were unsatisfactory and exhibited relatively high error values [131,132]. ...
Full-text available
Digital systems gain more and more popularity in todays music industry. Musicians and producers are using digital systems because of their advantages over analog electronics. They require less physical space, are cheaper to produce and are not prone to aging circuit components or temperature variations. Furthermore, they always produce the same output signal for a defined input sequence. However, musicians like vintage equipment. Old guitar amplifiers or legendary recording equipment are sold at very high prices. Therefore, it is desirable to create digital models of analog music electronics which can be used in modern digital environments. This work presents an approach for recreating nonlinear audio circuits using system identification techniques. Measurements of the input- and output-signals from the analog reference devices are used to adjust a digital model treating the reference device as a ‘black-box’. With this technique the schematic of the reference device does not need to be known and no circuit elements have to be measured to recreate the analog device. An appropriate block-based model is chosen, depending on the type of reference system. Then the parameters of the digital model are adjusted with an optimization method according to the measured input- and output-signals. The performance of the optimized digital model is evaluated with objective scores and listening tests. Two types of nonlinear reference systems are examined in this work. The first type of reference systems are dynamic range compressors like the ‘MXR Dynacomp’, the ‘Aguilar TLC’, or the ‘UREI 1176LN’. A block-based model describing a generic dynamic range compression system is chosen and an automated routine is developed to adjust it. The adapted digital models are evaluated with objective scores and a listening test is performed for the UREI 1176LN studio compressor. The second type of nonlinear systems are distortion systems like e.g. amplifiers for electric guitars. This work presents novel modeling approaches for different kinds of distortion systems from basic distortion circuits which can be found in distortion pedals for guitars to (vintage) guitar amplifiers like the ‘Marshall JCM900’, or the ‘Fender Bassman’. The linear blocks of the digital model are measured and used in the model while the nonlinear blocks are adapted with parameter optimization methods like the Levenberg–Marquardt method. The quality of the adjusted models is evaluated with objective scores and listening tests. The adjusted digital models give convincing results and can be implemented as real-time digital versions of their analog counterparts. This enables the musician to safe a snapshot of a certain sound and recall it anytime with a digital system like a VST plug-in or as a program on a dedicated hardware.
Full-text available
This work proposes the use of artificial neural networks in the task of emulating a tube pream-plifier for electric guitar. Studies are made about the excitation signals based on the systemidentification theory. To overcome the vanishing gradient problem of the most simple recurrentneural networks trained with the back-propagation through time algorithm, two solutions areproposed: the changing of the training technique to the extended Kalman filter; and the use ofanother network topology, the long short-term memory network. A test is developed about thestructure definition of the networks, which is applied to one of the recurrent neural networks. Atube preamplifier is designed, simulated through a circuit simulation software using the triodevacuum-tube models that are available in the literature. A prototype is built. Based on the datafrom the prototype and the simulations, the preamplifier is identified by the three networks andthe results are compared in a quantitative and qualitative form, showing that the long short-termmemory network obtain better results than the simpler recurrent neural networks, and the use ofthe extended Kalman filter as the training algorithm does not reduce the error, only increase thetraining’s speed.
Full-text available
A system for inducing an effect in a raw audio signal comprises a computing device for receiving a first audio signal and a second audio signal from a signal source, and the second audio signal comprises the first audio signal induced with an effect. The system further comprises logic that parameterizes the effect in the second audio signal into an artificial neural network (ANN).
This is the second part of a two-part paper that presents a procedural approach to derive nonlinear filters from schematics of audio circuits for the purpose of digitally emulating musical effects circuits in real-time. This work presents the results of applying this physics-based technique to two audio preamplifier circuits. The approach extends a thread of research that uses variable transformation and offline solution of the global nonlinear system. The solution is approximated with multidimensional linear interpolation during runtime to avoid uncertainties in convergence. The methods are evaluated here experimentally against a reference SPICE circuit simulation. The circuits studied here are the bipolar junction transistor (BJT) common emitter amplifier, and the triode preamplifier. The results suggest the use of function approximation to represent the solved system nonlinearity of the K-method and invite future work along these lines.
Although solid-state technology overwhelmingly dominates today's world of electronics, vacuum tubes are holding out in small but vibrant areas. Here, the author describes how music applications are one of the last remaining domains dominated by vacuum tubes, and how the devices flourish and even innovate in this field
Recursive input-output models for non-linear multivariate discrete-time systems are derived, and sufficient conditions for their existence are defined. The paper is divided into two parts. The first part introduces and defines concepts such as Nerode realization, multistructural forms and results from differential geometry which are then used to derive a recursive input-output model for multivariable deterministic non-linear systems. The second part introduces several examples, compares the derived model with other representations and extends the results to create prediction error or innovation input-output models for non-linear stochastic systems. These latter models are the generalization of the multivariable ARM AX models for linear systems and are referred to as NARMAX or Non-linear AutoRegressive Moving Average models with exogenous inputs.
A new neural network architecture involving either local feedforward global feedforward, and/or local recurrent global feedforward structure is proposed. A learning rule minimizing a mean square error criterion is derived. The performance of this algorithm (local recurrent global feedforward architecture) is compared with a local-feedforward global-feedforward architecture. It is shown that the local-recurrent global-feedforward model performs better than the local-feedforward global-feedforward model.
This paper rigorously establishes that standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another to any desired degree of accuracy, provided sufficiently many hidden units are available. In this sense, multilayer feedforward networks are a class of universal approximators.
The guitar tube amplifier consists of a preamplifier, a tone-control circuit, a power amplifier, and a transformer that couples to the loudspeaker load forming a circuit in which the operating point in terms of current through the tube device is usually set by a resistor connecting the cathode terminal to ground. The approach of modeling linear systems in guitar circuits is the black-box system identification approach that views the system as an abstract linear system and determines coefficients replicating the system. The other approach is a white-box approach that derives a discretized frequency response transfer function for the system based upon knowledge of its linear, constant-coefficient differential equations. The filters employed are ad hoc nonlinear filters based upon the circuit signal path, analytical approaches, and nonlinear filters derived from solving circuit equations using numerical methods. The analytical methods employed for analyzing nonlinearity with memory are based upon Volterra series theory and can be used to implement nonlinear audio effects. Download PDF at:
Recently, fully connected recurrent neural networks have been proven to be computationally rich-at least as powerful as Turing machines. This work focuses on another network which is popular in control applications and has been found to be very effective at learning a variety of problems. These networks are based upon Nonlinear AutoRegressive models with eXogenous Inputs (NARX models), and are therefore called NARX networks. As opposed to other recurrent networks, NARX networks have a limited feedback which comes only from the output neuron rather than from hidden states. They are formalized by y(t)=Psi(u(t-n(u)), ..., u(t-1), u(t), y(t-n(y)), ..., y(t-1)) where u(t) and y(t) represent input and output of the network at time t, n(u) and n(y) are the input and output order, and the function Psi is the mapping performed by a Multilayer Perceptron. We constructively prove that the NARX networks with a finite number of parameters are computationally as strong as fully connected recurrent networks and thus Turing machines. We conclude that in theory one can use the NARX models, rather than conventional recurrent networks without any computational loss even though their feedback is limited. Furthermore, these results raise the issue of what amount of feedback or recurrence is necessary for any network to be Turing equivalent and what restrictions on feedback limit computational power.
Many researchers have recently focused their efforts on devising efficient algorithms, mainly based on optimization schemes, for learning the weights of recurrent neural networks. As in the case of feedforward networks, however, these learning algorithms may get stuck in local minima during gradient descent, thus discovering sub-optimal solutions. This paper analyses the problem of optimal learning in recurrent networks by proposing conditions that guarantee local minima free error surfaces. An example is given that also shows the constructive role of the proposed theory in designing networks suitable for solving a given task. Moreover, a formal relationship between recurrent and static feedforward networks is established such that the examples of local minima for feedforward networks already known in the literature can be associated with analogous ones in recurrent networks.