Content uploaded by David L. Livingston
Author content
All content in this area was uploaded by David L. Livingston on Dec 19, 2019
Content may be subject to copyright.
A Vacuum-Tube Guitar Amplifier Model Using a
Recurrent Neural Network
John Covert and David L. Livingston
Department of Electrical and Computer Engineering
Virginia Military Institute
Lexington, Virginia, USA
livingstondl@vmi.edu
Abstract—Rock and blues guitar players prefer the use of
vacuum-tube amplifiers due to the harmonic structures
developed when the amplifiers are overdriven. The disadvantages
of vacuum tubes compared against solid-state implementations,
such as power consumption, reliability, cost, etc., are far
outweighed by the desirable sound characteristics of the
overdriven vacuum-tube amplifier. There are many approaches
to modeling vacuum-tube amplifier behaviors in solid-state
implementations. These include a variety of both analog and
digital techniques, some of which are judged to be good
approximations to the tube sound.
In this paper we present early results of experiments in using a
neural network to model the distortion produced by an
overdriven vacuum-tube amplifier. Our approach is to use
artificial neural networks of the recurrent variety, specifically a
Nonlinear AutoRegressive eXogenous (NARX) network, to
capture the nonlinear, dynamic characteristics of vacuum-tube
amplifiers. NARX networks of various sizes have been trained on
data sets consisting of samples of both sinusoidal and raw electric
guitar signals and the amplified output of those signals applied to
a tube-based amplifier driven at various levels of saturation.
Models are evaluated using both quantitative (e.g., RMS error)
and qualitative (listening tests) assessment methods on data sets
that were not used in the network training. Listening tests—
considered by us to be the most important evaluation method—at
this point in the work, are indicative of the potential for success
in the modeling of a vacuum-tube amplifier using a recurrent
neural network.
Keywords—vacuum-tube amplifiers; recurrent neural networks
I. INTRODUCTION
Since the invention of electronic audio amplification, much
effort has been directed in designing electronic systems which
faithfully amplify audio signals for the purposes of conveying
information or providing entertainment. Biasing techniques and
the use of negative feedback were developed for improving the
linear response of electronic amplifiers in an effort to minimize
harmonic and intermodulation distortion. But an interesting
thing happened along the path of audio development, guitars
were electrified. To be used and therefore heard in big bands
with many instruments, guitar sound had to be amplified [1].
Electric guitars went on to become the musical instruments of
choice for various musical styles such as country, blues, and
particularly rock and roll. In the early periods of the these
musical styles, the state of the art of electronic amplification
was the vacuum-tube amplifier. It was soon discovered,
particularly by early blues and rock musicians, that when
overdriven, vacuum-tube amplifiers produced harmonic
distortion which was musically pleasing [2]. Rock musicians
routinely used their amplifiers in an overdriven mode and even
modified the circuits to force the amps into saturation to obtain
the desired harmonic distortion.
With the invention of the transistor, the technology of
choice for the implementation of audio amplifiers shifted from
vacuum tubes to solid-state devices. Solid-state amplifiers have
a significant number of advantages over vacuum-tube
amplifiers including lower cost, energy efficiency, smaller size,
lighter weight, reliability, etc. However, despite the
disadvantages of tube-based amplifiers, modern electric guitar
players still prefer them over solid-state amplifiers for one
reason: the distortion produced by overdriven vacuum-tube
amplifiers is significantly different than that produced by
overdriven solid-state amplifiers. Solid-state distortion is often
characterized as sounding harsher than tube-based distortion,
and is attributed to the difference in the way the amplifier
transitions from the linear part of its characteristic to the
saturation region.
Much time has been devoted by amplifier and effects
designers to reproduce the vacuum-tube sound in solid-state
amplifiers and is reflected by the number of patents and
resulting products that attempt to do so. Analog approaches
range from fairly simple “stomp boxes” such as the Ibanez
Tube Screamer® [3] that uses an operational amplifier with
back-to-back limiting diodes, to FET-based analog circuit
emulations [4], to sophisticated electronic systems composed of
many stages of linear and nonlinear elements [5]. Guitar
amplifier manufacturers have incorporated digital signal
processing approaches that can be found in “modeling
amplifiers” [6].
The pursuit of the vacuum-tube model in solid-state
form—particularly as a DSP implementation—has become an
academic exercise with a corresponding increase in scholarly
papers addressing the topic. Pakarinen and Yeh [7] have
Sponsored by Virginia Military Institute Grants in Aid of Research
Fig. 1. Soft clipping characteristic vs. hard clipping.
produced an extensive review of the work towards modeling
tube amplifiers. Yeh [8] has devised a technique for modeling a
tube amplifier by developing a nonlinear filter using state-space
methods for systems of nonlinear differential equations.
Since the distortion from vacuum-tube amplifiers is due to
the nonlinear characteristics of the vacuum-tubes (and possibly
the saturation characteristics of the output transformer),
techniques for modeling nonlinear behaviors need to be the
focus for developing a successful model of the overall system.
Once such technique is the use of multilayer feedforward
neural networks which have been shown to be capable of
approximating almost any function of interest—those that are
Borel measurable—given a sufficient number of hidden-layer
neurons [9].
Published research into the application of neural networks
to the problem of modeling guitar amplifiers and effects is
scant. Mendoza [10] used multilayer feedforward networks to
learn the static nonlinearities of an Ibanez Tube-Screamer®
applied to a guitar signal. DeBoer and Stanley [11] filed a
patent on a technique for evolving recurrent neural networks to
implement various guitar effects.
Our approach to the problem of modeling a vacuum-tube
guitar amplifier is the use of a recurrent neural network
implementation of the Nonlinear AutoRegressive eXogenous
(NARX) model [12][13]. The use of the NARX model for
exactly this application was suggested in lecture notes by
Videll [14]. We present preliminary, but promising, results of
our investigations into using a NARX network to model a
vacuum-tube amplifier.
The remainder of the paper is organized as follows: Section
II covers how harmonic distortion generated from vacuum-tube
amplifiers is different than distortion from solid-state amplifier
saturation. The properties of recurrent neural networks and
their use in modeling nonlinear behaviors are also discussed.
The details concerning the construction and training of neural
network models with guitar signals are examined in Section III.
The results of recent experiments are presented and examined
in Section IV and conclusions and future work are detailed in
Section V.
II. BACKGROUND
A. Harmonic Distortion
The characteristics of the harmonic distortion produced by
a vacuum-tube amplifier versus a solid-state amplifier are the
principal reason the tube amp is favored by guitarists. The
differences in the characteristics are primarily due to how the
amplifiers are driven into saturation. Solid-state amplifiers tend
to clip signals abruptly—called hard clipping, whereas tube
amplifiers produce a gradual saturation—called soft clipping—
as shown in Fig. 1. For symmetric clipping, odd harmonics are
produced. Hard clipping results in more energy in the higher
harmonics than soft clipping, leading to a sound which is often
described as harsh. Fig. 2 compares the spectra of a sinusoid
which is soft clipped—in blue—versus hard clipped—in red.
Another contributor to preferred sound is the occurrence of
even harmonics in the distorted signal. The first two even
harmonics are octaves above the fundamental frequency which
are musically pleasing. Even harmonics are produced when
clipping is asymmetric which occurs when the bias point or Q-
point is not centered on the load-line between the
saturation/cutoff levels. In vacuum-tube amplifiers, the
introduction of even harmonics is a dynamic process.
Capacitances in the biasing circuits of vacuum-tube stages
contribute to Q-point sensitivity to changes in the signal
envelope; i.e., the Q-point moves on the load-line in response
to signal strength [7]. Thus, the strength of the even harmonic
content is a function of the signal envelope. It is this response
which dictates the need for a dynamic model rather than a
simple static model of the nonlinearity in the vacuum-tube
amplifier.
B. Recurrent Neural Networks
To be able to emulate dynamic behavior, a neural network
must have state. State is implemented in digital form via
feedback and delay, resulting in a network designated as
recurrent. A fully connected recurrent network, i.e., with
connections from all neurons, to all neurons, is computationally
powerful, but difficult to train and may have copious local
minima [15]. One particular type of recurrent neural network
with limited connectivity, the NARX network, compares well
in its computational abilities to a fully connected network and
is trainable [13]. Back and Tsoi [16] treated the NARX
network as a nonlinear IIR filter and demonstrated its use for
learning time series. Essentially, the NARX network is a
multilayer, feedforward network with a tapped delay line of
input units and output units serving as inputs. In functional
form:
where yk is the output, xk is the input, yk−i are delayed outputs,
xk−i are delayed inputs, and f is a nonlinear function composed
of a multilayer, feedforward neural network. The indices m and
1 2 1
( , ,..., , , ,..., )
k k k m k k n
y f y y y x x x
, (1)
Fig. 2. Harmonic content of soft vs. hard clipping
n are the numbers of delayed versions of the outputs and
inputs, respectively.
The structure of the NARX network used in this work is
essentially a two-level, feedforward network with a variable
number of sigmoidal neurons in the hidden layer and a linear
neuron in the output layer. The input and variable numbers of
delayed inputs, and delayed outputs feed all hidden-layer
neurons. The structure is displayed in Figure 3.
III. EXPERIMENTAL SET-UP
A. Traing Data Acquisition System
Since the objective is to model a vacuum-tube amplifier,
training and testing data have to be generated for input to the
NARX network training system. The training data consist of
non-distorted or “raw” input signals synchronized to the
resulting output signals of the vacuum-tube amplifier. The
“input” and “target” signals are digitized and stored in files for
application to the NARX network. The set-up, displayed in Fig.
4, consists of a signal source—function generator or guitar—
applied directly to a digitizer and to the vacuum-tube amplifier.
The output of the amplifier is connected to a second channel of
the digitizer and the resulting digital signals are acquired and
stored in files on a computer. To drive a tube amplifier into
saturation, the gain is often “turned to eleven,” i.e., adjusted to
the maximum value which can produce an audio signal of a
substantial volume level. To prevent hearing damage, a passive
speaker load simulator was built and a monitor amplifier was
added to get aural feedback at a reasonable level.
The specifications on the data acquisition set-up are as
follows: The signal is provided by either a function generator at
frequencies in the range of 100 Hz to 500 Hz or an electric
guitar. The vacuum-tube amplifier is a 4 W Vox AC4TV model
with its tone control set to maximum bandwidth and its volume
set at various levels. The passive load is a resistor, inductor,
capacitor circuit model of a speaker with a nominal impedance
of 16 Ω. Audio signals are monitored using a 10W Fender
Frontman. To digitize the signals, a Roland Quad Capture USB
2.0 audio interface is used with word size of 24 bits and sample
rate of 96 kSa/s. Signals are captured and recorded using the
recording software Audacity and are
converted to 24-bit, 96 kSa/s stereo WAV files with the raw
signal on one stereo channel and the tube-amp signal on the
other.
B. NARX Training Software
A program was written in C# to train, test, and implement
NARX networks of various sizes. The program is specific to
networks composed of varying numbers of input and output
delays, and hidden units with one linear neuron in the output
layer executing a weighted sum of the hidden units. Standard
backpropagation with momentum and decaying learning rate
parameters was initially used to train the resulting feedforward
network. This method, however, suffered from extremely slow
learning. A faster method: RPROP [17] is now being used in
place of standard backpropagation. Networks are trained in
batch mode due to the sequential nature of the signals and the
requirements of RPROP. Since they constitute part of the input
to the hidden units, delayed outputs can either be those
computed by the network or the target outputs. The network
training software uses the latter to help accelerate learning.
To execute a trial, the network structure is configured and
training/testing files are submitted. Once training begins, the
number of iterations, the RMS error and the maximum error are
displayed. Network details and errors are saved periodically,
allowing for stopping and restarting training and for error trend
analysis. Testing signals can be loaded and evaluated for error
and audio files can be generated to be used for listening tests.
IV. INITIAL EXPERIMENTAL RESULTS
To test the feasibility of modeling a tube amplifier with the
NARX network, a set of training/testing signals were produced
as described in the previous section. The signals were
submitted to the NARX network program at various numbers
of input delays, output delays, and hidden units. With the latest
version of the training software, all configurations produced
similar results. As with many gradient descent learning
problems, the error tends to decrease in a hyperbolic fashion
with occasional abrupt steps; once the “knee” is crossed,
learning tends to progress very slowly.
The results of five configurations trained to a slightly
distorted guitar signal and an overdriven guitar signal at
100,000 iterations are shown in Table 1. The % RMS errors are
seen to be low; however, this is deceiving. The % RMS errors
measured during training are based on using the training data to
fill the output delay units, rather than the output from the
network. Testing is performed using the network outputs only,
propagating error in time and resulting in significantly worse
performance. This is shown in the table by the errors in
parentheses. The network models of the slightly distorted
linear
sigmoid sigmoid sigmoid
xk-1 xk-2 yk-1yk-2
yk
xk
...
...
Fig. 3. NARX Structure.
signal
tube
amp load digitizer computer
monitor
amp
Fig. 4. Training and testing data acquisition set-up.
guitar signal performed better than those of the overdriven
guitar. This can be explained by noting that the operating point
of the amplifier for the slightly distorted guitar is such that the
amplifier is not being driven far into the saturation region; i.e.,
the amplifier is close to linear and therefore the NARX net is
only having to learn to be close to linear. Figures 4 and 5 show
snippets of waveforms for the best cases of the slightly
distorted and overdriven guitar models, respectively.
The poorer than expected performance of the NARX
network can be attributed to a number of factors. The first is
insufficient training. Further training may improve the results,
but the improvements may be marginal as indicated by the
hyperbolic nature of the error trend. A second factor is the
explosive combinations of parameters, i.e., numbers of input
delays, output delays, and hidden units. Larger numbers of
hidden units and delays may significantly improve
performance, but will take much longer to train and ultimately
make implementation difficult. And, last, when using gradient
descent for learning—of which backpropagation and RPROP
are examples—there is always the danger of getting trapped in
local minima.
V. CONCLUSIONS
The results of preliminary experiments have demonstrated
the feasibility of modeling vacuum-tube amplifiers with
recurrent neural networks. However, they also indicate a need
for much more investigation. The easiest, but not necessarily
productive, next phases are to extend the number of training
steps and increase the numbers of delays and hidden units.
There is an indication that this will improve the results as the
largest networks in the experiments performed best in the test
mode. Training rates will have to be increased substantially as
larger networks take more time per iteration. This might be
achieved by parallelizing the process and further tailoring the
training algorithm to the structure of the NARX network.
Other possible lines of research to follow are to add another
layer of hidden units to the network or to use a different
feedforward model such as a radial basis function network.
Since a vacuum-tube amplifier is generally a collection of
subsystems, partitioning the amplifier into linear and nonlinear
sections could possibly aid the learning process by
decomposing or modularizing the neural network.
REFERENCES
[1] “The Invention of the Electric Guitar.” Internet:
http://invention.smithsonian.org/centerpieces/electricguitar/
invention.htm [December 1, 2012].
[2] Eric Barbour, “ The cool sound of tubes.” IEEE Spectrum, August 1998,
pp. 24-35.
[3] Dave Hunter, “Ibanez,” Guitar Effects Pedals, the Practical Handbook.
San Francisco, California: Backbeat Books, 2004, pp. 68-71.
[4] S. Tokodoro, “Signal amplifier circuit using a field-effect transistor
having current unsaturated triode vacuum tube characteristics.” U.S.
Patent 4 000 474, December 28, 1976.
[5] Eric Pritchard, “Semiconductor emulation of tube amplifiers.” U.S.
Patent 4 994 084, February 19, 1991.
[6] Michel Doidic et al., “Tube modeling programmable digital guitar
amplification system.” U.S. Patent 5 789 689, August 4, 1998.
Input
Delays
Output
Delays
Hidden
Units
RMS Error
Slightly Distorted Guitar Signal
5
5
5
0.561 % (43.6 %)
5
10
5
0.393 % (44.3 %)
5
5
10
0.388 % (35.6 %)
5
10
10
0.314 % (40.8 %)
10
10
10
0.788 % (64.2 %)
50
10
10
1.35 % (23.7 %)
Overdriven Guitar Signal
5
5
5
0.530 % (221 %)
5
10
5
0.463 % (268 %)
5
5
10
0.515 % (154 %)
5
10
10
0.449 % (155 %)
10
10
10
0.464 % (100 %)
50
10
10
0.521 % (110 %)
Table 1. RMS error for NARX configurations after 100,000 iterations.
Fig. 6. 50-10-10 network; blue: raw guitar, green: overdriven guitar, and
red: output of NARX network.
Fig. 5. 50-10-10 network; blue: raw guitar, green: slightly distorted guitar,
and red: output of NARX network.
[7] Jyri Pakarinen and David Yeh, “A review of digital techniques for
modeling vacuum-tube guitar amplifiers.” Computer Music Journal.
vol.33:2, pp. 85-100, summer 2009.
[8] David Yeh, “Automated physical modeling of nonlinear audio circuits
for real-time audio effects—part II: BJT and vacuum tube examples.”
IEEE Transactions on Audio, Speech, and Language Processing, vol.
20, no. 4, May 2012, pp. 1207-1216.
[9] K. Hornik, M. Stinchcombe, and H. White, "Multilayer feedforward
networks are universal approximators." Neural Networks, vol. 2, 1989,
pp. 359-366.
[10] D. Mendoza, “Emulating electric guitar effects with neural networks.”
Masters Thesis, Universitat Pompeu Fabra, Barcelona, 2005.
[11] Scott DeBoer and Kenneth Stanley, “Systems and methods for inducing
effects in a signal.” U.S. Patent US 2009/0022331, July 16, 2007.
[12] I. Leontaritis and S. Billings, “Input–output parametric models for
nonlinear systems Part I: Deterministic nonlinear systems.” International
Journal of Control, vol. 41, 1985, pp. 303-328.
[13] H. Siegelmann, B. Horne, and C.L. Giles, “ Computational capabilities of
recurrent NARX neural networks.” IEEE Transactions on Systems, Man,
and Cybernetics, Part B: Cybernetics, vol. 27, issue 2 , 1997, pp. 208-
215.
[14] Yon Visell, “Meeting 18: Temporal Processing in Neural Networks.”
Internet: http://www.cim.mcgill.ca/~yon/ai/lectures/lec18.pdf [July
2012].
[15] M. Bianchini, M. Gori, and M. Maggini, “On the problem of local
minima in recurrent neural networks.” IEEE Transactions on Neural
Networks, vol. 5, issue 2, , March 1994, pp. 167-177.
[16] A. Back and A. Tsoi, “FIR and IIR synapses, a new neural network
architecture for time series modelling.” Neural Computation, vol. 3,
issue 3, Fall 1991, pp. 375-385.
[17] M. Riedmiller, “A direct adaptive method for faster backpropagation
learning: the RPROP algorithm,” Conference proceedings of IEEE
International Conference on Neural Networks 1993, vol. 1, 1993,
pp.586-591.