ArticlePDF Available

Neural Computation in Stylometry I: An Application to the Works of Shakespeare and Fletcher

Authors:

Abstract and Figures

We consider the stylometric uses of a pattern recognition technique inspired by neurological research known as neural computation. This involves the training of so-called neural networks to classify data even in the presence of noise and non-linear interactions within data sets. We provide an introduction to this technique, and show how to tailor it to the needs of stylometry. Specifically, we show how to construct so-called multi-layer perceptron neural networks to investigate questions surrounding purported works of Shakespeare and Fletcher. The Double Falsehood and The London Prodigal are found to have strongly Fletcherian characteristics, Henry VIII strongly Shakespearian characteristics, and The Two Noble Kinsmen characteristics suggestive of collaboration.
Content may be subject to copyright.
Neural Computation in Stylometry I: An Application
to the Works of Shakespeare and Fletcher
ROBERT A. J. MATTHEWS
Oxford,
UK
THOMAS V. N. MERRIAM
Basingstoke, UK
Abstract
We consider the stylometric uses of a pattern recognition
technique inspired by neurological research known as neural
computation. This involves the training of so-called neural
networks to classify data even in the presence of noise and
non-linear interactions within data sets. We provide an intro-
duction to this technique, and show how to tailor it to the
needs of stylometry. Specifically, we show how to construct
so-called multi-layer perceptron neural networks to investi-
gate questions surrounding purported works of Shakespeare
and Fletcher.
The
Double Falsehood and
The
London Prodigal
are found to have strongly Fletcherian characteristics, Henry
VIII strongly Shakespearian characteristics, and The Two
Noble Kinsmen characteristics suggestive of collaboration.
1.
Introduction
Stylometry attempts to capture quantitatively the
essence of an individual's use of language. To do this,
researchers have proposed a wide variety of linguistic
parameters (e.g. rare word frequencies or ratios of
common word usage) which are claimed to enable
differences between individual writing styles to be
quantitatively determined.
Critics of stylometry rightly point out that despite its
mathematical approach the technique can never give
incontrovertible results. However, there can be little
doubt that the case in favour of attributing a particular
work to a specific author is strengthened if a wide
variety of independent stylometric tests point to a simi-
lar conclusion. The development of a new stylometric
technique is thus always of importance, in that it can
add to the weight of evidence in support of a specific
hypothesis.
To be a useful addition to stylometry, a new technique
should be theoretically well-founded, of measurable
reliability, and of wide applicability.
In this paper, we introduce a technique that meets all
these criteria. Based on ideas drawn from studies of the
brain, this so-called neural computation approach
forms a bridge between the method by which literary
scholars reach their qualitative judgements, and the
quantitative techniques used by stylometrists.
Like a human scholar, the technique uses exposure
to many examples of a problem to acquire expertise in
solving it. Unlike a human scholar, however, the neural
computation technique gives repeatable results of
measurable reliability. Furthermore, the technique is
theoretically well-founded. It can be shown that neural
Correspondence: Robert Matthews, 50 Norreys Road, Cumnor,
Oxford 0X2 9PT, UK.
Literary and Linguistic Computing, Vol. 8, No. 4, 1993
networks are capable of approximating any practically
useful function to arbitrary accuracy (see, for example,
Hecht-Nielsen, 1990, p. 131). Furthermore, ways of
finding such networks have their origins in well-
established concepts drawn from the theory of statistical
pattern recognition and non-linear regression; indeed,
neural computation can be thought of in more prosaic
terms as a non-linear regression technique.
In addition, neural networks are known to cope well
with both noisy data and non-linear correlations be-
tween data, confounding effects that have long dogged
stylometric research.
With such attributes, neural computation would
seem to constitute a promising new stylometric
method. In this paper, we show how to construct a
stylometric neural network, and then apply it to the
investigation of the works of Shakespeare and his con-
temporary John Fletcher (1579-1625).
2.
Background to Neural Computation
Despite the substantial computational power now avail-
able to conventional computers, the brain of an infant
can still outperform even the fastest supercomputers at
certain tasks. A prime example is that of recognizing a
face in a crowd: conventional computing techniques
have proved disappointing in such tasks.
This has led to interest in so-called neural computing,
which is an attempt to imitate computationally the
essentials of neurological activity in the brain. The idea
is that problems such as pattern recognition may be
better solved by mimicking a system known to be good
at such tasks.
Neural computation typically (but not necessarily)
involves programming a conventional computer to be-
have as if it consisted of arrangements of simple inter-
connected processing units—'neurons'—each one of
which is linked to its neighbours by couplings of various
strengths, known as 'connection weights'. It is now
known that even a relatively crude representation of
the collective behaviour of real neurons enables a num-
ber of difficult computational problems to be tackled.
To do this, the network of neurons has to be trained
to respond to a stimulus in the appropriate way. This
requires the application of a 'learning algorithm' enabl-
ing the weights to converge to
give
a network producing
acceptable solutions. Thereafter, each time the net-
work receives a specific input, it will produce an output
consistent with the data on which it has been trained.
Research into such 'neural computation' began in the
© Oxford University Press 1993
1940s,
but it
was
not
until
the mid
1980s
and the
publi-
cation
of
Parallel Distributed Processing (Rumelhart
and McClelland,
1986)
that
the
current interest
in the
field was kindled. This followed
the
authors' demon-
stration that a type
of
learning algorithm known as back
propagation
(or
simply 'backprop') enabled neural
net-
works
to
solve highly non-linear problems that
had
defeated simple networks (Minsky
and
Papert 1969).
The backprop algorithm, which
had in
fact been
previously discovered
by
several researchers,
has
since
been used to produce neural networks capable
of
solving
an astonishing variety
of
prediction
and
classification
problems, from credit risk assessment
to
speech recog-
nition, many
of
which have proved
all but
intractable
by conventional computational techniques (see,
for ex-
ample, Anderson
and
Rosenfeld, 1989; Refenes
et al.,
1993).
The backprop algorithm
is
typically used
in
conjunc-
tion with
a
specific arrangement
of
neurons known
as
the
multilayer perceptron
(MLP; see Fig. 1).
This
consists
of
an input layer
of
neurons, a so-called hidden
layer,
and an
output layer. Multi-layer perceptions
are
currently
the
most widely used form
of
neural network.
They have proved capable
of
performing classification
and prediction even
in the
presence
of
considerable
non-linearity
and
noise
in the raw
data.
It is for
these
reasons that
we
decided
to
investigate
the
specific
use
of MLPs
as a
new stylometric discrimination technique.
3. Building
a
Stylometric
MLP
For
our
purposes,
we
require
an MLP
that
can
take
a
set
of
m stylometric discriminators
for
a
given sample of
the works
of one of two
authors,
X and Y, and
then
classify
the
input
as the
work
of
either
X or Y.
This
implies that
the
MLP will consist
of
an input layer
of m
neurons—one
for
each stylometric discriminator used
to differentiate between
the two
authors—a hidden
wi
]
etc
oi (Author 1)
02 (Author 2)
h
i
dden"
1
ayer
output
1
ayer
Fig.
1
Topology
of a
stylometric multi-layer perception
for
classi-
fying works
of
two authors using five discriminators.
layer
of
n neurons,
and an
output layer
of
two neurons,
corresponding
to the two
authors.
Training such
an MLP
requires
the
backprop algo-
rithm, whose derivation is given
in
Chapter
7
of
Parallel
Distributed Processing (Rumelhart
and
McClelland,
1986).
We
then
use the
following protocol
to
train
the
MLP:
(a) Prepare
k
training vectors. These consist
of m
real numbers representing
the
discriminators,
while
the
output data consists
of the
author
ID.
(b)
Set up the
weights
of the
neural network with
small random values.
(c) Calculate
the
output that results when
the
input
training vector
is
applied
to
this initial network
arrangement.
{d) Calculate
the
(vector) difference between what
the network actually produces,
and the
desired
result; this constitutes
an
error vector
for
this
input
and
output vector.
(e) Adjust the weights
and
thresholds
of
the network
using
the
backprop algorithm
to
reduce
the
error.
(f) Repeat with
the
next input training vector,
and
continue down
the
training
set
until
the
network
becomes acceptably reliable.
We
now
consider
the
practical aspects
of
this protocol.
3.1
The
Training Vectors
These consist
of
the
m
discriminators with
the
power
to
differentiate between author
X
and author
Y,
together
with
an
author
ID
label.
In general,
the
larger
m
becomes,
the
stronger
the
discrimination. However,
a
limit
on the
number
of
dis-
criminators that can
be
used is
set
by the
the
availability
of text
of
reliable provenance
on
which training
can be
based.
If an MLP has too
many inputs relative
to the
number
of
training vectors,
it
will lose
its
ability
to
generalize
to new
data; essentially, there
are too
many
unknowns
for the
data
to
support.
To
combat this,
experience shows (D. Bounds, 1993, private communi-
cation) that
the
total number
of
training vectors used,
k,
should
be at
least
ten
times
the
sum
of
the number of
inputs
and
outputs. These training vectors should,
moreover,
be
drawn equally from
the
works
of
the
two
authors,
be
suitably representative,
and be
derived
from reasonable lengths
of
text.
The
use of
many discriminators thus raises
the num-
ber
of
training vectors required. However, one can only
extract more training vectors from
a
given amount
of
reliable training text by taking smaller and smaller sam-
ples,
and
these will
be
increasingly subject
to
statistical
noise.
Given these various constraints,
we
concluded that
a
useful stylometric
MLP
should consist
of
five input
neurons, giving reasonable discriminatory power,
and
two outputs; this then leads
to a
requirement
for at
least 10
x (5 + 2) =
70 training vectors, roughly half of
which come from each
of
the two authors. This number
of training vectors allows
the
stylometric discriminator
data
to be
based
on
reasonable samples
of
1000 words
drawn from
the
core canons
of
many authors.
204
Literary and Linguistic Computing, Vol.
8,
No.
4, 1993
3.2 Training the MLP
The first step in the training process is the so-called
forward pass, in which an input vector is applied to the
input neurons, and their output is passed via a set of
initially random weights to neurons in the hidden layer.
Suppose the discriminators applied to the input layer
form the vector (ij, i
2
, '3, '4,15). Then for each hidden
layer neuron hj we form the sum
(1)
where w
mj
is the weight connecting input neuron m to
hidden layer neuron
j.
The summation runs from
0
to 5,
with
w
Oj
the so-called biassing weight which performs a
role similar to that of a threshold (Rumelhart and
McClelland, 1986, p. 329). It can be trained just like
the other weights, with i
0
simply being considered to
have the fixed value + 1.
The output from hj is then obtained by applying a so-
called squashing function to 5, typically sigmoidal in
form, so that
exp[-S(*,-)]}
(2)
These are then used as the inputs to the output layer,
with a similar summing and squashing procedure giving
S(oj) and S(o
2
) for the two output neurons. The corres-
ponding outputs fl(oi) and
Q,(o
2
)
constitute the final
output of the MLP. Classification is then achieved on
the basis of which of these two outputs is the larger.
The error vector, e, between the desired output and
that produced by the network during training is used to
modify the weights according to the backprop algorithm.
The training is repeated down the training set until the
initially random weights converge to the set of values
giving an acceptable accuracy of classification. There-
after the MLP simply uses (1) and (2) to calculate
output vectors from given input vectors using the
weights w
mj
, etc., at their converged values.
3.3 The Completion of
Training
During training, the classification error falls until it
reaches a stable value. In practice two criteria are used
to dictate when an MLP can be considered 'trained'.
Typically, the set of k input vectors is split into a
train-
ing set and a
cross-validation
set. The former is used to
train the network while the latter is held in reserve to
gauge performance.
Left to train over many cycles, MLPs often learn to
classify the training set with complete accuracy.
However, this does not imply that the MLP will per-
form well when exposed to data it has never seen
before. This inability to generalize to new data is
known as 'overtraining'.
The exact cause of overtraining is still unclear (see,
for example, Hecht-Nielsen, 1990 p. 116), but it has
obvious symptoms: as training continues, classification
of the training vectors continues to improve, while that
of the cross-validation vectors start to degrade.
The solution
is to
halt training when the
MLP
performs
to an acceptable standard on both training and cross-
validation vectors. Selecting an appropriate standard is
Literary and Linguistic Computing, Vol. 8, No. 4, 1993
thus a balance between the need to produce useful
results and the avoidance of overtraining. Obviously, a
50%
success rate in classifying data between two equally
likely alternatives is no better than coin-tossing.
However, achieving 100% accuracy in both training
and cross-validation is usually prevented by the over-
training phenomenon.
We now describe our solution of this and other prac-
tical issues surrounding the construction of
a
stylometric
MLP capable of discriminating between Shakespeare
and Fletcher.
4. Construction of the Shakespeare-Fletcher MLP
4.1 Choice of
Discriminants
The inputs of the MLP are the m discriminators we
choose as being capable of differentiating between the
works of Shakespeare and those of Fletcher. The discri-
minators should, in addition, show reasonable stability
across the corpus of an author's work (at least that
made up by works of one genre, such as plays), and
ideally maintain their reliability when works are broken
down into smaller units, such as individual acts. This
latter feature is particularly desirable in an MLP de-
signed to investigate supposed collaborations within a
single work.
Both Merriam (1992) and Horton (1987) have studied
the choice of discriminators meeting such criteria in
considerable detail, and we investigated the use of five
discriminants based on their work as inputs for two
Shakespeare-Fletcher neural networks.
The Merriam-based set of m = 5 discriminators were
the following ratios: did/(did+do); nolT-10; no/(no+
not);
to the/to; upon/(on+upon). Here T-10. is
Taylor's ten function words (but, by, for, no, not, so,
that, the, to, with) (Taylor, 1987).
The set of five discriminators based on the work of
Horton consists of ratios formed by dividing the total
numbers of words in a sample by the number of occur-
rences of the following five function words:
are;
in; no;
of; the. All contractions involving these function words
(e.g. i' th') have been expanded to maximize the word
counts.
4.2 Formation of
Training
and
Cross-validation
Data
Sets
For each set of five discriminators, we formed training
sets of k = 100 vectors (fifty each for Shakespeare and
Fletcher), with each vector taking the following form:
(ratio 1; ratio 2; ratio 3; ratio 4; ratio 5; author ID)
For training purposes, each ratio was computed by
word counts on
1,000-word
samples from works of
undisputed origin for each author. For Shakespeare
these were taken to be the core canon plays The
Winter's Tale, Richard III, Love's Labour's Lost, A
Midsummer Night's Dream, 1 Henry TV, Henry V,
Julius Caesar, As You Like It, Twelfth Night and
Antony and
Cleopatra.
For Fletcher, we took as core
canon The
Chances,
The Womans
Prize,
Bonduca, The
Island
Princess,
The Loyal Subject and Demetrius and
205
Enanthe. For all these, the source used for our word
counts was the machine-readable texts produced by the
Oxford University Computing Service.
Once the five sets of 100 ratios were extracted for
each discriminator, each set was normalized to give
zero mean and unit standard deviation to ensure that
each discriminator contributes equally in the training
process.
4.3 Training
Criteria
The training vectors thus derived were then used to
produce two MLPs: one capable of differentiating be-
tween Shakespeare and Fletcher using the five Merriam
discriminators, the other using those of Horton.
After some experimentation, it emerged that we
could reasonably expect cross-validation accuracies of
at least 90% without running into overtraining prob-
lems.
Thus the first of our criteria for the completion of
training was that the MLP be capable of classifying the
cross-validation vectors with an accuracy of at least
90%.
The other criterion was set by the requirement that
the MLP be unbiassed in its discrimination process; in
other words, that it was no more likely to misclassify
works of Fletcher as Shakespearian than it was to do
the reverse. Thus, the second of our training criteria
was that misclassified vectors be approximately equally
divided between the two classifications.
These criteria were then used to find a suitable size
for the hidden layer. Too few hidden units fails to
capture all the features in the data, while too many
leads to a failure to generalize; in tests, we found that
three hidden units were sufficient to give cross-
validation results meeting our criteria. We then fixed
our topology for the stylometric MLP at five inputs,
three hidden units, and two outputs.
Both the Merriam and Horton MLPs were found to
successfully meet the training criteria after twenty or so
presentations of the complete 100-vector training set.
The Merriam-based network (henceforth MNN)
achieved a cross validation accuracy of 90%, with the
10%
misclassified being split into 6% Shakespeare
classified as Fletcher, and 4% Fletcher classified as
Shakespeare.
The Horton-based network (henceforth HNN)
achieved 96% cross-validation accuracy, with the both
modes of misclassification lying at 2%.
4.4 Testing and Performance Appraisal
Having been trained, both MNN and HNN were tested
by being asked to classify core canon works of Shake-
speare and Fletcher that neither network had seen
during training. This constitutes a test of the power of
each network to generalize to new data.
In the first test, each network was asked to classify
ten complete plays, eight from the core canon of Shake-
speare (All's Well that Ends Well, Comedy of Errors,
Coriolanus, King John, Much Ado about
Nothing,
The
Merchant of
Venice,
Richard II, and Romeo and Juliet)
and two from that of Fletcher (Valentinian and Mon-
sieur Thomas).
In addition to giving the simple (bipolar) classifica-
tion of 'Shakespeare' or 'Fletcher', as dictated by the
206
larger of the two output signal strengths, each network
also provided a measure of the degree to which it con-
sidered each work to belong to one class or another.
We call this the Shakespearian Characteristics Measure
(SCM); it is defined as
SCM
= n
s
/(a
s
+ n
F
) (3)
where
£1$
and d
F
are the values of the outputs from the
Shakespeare and Fletcher neurons, respectively. Thus
the stronger the Shakespeare neuron output relative to
the Fletcher neuron output, the higher the SCM.
Strongly Fletcherian classifications, on the other hand,
give SCM closer to zero, and those on the borderline
(il
s
=
^F)
give SCM = 0.5. The value of the SCM lies
in the greater insight it provides into a particular classi-
fication result.
The results obtained from the Merriam and Horton
MLPs applied to entire core canon plays of both dra-
matists are shown in Table 1.
Table 1 Multi-layer perception results for core canon Shakespeare
and Fletcher
Play
Shakespeare
ADO
AWW
CE
COR
KJ
MV
Rll
ROM
Fletcher
VAL
MTH
Merriam
SCM
0.75
0.74
0.90
0.84
0.76
0.67
0.81
0.80
0.46
0.32
Merriam
Verdict
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Fletcher
Fletcher
Horton Horton
SCM
0.71
0.92
0.91
0.98
0.91
0.97
0.92
0.87
0.30
0.29
Verdict
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Fletcher
Fletcher
As can be seen, both MNN and HNN gave the correct
overall classification to all ten complete plays. The two
networks also gave SCMs of similar numerical value,
despite being based on different sets of discriminator:
the correlation coefficient between the SCMs produced
by the two MLPs is 0.894.
The statistical significance of the overall classification
results can be judged by using the binomial distribution
to calculate the probability P(S) of obtaining at least 5
successes in T
trials
simply by chance, given two equally
likely outcomes. In our case, we have T = 10 and S =
10,
so that P(10) = 9.8 x 10~
4
; the correct classification
of ten entire plays by both MNN and HNN is thus
highly significant (P < 0.001).
The significance of the correlation of SCMs can be
assessed using the Student Mest, which for r = 0.894
and eight degrees of freedom gives t = 5.653, corres-
ponding to P <
0.001.
These impressive results highlight an important
feature of stylometric MLPs: although each network
was trained to give 90% cross-validation accuracy, this
figure can be improved upon when the networks are
applied to entire plays. This reflects the fact that dis-
criminator values derived from entire plays are less
noisy than those derived from acts.
Literary and Linguistic Computing, Vol. 8, No. 4, 1993
We would, however, expect the performance of the
MLPs to be somewhat less impressive when they are
applied to individual acts, whose stylometric properties
will be rather more noisy.To investigate this degrada-
tion in performance, we used MNN and HNN to classify
individual acts of two plays from the core canon of each
playwright. For Shakespeare, we took the acts from
The Tempest and The Merry Wives of Windsor, while
for Fletcher we took acts from
Valentinian
and Mon-
sieur Thomas.
The Merriam-based network was found to misclassify
Acts 2 and 4 of the Tempest, and Acts 1 and 3 of The
Merry Wives of
Windsor,
together with Acts 2 and 5 of
Valentinian, and Act 4 of Monsieur Thomas, an overall
success rate of 65%. As the probability of obtaining
thirteen or more correct classifications by chance alone
is 0.13, MNN's success is of only marginal significance.
The Horton-based network did considerably better,
however, successfully classifying all but Acts
3
and 4 of
the Tempest and Act 5 of
Valentinian,
a success rate of
85%;
the results are shown in Table 2.
Although, as expected, both MNN and HNN were
less successful when applied to acts rather than entire
plays,
the success rate of HNN was still very highly
significant (P < 0.001). We thus conclude that both
MNN and HNN are effective in discriminating author-
ship of entire plays, while HNN also remains effective
down at the level of individual acts.
5. Using the Networks on Disputed Works
Having investigated the relative powers of MNN and
HNN to classify successfully both entire plays and indi-
vidual acts, we applied each network to four works of
particular interest: The Double
Falsehood,
The London
Prodigal, Henry
VIII,
and The Two Noble Kinsmen.
All four plays have at some time been linked to
Shakespeare and Fletcher. Although the anonymous
The Double Falsehood has been associated with the
Shakespeare apocrypha this play is now generally
thought to be an adaptation of the now-lost
The History
of Cardenio, itself a collaboration between Shake-
speare and Fletcher (Taylor, 1987). The London Prodi-
gal is also anonymous and part of the Shakespeare
apocrypha, but evidence supporting authorship by
Fletcher has recently emerged from both stylometry
(Merriam, 1992, Chapters 10 and 11) and socio-
linguistic analysis (Hope, 1990).
Finally, interest in Henry VIII and The Two Noble
Kinsmen stems from the fact that both have long been
considered to be the product of collaboration between
Shakespeare and Fletcher (Hart, 1934; Maxwell, 1962;
Shoenbaum, 1967; Proudfoot, 1970).
Given this background, we applied both MNN and
HNN to all four plays in their entirety, and then investi-
gated the question of collaboration by applying HNN
alone to individual acts of Henry VIII and The Two
Noble Kinsmen. This produced the results shown in
Table 3.
6. Analysis of Results
As Table
3
shows, both MNN and HNN agree that The
Double Falsehood taken as an entire play is predomi-
Literary and Linguistic Computing, Vol. 8, No. 4, 1993
Table 2 Horton MLP results for core canon acts
Play
Shakespeare
Merry
Wives
of
Windsor
Act I
II
III
IV
V
The Tempest
Act I
II
III
IV
V
Fletcher
Monsieur Thomas
Act I
II
III
IV
V
Valentinian
Act I
II
III
IV
V
Horton
SCM
0.88
0.74
0.87
0.77
0.93
0.91
0.56
0.31*
0.37*
0.86
0.29
0.30
0.29
0.29
0.29
0.30
0.30
0.29
0.31
0.88*
Horton
Verdict
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Shakespeare
(Fletcher)
(Fletcher)
Shakespeare
Fletcher
Fletcher
Fletcher
Fletcher
Fletcher
Fletcher
Fletcher
Fletcher
Fletcher
(Shakespeare)
'Denotes apparent misclassification
Table 3 Merriam and Horton MLP results for disputed plays
Play Merriam
SCM
Entire plays
Double
Falsehood
0.40
London
Prodigal
0.31
Henry VIII 0.84
Two Noble Kinsmen 0.78
Plays by acts
Double
Falsehood
Act I
II
III
IV
V
London
Prodigal
Act I
II
III
IV
V
Henry VIII
Act I
II
III
IV
V
Two Noble Kinsmen
Act I
II
III
IV
V
Merriam
Verdict
Fletcher
Fletcher
Shakespeare
Shakespeare
Horton Horton
SCM
0.37
0.30
0.94
0.65
0.66
0.87
0.29
0.73
0.29
0.89
0.29
0.34
0.28
0.30
0.98
0.85
0.97
1.00
0.57
0.93
0.30
0.32
0.60
0.91
Verdict
Fletcher
Fletcher
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Fletcher
Shakespeare
Fletcher
Shakespeare
Fletcher
Fletcher
Fletcher
Fletcher
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Shakespeare
Fletcher
Fletcher
Shakespeare
Shakespeare
207
nantly Fletcherian in style. Given this agreement of two
different MLPs, and the more robust nature of results
obtained when the MLPs are applied to entire plays,
this finding appears to add evidential weight to the view
that, despite being the product of an eighteenth-
century adaptation, The Double Falsehood has con-
siderable Fletcherian characteristics, agreeing with
contemporary scholarship summed up by Metz (1989).
The SCMs for The Double Falsehood produced by
both MNN and HNN are, however, somewhat higher
than the ~ 0.3 value found by both MLPs for canon
Fletcher works. This raises the possibility that the SCM
is reflecting a Shakespearian influence on the play at
the level of individual acts.
This possibility gains support from the application of
HNN to individual acts of The Double
Falsehood:
we
find three of the five acts have SCMs suggestive of a
predominately Shakespearian influence. Given the
greater statistical noise in the discriminators at the level
of acts, less weight should be attached to these attribu-
tions,
but they remain suggestive, none the less.
Similar remarks apply to the MLP findings with
The London Prodigal: we find an overall Fletcherian
attribution, but with some Shakespearian influence,
especially in Act I. The results for Henry VIII taken as
an entire play using both MNN and HNN indicate that
it is predominately Shakespearian, a view that has long
had its advocates (Foakes, 1957; Bevington, 1980). The
SCM for the entire play is high, and even at the level of
acts,
all the attributions are to Shakespeare.
However, collaboration is not entirely ruled out: the
relatively low SCM value for Act V suggests a strongly
Fletcherian contribution to this part of Henry
VIII,
a
view supported by Hoy (1956).
The results from both MNN and HNN for The Two
Noble Kinsmen taken as an entire play also support an
overall Shakespearian attribution, but the relatively
low SCMs confirm current scholarly opinion of con-
siderable collaboration between the two dramatists.
The Horton-based network applied to individual acts
provides more detailed information on this, attributing
Acts I and V to Shakespeare, and Acts II and III to
Fletcher. It also gives a relatively borderline SCM for
Act IV, hinting at a considerable Fletcherian contribu-
tion to this act; all these assessments are in broad agree-
ment with those of Proudfoot (1970) and Hoy (1956).
7. Conclusions
In this paper, we have set out the principles and practi-
calities of applying neural computation to stylometry.
Multi-layer perception neural networks have two major
advantages as a stylometric technique. First, experi-
ence gained by researchers in neural computation over
a wide range of applications shows that MLPs are able
to classify data even in the presence of considerable
statistical noise. In addition, they are essentially non-
linear classifiers, and can thus deal with interactions
between stylometric discriminators, a feature denied
traditional linear methods.
We have shown that after being trained using data
drawn from
1,000-word
samples taken from core canon
works of Shakespeare and Fletcher, MLPs will success-
208
fully recognize known works of Fletcher and Shake-
speare they have not encountered before.
In particular the MLPs were found to give excellent
classification results when applied to entire plays,
whose discriminator data are less subject to statistical
noise. Furthermore, through the use of SCMs, they
proved capable of reflecting authorship influence at the
level of individual acts.
More specifically, when applied to disputed works
the MLPs gave new evidential weight to the views of
scholars concerning the authorship of four plays: The
Double Falsehood, The London Prodigal, Henry VIII
and The Two Noble Kinsmen. In the case of The
London Prodigal, the evidence may now be sufficient
to challenge the common assumption that, at 26,
Fletcher was insufficiently mature to write such a play.
We believe that these results show that neural net-
works are a useful addition to current stylometric tech-
niques. We cannot, however, overemphasize that
like any quantitative stylometric method—neural net-
works do not give incontrovertible classifications. Their
true importance lies in their potential to provide an
additional and independent source of evidential weight
upon which literary scholars can draw.
We are ourselves now undertaking further research
using MLP neural networks, and plan to report the
results in due course (Merriam and Matthews, 1993).
Acknowledgements
It is a pleasure to thank Professor David Bounds of
Aston University and Paul Gregory and Dr Les Ray of
Recognition Research for their interest and advice,
and for giving us access to their excellent NetBuilder
software, without which this research may well have
foundered. We also thank Dr Chris Bishop of AEA
Technology, Dr Jason Kingdon of University College
London for valuable discussions, and the anonymous
referees whose constructive comments resulted in many
improvements.
References
Anderson, J. A. and Rosenfeld, E. (eds) (1989). Neuro-
computing: Foundations of Research, 4th printing. MIT
Press,
Cambridge.
Bevington, D. (ed.) (1980). The Complete Works. Scott,
Foresman, Glenview.
Foakes, R. A. (ed.) (1957). King Henry VIII in The Arden
Shakespeare.
Methuen, London.
Hart, A., (1934). Shakespeare and the Vocabulary of
The Two Noble Kinsmen. Melbourne University Press,
Melbourne.
Hecht-Nielsen, R. (1990). Neurocomputing. Addison-
Wesley, Reading.
Hope, J. (1990). Applied Historical Linguistics: Socio-
historical Linguistic Evidence for the Authorship of Re-
naissance Plays,
Transactions
of the
Philological
Society,
88.
2: 201-26.
Horton, T. B. (1987). Doctoral thesis, University of Edin-
burgh.
Hoy, C. (1956). The Shares of Fletcher and His Collaborators
in the Beaumont and Fletcher Canon (VII),
Studies
in Bib-
liography, 15: 129-46.
Literary and Linguistic Computing, Vol. 8, No. 4, 1993
Maxwell, J. C. (ed.) (1962). King Henry
VIII.
Cambridge
University Press, Cambridge.
Merriam, T. V. N. (1992). Doctoral thesis, University of
London.
, and Matthews, R. A. J. (1993). Neural Computation in
Stylometry II: An Application to the Works of Shake-
speare and Marlowe, Literary and Linguistic Computing
(submitted).
Metz, G. H. (ed.) (1989).
Sources
of
Four Plays
Ascribed to
Shakespeare. University of Missouri Press, Columbia.
Minsky, M. and Papert, S. (1969).
Perceptrons.
MIT Press,
Cambridge.
Proudfoot, G. R. (ed.) (1970). The Two Noble Kinsmen
Edward Arnold, London.
Refenes, A. N., Azema-Barac, M., Chen, L., and Karoussos,
S.A. (1993). Currency Exchange Rate Prediction and Neu-
ral Network Design Strategies Neural Computing & Appli-
cations, 1.1: 46-58.
Rumelhart, D. E., and McClelland, J. L. (eds) (1986).
Para-
llel
Distributed Processing
(I). MIT Press, Cambridge.
Shoenbaum, S. (ed.) (1967). The Famous
History
of
the
Life
of King Henry the Eighth. The New American Library,
New York.
Taylor, G. (1987). The Canon and Chronology of Shake-
speare's Plays, William
Shakespeare:
A Textual Compan-
ion.
Clarendon Press, Oxford.
Appendix
To encourage the greater
use
of neural networks
in
stylometry,
the authors will happily provide .EXE files containing fully
trained MLPs based on the Merriam and Horton discrimina-
tors to anyone sending a blank IBM-compatible
3.5"
disk and
return postage.
Literary and Linguistic Computing, Vol. 8, No. 4, 1993
209
... Stylometry, the study of (predominantly) writing style, dates back several decades (Mosteller and Wallace, 1963), and has seen increased accessibility through the introduction of statistical models (see surveys by Holmes, 1998;Neal et al., 2018) and machine learning (e.g., Matthews and Merriam, 1993;Merriam and Matthews, 1994). Computational stylometry distinguishes several subtasks such as determining (Baayen et al., 2002) and verifying author identity (Koppel and Schler, 2004), and author profiling (Argamon et al., 2005); e.g., predicting demographic attributes. ...
Preprint
Full-text available
This dissertation proposes a framework of user-centered security in Natural Language Processing (NLP), and demonstrates how it can improve the accessibility of related research. Accordingly, it focuses on two security domains within NLP with great public interest. First, that of author profiling, which can be employed to compromise online privacy through invasive inferences. Without access and detailed insight into these models' predictions, there is no reasonable heuristic by which Internet users might defend themselves from such inferences. Secondly, that of cyberbullying detection, which by default presupposes a centralized implementation; i.e., content moderation across social platforms. As access to appropriate data is restricted, and the nature of the task rapidly evolves (both through lexical variation, and cultural shifts), the effectiveness of its classifiers is greatly diminished and thereby often misrepresented. Under the proposed framework, we predominantly investigate the use of adversarial attacks on language; i.e., changing a given input (generating adversarial samples) such that a given model does not function as intended. These attacks form a common thread between our user-centered security problems; they are highly relevant for privacy-preserving obfuscation methods against author profiling, and adversarial samples might also prove useful to assess the influence of lexical variation and augmentation on cyberbullying detection.
... Shakespeare phenomenon -identifying true author/authors of Shakespeare plays, has been commonly used in stylometry research for decades. Matthews et al. [12][13] used a neural network to determine who is the author of some novels. They used a Merriam-based set of discriminants of length 5 as input to their neural network. ...
Conference Paper
Full-text available
Stylochronometry deals with the influence of time in an author’sstyle, specifically how it changes stylometric features. Analysis oftime drift occurrence is important especially for a dataset creationprocess of other works in this area. In this paper, we performed ex-periments using the Google Code Jam dataset to show the influenceof time drift in the area of source code authorship attribution. Ourexperiments revealed that there is significant time drift in stylo-metric features in one year difference, which is enlargening as thedifference of time increases. Another interesting result is that whentraining our authorship attribution method on data from the futureand testing on data from the past, the time drift is lower than in op-posite direction. Also, we found the relation between the length ofsource code and the accuracy of our authorship attribution method.
Article
Full-text available
Why are the Digital Humanities a genuine part of the Humanities? Attempts are currently being made by arguing that computational methods are at the same time hermeneutic procedures (‘screwmeneutics’, ‘hermenumericals’): computation and hermeneutics were mixed. In criticizing this fusion of ‘literacy’ and ‘numeracy’, it is argued that what really connects the classical Humanities and the Digital Humanities is methodologically based on the ‘cultural technique of flattening’ and not on hermeneutics. The projection of spatial and non-spatial relations onto the artificial flatness of inscribed and illustrated surfaces forms a first-order epistemic and cultural potential in the history of the Humanities: diagrammatic reasoning, the visualizing potential of writings, lists, tables, diagrams, and maps, the sorting function of alphabetically ordered knowledge corpora have always shaped and determined basic scholarly work. It is this ‘diagrammatical’ dimension to which the Digital Humanities are linked to Humanities in general. The metamorphosis of texts, pictures, and music into the surface configurations of machine-analyzable data corpora opens up the possibility of revealing latent and implicit patterns of cultural artifacts, and practices that mostly are not accessible to human perception. The quantifying, computational methods of the Digital Humanities operate like computer-generated microscopes and telescopes into the cultural heritage, ongoing cultural practices, and even the culturally unconscious.
Chapter
The concept of provenance is largely perceived as documenting the lineage of things. However, provenance may not just be used to describe what did happen (retrospective provenance), it can also be used to describe what could happen (prospective provenance). The ProvONE model, based on the W3C PROV standard, is such a model that bridges retrospective and prospective provenance. In this chapter, we first give an overview of related provenance models that led to the creation of the ProvONE model. We then discuss in detail the concepts of prospective provenance, retrospective provenance, and hybrid provenance. Finally, we introduce the three aspects of the ProvONE model and the main classes in each aspect with illustrative examples.
Article
Described by the TLS as 'a formidable bibliographical achievement … destined to become a key reference work for Shakespeareans', Shakespeare in Print is now issued in a revised and expanded edition offering a wealth of new material, including a chapter which maps the history of digital editions from the earliest computer-generated texts to the very latest digital resources. Murphy's narrative offers a masterful overview of the history of Shakespeare publishing and editing, teasing out the greater cultural significance of the ways in which the plays and poems have been disseminated and received over the centuries from Shakespeare's time to our own. The opening chapters have been completely rewritten to offer close engagement with the careers of the network of publishers and printers who first brought Shakespeare to print, additional material has been added to all chapters, and the chronological appendix has been updated and expanded.
Chapter
Human artifacts like technical papers and computer programs often carry the individual styles of their creators. If retrieved properly, such style information from the artifacts can be used to categorize the artifacts, compare the relative “similarities” among artifacts, and may even be used for tracing the authorship of a new artifact.Bitcoin is a peer-to-peer cryptocurrency and its author(s) goes/go by the pseudonym of Satoshi Nakamoto. In this article, we use deep learning to study the styles of two Bitcoin artifacts: the first version of Bitcoin’s source code, v0.1.0, which was released in early 2009, and the original Bitcoin white paper, which is dated Oct. 2008. Both studies use the deep learning technique, which first utilizes extensive computing power to generate a neural network model from labelled training data and then uses the model to predict the authorship of unknown data. For the Bitcoin source code artifact, the data set is a set of cryptography software that were built around 2008/2009 and it has 16 known labels. Our model achieves \(89.1\%\) validation accuracy and our prediction results show that the Bitcoin source code is likely produced by multiple authors and Hal Finney is not one of them. For the Bitcoin white paper, we compiled a second data set of financial cryptography papers that are in the same knowledge domain. This data set has 436 known labels. Our model achieves \(55.1\%\) validation accuracy and it has identified four technical papers that are “similar” to the Bitcoin white paper. KeywordsFinancial cryptographyBitcoinDeep learningAnonymityAuthorship attributionCode stylometry
Article
The issue of authorship attribution has long been considered and continues to be a popular topic. Because of advances in digital computers, this field has experienced rapid developments in the last decade. In this article, a survey of recent advances in authorship attribution in text mining is presented. This survey focuses on authorship attribution methods that are statistically or computationally supported as opposed to traditional literary approaches. The main aspects covered include the changes in research topics over time, basic feature metrics, machine learning techniques, and the advantages and disadvantages of each approach. Moreover, the corpus size, number of candidates, data imbalance, and result description, all of which pose challenges in authorship attribution, are discussed to inform future work. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Text Mining >Authorship attribution in text mining.
Chapter
The technique known as contemporary stylometry uses different methods, including machine learning, to discover a poem’s author based on features like the frequencies of words and character n-grams. However, there is one potential textual fingerprint stylometry tends to ignore: versification, or the very making of language into verse. Using poetic texts in three different languages (Czech, German, and Spanish), Petr Plecháč asks whether versification features like rhythm patterns and types of rhyme can help determine authorship. He then tests its findings on two unsolved literary mysteries. In the first, Plecháč distinguishes the parts of the Elizabethan verse play The Two Noble Kinsmen written by William Shakespeare from those written by his coauthor, John Fletcher. In the second, he seeks to solve a case of suspected forgery: how authentic was a group of poems first published as the work of the nineteenth-century Russian author Gavriil Stepanovich Batenkov? This book of poetic investigation should appeal to literary sleuths the world over.
Article
Researchers will find this a useful guide to the background of concepts employed in the field that have been taken from disciplines as varied as neuroscience, psychology, cognitive science, engineering, and physics. A number of these important historical papers contain ideas that have not yet been fully exploited, while the more recent articles define the current direction of neurocomputing and point to future research. Each is prefaced by an introduction that puts it in historical and intellectual perspective. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
This paper presents a new type of evidence for the authorship of Renaissance plays. Taking the form of a test case on Shakespeare and Fletcher's Henry VIII, the paper demonstrates how differing rates of usage of grammatical variables can be used to identify the hands of collaborators.
Article
This paper describes a non trivial application in forecasting currency exchange rates, and its implementation using a multi-layer perceptron network. We show that with careful network design, the backpropagation learning procedure is an effective way of training neural networks for time series prediction. The choice of squashing function is an important design issue in achieving fast convergence and good generalisation performance. We evaluate the use of symmetric and asymmetric squashing functions in the learning procedure, and show that symmetric functions yield faster convergence and better generalisation performance. We derive analytic results to show the conditions under which symmetric squashing functions yield faster convergence, and to quantify the upper bounds on the convergence improvement. The network is evaluated both for long-term forecasting without feedback (i.e. only the forecast prices are used for the remaining trading days), and for short- term forecasting with hourly feedback. The network learns the training set near perfect, and shows accurate prediction, making at least 22% profit on the last 60 trading days of 1989.
Article
Using principles set out in an earlier paper, a neural network was constructed to discriminate between the works of Shakespeare and his contemporary Christopher Marlowe. Once trained using works from the core canon of the two dramatists, the network successfully classified works to which it had not been previously exposed. In the light of these favourable results, we used the network to classify a number of anonymous works. Strong support emerged for Tucker Brooke's view that The True Tragedy is the Marlovian original of Henry VI, Part 3 , the latter being the product of subsequent revisions by Shakespeare.
  • R Hecht-Nielsen
Hecht-Nielsen, R. (1990). Neurocomputing. Addison- Wesley, Reading.