Content uploaded by Jean-Pierre Koenig
Author content
All content in this area was uploaded by Jean-Pierre Koenig on Jan 06, 2014
Content may be subject to copyright.
The Tesserae Project: intertextual
analysis of Latin poetry
............................................................................................................................................................
Neil Coffee, Jean-Pierre Koenig, Shakthi Poornima,
Christopher W. Forstall, Roelant Ossewaarde and
Sarah L. Jacobson
The University at Buffalo, The State University of New York, USA
.......................................................................................................................................
Abstract
Tesserae is a web-based tool for automatically detecting allusions in Latin poetry.
Although still in the start-up phase, it already is capable of identifying significant
numbers of known allusions, as well as similar numbers of allusions previously
unnoticed by scholars. In this article, we use the tool to examine allusions to
Vergil’s Aeneid in the first book of Lucan’s Civil War. Approximately 3,000 lin-
guistic parallels returned by the program were compared with a list of known
allusions drawn from commentaries. Each was examined individually and graded
for its literary significance, in order to benchmark the program’s performance.
All allusions from the program and commentaries were then pooled in order to
examine broad patterns in Lucan’s allusive techniques which were largely
unapproachable without digital methods. Although Lucan draws relatively con-
stantly from Vergil’s generic language in order to maintain the epic idiom, this
baseline is punctuated by clusters of pointed allusions, in which Lucan frequently
subverts Vergil’s original meaning. These clusters not only attend the most sig-
nificant characters and events but also play a role in structuring scene transitions.
Work is under way to incorporate the ability to match on word meaning, phrase
context, as well as metrical and phonological features into future versions of the
program.
.................................................................................................................................................................................
1Introduction
The study of allusion has grown to become a core
interest of classical—particularly Latin—literary
studies over the past several decades. Beyond
simply documenting instances of textual reuse,
scholars such as Conte (1986),Hinds (1998), and
Edmunds (2001) have enlarged the scope in which
allusion is understood to create meaning and pre-
sented several theoretical models for how allusion is
both written and read.
A number of recent digital humanities projects
have examined various aspects of text reuse.
Bamman and Crane (2008) presented a model for
identifying allusions based on multiple parameters
and detailed their methods for measuring two texts’
similarity to words, word order, and syntax. Horton
et al. (2010) created an algorithm for detecting text
reuse in French and other languages, based solely on
string similarity, which they have released under an
open source licence. Bu
¨chler et al. (2010) examined
larger scale patterns of text reuse in the treatment of
Plato by later Greek authors. Tesserae draws on
these and other projects for models, yet distin-
guishes itself as an integrated effort to develop allu-
sion detection software, undertake detailed case
studies, and bring the understanding of allusion to
a non-specialist audience.
Correspondence:
Neil Coffee, Department
of Classics, 338 MFAC,
University at Buffalo,
Buffalo, NY 14261-0026,
USA.
E-mail:
ncoffee@buffalo.edu
Literary and Linguistic Computing ßThe Author 2012. Published by Oxford University Press on behalf of ALLC.
All rights reserved. For Permissions, please email: journals.permissions@oup.com
1of 8
doi:10.1093/llc/fqs033
Literary and Linguistic Computing Advance Access published July 20, 2012
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from
Users of Tesserae’s web-based interface select two
texts from simple drop-down lists (Fig. 1). A list of
parallel phrases is then returned (Fig. 2); this may
be downloaded as an XML document or a list of
comma separated values. The current version of the
program is already online and freely accessible
(http://tesserae.caset.buffalo.edu) and has received
positive feedback from practising scholars of Latin
allusion, including writers of textual commentaries
who customarily note allusions.
In the remainder of this article, we present some
preliminary results from our application of the cur-
rent version of the search tool to a case study of
the Roman poet Lucan. Lucan was a poet of the
time of Nero and left unfinished at his death an
8,000-line epic on the subject of Rome’s civil war
known as the Bellum Civile (BC). In writing such an
epic, it would have been impossible for Lucan to
avoid comparison with the figure of Vergil,
approximately 100 years his senior, whose monu-
mental work, the Aeneid, had already become a clas-
sic. Lucan’s relationship with his predecessor is far
from simple: at times he relies on and reinforces
Vergil’s authority; at times he draws out ambiguity
and paradox latent in Vergil’s work; and at times he
deliberately opposes Vergil’s artistic and ideological
programs.
We formulated five questions to frame our
analysis:
(1) How often does Lucan refer to the Aeneid?
(2) What kinds of reference does he make?
(3) Where in the Aeneid does he turn most often,
and for what kinds of references?
(4) How are these references distributed within
Lucan’s text?
(5) How do these results change our present
understanding of the relationship between
the BC and the Aeneid?
Fig. 1 Tesserae user interface.
N. Coffee et al.
2of 8Literary and Linguistic Computing, 2012
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from
2Method
Tesserae considers two passages from different
poems to constitute a parallel if they share two or
more words. Results reported here combine the
output of two successive versions of the program,
an earlier one in which a passage was any six
consecutive words, and a later one which divided
the text into grammatical phrases based on editorial
punctuation. Word order and syntax were not con-
sidered. Word identity was judged not only by
the word’s form in the text but also by its dictionary
headword. We used the Archimedes Morphology
Service of the Max Planck Institute for the
History of Science (http://archimedes.mpiwg-
berlin.mpg.de/arch/doc/xml-rpc.html) to retrieve
headword information for our texts. Texts them-
selves were drawn from the Latin Library (http://
thelatinlibrary.com) and the Perseus Project
(http://www.perseus.tufts.edu).
To explore the contact between Lucan and
Vergil, we examined a list of parallels between the
BC and the Aeneid. We concentrated our attention
on BC Book 1 (695 lines), considering parallels
found anywhere in the entirety of the Aeneid
(9,896 lines). We ran Tesserae on these texts, then
compared the results with a list of parallels collated
from four modern commentaries: Heitland and
Haskins (1887),Thompson and Brue
`re (1968),
Viansino (1995), and Roche (2009).
Each parallel identified either by the program or
by the commentators was examined individually
and given a type number between 1 and 5 according
to its literary significance. Although this was neces-
sarily a subjective procedure, we formulated a gen-
eral set of criteria for our classification (Table 1).
The principal distinction was between meaningful
(Types 3–5) and not meaningful (Types 1 and 2)
parallels. This distinction follows the argument of
Thomas (1986, p. 117) that references either are or
Fig. 2 Tesserae results.
The Tesserae Project
Literary and Linguistic Computing, 2012 3of 8
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from
are not ‘susceptible to interpretation or meaning-
ful’. The set of meaningful parallels was further
divided into those that simply reused distinctive
language, and those that in doing so created new
literary significance. Conte (1986, p. 31) proposed
that an earlier work could provide either a ‘code
model’ or an ‘exemplary model’ for a later one. In
the first case, the model as a whole defines the idiom
in which the later text speaks. In the second case, the
referring author directs the reader’s attention to a
particular moment in the earlier work. This distinc-
tion separates our Type 3 from Types 4 and 5. The
final distinction, between Types 4 and 5, less and
more significant allusions, was the most subjective.
Other schemas are possible, but ours proved useful
for broadly categorizing parallels to analyze the
large-scale questions posed above, to which we
now turn.
3Results
3.1 Numbers of parallels
The automated search returned a list of 3,100
parallels across all types, while the combined efforts
of the four commentaries produced 419 parallels of
Types 2–5. A comparison of results by type is given
in Table 2. The number of Type 3–5 parallels
returned by the program was comparable to the
work of the commentators, but the program re-
ported vastly more of Types 1 and 2 than did the
commentaries. These results show that, with manual
examination of the program’s output to filter out
false positives, our automated search can already
identify a significant portion of the parallels most
interesting to literary scholars. Comparing the
program to individual commentaries, we see
that for interpretable allusions (Types 4 and 5), it
reports 103 to Viansino’s 48, but still fewer than
Roche’s 151.
These numbers tell only half the story, however.
Although Tesserae returned numbers of valuable
parallels at similar rates to the commentators, the
parallels themselves were often different from those
found by the commentators. Only half of the
Table 1 The schema used to grade parallels reported by Tesserae and the commentaries
Meaningful Not meaningful
Interpretable Not interpretable
More significant Less significant
543 21
High formal simi-
larity to analogous
context
Moderate formal
similarity to
analogous
context, or
High formal
similarity in
moderately
analogous context
High/moderate formal similarity
to very common phrase or
words, or
High/moderate formal similarity
to no analogous context, or
Moderate formal similarity to
moderate/highly analogous
context
Very common
words in very
common phrase, or
Words too distant
to form a phrase
Error in discovery
algorithm, words
should not have
matched
Table 2 All parallels reported by Tesserae and four
commentaries, by type
Type Tesserae Commentaries Total
All Roche Viansino T and B H and H
1 486 0 0 0 0 0 486
2 2,241 55 50 8 1 1 2,289
3 280 192 168 33 13 6 425
4 57 79 66 18 12 3 115
5 36 93 85 30 14 4 103
Total 3,100 419 369 89 40 14 3,418
The commentaries used were Roche (2009), Viansino (1995),
Thompson and Brue
`re (1968) and Heitland and Haskings
(1887). In adding columns, each unique parallel is only counted
once; combined totals may be less than the sum of individual
values.
N. Coffee et al.
4of 8Literary and Linguistic Computing, 2012
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from
interpretable allusions detected by Tesserae were
listed in the commentaries (Fig. 3). Thus, although
Tesserae returned only 25% of the commentator’s
allusions, it also increased the total number of allu-
sions found by 25%.
3.2 Parallels by type
The most obvious difference between our auto-
mated search and the commentaries was the
number of less meaningful parallels returned.
Among the commentaries, there is already a trend
in this direction, with Roche (2009) surpassing his
predecessors in the number of Type 2 and 3 parallels
reported. Unlike the other commentators, Roche
examined only Book 1 of Lucan’s poem, effectively
concentrating his efforts. He also used digital
searches along with more traditional philological
tools. These methods enabled Roche to look
beyond the exemplary model allusions most familiar
to Latinists and begin to represent the level of code
model reference which underwrites Lucan’s posture
as an epic poet. Tesserae expands this perspective
considerably.
In what proportions does Lucan use the various
types of parallels? Combining results from Tesserae
and the commentators, we start to get a compre-
hensive picture of the author’s practice. The data
presented in Table 2 suggest that in BC 1, Lucan
relies on Vergil’s generic epic language about twice
as frequently as he alludes to specific passages in the
Aeneid.
3.3 Parallels by location in source text
Lucan does not draw evenly from all books of the
Aeneid.Figure 4 shows the distribution of all paral-
lels in the Aeneid, by type. Although Lucan draws
relatively evenly on all books of the Aeneid for Type
3 parallels, he clearly favors certain books for Types
4 and 5. His most meaningful allusions are drawn
above all from Aeneid 2, followed by Books 4, 11,
and 3.
It is natural that, in presenting the destruction of
Rome as the major theme of the Roman civil war,
Lucan should draw upon Vergil’s portrayal of the
fall of Troy in Aeneid Book 2. Aeneid 11 describes
hard fighting and internal conflict in the Latin as-
sembly and is also thematically apropos. The choice
of Aeneid 4, the story of Dido, is less obvious.
Although Lucan uses material from this book for
several purposes, a significant complex of allusions
borrow notions of madness and ill rumor from the
Dido story to suggest ill-starred similarities between
Carthage and Rome. Thus, BC 1.676, attonitam
rapitur matrona per urbem (The [prophetic]
matron is swept through the awestruck city),
draws on Aeneid 4.666, concussam bacchatur Fama
per urbem (Rumor runs riot through the stunned
city) to suggest that Romans of the civil war
period were as mad and rumor driven as Dido
and her Carthaginians.
The wealth of allusions to the less-studied Book 3
of the Aeneid gives further clues to Lucan’s unique
reading of Vergil. One significant strand of Lucan’s
use of this book involves reversing its optimistic
prophecies of a new land for the Trojans in order
to suggest the woeful future in store for the Romans.
Thus, in a parallel identified only by Tesserae, Vergil
uses the image of Sicily’s separation from Italy at the
straits of Messina to foretell Aeneas’ successful jour-
ney to found Rome (Aeneid 3.418), an image Lucan
recalls and reverses when he depicts Sicily rejoined
to Italy in an eruption of Mt Aetna as a portent of
the coming war (BC 1.547).
3.4 Parallels by location in the
referring text
We combined the automated results with those
collated from commentaries to ask what large-scale
patterns could be seen in Lucan’s use of allusion
Fig. 3 Types 4–5 parallels reported by Tesserae and four
commentators. Tesserae returned a significant number of
matches unremarked by commentators.
The Tesserae Project
Literary and Linguistic Computing, 2012 5of 8
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from
within his own poem. Figure 5 shows Type
3–5 parallels by location in BC 1. Again, the
baseline of code model references is relatively con-
stant, punctuated by clusters of more significant
allusions.
Lucan clusters significant references throughout
the opening and closing sections, and in establishing
the principal characters: at the outset of Lucan’s
text, where he sets out his theme and the artistic
program for the work; in the opening descriptions
of Caesar and Pompey, the principal belligerents;
and in the prophecy of the matrona, which closes
the book. In contrast, at the heart of Lucan’s praise
of Nero (Lines 39–59), we find a pause in references
to the Aeneid. Here, Lucan forgoes an obvious
opportunity to ennoble Nero by association with
the grandeur of the epic tradition, and instead
creates a prosaic tone that flattens what should be
the culmination of his praise.
Consideration of large-scale patterns also reveals
how Lucan uses allusion to structure his narrative.
He shows a tendency to cluster references at the
beginning and ending of sections. More specifically,
he often closes a section with a Vergilian allusion,
capped by his own pithy or moralizing statement.
The next scene then opens with a fresh allusion to
anchor and authorize it in the Vergilian tradition.
Thus, in the transition from Rome’s decline to
Caesar’s delay at the Rubicon (BC 1.178–205),
Lucan describes the prevalence of bribery (1.178)
using language from Vergil’s depiction of sinners
in the underworld (Aeneid 6.622). He closes the
scene with his own vision of avidity leading to war
(1.82), before opening his section on Caesar’s march
123456789101112
TYPE 5
TYPE 4
TYPE 3
TYPE 3−5 PARALLELS BY BOOK IN AENEID
AENEID BOOK NUMBER
NUMBER OF PARALLELS
0102030405060
Fig. 4 All Types 3–5 parallels by book in the Aeneid.
N. Coffee et al.
6of 8Literary and Linguistic Computing, 2012
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from
with several new references to the Aeneid. Lucan
draws on Vergil’s authority to bring density of
meaning to his transitions, yet he reserves the cru-
cial end of the section to finish with his own master
strokes.
3.5 Lucan’s BC 1 and the Aeneid
Do the results of automated allusion detection
change our understanding of Lucan’s relationship
to the Aeneid? A full answer to this question will
require analysis of the remaining books of Lucan’s
epic, but our results provide some initial responses.
In existing scholarship, Lucan’s references to the
Aeneid have generally been taken as oppositional,
subverting the imagery and language of Rome’s
founding to suggest that the construction of
empire inevitably becomes a corrupt enterprise.
Our study supports this picture, but also adds im-
portant detail. The constancy of Type 3 parallels
shows to what degree Lucan relied on Vergil even
for the basic idiom of epic. At the same time, Lucan
uses allusions to frame scenes, uses clusters of allu-
sions to different themes within Vergil’s poem, and
shifts markedly from allusions to the Aeneid in favor
of allusions to other works in his praise of Nero.
These gestures all represent distinctive patterns in
Lucan’s large-scale use of meaningful allusions for
artistic effect.
345
700 600 500 400 300 200 100 0
TYPE 3−5 PARALLELS BY POSITION IN BC 1
PARALLEL TYPE
BC LINE NUMBER
proem and apostrophe to Rome
praise of Nero
causes of war
description of Caesar and Pompey
Rome’s moral decline
Caesar at the Rubicon
panic at Ariminum
speech of Curio
speech of Caesar
speech of Laelius
list of Gallic tribes unguarded
evacuation of Rome
prodigies
purification of Rome
Figulus’ astrology
matrona’s prophecy
Fig. 5 All Types 3–5 parallels by line in the BC, Book 1.
The Tesserae Project
Literary and Linguistic Computing, 2012 7of 8
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from
4Future Work
The process of evaluating each of the 3,000 results
collected by the Tesserae program and commentators
has created a benchmark set of parallels, including
positive and negative examples, for training and test-
ing future algorithms. It has also given us insight into
which new feature sets would allow us to capture the
greatest number of allusions currently missed by the
program. Among these are the ability to match syno-
nyms and the ability to match paragraph-level con-
text, both of which seem to turn on semantics.
Although sound-based allusions were not prevalent
among the current test set, other examples have con-
vinced us that the ability to match on character-level
similarities and metrical shape would bring in add-
itional high-grade allusions. As such feature sensitiv-
ity is incorporated, automatic detection of allusion,
and of style and theme generally, will increasingly
come to replicate the results of traditional scholar-
ship and open up further new perspectives on literary
meaning and artistry.
Funding
This work was supported by funding from the
Digital Humanities Initiative at Buffalo for its
Textual Analysis Working Group; and from the
University at Buffalo’s Department of Classics.
References
Bamman, D. and Crane, G. (2008). The Logic and
Discovery of Textual Allusion, Proceedings of the Second
Workshop on Language Technology for Cultural Heritage
Data (LaTeCH 2008). Marrakesh, Morocco.
Bu
¨chler, M., Geßner, A., Eckart, T., and Heyer, G.
(2010). Unsupervised detection and visualisation of
textual reuse on ancient Greek texts. Journal of the
Chicago Colloquium on Digital Humanities and
Computer Science,1(2).
Conte, G. B. (1986). The Rhetoric of Imitation: Genre and
Poetic Memory in Virgil and Other Latin Poets. Ithaca,
NY: Cornell University Press.
Edmunds, L. (2001). Intertextuality and the Reading of
Roman Poetry. Baltimore, MD: Johns Hopkins
University Press.
Heitland, W. E. and Haskins, C. E. (1887). M. Annaei
Lucani Pharsalia. London: G. Bell.
Hinds, S. (1998). Allusion and Intertext: The Dynamics of
Appropriation in Roman Poetry. New York: Cambridge
University Press.
Horton, R., Olsen, M., and Roe, G. (2010). Something
borrowed: sequence alignment and the identification
of similar passages in large text collections. Digital
Studies / Le champ nume
´rique, 2(1).
Roche, P. (2009). Lucan: De Bello Civili: Book 1. Oxford:
Oxford University Press.
Thomas, R. F. (1986). Virgil’s georgics and the art of
reference. Harvard Studies in Classical Philology,90:
171–98.
Thompson, L. and Brue
`re, R. T. (1968). Lucan’s use of
Vergilian reminiscence. Classical Philology,63: 1–21.
Viansino, G. (1995). Marco Annaeo Lucano: La Guerra
Civile Volume 1 Libri I-V. Milan: Mondadori.
N. Coffee et al.
8of 8Literary and Linguistic Computing, 2012
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from