ArticlePDF Available

Abstract and Figures

Tesserae is a web-based tool for automatically detecting allusions in Latin poetry. Although still in the start-up phase, it already is capable of identifying significant numbers of known allusions, as well as similar numbers of allusions previously unnoticed by scholars. In this article, we use the tool to examine allusions to Vergil’s Aeneid in the first book of Lucan’s Civil War. Approximately 3,000 linguistic parallels returned by the program were compared with a list of known allusions drawn from commentaries. Each was examined individually and graded for its literary significance, in order to benchmark the program’s performance. All allusions from the program and commentaries were then pooled in order to examine broad patterns in Lucan’s allusive techniques which were largely unapproachable without digital methods. Although Lucan draws relatively constantly from Vergil’s generic language in order to maintain the epic idiom, this baseline is punctuated by clusters of pointed allusions, in which Lucan frequently subverts Vergil’s original meaning. These clusters not only attend the most significant characters and events but also play a role in structuring scene transitions. Work is under way to incorporate the ability to match on word meaning, phrase context, as well as metrical and phonological features into future versions of the program.
Content may be subject to copyright.
The Tesserae Project: intertextual
analysis of Latin poetry
............................................................................................................................................................
Neil Coffee, Jean-Pierre Koenig, Shakthi Poornima,
Christopher W. Forstall, Roelant Ossewaarde and
Sarah L. Jacobson
The University at Buffalo, The State University of New York, USA
.......................................................................................................................................
Abstract
Tesserae is a web-based tool for automatically detecting allusions in Latin poetry.
Although still in the start-up phase, it already is capable of identifying significant
numbers of known allusions, as well as similar numbers of allusions previously
unnoticed by scholars. In this article, we use the tool to examine allusions to
Vergil’s Aeneid in the first book of Lucan’s Civil War. Approximately 3,000 lin-
guistic parallels returned by the program were compared with a list of known
allusions drawn from commentaries. Each was examined individually and graded
for its literary significance, in order to benchmark the program’s performance.
All allusions from the program and commentaries were then pooled in order to
examine broad patterns in Lucan’s allusive techniques which were largely
unapproachable without digital methods. Although Lucan draws relatively con-
stantly from Vergil’s generic language in order to maintain the epic idiom, this
baseline is punctuated by clusters of pointed allusions, in which Lucan frequently
subverts Vergil’s original meaning. These clusters not only attend the most sig-
nificant characters and events but also play a role in structuring scene transitions.
Work is under way to incorporate the ability to match on word meaning, phrase
context, as well as metrical and phonological features into future versions of the
program.
.................................................................................................................................................................................
1Introduction
The study of allusion has grown to become a core
interest of classical—particularly Latin—literary
studies over the past several decades. Beyond
simply documenting instances of textual reuse,
scholars such as Conte (1986),Hinds (1998), and
Edmunds (2001) have enlarged the scope in which
allusion is understood to create meaning and pre-
sented several theoretical models for how allusion is
both written and read.
A number of recent digital humanities projects
have examined various aspects of text reuse.
Bamman and Crane (2008) presented a model for
identifying allusions based on multiple parameters
and detailed their methods for measuring two texts’
similarity to words, word order, and syntax. Horton
et al. (2010) created an algorithm for detecting text
reuse in French and other languages, based solely on
string similarity, which they have released under an
open source licence. Bu
¨chler et al. (2010) examined
larger scale patterns of text reuse in the treatment of
Plato by later Greek authors. Tesserae draws on
these and other projects for models, yet distin-
guishes itself as an integrated effort to develop allu-
sion detection software, undertake detailed case
studies, and bring the understanding of allusion to
a non-specialist audience.
Correspondence:
Neil Coffee, Department
of Classics, 338 MFAC,
University at Buffalo,
Buffalo, NY 14261-0026,
USA.
E-mail:
ncoffee@buffalo.edu
Literary and Linguistic Computing ßThe Author 2012. Published by Oxford University Press on behalf of ALLC.
All rights reserved. For Permissions, please email: journals.permissions@oup.com
1of 8
doi:10.1093/llc/fqs033
Literary and Linguistic Computing Advance Access published July 20, 2012
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from
Users of Tesserae’s web-based interface select two
texts from simple drop-down lists (Fig. 1). A list of
parallel phrases is then returned (Fig. 2); this may
be downloaded as an XML document or a list of
comma separated values. The current version of the
program is already online and freely accessible
(http://tesserae.caset.buffalo.edu) and has received
positive feedback from practising scholars of Latin
allusion, including writers of textual commentaries
who customarily note allusions.
In the remainder of this article, we present some
preliminary results from our application of the cur-
rent version of the search tool to a case study of
the Roman poet Lucan. Lucan was a poet of the
time of Nero and left unfinished at his death an
8,000-line epic on the subject of Rome’s civil war
known as the Bellum Civile (BC). In writing such an
epic, it would have been impossible for Lucan to
avoid comparison with the figure of Vergil,
approximately 100 years his senior, whose monu-
mental work, the Aeneid, had already become a clas-
sic. Lucan’s relationship with his predecessor is far
from simple: at times he relies on and reinforces
Vergil’s authority; at times he draws out ambiguity
and paradox latent in Vergil’s work; and at times he
deliberately opposes Vergil’s artistic and ideological
programs.
We formulated five questions to frame our
analysis:
(1) How often does Lucan refer to the Aeneid?
(2) What kinds of reference does he make?
(3) Where in the Aeneid does he turn most often,
and for what kinds of references?
(4) How are these references distributed within
Lucan’s text?
(5) How do these results change our present
understanding of the relationship between
the BC and the Aeneid?
Fig. 1 Tesserae user interface.
N. Coffee et al.
2of 8Literary and Linguistic Computing, 2012
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from
2Method
Tesserae considers two passages from different
poems to constitute a parallel if they share two or
more words. Results reported here combine the
output of two successive versions of the program,
an earlier one in which a passage was any six
consecutive words, and a later one which divided
the text into grammatical phrases based on editorial
punctuation. Word order and syntax were not con-
sidered. Word identity was judged not only by
the word’s form in the text but also by its dictionary
headword. We used the Archimedes Morphology
Service of the Max Planck Institute for the
History of Science (http://archimedes.mpiwg-
berlin.mpg.de/arch/doc/xml-rpc.html) to retrieve
headword information for our texts. Texts them-
selves were drawn from the Latin Library (http://
thelatinlibrary.com) and the Perseus Project
(http://www.perseus.tufts.edu).
To explore the contact between Lucan and
Vergil, we examined a list of parallels between the
BC and the Aeneid. We concentrated our attention
on BC Book 1 (695 lines), considering parallels
found anywhere in the entirety of the Aeneid
(9,896 lines). We ran Tesserae on these texts, then
compared the results with a list of parallels collated
from four modern commentaries: Heitland and
Haskins (1887),Thompson and Brue
`re (1968),
Viansino (1995), and Roche (2009).
Each parallel identified either by the program or
by the commentators was examined individually
and given a type number between 1 and 5 according
to its literary significance. Although this was neces-
sarily a subjective procedure, we formulated a gen-
eral set of criteria for our classification (Table 1).
The principal distinction was between meaningful
(Types 3–5) and not meaningful (Types 1 and 2)
parallels. This distinction follows the argument of
Thomas (1986, p. 117) that references either are or
Fig. 2 Tesserae results.
The Tesserae Project
Literary and Linguistic Computing, 2012 3of 8
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from
are not ‘susceptible to interpretation or meaning-
ful’. The set of meaningful parallels was further
divided into those that simply reused distinctive
language, and those that in doing so created new
literary significance. Conte (1986, p. 31) proposed
that an earlier work could provide either a ‘code
model’ or an ‘exemplary model’ for a later one. In
the first case, the model as a whole defines the idiom
in which the later text speaks. In the second case, the
referring author directs the reader’s attention to a
particular moment in the earlier work. This distinc-
tion separates our Type 3 from Types 4 and 5. The
final distinction, between Types 4 and 5, less and
more significant allusions, was the most subjective.
Other schemas are possible, but ours proved useful
for broadly categorizing parallels to analyze the
large-scale questions posed above, to which we
now turn.
3Results
3.1 Numbers of parallels
The automated search returned a list of 3,100
parallels across all types, while the combined efforts
of the four commentaries produced 419 parallels of
Types 2–5. A comparison of results by type is given
in Table 2. The number of Type 3–5 parallels
returned by the program was comparable to the
work of the commentators, but the program re-
ported vastly more of Types 1 and 2 than did the
commentaries. These results show that, with manual
examination of the program’s output to filter out
false positives, our automated search can already
identify a significant portion of the parallels most
interesting to literary scholars. Comparing the
program to individual commentaries, we see
that for interpretable allusions (Types 4 and 5), it
reports 103 to Viansino’s 48, but still fewer than
Roche’s 151.
These numbers tell only half the story, however.
Although Tesserae returned numbers of valuable
parallels at similar rates to the commentators, the
parallels themselves were often different from those
found by the commentators. Only half of the
Table 1 The schema used to grade parallels reported by Tesserae and the commentaries
Meaningful Not meaningful
Interpretable Not interpretable
More significant Less significant
543 21
High formal simi-
larity to analogous
context
Moderate formal
similarity to
analogous
context, or
High formal
similarity in
moderately
analogous context
High/moderate formal similarity
to very common phrase or
words, or
High/moderate formal similarity
to no analogous context, or
Moderate formal similarity to
moderate/highly analogous
context
Very common
words in very
common phrase, or
Words too distant
to form a phrase
Error in discovery
algorithm, words
should not have
matched
Table 2 All parallels reported by Tesserae and four
commentaries, by type
Type Tesserae Commentaries Total
All Roche Viansino T and B H and H
1 486 0 0 0 0 0 486
2 2,241 55 50 8 1 1 2,289
3 280 192 168 33 13 6 425
4 57 79 66 18 12 3 115
5 36 93 85 30 14 4 103
Total 3,100 419 369 89 40 14 3,418
The commentaries used were Roche (2009), Viansino (1995),
Thompson and Brue
`re (1968) and Heitland and Haskings
(1887). In adding columns, each unique parallel is only counted
once; combined totals may be less than the sum of individual
values.
N. Coffee et al.
4of 8Literary and Linguistic Computing, 2012
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from
interpretable allusions detected by Tesserae were
listed in the commentaries (Fig. 3). Thus, although
Tesserae returned only 25% of the commentator’s
allusions, it also increased the total number of allu-
sions found by 25%.
3.2 Parallels by type
The most obvious difference between our auto-
mated search and the commentaries was the
number of less meaningful parallels returned.
Among the commentaries, there is already a trend
in this direction, with Roche (2009) surpassing his
predecessors in the number of Type 2 and 3 parallels
reported. Unlike the other commentators, Roche
examined only Book 1 of Lucan’s poem, effectively
concentrating his efforts. He also used digital
searches along with more traditional philological
tools. These methods enabled Roche to look
beyond the exemplary model allusions most familiar
to Latinists and begin to represent the level of code
model reference which underwrites Lucan’s posture
as an epic poet. Tesserae expands this perspective
considerably.
In what proportions does Lucan use the various
types of parallels? Combining results from Tesserae
and the commentators, we start to get a compre-
hensive picture of the author’s practice. The data
presented in Table 2 suggest that in BC 1, Lucan
relies on Vergil’s generic epic language about twice
as frequently as he alludes to specific passages in the
Aeneid.
3.3 Parallels by location in source text
Lucan does not draw evenly from all books of the
Aeneid.Figure 4 shows the distribution of all paral-
lels in the Aeneid, by type. Although Lucan draws
relatively evenly on all books of the Aeneid for Type
3 parallels, he clearly favors certain books for Types
4 and 5. His most meaningful allusions are drawn
above all from Aeneid 2, followed by Books 4, 11,
and 3.
It is natural that, in presenting the destruction of
Rome as the major theme of the Roman civil war,
Lucan should draw upon Vergil’s portrayal of the
fall of Troy in Aeneid Book 2. Aeneid 11 describes
hard fighting and internal conflict in the Latin as-
sembly and is also thematically apropos. The choice
of Aeneid 4, the story of Dido, is less obvious.
Although Lucan uses material from this book for
several purposes, a significant complex of allusions
borrow notions of madness and ill rumor from the
Dido story to suggest ill-starred similarities between
Carthage and Rome. Thus, BC 1.676, attonitam
rapitur matrona per urbem (The [prophetic]
matron is swept through the awestruck city),
draws on Aeneid 4.666, concussam bacchatur Fama
per urbem (Rumor runs riot through the stunned
city) to suggest that Romans of the civil war
period were as mad and rumor driven as Dido
and her Carthaginians.
The wealth of allusions to the less-studied Book 3
of the Aeneid gives further clues to Lucan’s unique
reading of Vergil. One significant strand of Lucan’s
use of this book involves reversing its optimistic
prophecies of a new land for the Trojans in order
to suggest the woeful future in store for the Romans.
Thus, in a parallel identified only by Tesserae, Vergil
uses the image of Sicily’s separation from Italy at the
straits of Messina to foretell Aeneas’ successful jour-
ney to found Rome (Aeneid 3.418), an image Lucan
recalls and reverses when he depicts Sicily rejoined
to Italy in an eruption of Mt Aetna as a portent of
the coming war (BC 1.547).
3.4 Parallels by location in the
referring text
We combined the automated results with those
collated from commentaries to ask what large-scale
patterns could be seen in Lucan’s use of allusion
Fig. 3 Types 4–5 parallels reported by Tesserae and four
commentators. Tesserae returned a significant number of
matches unremarked by commentators.
The Tesserae Project
Literary and Linguistic Computing, 2012 5of 8
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from
within his own poem. Figure 5 shows Type
3–5 parallels by location in BC 1. Again, the
baseline of code model references is relatively con-
stant, punctuated by clusters of more significant
allusions.
Lucan clusters significant references throughout
the opening and closing sections, and in establishing
the principal characters: at the outset of Lucan’s
text, where he sets out his theme and the artistic
program for the work; in the opening descriptions
of Caesar and Pompey, the principal belligerents;
and in the prophecy of the matrona, which closes
the book. In contrast, at the heart of Lucan’s praise
of Nero (Lines 39–59), we find a pause in references
to the Aeneid. Here, Lucan forgoes an obvious
opportunity to ennoble Nero by association with
the grandeur of the epic tradition, and instead
creates a prosaic tone that flattens what should be
the culmination of his praise.
Consideration of large-scale patterns also reveals
how Lucan uses allusion to structure his narrative.
He shows a tendency to cluster references at the
beginning and ending of sections. More specifically,
he often closes a section with a Vergilian allusion,
capped by his own pithy or moralizing statement.
The next scene then opens with a fresh allusion to
anchor and authorize it in the Vergilian tradition.
Thus, in the transition from Rome’s decline to
Caesar’s delay at the Rubicon (BC 1.178–205),
Lucan describes the prevalence of bribery (1.178)
using language from Vergil’s depiction of sinners
in the underworld (Aeneid 6.622). He closes the
scene with his own vision of avidity leading to war
(1.82), before opening his section on Caesar’s march
123456789101112
TYPE 5
TYPE 4
TYPE 3
TYPE 3−5 PARALLELS BY BOOK IN AENEID
AENEID BOOK NUMBER
NUMBER OF PARALLELS
0102030405060
Fig. 4 All Types 3–5 parallels by book in the Aeneid.
N. Coffee et al.
6of 8Literary and Linguistic Computing, 2012
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from
with several new references to the Aeneid. Lucan
draws on Vergil’s authority to bring density of
meaning to his transitions, yet he reserves the cru-
cial end of the section to finish with his own master
strokes.
3.5 Lucan’s BC 1 and the Aeneid
Do the results of automated allusion detection
change our understanding of Lucan’s relationship
to the Aeneid? A full answer to this question will
require analysis of the remaining books of Lucan’s
epic, but our results provide some initial responses.
In existing scholarship, Lucan’s references to the
Aeneid have generally been taken as oppositional,
subverting the imagery and language of Rome’s
founding to suggest that the construction of
empire inevitably becomes a corrupt enterprise.
Our study supports this picture, but also adds im-
portant detail. The constancy of Type 3 parallels
shows to what degree Lucan relied on Vergil even
for the basic idiom of epic. At the same time, Lucan
uses allusions to frame scenes, uses clusters of allu-
sions to different themes within Vergil’s poem, and
shifts markedly from allusions to the Aeneid in favor
of allusions to other works in his praise of Nero.
These gestures all represent distinctive patterns in
Lucan’s large-scale use of meaningful allusions for
artistic effect.
345
700 600 500 400 300 200 100 0
TYPE 3−5 PARALLELS BY POSITION IN BC 1
PARALLEL TYPE
BC LINE NUMBER
proem and apostrophe to Rome
praise of Nero
causes of war
description of Caesar and Pompey
Rome’s moral decline
Caesar at the Rubicon
panic at Ariminum
speech of Curio
speech of Caesar
speech of Laelius
list of Gallic tribes unguarded
evacuation of Rome
prodigies
purification of Rome
Figulus’ astrology
matrona’s prophecy
Fig. 5 All Types 3–5 parallels by line in the BC, Book 1.
The Tesserae Project
Literary and Linguistic Computing, 2012 7of 8
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from
4Future Work
The process of evaluating each of the 3,000 results
collected by the Tesserae program and commentators
has created a benchmark set of parallels, including
positive and negative examples, for training and test-
ing future algorithms. It has also given us insight into
which new feature sets would allow us to capture the
greatest number of allusions currently missed by the
program. Among these are the ability to match syno-
nyms and the ability to match paragraph-level con-
text, both of which seem to turn on semantics.
Although sound-based allusions were not prevalent
among the current test set, other examples have con-
vinced us that the ability to match on character-level
similarities and metrical shape would bring in add-
itional high-grade allusions. As such feature sensitiv-
ity is incorporated, automatic detection of allusion,
and of style and theme generally, will increasingly
come to replicate the results of traditional scholar-
ship and open up further new perspectives on literary
meaning and artistry.
Funding
This work was supported by funding from the
Digital Humanities Initiative at Buffalo for its
Textual Analysis Working Group; and from the
University at Buffalo’s Department of Classics.
References
Bamman, D. and Crane, G. (2008). The Logic and
Discovery of Textual Allusion, Proceedings of the Second
Workshop on Language Technology for Cultural Heritage
Data (LaTeCH 2008). Marrakesh, Morocco.
Bu
¨chler, M., Geßner, A., Eckart, T., and Heyer, G.
(2010). Unsupervised detection and visualisation of
textual reuse on ancient Greek texts. Journal of the
Chicago Colloquium on Digital Humanities and
Computer Science,1(2).
Conte, G. B. (1986). The Rhetoric of Imitation: Genre and
Poetic Memory in Virgil and Other Latin Poets. Ithaca,
NY: Cornell University Press.
Edmunds, L. (2001). Intertextuality and the Reading of
Roman Poetry. Baltimore, MD: Johns Hopkins
University Press.
Heitland, W. E. and Haskins, C. E. (1887). M. Annaei
Lucani Pharsalia. London: G. Bell.
Hinds, S. (1998). Allusion and Intertext: The Dynamics of
Appropriation in Roman Poetry. New York: Cambridge
University Press.
Horton, R., Olsen, M., and Roe, G. (2010). Something
borrowed: sequence alignment and the identification
of similar passages in large text collections. Digital
Studies / Le champ nume
´rique, 2(1).
Roche, P. (2009). Lucan: De Bello Civili: Book 1. Oxford:
Oxford University Press.
Thomas, R. F. (1986). Virgil’s georgics and the art of
reference. Harvard Studies in Classical Philology,90:
171–98.
Thompson, L. and Brue
`re, R. T. (1968). Lucan’s use of
Vergilian reminiscence. Classical Philology,63: 1–21.
Viansino, G. (1995). Marco Annaeo Lucano: La Guerra
Civile Volume 1 Libri I-V. Milan: Mondadori.
N. Coffee et al.
8of 8Literary and Linguistic Computing, 2012
by guest on July 28, 2012http://llc.oxfordjournals.org/Downloaded from
... Traditionally, the identification of such parallels has largely relied on scholars' close reading. However, recent years have seen the development of statistical NLP tools -driven especially by the Tesserae project (Coffee et al., 2012; at the forefront of this movement -that are able to automatically uncover a considerable number of textual parallels. These approaches, however, typically rely on string-level parallels and are grounded in carefully designed rules and scoring functions. ...
... Detecting Intertextual Allusions. Initiated in 2008, the Tesserae project (Coffee et al., 2012; has been instrumental in advancing the automatic detection of intertextuality in Latin and Greek texts. Their open-source tools have seen numerous enhancements and refinements over the years. ...
... Their open-source tools have seen numerous enhancements and refinements over the years. 2 Existing research has explored matching words or stems (Coffee et al., 2012) as well as methods that focus on semantics (Scheirer et al., 2014). Additionally, techniques that combine both lexical and semantic elements have been examined, where semantic understanding is established through word embeddings (Manjavacas et al., 2019) or via the (Ancient Greek) WordNet (Bizzoni et al., 2014). ...
Preprint
Intertextual allusions hold a pivotal role in Classical Philology, with Latin authors frequently referencing Ancient Greek texts. Until now, the automatic identification of these intertextual references has been constrained to monolingual approaches, seeking parallels solely within Latin or Greek texts. In this study, we introduce SPhilBERTa, a trilingual Sentence-RoBERTa model tailored for Classical Philology, which excels at cross-lingual semantic comprehension and identification of identical sentences across Ancient Greek, Latin, and English. We generate new training data by automatically translating English texts into Ancient Greek. Further, we present a case study, demonstrating SPhilBERTa's capability to facilitate automated detection of intertextual parallels. Our models and resources are available at https://github.com/Heidelberg-NLP/ancient-language-models.
... 3 https://tesserae.caset.buffalo.edu/. See also Coffee et al. (2013). 4 https://github.com/ARTFL-Project/text-pair/. ...
Article
Full-text available
Text reuse, encompassing direct citations, paraphrases and allusions, represents a key aspect of intertextuality – a concept central to literary theory since the 1960s. This paper highlights how computational methods, particularly automatic text-reuse detection, can illuminate the complex system of intertextual exchange that informs 18th-century literary culture, focusing on significant works like the Encyclopédie and Voltaire's correspondence. By employing advanced techniques such as sequence alignment and social network analysis, we uncover hidden patterns of influence, citation strategies and the subtle interplay between originality and imitation in Enlightenment literature. The paper also considers the implications of these findings for modern understandings of authorship, originality and textuality, drawing connections to contemporary digital humanities practices. The paper ultimately aims to recontextualise the Enlightenment as a period of intense intertextual productivity, where the reuse of texts was not merely a scholarly exercise but a dynamic and essential component of literary creation.
... 11 https://www.etrap.eu/research/tracer/. See alsoBüchler et al. (2014) andFranzini et al. (2019). 12 https://github.com/tesserae/tesserae/. See alsoCoffee et al. (2013). 13 https://github.com/dasmiq/passim/. ...
Article
Full-text available
The European Research Council's ModERN project (Modelling Enlightenment: reassembling networks of modernity through data-driven research) is a pioneering five-year research initiative. This programme seeks to redefine the conventional understanding of 18th-century literary history by employing advanced data-modelling and analysis techniques. By developing a comprehensive corpus of 18th-century French texts and leveraging a range of data-science methodologies such as text-reuse detection and network analysis, the project aims to uncover novel research avenues and provide fresh insights into early-modern French print culture and its intertextual dynamics.In this report, we discuss some theoretical points underlying our research; we explain the choices made in constructing our corpus and their implications; and we present some case studies to show the potential of our research and the most prudent methodologies to adopt.
... In Coffee et al. (2012), the authors specifically sought to detect textual allusions by identifying shared words between two texts. They found this method was able to identify previously uncatalogued passages that may contain allusions. ...
... In Coffee et al. (2012), the authors specifically sought to detect textual allusions by identifying shared words between two texts. They found this method was able to identify previously uncatalogued passages that may contain allusions. ...
Preprint
Full-text available
This study explores the potential of large language models (LLMs) for identifying and examining intertextual relationships within biblical, Koine Greek texts. By evaluating the performance of LLMs on various intertextuality scenarios the study demonstrates that these models can detect direct quotations, allusions, and echoes between texts. The LLM's ability to generate novel intertextual observations and connections highlights its potential to uncover new insights. However, the model also struggles with long query passages and the inclusion of false intertextual dependences, emphasizing the importance of expert evaluation. The expert-in-the-loop methodology presented offers a scalable approach for intertextual research into the complex web of intertextuality within and beyond the biblical corpus.
... Therefore, we undertook the task of pre-detecting text reuse within the corpus in advance and incorporated the results into the platform. Unlike platforms that provide online services for text reuse browsing (Sturgeon, 2019) (Ctext) or retrieval (Coffee et al., 2012a) (Tesserae 11 ), Evol is equipped with hierarchical and multi-perspective tools. Within this module, users can select a collection of literature based on their interests. ...
Article
Full-text available
Quantitative cultural studies have witnessed a surge with the rapid development of computer technology in recent years. Since ancient literature constitutes a long-time-span repository for human culture, with quantitative methods and ancient texts, scholars can study the genesis and progression of human history and society across historical epochs from digital perspectives. Nevertheless, traditional humanities scholars often lack the requisite technical skills, creating a demand for interactive platforms. This paper introduces the Evol platform—an online tool designed for the quantitative analysis of ancient literature. Equipped with various analysis functions and visualization tools, the Evol platform allows users to quantify literary documents through intuitive online interaction. Using this platform, we investigated three cases of cultural evolution in ancient Chinese history: (1) the changing attitude of the government towards nomadic ethnic groups; (2) the formulation and propagation of an allusion phrase related to the Battle of Muye; (3) the influence of the Book of Changes across diverse cultural domains. By showcasing cases across diverse semantic units and topics, Evol demonstrates its potential in providing efficient and low-cost experimental tools catering to the realms of culturomics, history, and philology.
... The concept of text alignment for the purpose of comparison of similar texts has been present in Digital Humanities for a long time. Examples of it include the Versioning Machine [Schreibman et al., 2003], the Tesserae project for aligning Latin poetry [Coffee et al., 2013], innovative visualizations for medieval French poetry [Jänicke and Wrisley, 2017], or recently the Reception Reader for browsing text reuse in a corpus of early modern English publications [Rosson et al., 2023]. The quantitative study of intertextuality is an emerging research subject that builds upon automatic discovery of similarities in large collections of text or other media (see Forstall and Scheirer [2019] for a general introduction). ...
Article
Full-text available
The digitization of large archival collections of oral folk poetry in Finland and Estonia has opened possibilities for large-scale quantitative studies of intertextuality. As an initial methodological step in this direction, I present a method for pairwise line-by-line comparison of poems using the weighted sequence alignment algorithm (a.k.a. ‘weighted edit distance’). The main contribution of the paper is a novel description of the algorithm in terms of matrix operations, which allows for much faster alignment of a poem against the entire corpus by utilizing modern numeric libraries and GPU capabilities. This way we are able to compute pairwise alignment scores between all pairs from among a corpus of over 280,000 poems. The resulting table of over 40 million pairwise poem similarities can be used in various ways to study the oral tradition. Some starting points for such research are sketched in the latter part of the article.
Article
Full-text available
Cet article expose certains des défis qui ont émergé au cours des premières phases du projet Modern, programme de recherche financé par l’ERC (European Research Council, ou Conseil européen de la recherche) pour cinq ans, qui adopte une nouvelle approche partant des données (data driven) pour étudier l’histoire littéraire du siècle des Lumières. À partir d’un grand corpus de textes français du début de la période moderne, les auteurs détaillent les diverses étapes de la construction de réseaux intertextuels en se servant des résultats d’algorithmes de réutilisation de textes. De l’harmonisation du corpus et des métadonnées à l’entraînement d’un réseau neuronal pour filtrer les passages « bruités », cet article propose une chaîne de traitement pragmatique pour les projets similaires travaillant sur d’importantes collections de textes numérisés, tout en mettant en lumière les promesses ainsi que les périls de la recherche littéraire à grande échelle.
Article
The following article describes a simple technique to identify lexically-similar passages in large collections of text using sequence alignment algorithms. Primarily used in the field of bioinformatics to identify similar segments of DNA in genome research, sequence alignment has also been employed in many other domains, from plagiarism detection to image processing. While we have applied this approach to a wide variety of diverse text collections, we will focus our discussion here on the identification of similar passages in the famous 18th-century 'Encyclopédie' of Denis Diderot and Jean d’Alembert. Reference works, such as encyclopedias and dictionaries, are generally expected to “reuse” or “borrow” passages from many sources and Diderot and d’Alembert’s 'Encyclopédie' was no exception. Drawn from on an immense variety of source material, both French and non-French, many, if not most, of the borrowings that occur in the 'Encyclopédie' are not sufficiently identified (according to our standards of modern citation), or are only partially acknowledged in passing. The systematic identification of recycled passages can thus offer us a clear indication of the sources the 'philosophes' were exploiting as well as the extent to which the intertextual relations that accompanied its composition and subsequent reception can be explored. In the end, we hope this approach to “Encyclopedic intertextuality” using sequence alignment can broaden the discussion concerning the relationship of Enlightenment thought to previous intellectual traditions as well as its reuse in the centuries that followed.
Article
Preface List of abbreviations 1. Reflexivity: allusion and self-annotation 2. Interpretability: beyond philological fundamentalism 3. Diachrony: literary history and its narratives 4. Repetition and change 5. Tradition and self-fashioning Bibliography Index.
Article
We describe here a method for discovering imitative textual allusions in a large collec- tion of Classical Latin poetry. In translating the logic of literary allusion into computa- tional terms, we include not only traditional IR variables such as token similarity and n- grams, but also incorporate a comparison of syntactic structure as well. This provides a more robust search method for Classical lan- guages since it accomodates their relatively free word order and rich inflection, and has the potential to improve fuzzy string search- ing in other languages as well.
Article
The Classics Version of Record
Article
This thesis represents the first full-scale, English commentary on the opening book of Lucan's epic poem, De Bello Ciuili, in sixty-five years. Its fundamental purpose is to explain the language and content of the Latin text of the book. The subject matter of the thesis beyond the introduction is naturally dependent upon the content of each individual line under consideration, but the following questions may help establish some of the larger issues I have prioritised throughout my response to the Latin text of book one. These questions may be variously relevant to an episode within book one of De Bello Ciuili, or else a sentence, a line, a word, a metrical issue, or a combination of these. How does it help locate the text within the genre of epic? What does it contribute to the overall meaning of the poem? What does it contribute to our understanding of epic narrative technique? What does it contribute to our understanding of Lucan's poetic usage and technique? How does it interact with the rest of the poem (i.e. what are the structural or intratextual markers advertised and what do they contribute to the meaning of the passage under consideration or the structure of the book or poem as a whole)? How does it interact with its (especially epic) models (i.e. what intertextual markers are at work and how does the invocation of earlier models affect the meaning of the passage under consideration)? How does it behave in relation to what we know of the norms espoused by Classical literary criticism? What are the programmatic issues, themes, and images explored or established by book one? "20 October 2005." Thesis (Ph. D.)--University of Otago, 2006. Includes bibliographical references.