Figure 1 - uploaded by Jean-Pierre Koenig
Content may be subject to copyright.
Source publication
This paper describes a new digital approach to intertextual study involving
the creation of a free online tool for the automatic detection of parallel
phrases. A test comparison of Vergil’s Aeneid and Lucan’s Civil War shows that
the tool can identify a substantial number of meaningful intertexts, both previously
recorded and unrecorded. Analysis o...
Context in source publication
Context 1
... our ranking system to Tesserae parallels and those of the com- mentators produced the results depicted in Table 2. The table indicates that automatic discovery with manual examination of results can reveal substantial numbers of the parallels most significant to literary interpretation. Tesserae search added 279 meaningful parallels to the commentators' 364, an increase of 43%, for a total of 643 BC 1-Aeneid parallels detected (as seen in the type 5-3 totals). For interpretable (types 5-4) parallels, Tesserae word (43) and lemma (69) searches separately returned results comparable to those of Viansino (48), but fewer than half those of Roche (151), though the combined Tesserae total excluding duplicates (93) comes closer. Tesserae returns fewer meaningful results because it currently lacks sensitivity to features such as semantics and syntax. For the moment, the automatic search available on the Tesserae website can serve as a check on and complement to traditional methods. More signifi- cantly, it can identify parallels unrecognized by commentators in substantial numbers, as illustrated in Figure 1. The circle on the left represents the total of all type 5 and 4 parallels found by all commentators combined (172).The circle on the right represents all such parallels (93) found by both Tesserae search methods (exact word and lemma). The numbers within show, from left to right, parallels found only by commentators (125), by both Tesserae and commentators (47), and only by Tesserae (46). Although the Tesserae search captured only a quarter of the high interest type 5-4 parallels found by the commentators, this recovery rate nevertheless shows the capacity of auto- matic search to replicate traditional methods. Equally compelling is the fact that Tesserae search returned a high proportion of previously undiscovered parallels. This result suggests that the systematic application of fixed criteria can detect parallels traditional criticism may tend to overlook. 38 We can explore this last question in greater depth by considering which parts of a source text are referred to by the target text, or, in the case of our sample, how often intertexts from each book of the Aeneid appear in BC 1. Figure 2 gives the number of interpretable intertexts between BC 1 and in- 39 By the ordinary measure of standard deviation, Roche's parallels fluctuate three times as much as those of Tesserae or Viansino. Roche has a mean number of high value parallels per Aeneid book of 12.6, with a standard deviation of 6.7. For Tesserae the values are respectively 7.8 and 2.6, and Viansino 4 and 2.4. dividual books of the Aeneid, as found by each of three sources: combined Tesserae word and lemma search, Viansino, and Roche. The y-axis gives the number of parallels found, while the x-axis shows in which book of the Aeneid the parallel occurs. Each source finds more parallels to the first six books of the Aeneid than to the last six. Roche finds more parallels overall. The dis- tribution of Roche's parallels in the Aeneid varies more than those found by Tesserae and Viansino, as indicated by the large gap between the highest and lowest numbers of parallels Roche finds (28 to Aeneid book 2 vs. 5 each to Aeneid books 5 and 10). 39 Where all sources find consistent variation in intertextual connections with different books of the Aeneid, this likely represents a real difference in Lucan's practice. Just as Servius devotes a much greater portion of his Aeneid commentary to the first half of Vergil's epic than the second, Lucan may have had greater interest in, or use for, the first half of the Aeneid in composing BC 1. Both poet and commentator may have responded in a similar way to the work itself or been influenced by an inherited interpretive emphasis on Aeneid 1-6. We explore these possibilities further below when we combine results from all sources to obtain an overall picture of Lucan's intertextual ...
Similar publications
Tesserae is a web-based tool for automatically detecting allusions in Latin poetry. Although still in the start-up phase,
it already is capable of identifying significant numbers of known allusions, as well as similar numbers of allusions previously
unnoticed by scholars. In this article, we use the tool to examine allusions to Vergil’s Aeneid in t...
Citations
... As a manifestation of intertextual relationships, text reuse serves as quantitative evidence for various cultural studies themed on the similarity (Sturgeon, 2018b;Burns et al., 2021), influence (Büchler et al., 2013;Forstall et al., 2014), and evolution (Hartberg and Wilson, 2017;Duan et al., 2023) of literary works. The feasibility of this text analysis approach has been validated across different languages, including Latin (Coffee et al., 2012b), French (Ganascia et al., 2014), English (Smith et al., 2013), and ancient Chinese (Sturgeon, 2018a). The Evol platform employed the text reuse technique to effectively quantify and visually represent the instances of text reuse, thereby facilitating the identification and exploration of potential cultural phenomena. ...
Quantitative cultural studies have witnessed a surge with the rapid development of computer technology in recent years. Since ancient literature constitutes a long-time-span repository for human culture, with quantitative methods and ancient texts, scholars can study the genesis and progression of human history and society across historical epochs from digital perspectives. Nevertheless, traditional humanities scholars often lack the requisite technical skills, creating a demand for interactive platforms. This paper introduces the Evol platform—an online tool designed for the quantitative analysis of ancient literature. Equipped with various analysis functions and visualization tools, the Evol platform allows users to quantify literary documents through intuitive online interaction. Using this platform, we investigated three cases of cultural evolution in ancient Chinese history: (1) the changing attitude of the government towards nomadic ethnic groups; (2) the formulation and propagation of an allusion phrase related to the Battle of Muye; (3) the influence of the Book of Changes across diverse cultural domains. By showcasing cases across diverse semantic units and topics, Evol demonstrates its potential in providing efficient and low-cost experimental tools catering to the realms of culturomics, history, and philology.
... Nowadays, an opportunity arises to develop digital approaches to intertextual study. Coffee et al. (2012) at Buffalo University created a free online tool for the automatic detection of parallel phrases. The Tesserae tool can recognize numerous previously recorded and unrecorded intertexts. ...
Intertextuality is a productive means to enrich a literary text and establish a unique dialogue with the reader. The present study aims to identify and look into intertextual elements in Neil Gaiman's American Gods that reflect American culture and history and immerse the reader into the author's fictional world. The study expounds intertextual elements in Neil Gaiman's American Gods to allude to typical cultural and historical phenomena and engage with the reader by fully immersing them in a fictional world.
... Various natural language processing (NLP) methods have been applied to the intertextuality modelling of ancient literature. The previous automatic detection methods of text-level intertextuality aimed to discover similar phrases or sequences by lexical matching approach (Lee 2007;Coffee et al. 2012a;Coffee et al. 2012b;Ganascia et al. 2014;Forstall et al. 2015), which are insufficient and rigid in semantic modelling. The non-literal feature like synonym (Büchler et al. 2014;Moritz et al. 2016) and rhythm (Neidorf et al. 2019) also implies intertextuality, yet it requires language-specific design. ...
Being recognized among the cradles of human civilization, ancient China nurtured the longest continuous academic traditions and humanistic spirits, which continue to impact today’s society. With an unprecedented large-scale corpus spanning 3000 years, this paper presents a quantitative analysis of cultural evolution in ancient China. Millions of intertextual associations are identified and modelled with a hierarchical framework via deep neural network and graph computation, thus allowing us to answer three progressive questions quantitatively: (1) What is the interaction between individual scholars and philosophical schools? (2) What are the vicissitudes of schools in ancient Chinese history? (3) How did ancient China develop a cross-cultural exchange with an externally introduced religion such as Buddhism? The results suggest that the proposed hierarchical framework for intertextuality modelling can provide sound suggestions for large-scale quantitative studies of ancient literature. An online platform is developed for custom data analysis within this corpus, which encourages researchers and enthusiasts to gain insight into this work. This interdisciplinary study inspires the re-understanding of ancient Chinese culture from a digital humanities perspective and prompts the collaboration between humanities and computer science.
... Harnessing existing computational methods in the search for similarities between texts, the project will create a computing interface that will allow us to view and navigate through N-grams in the rabbinic and Christian texts. Such methods of "distant reading," using the terminology employed in digital humanities, have focused on identifying phrases or letter sequences in multiple locations that are exactly identical, or nearly so (Coffee et al., 2012;Shmidman et al., 2018). ...
The development of the two religions: Christianity and Judaism, is a topic of much debate. Whereas Judaism and Christianity are known as separate religions, in fact, these two religions developed side by side. While earlier researchers conceptualized a “parting-of-the-ways,” after which the two religions evolved independently, new studies reveal a multi-layered set of interactions throughout the first several centuries CE. Until recently, this question was explored with the limited source material and limited tools to analyze it. While working on a limited set of data, from a specific corpus, this project offers a new set of methodological tools, borrowed from computer sciences, that could ultimately serve for understanding the connections between Jews and Christians in late antiquity. We generated models of inter-religious Christian–Jewish networks that demonstrate the scope, nature, and advantages of network analysis for revealing the complex intertwined evolution of the two religions. The Jewish corpora chosen for this research are rabbinic writings from late antique Babylonia and Palestine. Christian texts range from the first through sixth centuries CE. Instead of representing interactions between people or places, as is typically done with social networks, we model literary interactions that, in our view, indicate historical connections between religious communities. This novel approach allows us to visually represent sets of temporal–spatial–contextual relationships, which evolved over hundreds of years, in single snapshots. It also reveals new insights about the relationships between the two communities. For example, we find that rabbinic sources exhibit a largely polemical approach towards earlier Christian traditions but a non-polemical attitude towards later ones. Moreover, network analysis suggests a temporal–spatial familiarity correlation. Namely, Jewish sources are familiar with early, eastern Christian sources and with both Eastern and Western Christian sources in later periods. The application of network analysis makes it possible to identify the most influential texts—that is, the key “nodes”—testifying to the importance of certain traditions for both religious communities. Finally, the network approach is a tool for pointing scholarly research in new directions, which only reveals itself as a result of this type of mapping. In other words, the network not only describes the known data, but it is itself a way to enlarge the network and lead us down new and exciting paths that are currently unknown.
... At the specific intersection of Classics and NLP, Latin has been the subject of several dependency treebanks (Bamman and Crane, 2006;Haug and Jøhndal, 2008;Passarotti and Dell'Orletta, 2010; and other lexico-semantic resources (Mambrini et al., 2020;Short, 2020), and is the focus of much work on individual components of the NLP pipeline, including lemmatizers, part-of-speech taggers, and morphological analyzers, among others (for overviews, see McGillivray (2014) and Burns (2019)). This work on corpus creation and annotation as well as the development of NLP tools has enabled literary-critical work on problems relevant to historical-language texts, including uncovering instances of intertextuality in Classical texts (Coffee et al., 2012;Moritz et al., 2016;Coffee, 2018) and stylometric research on genre and authorship (Dexter et al., 2017;Chaudhuri et al., 2019;Köntges, 2020;Storey and Mimno, 2020). ...
We present Latin BERT, a contextual language model for the Latin language, trained on 642.7 million words from a variety of sources spanning the Classical era to the 21st century. In a series of case studies, we illustrate the affordances of this language-specific model both for work in natural language processing for Latin and in using computational methods for traditional scholarship: we show that Latin BERT achieves a new state of the art for part-of-speech tagging on all three Universal Dependency datasets for Latin and can be used for predicting missing text (including critical emendations); we create a new dataset for assessing word sense disambiguation for Latin and demonstrate that Latin BERT outperforms static word embeddings; and we show that it can be used for semantically-informed search by querying contextual nearest neighbors. We publicly release trained models to help drive future work in this space.
... Another major difference to text comparison, in which texts are only marked as similar if they contain some identical words (see, e. g. [10,12,13,14]), is that paraphrase extraction can only suggest text passages to the humanities scholar that may be a paraphrase of the search query. In fact, there are cases where humanities scholars disagree on whether or not a text passage, however found, is a paraphrase of a particular original text. ...
... With a large vocabulary, this cache would be several terabytes in size. 12 The alternative that we have implemented, and which has been similarly proposed in [3], is to not do a precomputation, but only to cache the distances that are calculated during a actually asked search, as long as the designated storage is not full. Thus, distances once calculated need not be re-calculated in future searches. ...
In this paper, A shorter version of the paper appeared in German in the final report of the Digital Plato project which was funded by the Volkswagen Foundation from 2016 to 2019. [35], [28].
we present a method for paraphrase extraction in Ancient Greek that can be applied to huge text corpora in interactive humanities applications. Since lexical databases and POS tagging are either unavailable or do not achieve sufficient accuracy for ancient languages, our approach is based on pure word embeddings and the word mover’s distance (WMD) [20]. We show how to adapt the WMD approach to paraphrase searching such that the expensive WMD computation has to be computed for a small fraction of the text segments contained in the corpus, only. Formally, the time complexity will be reduced from \mathcal{O}(N\cdot {K^{3}}\cdot \log K) to \mathcal{O}(N+{K^{3}}\cdot \log K) , compared to the brute-force approach which computes the WMD between each text segment of the corpus and the search query. N is the length of the corpus and K the size of its vocabulary. The method, which searches not only for paraphrases of the same length as the search query but also for paraphrases of varying lengths, was evaluated on the Thesaurus Linguae Graecae ® (TLG ® ) [25]. The TLG consists of about 75\cdot {10^{6}} Greek words. We searched the whole TLG for paraphrases for given passages of Plato. The experimental results show that our method and the brute-force approach, with only very few exceptions, propose the same text passages in the TLG as possible paraphrases. The computation times of our method are in a range that allows its application in interactive systems and let the humanities scholars work productively and smoothly.
... Users can choose from over 700 Latin texts, which comprise almost all of the surviving corpus of classical Latin. The texts were originally digitized by the Perseus Digital Library and further developed by the Tesserae Project (Crane, 1996;Coffee et al., 2012). Texts can be selected by author, text, or book (roughly the ancient equivalent of a chapter). ...
... The existence of a book called "The Bellum Civile and Latin Love Elegy" would certainly appear in the bibliography of this study if it existed. [Coffee et al, 2012] remarks that traditional scholarly methods have avoided these kinds of comprehensive treatments of intertextuality because of the massive scholarly labor involved. Software is now available, however, to greatly reduce the procedural difficulty to which Coffee refers. ...
... In recent years, researchers at Tesserae have published a series of papers testing the assumptions of traditional Latin literary criticism against their algorithmic model ( [Coffee et al, 2012;Coffee et al, 2013;Forstall et al, 2015]). These papers have used the first book of Lucan's Bellum Civile as their target text and Virgil's Aeneid as their source text, evaluating the results of the automated tool against philological commentaries by assigning them, following [Thomas, 1986], values of "meaningful" and "not meaningful," as well as "interpretable" and "not interpretable." ...
... The Tesserae publications have confirmed the traditional scholarly view that Lucan's poetic diction draws significantly on Virgil. That said, this research has consistently pointed the way towards wider applicability of algorithmically based methods for the study of intertextuality: [Coffee, 2012] suggests that systematic collection and measurement of textual similarities using a tool like Tesserae can build an "intertextual 'fingerprint'," that can be used to make meaningful comparisons between the poetic practices of different authors. Important work on testing Tesserae search results is also being done by [Bernstein, 2013;Gervais, 2014;Bernstein, Gervais and Lin, 2015], who have concentrated on the platform's "macrophilological applications," that is ways in which the complete collection of search results for a given genre, author, or work can be used to draw conclusions, not about specific intertexts, but rather about larger patterns of intertextuality. ...
Most intertextuality in classical poetry is unmarked, that is, it lacks objective signposts to make readers aware of the presence of references to existing texts. Intergeneric relationships can pose a particular problem as scholarship has long privileged intertextual relationships between works of the same genre. This paper treats the influence of Latin love elegy on Lucan’s epic poem, Bellum Civile, by looking at two features of unmarked intertextuality: frequency and distribution. I use the Tesserae project to generate a dataset of potential intertexts between Lucan’s epic and the elegies of Tibullus, Propertius, and Ovid, which are then aggregrated and mapped in Lucan’s text. This study draws two conclusions: 1. measurement of intertextual frequency shows that the elegists contribute fewer intertexts than, for example, another epic poem (Virgil’s Aeneid), though far more than the scholarly record on elegiac influence in Lucan would suggest; and 2. mapping the distribution of intertexts confirms previous scholarship on the influence of elegy on the Bellum Civile by showing concentrations of matches, for example, in Pompey and Cornelia’s meeting before Pharsalus (5.722-815) or during the affair between Caesar and Cleopatra (10.53-106). By looking at both frequency and proportion, we can demonstrate systematically the generic enrichment of Lucan’s Bellum Civile with respect to Latin love elegy.
... The facility of both tools to capture morphological variants results in an improved signaltonoise ratio in comparison to sequence alignment's intentionally more indiscriminate method. The great success of Tesserae, for instance, was demonstrated in a 2012 study that reported a systematic tracing of the reuse of Vergil's Aeneid in Lucan's epic Pharsalia (Coffee et al. , 2012). ...
This paper describes the Quantitative Criticism Lab, a collaborative
initiative between classicists, quantitative biologists, and computer
scientists to apply ideas and methods drawn from the sciences to the study of
literature. A core goal of the project is the use of computational biology,
natural language processing, and machine learning techniques to investigate
authorial style, intertextuality, and related phenomena of literary
significance. As a case study in our approach, here we review the use of
sequence alignment, a common technique in genomics and computational
linguistics, to detect intertextuality in Latin literature. Sequence alignment
is distinguished by its ability to find inexact verbal similarities, which
makes it ideal for identifying phonetic echoes in large corpora of Latin texts.
Although especially suited to Latin, sequence alignment in principle can be
extended to many other languages.
... Such new avenues for specific intertextual interpretation are the typical results of Tesserae searches. Previous examples of comparable results can be found in a study of verbal reuse of Vergil's Aeneid by the epic poet Lucan [Coffee et al. 2012]. Coffee et al. hand-ranked all Tesserae results from a comparison of Lucan Bellum Civile 1 (target) and Vergil's Aeneid (source) on a 5-point scale of interpretive significance. ...
This paper presents a quantitative picture of the interactions between poets in the Latin hexameter tradition. The freely available Tesserae website (tesserae.caset.buffalo.edu) automatically searches pairs of texts in a corpus of over 300 works of Latin literature in order to identify instances where short passages share two or more repeated lexemes. We use Tesserae to survey relative rates of text reuse in 24 Latin hexameter works written from the 1 st century BCE to the 6 th century CE. We compare the quantitative information about text reuse provided by Tesserae to the scholarly tradition of qualitative discussion of allusion by Latinists. The detection and interpretation of allusion currently represent the dominant mode of study of Latin poetry. [1] The typical goal of intertextual study is to describe how links between texts affect the meaning of both the specific passages that contain them and the poems as a whole. Although intertextual associations may be signalled in many different ways (including similarity of action, character, or theme), verbal repetition, or text reuse, is the best studied and often the strongest type of signal. Philogical commentaries, copiously detailed collections of information on individual books of Latin epic poems, have been the traditional means for Latin poetry scholars to collect and present interpretations based on studies of text reuse. An example from Parkes' recent commentary on the fourth book of Statius' Thebaid demonstrates the practice of translating the evidence of verbal repetition into interpretation: [Statius, Thebaid 4.260] audaci Martis percussus amore ["struck by a bold desire for warfare" [2] ]: … The collocation percussus amore ["struck by a desire"] is not uncommon (compare e.g. Verg. G. 2.476, Hor. Epod. 11.2 amore percussum, and Nem. Cyn. 99) but Statius may be specifically recalling the ephebe Euryalus' reaction to Nisus' planned expedition at Verg. A. 9.197: magno laudum percussus amore ["struck by a great desire for glory"]…. Like Parthenopaeus, Euryalus is eager to brave danger for the chance of glory (A. 9.205-6), with similarly fatal results. [Parkes 2012, 164] This exemplary note builds its interpretation on the evidence of the repetition of two key lexemes, the verb percutio ("I strike") and the noun amor ("desire"). [3] The cooccurence of these lexemes in the Statian passage signifies for most readers a link to the passage from Vergil. The discovery of such verbal links has been facilitated in recent years by digital tools such as the freely available Tesserae web interface (tesserae.caset.buffalo.edu), a search program developed by Neil Coffee and a team at the University at Buffalo. Tesserae allows users to search pairs of texts (an earlier "source" text paired with a later "target" text) in a corpus of over 300 poetic and prose works, in order to discover every instance where short passages (either lines of verse or grammatical periods) share two or more repeated lexemes. Thus, a Tesserae search that pairs the Thebaid with the Aeneid permits the user to discover the allusion discussed by Parkes by identifying the repetition of the lexemes percutio and amor. The Tesserae scoring system signals the potential interpretive significance of the match by assigning it a high score, 8 out of approximately 11. [4] In addition, Tesserae identifies a second potential match (score = 7) between Thebaid 4.260 and another passage from the Aeneid: Statius, Thebaid 4.260 prosilit audaci Martis percussus amore ("Parthenopaeus leapt up, struck by a bold desire for warfare"). Vergil, Aeneid 7.550 accendamque animos insani Martis amore ("I'll inflame their minds with a desire for mad warfare").