ArticlePDF Available

Modelling Enlightenment: reassembling intertextual networks through data-driven research (ModERN)

Authors:

Abstract and Figures

The European Research Council's ModERN project (Modelling Enlightenment: reassembling networks of modernity through data-driven research) is a pioneering five-year research initiative. This programme seeks to redefine the conventional understanding of 18th-century literary history by employing advanced data-modelling and analysis techniques. By developing a comprehensive corpus of 18th-century French texts and leveraging a range of data-science methodologies such as text-reuse detection and network analysis, the project aims to uncover novel research avenues and provide fresh insights into early-modern French print culture and its intertextual dynamics.In this report, we discuss some theoretical points underlying our research; we explain the choices made in constructing our corpus and their implications; and we present some case studies to show the potential of our research and the most prudent methodologies to adopt.
Content may be subject to copyright.
Modelling Enlightenment: reassembling
intertextual networks through data-driven
research (ModERN)
Dario Maria Nicolosi, Sorbonne Université, dario.nicolosi.92@gmail.com
Glenn Roe, Sorbonne Université, glenn.roe@sorbonne-universite.fr
Nicolosi D. M. and Roe G. 2024. ‘Modelling Enlightenment: reassembling intertextual networks through data-driven
research (ModERN)’. In: Digital Enlightenment Studies 2, 62–84.
DOI: 10.61147/des.22
Digital Enlightenment Studies is a peer-reviewed open access journal. © 2024 The Author(s). This is an open-access article distributed
under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use,
distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.
org/licenses/by/4.0/.
OPEN ACCESS
The European Research Council’s ModERN project (Modelling Enlightenment: reassembling networks of
modernity through data-driven research) is a pioneering ve-year research initiative. This programme
seeks to redene the conventional understanding of 18th-century literary history by employing
advanced data-modelling and analysis techniques. By developing a comprehensive corpus of
18th-century French texts and leveraging a range of data-science methodologies such as text-reuse
detection and network analysis, the project aims to uncover novel research avenues and provide fresh
insights into early-modern French print culture and its intertextual dynamics.
In this report, we discuss some theoretical points underlying our research; we explain the choices
made in constructing our corpus and their implications; and we present some case studies to show the
potential of our research and the most prudent methodologies to adopt.
Keywords: intertextuality, network analysis, text reuse, SNA
63
modelling enlightenment: reassembling intertextual networks
1. Introduction and scope
e European Research Council project ModERN (Modelling Enlightenment: reassembling networks
of modernity through data-driven research) is a ve-year research programme that aims to challenge
received notions of 18th-century literary history through large-scale data modelling and analysis. By way
of the creation of a new, extensive corpus of 18th-century French texts and mobilising various data-science
methods like text-reuse detection and network analysis, we intend to open up new lines of research in early-
modern French print culture and its intertextual connections.
Intellectually, our project aims to align itself with previous work that has highlighted the highly
interconnected nature of 18th-century culture, analysing the social networks of intellectual gures (e.g.
Brockliss 2002), the correspondence networks between members of the Republic of Letters (Comsa et al.
2016; Edmondson and Edelstein 2019) or focusing on the diusion and reception of literary and philosophical
works (Burrows 2018; Burrows 2020; Darnton 1982; Darnton 2021). However, while these important and
stimulating studies have emphasised the circulation of people and ideas that contributed to dening the
main ideological axes and the European scope of the Enlightenment, they analyse these exchanges from a
purely material perspective (letters, books, sales records, etc.), reducing the textual content of these objects
to data points. Our project will instead employ new techniques for large-scale text analysis to identify and
analyse conceptual and intertextual networks over an unprecedented collection of 18th-century texts. As
we know, 18th-century authors demonstrated great agility in their appropriation and reappropriation of
both ancient and modern sources, relying on the shared cultural knowledge of their readers to identify
these borrowings (Edelstein, Morrissey and Roe 2013). Today, however, given that most of these references
remain hidden to contemporary readers, the identication of these intertextual relationships can provide
new insights into the reciprocal inuences, models and authorities that shaped the evolution of various
ideological, political and aesthetic discourses of the period.
At its core, the project seeks to understand how the modern constellation of Enlightenment authors
we have inherited today came into being; to uncover the cultural and ideological processes by which these
writers, and by extension the texts and concepts they helped disseminate, became so indissolubly linked to
the Enlightenment as an idea, while others – mostly forgotten today – were gradually excluded from these
same assemblages. In order to re-establish these lost voices, or, to put it another way, to reassemble these lost
networks, we need to drastically expand the corpus of texts on which we have traditionally drawn to understand
the French Enlightenment and its reception.1 ankfully, this process of expansion is already underway, as the
18th century has beneted greatly, perhaps more than any other historical period, from the past two decades of
digital transformation and the subsequent rise of the digital humanities (Burrows and Roe 2020; Paige 2021).
1 For further insights into the implications of the large-scale analysis of ‘non-canonical’ texts, on their heuristic potential as well
as the risks behind such broad analyses, see Moretti (2017).
64
modelling enlightenment: reassembling intertextual networks
Digital projects, databases and collections in 18th-century studies are now reaching a point of maturity
in their elaboration, as well as a critical mass in number, such that we can now begin to think in terms of
the literary or cultural systems, rather than individual works or authors, that inhabit our growing digital
archives, at least in the francophone context. e main ModERN corpora have thus been built including not
only canonical works and printed books that have been progressively digitised over the past two decades, but
also more ephemeral texts such as private correspondence, pamphlets, newspapers and journals: the breadth
of these collections provides a unique opportunity to trace a much broader range of intertextual practices
than has traditionally been conceivable (Kristeva 1969; Barthes 1984; Genette 1992). By identifying and
scrutinising a large swathe of these sorts of intertextual practices – from borrowings, citations, mentions
and references to paraphrases and allusions, etc. – we gain a deeper understanding of the intricate web of
inuences that shaped literary and philosophical works but which also enriches our comprehension of the
historical context in which these texts were produced.
Which 18th-century texts were most frequently ‘cited’, in what form and why? Which authors prove to
be the most ‘inuential’ – their words resonating and circulating the most within other texts of the same
period? What verses, maxims and concepts seemed to gain (or lose) popularity over the course of the long
18th century? Do specic communities emerge, centred around an authority or a foundational text, or are
textual exchanges rather to be found transversally, cutting across numerous literary and cultural elds?
Similar reections can be made regarding the reception of Antiquity. As we know, Greek and Latin literature
form the cultural and educational foundations of the time, and will, starting with the Querelle des Anciens
et des Modernes, eventually become an instrument for discussing and proposing new philosophical and
aesthetic theories (Grell 1995; Norman 2011). Which ancient authors are most frequently cited, and in what
context? In the original language or in translation? While the inuence of the most important authors (e.g.
Aristotle, Cicero, Plutarch) is well known, is it possible to unearth ‘secondary’ gures whose reception is
nevertheless crucial to understanding strategies of reusing Antiquity during the Enlightenment?
From a technical and methodological standpoint, we ground our understanding of these intertextual
exchanges as a specic instance and application of network modelling and analysis methodologies,
particularly social network analysis (SNA) for literary-historical studies. Over the past decade or so, several
research groups have begun to exploit the potential of applying the heuristic tools oered by SNA, developed
mainly in the social and data science elds, to humanities projects.2 But what are the implications behind
applying this model to a phenomenon like intertextuality? What kind of information can be extracted, and
how is it inuenced by our choices of formalisation and representation? In general, it is always important
to remember that any modelling attempt is, by denition, constructed with inherent biases related to the
dataset in question (the extent and quality of the sources it is derived from), the tools used to create it
2 On the ‘network turn’ in the humanities, see Ahnert et al. (2021). On the use of SNA in 18th-century studies, see Edmondson
and Edelstein (2019).
65
modelling enlightenment: reassembling intertextual networks
(computational techniques and their performance), and the specic research objectives (which questions
are posed, and how the results are interpreted). However, these potential obstacles remain surmountable
if it is clear to the researcher that each model is more of a heuristic tool for analysis than a repository of
immutable truth: models serve less to nd denitive answers than to propose new questions. e ability to
engage with an unprecedented amount of data oers specialists a new way to look at phenomena, in our
case, intertextuality and inuence in the 18th century.
We are condent that such large-scale projects are worthwhile, and that avenues of research opened up
by such programmes can lead to new knowledge and, eventually, new historical paradigms (Moretti 2008;
McCarty 2018). We will thus present the corpus construction and alignment methodologies that inform our
models, and then conclude with some practical use-cases for exploring and analysing intertextual networks
at scale.
2. Corpus
anks to institutional agreements with the University of Chicago, Gale Primary Sources, the University
of Oxford and the Bibliothèque nationale de France (BnF), our primary corpora are drawn from four main
sources: transcribed and curated data holdings from the ARTFL Project and the Voltaire Foundation;3 Paul
Fièvre’s éâtre Classique database (transcriptions) of French theatre;4 digitised texts in French (via OCR)
taken from the Gale Eighteenth Century Collections Online (ECCO) and e Goldsmiths’-Kress Library of
Economic Literature;5 and texts drawn from the Gallica digital library housed at the BnF (OCR).6 Overall,
our ingestion policy included the following criteria: digitised texts in French published roughly between
1685 and 1800 and, in the case of multiple editions of the same text, the earliest edition available across
collections. Preference was given to transcribed versions of texts regardless of publication date.
Our main corpus of texts is thus the result of an amalgamation of several independent collections
derived from distinct digitisation campaigns, the combination of which led to a series of inevitable problems
that we had to address. From the form and content of the metadata to the text formats and TEI-XML le
structures, each collection was unique and followed no real standard. Being conceived at dierent times and
in response to specic research objectives, each digitisation project adopts its own encoding and classication
logic, which introduces signicant variability into the combined metadata of any large-scale corpus.
3 ARTFL holdings include relevant texts from the ARTFL-Frantext database as well as other open-access and subscription-
based collections, including the Bibliothèque bleue de Troyes, the ARTFL Encyclopédie and Dictionnaires d’autrefois, see
https://art-project.uchicago.edu/; in collaboration with ARTFL, the Voltaire Foundation has developed the TOUT Voltaire
dataset, which was made available to our project, see https://www.voltaire.ox.ac.uk/voltaire-lab/tout-voltaire/.
4 See https://www.theatre-classique.fr/.
5 See https://www.gale.com/c/eighteenth-century-collections-online-part-i and https://www.gale.com/c/making-of-the-
modern-world-part-i.
6 See https://gallica.bnf.fr/.
66
modelling enlightenment: reassembling intertextual networks
is variability comes also from the editorial, literary and historical specicity of each corpus, which
inuences the choices of the researchers who compiled and encoded our corpora. For example, the issue
of paratextual elements takes on greater signicance depending on the historical or linguistic nature of the
corpus analysis one seeks to enact, begging the question of whether they should be included or excluded
from digital editions. e ARTFL-Frantext database, for instance, has removed all non-authorial elements
from its texts, in an eort to better represent the linguistic context in which they were produced.7 Other
collections, such as ECCO, reproduce texts in their entirety, including any and all paratextual elements
whether originating from the author or not.
ese tensions are indicative of larger debates around digitisation protocols and, more specically,
digital scholarly editions and the TEI-XML encoding standard: how much to encode, and at what levels
of granularity, are decisions oen made by previous editors whose justications may no longer be legible
at the time of corpus construction, leading to editorial artefacts that can skew results downstream if one
is not careful. And yet, the rst (and perhaps only) necessary condition behind the construction of an
analytical model is that the data within it are consistent with each other: if every choice and selection is
justiable in itself, a model is valid only if its elements are homogeneous and functional to the type of
analysis envisioned.
Given the diversity of text-encoding options, it thus becomes necessary to seek automated or semi-
automated methods to harmonise corpus metadata, and in particular titles and author names. e approach
we adopted combines three digital methods that assess the similarity between strings of characters. In a
rst instance, we deployed two well-established methods developed in natural language processing (NLP) –
Levenshtein distance and cosine similarity – in order to score the lexical similarity of author labels.8 e
results were fairly promising, allowing us to ascertain that ‘CARMONTELLE, Louis Carrogis, dit Louis de
Carmontelle (1717–1806)’ and ‘CARMONTELLE, Louis Carrogis de (1717–1806)’ were indeed the same
author. is may seem intuitive to the human eye, and indeed it is, but for the computer they are two
very distinct strings that inhabit the same XML eld and are therefore treated as separate entities. More
uncertain cases – author names that were too lexically dissimilar to be caught – ‘Anne-Claude-Philippe
de Tubières, comte de Caylus’ and ‘Comte de Caylus (1692–1765)’, for instance – required the use of more
direct techniques, such as comparing the two longest words in a string or systematically removing all dates
before comparison. anks to these approaches, were able to disambiguate our author names, which were
then standardised across our corpus. e same semi-automated standardisation process was employed for
the titles of texts, grouping the volumes of the same work under a single label, oen distinguished by the
mention of the volume number.
7 ARTFL-Frantext, like its French counterpart Frantext, was originally constructed by lexicographers compiling the Trésor de la
langue française dictionary in the 1970s. See https://art-project.uchicago.edu/content/art-frantext.
8 On these two measures of textual ‘similarity’, see Buscaldi et al. (2020).
67
modelling enlightenment: reassembling intertextual networks
Here we encountered the thorny issue of likely duplicates, i.e. those texts with similar, but not identical,
titles that in fact represent two (or more) versions of the same text. In order to make our dataset conform
to the analytical model we have chosen – to model intertextual exchanges using graphs and SNA tools – the
presence of a duplicate text can signicantly alter the results. Two or more copies of the same text, identical
or very similar, will tend to generate a very high number of co-occurrences among themselves (or just one,
but concerning the entire text), strongly inuencing the use of quantitative metrics derived from graph
analysis; and the same happens for every single text that cites (or is cited by) this same multiple source,
creating double, triple, etc., intertextual links. For the reasons previously mentioned, simple comparison of
metadata is not eective, given the wide range of variables in the indication of authors, titles, dates, etc. e
solution we found consists of exploiting precisely these two characteristics of duplicates: if the automatically
detected co-occurrence is extremely long, or if it coincides with the near entirety of a document, it is highly
likely that it is a duplicate; the same applies if two texts present an anomalous number of co-occurrences,
which require inspection to explain.
is last case gives us cause to recall and insist on two key points. First, any automatic process must
always be conducted in a supervised manner (hence our use of the term ‘semi-automated’): ambiguous
cases are always numerous, given that the literary, editorial, and cultural reality of any period is always more
complex and elusive than what can be strictly formalised. In the case of metadata, cases of homonymy can
occur: oen, dierent works that are interested in the same theme may have similar titles (Essai sur la poésie
épique, Essai sur les intérêts du commerce maritime...). In the case of duplicate detection, some editorial
forms, such as anthologies or encyclopaedic collections, exist precisely because they serve as repositories of
long intertextual excerpts, which quantitative analysis alone would tend to indicate as duplicates. erefore,
each algorithmic intervention must be evaluated by a domain specialist.
Secondly, there are always margins of error, independent of the researcher’s intentions, which cannot be
avoided because they are inherent to the nature of the data. For example, our digitised texts are fundamentally
dierent in terms of the underlying quality of the textual data, transcribed to near 100 per cent accuracy
in some cases, while others are the result of an automatic OCR process whose accuracy can vary greatly
depending on the source and digitisation campaign. Automatic correction oen introduces more errors than
it corrects, and as a result, most of these texts retain high levels of OCR errors that can aect the performance
of even the most robust text-reuse detection system. But, beyond the incompleteness of results (a common
problem in both ‘traditional’ or digital research), it is the introduction of a non-homogeneous dimension
(corrected texts versus OCR) that can skew data analysis: the co-occurrences of a corrected text will be much
more easily retrievable than those of an OCR-generated text, and this can create signicant variations in
the results. However, this should not be considered an impediment to proceeding, but rather a warning to
exercise caution across the entire data-processing pipeline. Understanding ones dataset and its limitations
is a rst, necessary, step to ensuring that downstream tasks and results are not inuenced by outside factors.
68
modelling enlightenment: reassembling intertextual networks
e most prevalent authors in our corpus, i.e. those with 30 or more titles attributed to them, can be
seen in Table 1.
A brief look at this list may raise some doubts, which need to be discussed, even if merely to draw
some general methodological observations. At rst glance, there seems to be a rather signicant over-
representation of Voltaire in our corpus: due to the editorial decisions of the Voltaire Foundation at the
University of Oxford, the source for all of our Voltaire les, single poems, some just a few lines long, are
considered individual works, alongside more substantial texts, such as the Essai sur les mœurs and Candide.
In this case as well, therefore, this bias must be taken into account during interpretation: each individual
poem must always be contextualised by taking into account the actual editorial history of the composition
and its dissemination.
Figure1 Distribution of text genre as percentage of total ModERN corpus.
With the above observations in mind, we began the corpus construction phase of our project: once
duplicates had been removed and the metadata standardised, we were le with a main research corpus of
13133 documents (mainly books) totalling over 511 million words in total. Of these 13133 documents,
3385 came from curated or transcribed sources while the remaining 9748 were the result of automatic
OCR. A rough distribution of text genres in the corpus can be seen in Figure 1. ese classications
were applied by the project team using a simplied version of the duc de La Vallières 18th-century
classication scheme for his private library, one of the most extensive in late Enlightenment France (see
Van Praet 1783).
69
modelling enlightenment: reassembling intertextual networks
Also of note is the strong presence of classical authors; those that were constantly edited and re-edited in
the 17th and 18th centuries. For the most part, these are translations that form a coherent sub-corpus of 527
texts, identied by our research team and mostly taken from Google Books as EPUB les which were then
cleaned, corrected and transformed into TEI-XML les. e logic behind including these texts, even if they
were not present in the initial corpora, is clear: the importance of the classical world in 18th-century culture
(from the Querelle des Anciens et des Modernes to the French Revolution, passing through neoclassicism and
the literary ‘retour à l’antique’ in the 1750s–1760s) is well known, and excluding them from our research
Author name Number of works
Voltaire (1694–1778) 1057
Carmontelle (1717–1806) 76
Cicero (106–43 bce)70
Bernard de Fontenelle (1657–1757) 69
Horace (65–8 bce)58
Plutarch (c.46–c.120) 56
Denis Diderot (1713–1784) 56
Florent Carton Dancourt (1661–1725) 51
Henri-Louis Duhamel Du Monceau (1700–1782) 47
Jean-Jacques Rousseau (1712–1778) 44
Honoré-Gabriel Riquetti comte de Mirabeau (1749–1791) 44
Pierre de Marivaux (1688–1763) 41
Pierre Corneille (1606–1684) 39
Jacques Necker (1732–1804) 39
Étienne Clavière (1735–1793) 39
Louis-Sébastien Mercier (1740–1814) 38
Louis Petit de Bachaumont (1690–1771) 37
Claude-Louis-Michel de Sacy (1746–1794) 37
Molière (1622–1673) 36
Jean-Antoine-Nicolas de Caritat marquis de Condorcet (1743–1794) 35
Antoine François Prévost (1697–1763) 34
Olympe de Gouges (1748–1793) 34
Charles-Simon Favart (1710–1792) 33
Thomas Corneille (1625–1709) 32
Tacitus (c.55–c.120) 32
Pierre-Samuel Dupont de Nemours (1739–1817) 32
Nicolas-Edme Rétif de La Bretonne (1734–1806) 30
Table 1 Authors with more than 30 texts in the ModERN corpus
70
modelling enlightenment: reassembling intertextual networks
would have meant losing a large quantity of citations and references that contributed to shaping the political
and aesthetic thought of the period.
Finally, we decided to include various canonical and indispensable texts composed before the 18th
century. In this case as well, we must consider the specics of our project: if, for example, Montaigne or
Pascal were absent from our corpus, an incredibly high number of references to their texts, crucial for the
18th century, would be untraceable for us, and our representation of intertextual patterns would be severely
compromised. In fact, it would not only be a serious lack of information but a structural problem of our
network: if two texts cite Montaigne without his work being present, they would appear as citing each
other when, in reality, they are both independently referring to a previous work. us, while the intertextual
networks we aim to produce will be bounded chronologically between 1685 and 1800 as beginning- and
end-dates, the corpus of texts used to generate the reuses must necessarily include works that fall outside
these somewhat arbitrary markers. If earlier texts were available in our base collections, then we tried to
include as many of them as possible.9
Having established our corpus and outlined the reasons and choices that underlie its creation, and aer
considering its limitations and their implications in formulating our working hypotheses for interpreting
our results, we were ready to proceed to the identication of intertextual connections within this corpus and
then to explore some possible research avenues and various types of analyses that can be marshalled even
with preliminary results.
3. Alignment and use-cases
Today, multiple soware applications are available for identifying text reuses in various datasets. Among
the freely available tools for extracting textual reuses in large corpora, we considered those that use
programming languages such as R (R textreuse package10), Java (TRACER11), PHP/Perl (Tesserae12) and
Python (Passim;13 BL AST,14 a tool designed for DNA sequence analysis; and Text-PAIR15). Although these
tools oer similar functionalities, we ultimately opted for Text-PAIR as it was specically designed to meet
the needs of literary-historical research, scales well to large corpora and can be compiled as part of the
9 For English publications the choice of potentially interesting early modern texts that predate the 18th century is much easier:
one can simply leverage the texts made available by the EEBO (Early English Books Online) project: https://proquest.libguides.
com/eebopqp. No such resource yet exists for publications in French.
10 https://github.com/ropensci/textreuse/. See also Li and Mullen (2020).
11 https://www.etrap.eu/research/tracer/. See also Büchler et al. (2014) and Franzini et al. (2019).
12 https://github.com/tesserae/tesserae/. See also Coee et al. (2013).
13 https://github.com/dasmiq/passim/. See also Romanello and Hengchen (2021).
14 Basic Local Alignment Search Tool: https://blast.ncbi.nlm.nih.gov/Blast.cgi. See also Vesanto et al. (2017) and Salmi et al.
(2020).
15 Pairwise Alignment for Intertextual Relations: https://github.com/ARTFL-Project/text-pair/. See also Olsen, Horton and Roe
(2011).
71
modelling enlightenment: reassembling intertextual networks
PhiloLogic search and retrieval corpus analysis system.16 PhiloLogic creates full-text indices of corpora,
leveraging metadata and other textual elements from TEI-XML les, and organises them into a database
that can be easily queried. PhiloLogic word indices then subsequently form the basis on which Text-PAIR
runs its sequence alignment matching algorithm. Additionally, Text-PAIR is easy to congure and relatively
fast, which allowed us to experiment with several key matching parameters and pre-processing options,
including lemmatisation and stemming.17 Finally, Text-PAIR is particularly well suited for extracting noisy
reuses – i.e. those that because of OCR or other factors may include broken or highly dissimilar sequences.
Once our main corpus was built as a PhiloLogic instance, and aer having settled on our text pre-
processing and matching parameters, we compared the entire corpus to itself using Text-PAIR. is initial
pass generated almost two million potential text reuses – i.e. similar passages that co-occur in at least
two dierent texts. While impressive, these results should be taken with an appropriate measure of salt,
as they include many, many ‘noisy’ alignments. at is, passages that are indeed similar but that do not
necessarily constitute a ‘reuse’ in its fullest sense: formulaic expressions, legal boilerplate, publishing and
print privileges, commonplaces, and so on. We are actively developing a semi-automatic alignment lter
designed to eliminate much of this ‘noise’, although many cases will require human intervention at a ner-
grained level of ltering. Based on initial estimates, around 80 per cent of the identied alignments will
likely be eliminated as ‘noise’, leaving us with roughly 200000 to evaluate further. We also plan to compare
our main corpus with several secondary corpora, including dictionaries, private correspondences, printed
pamphlets and the 18th-century press.18 In the meantime, we were eager to demonstrate the utility of our
approach and present some preliminary results and possible use-cases leveraging these ltered alignments.
4. Plagiarism
e potential of such large-scale text reuse data is manifold, and the type of studies that can be conducted
with it extremely varied. For instance, it is possible to identify obvious examples of plagiarism, cases in
which an author takes advantage of their source’s relative obscurity to appropriate their text with impunity.
Similarly, one can nd references that are dicult to trace for contemporary researchers, drawn from works
which are not considered canonical today but that circulated at the time and participated in the shared
literary culture. e results of an alignment may therefore represent a real surprise for a researcher, who can
(re)discover intertextual connections that would otherwise remain invisible.
16 https://github.com/ARTFL-Project/PhiloLogic4/. See also Tharsen and Gladstone (2020). Since 2015, Clovis Gladstone,
associate director at the University of Chicago’s ARTFL project, has been the lead developer of both the PhiloLogic and Text-
PAIR codebases. We are grateful for his invaluable support of the ModERN project.
17 For a discussion of our text-matching parameters and experimentation, see Fedchenko, Nicolosi and Roe (2024).
18 Our thanks to the ARTFL project for its collection of dictionaries (https://art-project.uchicago.edu/content/dictionnaires-
dautrefois), Electronic Enlightenment for its correspondences (https://www.e-enlightenment.com), the Newberry Library for
its collection of French pamphlets (https://www.newberry.org/collection/research-guide/french-pamphlets) and the BnF
DataLab for the 18th-century press (https://www.bnf.fr/fr/bnf-datalab).
72
modelling enlightenment: reassembling intertextual networks
To take one example, we came across an unexpected case of plagiarism involving the Greek poet Sappho,
a gure whose reception in the 18th century is somewhat problematic, and whose texts were included in the
sub-corpus of classical translations mentioned above. While the scandal of Sappho’s homosexual relationships
was largely mitigated in the 18th century, leading to a more pathos-driven portrayal of her character in
numerous rewritings that depict her as an unlucky lover, old and exiled, her erotic and passionate dimension
remained ever-present. So much so that Mercier, in his utopian work L’An 2440, included her among the
ancient texts that were unanimously burned as harmful. As Joan DeJean states, ‘the eighteenth century
may have continuously rearmed Sappho’s heterosexuality because the fear of her sapphism had not been
eradicated’ (1989, p.118). Her fragments were thus, on the one hand, excluded from the canon of classical
authors, and on the other, frequently translated into French and Latin, oen in anthologies of Greek poets
(Sappho 1681; Sappho 1712; Sappho 1758; Sappho 1781). ese translations, which lack deep philological
rigour, signicantly altered the original text to promote this new characterisation, making the new versions
scarcely recognisable compared to the originals (DeJean 1989, pp.116–67). Sappho is therefore published,
but read with suspicion, or simply misread; like the poet herself, her texts are subject to misinterpretations
and rewritings. Sappho’s case is therefore exemplary in understanding how, without the use of automatic
textual comparison systems, it would be impossible to nd all the traces of her literary dissemination.
Given this context, we discovered that in 1788 her poems were the subject of an almost comical
‘appropriation’ in the poetry section of the provincial Journal du Hainaut et du Cambrésis: a certain M. Parent
de Saint-Amand published not just one, but at least three poems by Sappho (‘À ma bouteille’, ‘Ma mort,
À Éléonore’, this last rechristened ‘La Discrétion à Mlle de C.L.’), explicitly presenting himself as the author, with
small changes to the texts that in no way justify this claim (see Parent de Saint-Amand 1788; Parent de Saint-
Amand 1789a; Parent de Saint-Amand 1789b). In Figure 2 we see the text of Sappho compared to the Journal
in the Text-PAIR web interface, and in Figure 3, the original text published in the Journal. Due to OCR errors
present in the two compared texts, Text-PAIR only identied the part highlighted in red and marked the minor
dierences in green. However, the researcher can easily observe that the entire poem has been fully copied,
with only the modication of the title and the change of addressee of the poem (Éléonore becomes Constance).
is is one of the main advantages of Text-PAIR, which allows for the identication of textual reuse, even
partial, when the texts are incorrect or fragmented. Clearly, in this case, this is plagiarism no matter how one
looks at it, involving only the modication of the title and the change of the addressee of the poem.
In its exceptionality, this case raises several key points:
First, we must not forget the nature of the texts we are analysing: the 18th century had a very special
relationship with Antiquity, both of proximity and acclimatisation (theories of artistic perfection,
syncretism and the principle of the belle indèle, for example). is allowed for great liberty in the reuse of
ancient texts, which could be cut, transformed and distorted for any aesthetic purpose (Zuber 1968; Grell
1995, pp.307–24). A project such as ours can thus not only detect extreme cases of plagiarism such as the
73
modelling enlightenment: reassembling intertextual networks
Figure2 The text of Sappho as it appears in TextPair.
Figure3 M. Parent’s plagiarism as it appears in the Journal du Hainaut et du
Cambrésis.
74
modelling enlightenment: reassembling intertextual networks
one above, but also identify co-occurrences and rewritings that, given the extreme nonchalance with which
sources were oen reused in the 18th century, would oen otherwise have remained hidden.
Second, and most importantly, the detection of intertextual links is highly dependent on the ‘culture’ of
the analyser: just as the readers of the Journal may not have recognised Sapphos texts, similarly, we researchers
today would have a dicult time identifying such references if not for its automatic detection across a large and
heterogenous corpus that includes both classical translations and issues of obscure provincial newspapers. But,
the importance of these small, almost serendipitous discoveries is signicant, as they open up unpredictable and
stimulating elds of research: who was this M. Parent? Why did he choose Sappho? How transparent was this
plagiarism for the reader of the time? etc. Or it can equally serve as a starting point for new more general research
questions: is it possible to nd networks of dissemination of classical texts in the provinces, where the processes
of cultural diusion were dierent from those in the capital? Despite moral controversies, how extensive was the
dissemination of Sappho’s texts in the 18th century? etc. It is precisely these sorts of questions, and their potential
answers, that we hope will emerge once the project’s data is analysed and released to the public.
5. Quantitative analysis – uncovering the inuence of individual authors
Aside from identifying and uncovering direct (or indirect) examples of text reuse, our project seeks more
generally to understand the notion of authorial or textual ‘inuence’ in the 18th century. Confronted
with many lesser-known gures whose reception is ambiguous, this type of analysis is oen dicult. It is
undeniable, for instance, that Cicero was a key reference for 18th-century culture, both in terms of oratory
and moral reexions, but understanding the impact of less prominent gures, or those whose biographies
might signicantly aect contemporary judgment, is altogether more complex. Again, it comes down to a
question of scale: Cicero is everywhere, and the possible variations in references to his works tend to lose
their signicance; Catullus much less so, and each individual reference takes on much greater importance,
making the discovery of multiple examples particularly valuable.
Take Julius Caesar as one such example: his reception in the 18th century is highly ambiguous, both as
a historical gure and as a writer (Mercier and Bièvre-Perrin 2024). ese two aspects are oen connected:
for instance, in Rollin’s Traité des études, an important pedagogical text of the time, Caesar is simultaneously
praised for his style as a historian (Grell 1995, pp.100–106) and condemned for his arrogance and for his
political coup that undermined the institutions of the Roman Republic (Bedon 1985). Praised for his military
achievements and for civilising Gaul (Grell 1995, pp.1113–19), Caesar is also highly criticised, on one hand
politically, given his status as a ‘tyrant of usurpation’ and, on the other, as an historian, whose works are
oen considered devoid of concrete details (Poignault 1985). We need only think about Voltaires equivocal
treatment of Caesar in Rome sauvée, where the character is both an alternative to Ciceros passivity and one
of the rst possible accomplices in Catiline’s thirst for power (Nicolosi 2024), or the dierent nuances his
image takes in revolutionary speeches (Parent 2022). How should one assess the period’s interest in such an
75
modelling enlightenment: reassembling intertextual networks
ambiguous gure? In this case, traditional exegesis can be enriched by the data that a project like ours can
provide, conrming and nuancing existing interpretations.
e simplest method for assessing the inuence of authors is to quantify and evaluate their presence in
the texts of the time, either as subjects of theoretical works or as protagonists in literary texts, or to the extent
that their works and words are cited or reused for their exemplarity or appropriateness. However, while
the history of Julius Caesar is evidently the subject of countless comments and analyses (in our corpus, the
expression ‘Jules César’ occurs 620 times), the case is dierent when examining his ‘active’ presence in the
period’s imagination as a ‘speaking’ subject or ‘agent’, and thus, a more direct denition of inuence in terms
of symbolic impact or direct reuse of his statements.
We can start by evaluating the number and type of plays that are dedicated to Caesar, presented in
Table 2. Based on Brenner’s catalogue of all 18th-century plays (1947), we notice that Caesar is relatively
marginalised compared to other Roman historical gures (Laplace 1985): out of eight plays about him, two
are translations of Shakespeare, where Caesar is little seen; two others give him larger roles but were not
performed on Parisian or institutional stages; and the rest are primarily about Caesar’s death, where the
protagonists are actually Brutus and Cassius. His appearances in works on other subjects are rare: besides the
aforementioned Rome sauvée, we could also mention Caton d’Utique (1715) by François-Michel-Chrétien
Deschamps. Clearly, we have here a historical character who is much talked about for the greatness of his
deeds, but whom an 18th-century public seemingly does not want to ‘see’ or ‘hear’.
Author Title Genre Acts Year (and place if
known) of rst
performance or
publication
M.-A. Barbier La Mort de César tragedy 51710 (Paris)
Banières La Mort de Jules César tragedy 51728 (Toulouse)
Voltaire La Mort de César tragedy 31735 (Paris)
Abbé Saulx La Mort de César tragedy unknown 1737 (Reims, Collège des
Bons-Enfants)
P.-A. de LaPlace Jules César (translation
of Shakespeare)
tragedy 51746 (published in Le
Théâtre anglais)
J.-B.-C. Delisle de Sales César ou les deux vestales play 11774 (at the residence of
the prince d’Hénin)
P.-P.-F. LeTourneur Jules César (translation
of Shakespeare)
tragedy 51776 (published in
Shakespeare traduit de
l’anglais)
Anonymous L’Héroïsme sénonais ou
le siège de Sens sous
Jules César
drama 31781
Table 2 Table of all 18th-century plays dedicated to Julius Caesar.
76
modelling enlightenment: reassembling intertextual networks
But what about the presence of Caesar’s words in other texts? e extracted reuse data from our
project would seem to conrm the hypothesis we have just formulated. Caesar’s two main works are the
Commentarii de Bello Gallico, and the Commentarii de Bello Civili, which we consider rst by nding
quotations directly in Latin (the texts in the original language are easily found online). It is immediately
evident that most references to these two historiographical treatises are extracted from the De Bello Gallico
(57 co-occurrences), rather than the De Bello Civile (ve co-occurrences), which as the history of an
insurrection was clearly less popular in the context of French absolutism. Quantitatively, the quotations
are not particularly numerous, and oen appear in works by non-French authors and military texts, or
those that take an interest in pre-Roman Gaul (Table 3). All told, Caesar seems to be used mainly as
documentary support for historical or proto-ethnological enquiries, and rarely taken up or commented
on for his rhetoric and sentences.
Author and nationality Title and date of publication Number of quotes
F.-R. Pommereul (French) Recherches sur l’origine de l’esclavage religieux
politique du peuple, en France (1783)
12
C. Guischardt (French) Mémoires critiques et historiques sur plusieurs
points d’antiquités militaires (1774)
8
G. Stuart (British) Dissertation historique sur l’ancienne constitution
des Germains, Saxons et habitants de la Grande-
Bretagne (1794)
6
J.-R. Sinner (Swiss) Voyage historique et littéraire dans la Suisse
occidentale (1781)
5
R. Wallace (British) Essai sur la diérence du nombre des hommes
dans les temps anciens et modernes (1754)
4
J.-B. de Mirabaud (French) Le Monde, son origine, et son antiquité (1751) 2
H. Gautier (French) Traité des ponts (1728) 2
T. Shaw (British) Voyages dans plusieurs provinces de la Barbarie
et du Levant: contenant des observations
géographiques, physiques, philologiques…(1743)
2
Table 3 Table of works that most frequently cite Julius Caesar in Latin
Using our sub-corpus of translations, we nd a similar set of practices of reuse concerning Caesar’s works
(Caesar 1678; Caesar 1763; Caesar 1785; Caesar 1786): most of the references appear, again, in military texts,
conrming how Caesar was appreciated in the 18th century as a brilliant general rather than as a politician;
his historical treatises are again used mainly to extract information about ancient Gaul (Table 4).
Interestingly, our alignments also unearthed a maxim attributed to Madame Des Houillères that
recurs frequently in the various translations of Caesar: ‘Nul nest content de sa fortune, ni mécontent de
son esprit’. e signicant presence of this maxim, which has become proverbial, in many peritexts of
Caesar’s translations suggests a negative perception of this character, of the leader who, out of his personal
77
modelling enlightenment: reassembling intertextual networks
armation, overthrew the legitimate, albeit republican, state. Finally, our data conrm and corroborate what
we had empirically perceived when looking at the theatrical output of the century: Caesar’s ‘voice’ remained
largely unheard in the 18th century, which preferred to discuss his exploits (mainly through Plutarch and
historians of the Imperial period) than to use directly the expressions of a controversial gure in the context
of monarchical France.
Here as above, the ability to compare a large number of texts allows us to conrm a hypothesis that
would be otherwise dicult to prove in absolute terms – remaining, as these so oen do, at the level of a
‘hunch’ (in this case, correct). Certainly, due to the nature of our data, it is possible that for technical reasons
(OCR texts with many errors), some of Caesars quotations may remain unidentied. But given the very low
number of quotations found in both French and Latin in comparison to other Roman authors, the nature of
the texts reusing the Roman generals treatises, and the massive presence of Madame Des Houillères couplet,
there seems to be little doubt as to Caesar’s scarce presence within the 18th-century literary eld. As with
mixed-mode methods in the social sciences, quantitative analysis, applied to narrow or inherently complex
cases, becomes a solid ally of qualitative hypotheses and research.
6. e heuristic potential of networks
Finally, what about network analysis? How can it serve literary studies? At our current stage of research,
the amount of data and its complexity make it dicult to obtain reliable results. e various obstacles
that have appeared, and that we intend to overcome, concern, for example, the diculties in classifying
co-occurrences: when are they signicant? When do they represent a true reuse and not simply a repetition
of common or formulaic language with no intertextual value? While these questions remain very much
Author Title and year of publication Number of quotes
J.-B. Dubos Histoire critique de l’établissement de la monarchie
françoise dans les Gaules (1734)
11
J. Pagès Manuscrits de Pagès, marchand d’Amiens, écrits à
la n du 17e et au commencement du 18e siècle, sur
Amiens et la Picardie (1820)
10
A.-F. Boureau-Deslandes Essai sur la marine des anciens (1768) 10
C. Guischardt Mémoires critiques et historiques sur plusieurs
points d’antiquités militaires (1774)
8
M. de Saxe Les Rêveries dédiées à Messieurs les ociers
généraux par Mr. de Bonneville (1757)
3
D. Lescallier Vocabulaire des termes de marine anglois et
françois (1777)
3
Anonymous Un bon François de l’ordre des patriciens, aux bons
François de l’ordre des plébéiens (1789)
2
Table 4 Table of works that most frequently cite Julius Caesar translated in French
78
modelling enlightenment: reassembling intertextual networks
open, we are nonetheless encouraged by some of our preliminary results, which conrm known premises of
18th-century literature while hinting at the broader potential of the project as a whole.
Let us take, for example, one of the typical dichotomies of the theatrical and literary world of the
18th century, the subject of countless specialist debates: who is the most important point of reference
for Enlightenment dramatists, Corneille or Racine? And how did these two giants of French classicism
come to inuence 18th-century playwriting? e parallel, already posed at the end of the 17th century
(Mortgat-Longuet 2003) and continued in the 18th century (Goldzink 2003), accompanies the entire
history of French literature. Many concessions would need to be made, but in general, anyone who has
dealt with the history of 18th-century theatre would likely answer Racine, insofar as his use of the pathetic
and the spectacular (e.g. Athalie) inform the main aesthetic developments of the century (Perchellet
2004a; Perchellet 2004b; Viala and Tunstall 2015, p.274). Can our current data conrm this rst, intuitive
hypothesis? And if so, how? To answer these questions, we need to rst take into account our use of graph
metrics for understanding network ‘inuence’.
Our dataset of textual reuses allows us to generate graphs in which each text or author (i.e. the totality
of texts attributed to the same author) represents a node, and in which the links between two points indicate
an intertextual exchange that has taken place between two texts (or between the entire production of both
connected authors). Each generated graph will thus have characteristics that can be analysed mathematically,
and which give us information on the function and weight that each node (and thus each text or author)
assumes within the system of intertextual exchanges.
e rst metric that can be analysed is the degree of a node, i.e. the number of exchanges in which it is
a protagonist, either as a quoting subject or as a quoted object. Using this degree measure, it is possible to
generate a simple relative ranking by number of intertextual interactions. In our case, and with the current
data at our disposal, Corneille appears in 14th position, while Racine comes in 11th. us, in the absolute,
Racine appears in more exchanges than Corneille. A rst conrmation of our hypothesis, but from which
no conclusion can be drawn: beyond this purely quantitative measure, it is the quality and importance of
these links in the general context of the intertextual network that can give us more and better indications.
Another measure that can be taken into consideration is PageRank, a measure for directed graphs
which depends on the number and quality of links to a node (Liu et al. 2017; Labatut and Bost 2019). e
underlying hypothesis behind this measure is that the most important nodes are likely to receive more links
from other important nodes. An author has a high PageRank if a large number of authors reuse their text
and these authors are themselves oen reused and, therefore, considered important in the system. In other
words, if an author who is ‘widely read’ quotes me, my ‘importance’ and the possibility of other people
reading me increase. ere is more chance of readers ‘stumbling across’ my text if Voltaire reuses me, than if
20 minor and little-read authors do. Now, if we take the ranking by PageRank of the authors in our project,
Corneille is in sixth place, while Racine is in seventh: we could therefore deduce that Corneille is less cited
79
modelling enlightenment: reassembling intertextual networks
in the absolute, but cited by more ‘important’ authors, and that therefore his impact on the literary world of
the 18th century is slightly stronger than that of Racine. But it is possible to rene this result even further.
Another fundamental measure in network analysis is betweenness centrality, which calculates the
importance of a node with respect to its position in the graph (Labatut and Bost 2019; Grandjean and Jacomy
2019). More specically, it calculates the number of times a node is on the shortest paths between two other
nodes; the more central a node is and therefore the more it can act as a bridge between other nodes in the
network, the higher its betweenness centrality. e more peripheral and isolated a node is from the rest of the
nodes in the system, the lower its betweenness will be and the lower its impact on potential paths. Now, this
measure emphasises the identication of paths, as in the case of information transmission and distribution
ows (e.g. of people, energy, goods): if to get from point A to point C, the fastest route goes through point
B, then B appears as a central node in the distribution of a resource, and will have a high betweenness – it
becomes a hub. An airport of a large city allows for the connection of many airports of smaller cities, not
otherwise connected to each other: its betweenness and importance in the transportation network are very
high, enabling the passage of travellers between disconnected places. In an intertextual network, however, the
relationships between texts/points do not describe a ow of information or represent a path between various
points. If text B cites text A, and is itself cited by text C, the representation of this interaction (A → B → C)
seems to suggest that B is a bridge between A and C; but in reality, it makes little sense to say that A and C
are connected thanks to B, for numerous reasons (what B cites from A is not necessarily what C cites from B;
and if the citation is the same, it is impossible to establish whether C cited A through B, or whether C directly
cited A). More generally, even though the direction of the arrows may suggest a path from A to C, in reality,
what is represented is only the relationship between A and B, and that between B and C.
In our network, betweenness does not indicate the importance of a text, but rather its ability to be involved
in intertextual exchanges with dierent groups of texts or literary communities. A high betweenness implies
that the node is connected to many nodes of the network which would be disconnected otherwise: a text that
cites or is cited in politics, economics, theatre, religion, etc. will come into contact with very dierent parts of our
intertextual graph, even in isolation. Conversely, a low betweenness implies that the node is poorly connected,
or connected to a homogeneous group of texts that tend to quote each other, without connections with other
areas of the network. Typical cases of low betweenness are found in religious texts or legal documents, which
are very present in texts of the same nature, but rarely found in texts dealing with other subjects.
Our network’s betweenness centrality ranking nds Racine in sixth position and Corneille 196th. e
latter’s position in the network is thus much more marginal than that of the former. e explanations for
this large gap, even in the face of similar degree and PageRank measures, may be multiple: Racine seems
to act as a ‘bridge’ between dierent groups in our network, both as a citing author (think of his links with
the religious world of Port-Royal) and as a quoted author (his verses become part of popular culture, and
quoted in contexts that do not concern theatrical dramaturgy). On the contrary, Corneille remains a highly
80
modelling enlightenment: reassembling intertextual networks
cited author (his degree measure is very high), but by homogeneous groups, or by a few individual authors
in an intense manner. Voltaires Commentaires sur Corneille (1764), as well as other poetic texts (La Harpe’s
Lycée, 1739, for example), quote his works extensively, and we thus understand why Corneilles PageRank is
so high, but these texts all belong to the world of Belles-lettres alone, and links with other groups and areas
of the network remain weak.
While it therefore remains dicult (and probably pointless) to answer denitively the question we have
posed regarding the importance of Corneille or Racine in 18th-century culture, the possibility of enriching
our knowledge through the integration of graph metrics extracted from network analysis has allowed us
to imagine new and dierent research hypotheses. e two authors seem to participate, in a qualitatively
dierent way, in the network of transmission of intertextual material: Racine seems to be a more transversal
author, and his verses are extracted from their context and reused in dierent spheres; Corneille remains a
very important literary reference point, but with a few exceptions, his words (poetic, but also theoretical,
e.g. his Trois discours sur le poème dramatique) resonate mainly in literary circles. Even as we await more
conclusive and extensive data, this simple insight opens up new avenues of research: which categories of
texts quote Racine the most? How do his verses – extracts from tragic texts, and decontextualised – manage
to take on dierent valences and become meaningful? How does Corneille become a point of reference – by
adherence or contrast – to the dramatic poetics of the 18th century?
7. Conclusion: next steps
In this brief account, we have described the basic workings of our project, and presented some possible
lines of research that it can help to identify and enrich. e analyses of the plagiarism of Sappho and the
dissemination of Caesar’s texts represent an initial demonstration of the advantages of applying digital
methods in the discovery of intertextual connections, which would otherwise be either unrecoverable or
too numerous for traditional close-reading approaches. As such, each piece of data becomes the basis for
returning to the sources, leading to new interpretations that either rearm or challenge common critical
assumptions. On the other hand, the Corneille/Racine comparison shows how the use of SNA paradigms
and metrics in literature and its internal intertextual links is both possible and potentially capable of rening
common exegesis, oering new evidence for established theories or uncovering new patterns that only large-
scale distant-reading analyses can reveal (Underwood 2019).
Clearly, we are still at an early stage of our project, and much work remains to be done to make our
results meaningful. Our goal is to create and dene proles for each text/author node in our network, created
on a mathematical basis in relation to the various metrics presented (and others belonging to the eld of
graph studies – e.g. closeness centrality, clustering).19 For example, since ours is an oriented graph (the links
19 See Labatut and Bost (2019); Grandjean and Jacomy (2019).
81
modelling enlightenment: reassembling intertextual networks
connecting the works have a direction, there is a source and a target of each citation), it is possible to dene
an author-text node as an Authority (high number of texts citing it) or as an Observer (high number of
texts cited); the same goes for the category of Mediator, applicable to a node whose betweenness centrality
and PageRank measures are both high. Other possible categories will emerge through the combination of
the various metrics we intend to calculate. We also envision analysing the context of each alignment, i.e.
dening by topic modelling or sentiment analysis the ‘intention’ behind each quotation/reuse. For example,
while Corneille is quoted at length and commented on in Voltaire’s Commentaires, the criticisms that the
philosophe levies against the 17th-century playwright certainly contain a dierent set of value judgements
than those found in La Harpe’s Éloge de Racine (1772).
ese future perspectives notwithstanding, our general hope is that our large-scale treatment of
intertextual links in the 18th century will allow us to verify, in a new and novel way, some of the most
widespread literary hypotheses, and to oer the entire scholarly community the tools to conduct such
research themselves. Once organised in a database, all our data will be available online and interrogated in,
we hope, an intuitive manner, allowing any researcher to verify their own hypotheses on the circulation and
diusion of texts and authors in the 18th-century French literary eld.
References
Ahnert R., Ahnert S., Coleman C. and Weingart S. 2021. e Network Turn. Changing Perspectives in the Humanities.
Cambridge: Cambridge University Press. https://doi.org/10.1017/9781108866804.
Barron A., Huang J., Spang R. and DeDeo S. 2018. ‘Individuals, institutions, and innovation in the debates of the
French Revolution. In: Proceedings of the National Academy of Sciences 115:18, 4607–12.
https://doi.org/10.1073/pnas.1717729115.
Barthes R. 1984. Le Bruissement de la langue. Paris: Seuil.
Bedon R. 1985. ‘César dans le Traité des études de Charles Rollin’. In: Chevallier R. (ed.) Présence de César. Paris: Les
Belles Lettres, 275–85.
Brenner C. 1947. A Bibliographical List of Plays in the French Language 1700–1789. Berkeley CA: Edwards brothers.
Brockliss L.W.B. 2002. Calvet’s web: Enlightenment and the Republic of Letters in Eighteenth-century France. Oxford:
Oxford University Press. https://doi.org/10.1093/oso/9780199247486.001.0001.
Büchler M., Burns P., Müller M., Franzini E. and Franzini G. 2014. ‘Towards a historical text re-use detection. In:
Biemann C. and Mehler A. (eds) Text Mining. eory and Applications of Natural Language Processing. Cham: Springer,
221–38. https://doi.org/10.1007/978-3-319-12655-5_11.
Burrows S. 2018. e French Book Trade in Enlightenment Europe. London: Bloomsbury Academic.
Burrows S. 2020. ‘e FBTEE revolution: mapping the Ancien Régime book trade and the future of historical bibliometric
research. In: Burrows S. and Roe G. (eds) Digitizing Enlightenment: Digital Humanities and the Transformation of
Eighteenth-Century Studies. Oxford University Studies in the Enlightenment. Liverpool: Liverpool University Press,
167–94.
82
modelling enlightenment: reassembling intertextual networks
Burrows S. and Roe G. (eds) 2020. Digitizing Enlightenment: Digital Humanities and the Transformation of Eighteenth-
Century Studies. Oxford University Studies in the Enlightenment. Liverpool: Liverpool University Press.
Buscaldi D., Felhi G., Ghoul D., Le Roux J., Lejeune G. and Zhang X. 2020. ‘Calcul de similarité entre phrases: quelles
mesures et quels descripteurs?’. In: Cardon R., Grabar N., Grouin C. and Hamon T. (eds) Actes de la 6e conférence conjointe
Journées détudes sur la parole (JEP, 33e édition), Traitement automatique des langues naturelles (TALN, 27e édition),
Rencontre des étudiants chercheurs en informatique pour le traitement automatique des langues (RÉCITAL, 22e édition).
Atelier Dé Fouille de Textes. Nancy: ATALA and AFCP, 14–25. https://aclanthology.org/2020.jeptalnrecital-de.2.
Caesar J. 1678. Les Commentaires de César, de la traduction de N.Perrot, sieur d’Ablancourt. Édition nouvelle revue et
corrigée. Perrot d’Ablancourt N. (trans.). Amsterdam: A.Wolfgang.
Caesar J. 1763. Les Commentaires de César […]. Nouvelle édition augmentée de notes historiques et géographiques, et
d’une carte nouvelle de la Gaule et du plan d’Alise, par M.Danville. Perrot d’Ablancourt N. and Le Mascrier J.-B. (trans.).
Amsterdam: Arkstee & Merkus.
Caesar J. 1785. Commentaires de César, avec des notes historiques, critiques et militaires. Turpin de Crispé L. (trans.).
Montargis: C.Lequatre and Paris: C.-G.Leclerc.
Caesar J. 1786. La Guerre de Jules César dans les Gaules. De Précis (trans.). Paris: Imprimerie royale.
Coee N., Koenig J.-P., Poornima S., Forstall C., Ossewaarde R. and Jacobson S. 2013. ‘e Tesserae Project: intertextual
analysis of Latin poetry’. In: Literary and Linguistic Computing 28:2, 221–28. https://doi.org/10.1093/llc/fqs033.
Comsa M. T., Conroy M., Edelstein D., Edmondson C. S. and Willan C. 2016. ‘e French Enlightenment network’.
In: e Journal of Modern History 88:3, 495–534. https://doi.org/10.1086/687927.
Darnton R. 1982. e Literary Underground of the Old Regime. Cambridge MA: Harvard University Press.
Darnton R. 2021. Pirating and Publishing: e Book Trade in the Age of Enlightenment. Oxford: Oxford University
Press.
DeJean J. 1989. Fictions of Sappho, 1546–1937. Chicago IL: University of Chicago Press.
Edelstein D., Morrissey R. and Roe G. 2013. ‘To quote or not to quote: citation strategies in the Encyclopédie. In:
Journal of the History of Ideas 74:2, 213–36. https://www.jstor.org/stable/43291299.
Edmondson C. and Edelstein D. (eds) 2019. Networks of Enlightenment: Digital Approaches to the Republic of Letters.
Oxford University Studies in the Enlightenment. Liverpool: Liverpool University Press.
Fedchenko V., Nicolosi D.M. and Roe G. 2024. ‘À la recherche des réseaux intertextuels: dés de la recherche littéraire
à grande échelle’. In: Humanités numériques 9. https://doi.org/10.4000/11wmw.
Franzini G., Passarotti M., Moritz M. and Büchler M. 2019. ‘Using and evaluating TRACER for an Index fontium
computatus of the Summa contra Gentiles of omas Aquinas’. Fih Italian Conference on Computational Linguistics
(CLiC-it 2018), Turin: Zenodo. https://doi.org/10.5281/zenodo.3362130.
Genette G. 1992. Palimpsestes: la littérature au second degré. Paris: Seuil.
Goldzink J. 2003. ‘Le torrent et la rivière. In: Declercq G. and Rosellini M. (eds) Jean Racine, 1699–1999. Paris: Presses
Universitaires de France, 719–28.
Grandjean M. and Jacomy M. 2019. ‘Translating networks: assessing correspondence between network visualisation
and analytics’. Digital Humanities 2019. Utrecht: HALSHS. https://shs.hal.science/halshs-02179024.
Grell C. 1995. Le Dix-huitième siècle et l’antiquité en France: 1680–1789. SVEC 330–31. Oxford: Voltaire Foundation.
83
modelling enlightenment: reassembling intertextual networks
Hamzehei A., Jiang S., Koutra D., Wong R. and Chen F. 2017. ‘Topic-based social inuence measurement for social
networks’. In: Australasian Journal of Information Systems 21. https://doi.org/10.3127/ajis.v21i0.1552.
Kristeva J. 1969. Sēmeiōtikē. Recherches pour une sémanalyse. Paris: Seuil.
Labatut V. and Bost X. 2019. ‘Extraction and analysis of ctional character networks: a survey’. In: ACM Computing
Surveys 52:5, 1–40. https://doi.org/10.1145/3344548.
Laplace R. 1985. ‘Le personnage de César à la Comédie-Française’. In: Chevallier R. (ed.) Présence de César. Paris: Les
Belles Lettres, 293–304.
Li Y. and Mullen L. 2020. textreuse: Detect Text Reuse and Document Similarity. https://docs.ropensci.org/textreuse.
Liu Q., Xiang B., Jing Yuan N., Chen E., Xiong H., Zheng Y. and Yang Y. 2017. ‘An inuence propagation view of
PageRank’. In: ACM Transactions on Knowledge Discovery from Data 11:3, 1–30. https://doi.org/10.1145/3046941.
McCarty W. 2018. ‘Modeling the actual, simulating the possible’. In: Flander J. and Joannidis F. (eds) e Shape of Data
in Digital Humanities. London: Routledge, 264–84.
Mercier C. and Bièvre-Perrin F. (eds) 2024. Jules César, construction d’une image de lAntiquité à nos jours. Besançon:
Presses universitaires de Franche-Comté.
Moret ti F. 2008. Graphes, cartes et arbres. Modèles abstraits pour une autre histoire de la littérature. Paris: Les Prairies
ordinaires.
Moret ti F. (ed.) 2017. Canon/Archive. Studies in Quantitative Formalism from the Stanford Literary Lab. New York: n+1
Foundation.
Mortgat-Longuet E. 2003. ‘Aux origines du parallèle Corneille-Racine: une question de temps’. In: Declercq G. and
Rosellini M. (eds) Jean Racine, 1699–1999. Paris: Presses Universitaires de France, 703–17.
Most G.W. 2008. ‘Réexions de Sappho. Rabau S. and de Gandt M. (trans.). In: Fabula-LhT5. https://doi.org/10.58282/lht.832.
Nicolosi D.M. 2024. ‘La valeur symbolique de l’espace scénique dans les tragédies romaines et grecques de Voltaire’. In:
Revue Voltaire 22, 121–36.
Norman L.F. 2011. e Shock of the Ancient. Literature and History in Early Modern France. Chicago IL: University of
Chicago Press.
Olsen M., Horton R. and Roe G. 2011. ‘Something borrowed: sequence alignment and the identication of similar
passages in large text collections. In: Digital Studies/Le Champ numérique 2:1. https://doi.org/10.16995/dscn.258.
Paige N. 2021. Technologies of the Novel: Quantitative Data and the Evolution of Literary Systems. Cambridge: Cambridge
University Press. https://doi.org/10.1017/9781108890861.
Parent H. 2022. Modernes Cicéron. La romanité des orateurs révolutionnaires et de l’Empire (1789–1807). Paris: Classiques
Garnier.
Parent de Saint-Amand [given name unknown] 1788. ‘À ma bouteille’. In: Journal du Hainaut et du Cambrésis, par
M.le Cher de Limoges, membre de plusieurs académies 45, 378.
Parent de Saint-Amand [given name unknown] 1789a. ‘Ma mort’. In: Journal du Hainaut et du Cambrésis, par M.le
Cher de Limoges, membre de plusieurs académies 4, 35–36.
Parent de Saint-Amand [given name unknown] 1789b. ‘La Discrétion à Mlle de C.L.. In: Journal du Hainaut et du
Cambrésis, par M.le Cher de Limoges, membre de plusieurs académies 7, 60.
84
modelling enlightenment: reassembling intertextual networks
Perchellet J.-P. 2004a. L’Héritage classique: la tragédie classique entre 1680 et 1814. Paris: Honoré Champion.
Perchellet J.-P. 2004b. ‘Corneille et ses publics au XVIIIe siècle’. In: Dix-septième siècle 225, 549–57.
https://doi.org/10.3917/dss.044.0549.
Poignault R. 1985. ‘NapoléonIer et NapoléonIII lecteurs de Jules César’. In: Chevallier R. (ed.) Présence de César. Paris:
Les Belles Lettres, 329–45.
Romanello M. and Hengchen S. 2021. ‘Detecting text reuse with Passim’. In: Programming Historian.
https://doi.org/10.46430/phen0092.
Salmi H., Paju P., Rantala H., Nivala A., Vesanto A. and Ginter F. 2020. ‘e Reuse of texts in Finnish newspapers
and journals, 1771–1920: a digital humanities perspective. In: Historical Methods: A Journal of Quantitative and
Interdisciplinary History 54:1, 14–28. https://doi.org/10.1080/01615440.2020.1803166.
Sappho 1681. Les Poésies d’Anacréon et de Sapho, traduites de grec en françois, avec des remarques. Dacier A. (trans.).
Paris: D.ierry & C.Barbin.
Sappho 1712. Les Odes d’Anacréon et de Sapho en vers françois, par le poète sans fard. Gacon F. (trans.). Rotterdam:
Fritsch & Böhm.
Sappho 1758. Anacréon, Sapho, Moschus, Bion, Tyrthée, etc., traduits en vers français. Poinsinet de Sivry L. (trans.).
Nancy: P.Antoine.
Sappho 1781. Poésies de Sapho, suivies de diérentes poésies dans le même genre. Billardon de Sauvigny E.-L. (trans.).
London: [n.pub.].
arsen J. and Gladstone C. 2020. ‘Using Philologic for digital textual and intertextual analyses of the Twenty-Four
Chinese Histories 二十四史. In: Journal of Chinese History 中國歷史學刊4:2, 558–63.
https://doi.org/10.1017/jch.2020.27.
Underwood T. 2019. Distant Horizons, Digital Evidence and Literary Change. Chicago IL: University of Chicago Press.
Van Praet J.B.B. 1783. Catalogue des livres de la bibliothèque de feu M.le duc de LaVallière. 4vol., Paris: Guillaume De
Bure. https://catalogue.bnf.fr/ark:/12148/cb365379105.
Vesanto A., Nivala A., Rantala H., Salakoski T., Salmi H. and Ginter F. 2017. ‘Applying BLAST to text reuse detection
in Finnish newspapers and journals, 1771–1910’. In: Bouma G. and Adesam Y. (eds) Proceedings of the NoDaLiDa
2017 Workshop on Processing Historical Language. Gothenburg: Linköping University Electronic Press, 54–58.
https://aclanthology.org/W17-0510.
Viala A. and Tunstall K. 2015. L’Âge classique et les Lumières. In: Viala A. 2014–2017. Une histoire brève de la littérature
française. Paris: Presses universitaires de France.
Zuber R. 1968. Les Belles indèles et la formation du goût classique. Paris: A.Colin.
Article
Full-text available
Cet article expose certains des défis qui ont émergé au cours des premières phases du projet Modern, programme de recherche financé par l’ERC (European Research Council, ou Conseil européen de la recherche) pour cinq ans, qui adopte une nouvelle approche partant des données (data driven) pour étudier l’histoire littéraire du siècle des Lumières. À partir d’un grand corpus de textes français du début de la période moderne, les auteurs détaillent les diverses étapes de la construction de réseaux intertextuels en se servant des résultats d’algorithmes de réutilisation de textes. De l’harmonisation du corpus et des métadonnées à l’entraînement d’un réseau neuronal pour filtrer les passages « bruités », cet article propose une chaîne de traitement pragmatique pour les projets similaires travaillant sur d’importantes collections de textes numérisés, tout en mettant en lumière les promesses ainsi que les périls de la recherche littéraire à grande échelle.
Article
Full-text available
A character network is a graph extracted from a narrative in which vertices represent characters and edges correspond to interactions between them. A number of narrative-related problems can be addressed automatically through the analysis of character networks, such as summarization, classification, or role detection. Character networks are particularly relevant when considering works of fiction (e.g., novels, plays, movies, TV series), as their exploitation allows developing information retrieval and recommendation systems. However, works of fiction possess specific properties that make these tasks harder. This survey aims at presenting and organizing the scientific literature related to the extraction of character networks from works of fiction, as well as their analysis. We first describe the extraction process in a generic way and explain how its constituting steps are implemented in practice, depending on the medium of the narrative, the goal of the network analysis, and other factors. We then review the descriptive tools used to characterize character networks, with a focus on the way they are interpreted in this context. We illustrate the relevance of character networks by also providing a review of applications derived from their analysis. Finally, we identify the limitations of the existing approaches and the most promising perspectives.
Book
Calvet’s Web is a study of the correspondence network of an Avignon physician in the period 1750–1810. Esprit Calvet was an antiquarian, natural historian, and bibliophile, and was at the centre of a circle of like-minded intellectuals from various backgrounds, chiefly based in the Rhone valley. Laurence Brockliss explores for the first time in detail the intellectual interests and relationships of a representative sample of the French Republic of Letters. He traces the destruction of the Republic during the Revolution, and its reconstruction, in different guise, under Napoleon. Calvet’s Web is an important contribution to our understanding of the social construction of knowledge, the history of collecting, and the history of the book. In addition, by examining the circle’s attitude to the philosophes and their programme of material and moral progress, it offers a new picture of the relationship between the Republic of Letters and the Enlightenment.
Article
Souvent lues comme le fruit de sa jalousie envers Crébillon, les tragédies grecques et romaines de Voltaire sont au contraire des exemples parfaits de ses expérimentations dramaturgiques, traduisant avec des stratagèmes scéniques innovants ses réflexions sur le mythe antique et l’Histoire. Les tragédies romaines sont innervées par une force « centrifuge », qui fait de la scène un centre propulsif vers le monde externe, dans une idée d’Histoire qui ne se limite pas aux vicissitudes de ses acteurs principaux ; les tragédies à sujet mythologique se caractérisent par une tendance « centripète », où les oppositions entre principes naturels également puissants convergent et éclatent symboliquement sur la scène. Finalement, dans les œuvres de Voltaire la scène « idéale » de la tragédie « classique » s’ouvre à l’espace et au temps extérieurs, qu’elle devienne le point où ils s’effondrent dans un sacrifice ultime ou le centre d’irradiation d’un nouvel élan ; ce qui apparaît en définitive comme l’une des causes principales de ses accusations à l’esthétique de Crébillon.
Article
In this lesson you will learn about text reuse detection -- the automatic identification of reused passages in texts -- and why you might want to use it in your research. Through a detailed installation guide and two case studies, this lesson will teach you the ropes of Passim, an open source and scalable tool for text reuse detection.
Book
We live in a networked world. Online social networking platforms and the World Wide Web have changed how society thinks about connectivity. Because of the technological nature of such networks, their study has predominantly taken place within the domains of computer science and related scientific fields. But arts and humanities scholars are increasingly using the same kinds of visual and quantitative analysis to shed light on aspects of culture and society hitherto concealed. This Element contends that networks are a category of study that cuts across traditional academic barriers, uniting diverse disciplines through a shared understanding of complexity in our world. Moreover, we are at a moment in time when it is crucial that arts and humanities scholars join the critique of how large-scale network data and advanced network analysis are being harnessed for the purposes of power, surveillance, and commercial gain. This title is also available as Open Access on Cambridge Core.
Book
Based on a systematic sampling of nearly 2000 French and English novels from 1601 to 1830, this book's foremost aim is to ask precisely how the novel evolved. Instead of simply 'rising', as scholars have been saying for some sixty years, the novel is in fact a system in constant flux, made up of artifacts – formally distinct novel types – that themselves rise, only to inevitably fall. Nicholas D. Paige argues that these artifacts are technologies, each with traceable origins, each needing time for adoption (at the expense of already developed technologies) and also for abandonment. Like technological waves in more physical domains, the rises and falls of novelistic technologies don't happen automatically: writers invent and adopt literary artifacts for many diverse reasons. However, looking not at individual works but at the novel as a patterned system provides a startlingly persuasive new way of understanding the history and evolution of artforms.
Article
What does it mean to be able to study Chinese history at scale? What methods, tools, and approaches will allow us to understand Chinese history and historiography from a larger perspective over the longue durée, including linguistic, philosophical, ethnographic, and literary concerns? In this article we present what we feel is one potential key to answering these questions and provide an overview of the utility and value of harnessing this framework for text-based historical research as a means to expand one's scholarship to virtually limitless scales.