Article

Phylogenetics of artificial manuscripts

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Biological evolution has parallels with the development of natural languages, man-made artifacts, and manuscript texts. As a result, phylogenetic methods developed for evolutionary biology are increasingly being used in linguistics, anthropology, archaeology, and textual criticism. Despite this popularity, there have been few critical tests of their suitability. Here, we apply phylogenetic methods to artificial manuscripts with a known true phylogeny, produced by modern 'scribes'. Although the survival of ancestral forms and multiple descendants from a single ancestor are probably much more common in manuscript evolution than biological evolution, we were able to reconstruct most of the true phylogeny. This is important because phylogenetic methods are influencing the production of critical editions of major written works. We also show that the variation in rates of change at different locations in the text follows a gamma distribution, as is often the case in DNA sequences.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Stemma generation is the process of determining the most likely tree 1 with manuscripts being represented by nodes in the usually directed acyclic graph 2 and edges representing copy processes or chains of such, see Figure 1. Since the late 1950ies computational methods have been applied to stemmatological tasks (Ellison, 1957) and in the last decade evaluation against benchmark datasets (or artificial traditions) has been conducted (Baret et al., 2004;Spencer et al., 2004a;Roos and Heikkilä, 2009;Hoenen, 2015a). These datasets have been generated by first giving one text (root) to volunteers to be handcopied (or dictated). ...
... For evaluation, we use three most used artificial datasets, called Parzival PRZ (English), Notre Besoin NB (French) and Heinrichi HR (Finnish) (Baret et al., 2004;Spencer et al., 2004a;Roos and Heikkilä, 2009) both in their entirety. A fourth 7 and fifth (Hoenen, 2015a) 8 are not focussed. ...
... There are confusions, where one single modality is to be held responsible. When Spencer et al. (2004a) mention the exapmle of <cl> and <d>, it is unlikely that the reason for the confusion lie in any other modality than vision. Thus modelling each modality separately and summing them, apart from having neurological correlates, is not unreasonable but the presented approach is surely just a first step to investigate a complex and data sparse object. ...
Article
Full-text available
Stemma generation can be understood as a task where an original manuscript M gets copied and copies-due to the manual mode of copying-vary from each other and from M. Copies M1, .., M k which survive historical loss serve as input to a mapping process estimating a directed acyclic graph (tree) which is the most likely representation of their copy history. One can first tokenize and align the texts of M1, .., M k and then produce a pairwise distance matrix between them. From this, one can finally derive a tree with various methods, for instance Neighbor-Joining (NJ) (Saitou and Nei, 1987). For computing those matrices, previous research has applied unweighted approaches to token similarity (implicitly interpreting each token pair as a binary observation: identical or different), see Mooney et al. (2003). The effects of weighting have then been investigated and Spencer et al. (2004b) found them to be small in their (not necessarily all) scenario(s). The present approach goes beyond the token level and instead of a binary comparison uses a distance model on the basis of psycholinguistically gained distance matrices of letters in three modalities: vision, audition and motorics. Results indicate that this type of weighting have positive effects on stemma generation.
... It is our second empirical comparison between parsimony and 3ta. The protocol of their generation is described in Spencer et al. (2004). A text used as an ancestral sequence was copied by volunteer scribes. ...
... The dataset on manuscript characters favours MP, which reaches its highest efficiency in this study (92.7%). 3ta shows slightly less efficiency (91.6%), because the true tree contains polytomies (Spencer et al., 2004). Given that our results are synthetized by a strict consensus, the fact that MP retains more optimal trees than 3ta is advantageous here. ...
... They provide a way to compare analyses by varying a single parameter. Empirical examples can also be found in the literature of known genealogies (Leitner et al., 1996;Spencer et al., 2004), while our study has used both approaches. ...
Article
Full-text available
Simulation-based and experimental studies are crucial to produce factual arguments to solve theoretical and methodological debates in phylogenetics. However, despite the large number of works that tested the relative efficiency of phylogenetic methods with various evolutionary models, the capacity of methods to manage various sources of error and homoplasy has almost never been studied. By applying ordered and unordered methods to datasets with iterative addition of errors in the ordering scheme, we show that unordered coding in parsimony is not a more cautious option. A second debate concerns how to handle reversals, especially when they are regarded as possible synapomorphies. By comparing analyses of reversible and irreversible characters, we show empirically that three-taxon analysis (3ta) manages reversals better than parsimony. For Brownian motion data, we highlight that 3ta is also more efficient than parsimony in managing random errors, which might result from taphonomic problems or any homoplasy generating events that do not follow the dichotomy reversal/ convergence, such as lateral gene transfer. We show parsimony to be more efficient with numerous character states (more than four), and 3ta to be more efficient with binary characters, both methods being equally efficient with four states per character. We finally compare methods using two empirical cases of known evolution.
... For stemmatology, some specific features of the units of interest (mainly manuscripts and the texts they hold) constitute a difference to each of its sister sciences, which is why developing a DL approach here could still be worthwhile. In this paper, after a brief recapitulation of related literature, DL approaches for stemmatology will be discussed in general before presenting an experiment where DL machine translation technology is being used for phylogenetic placement which is evaluated on a so-called artificial dataset, that is a dataset where a text has been manually copied and recopied and then aligned and digitized later on Spencer et al. (2004). ...
... For evaluation, we use the Parzival dataset Spencer et al. (2004), features of which are given in table 1. There are some more such traditions Baret et al. (2004); Roos and Heikkilä (2009); Hoenen (2015a) but many of them have either cycles or consist of various independent trees, which makes Parzival the most applicable dataset of this kind. ...
Preprint
Full-text available
Stemmatology is a subfield of philology where one approach to understand the copy-history of textual variants of a text (witnesses of a tradition) is to generate an evolutionary tree. Computational methods are partly shared between the sister discipline of phylogenetics and stemmatology. In 2022, a surveypaper in nature communications found that Deep Learning (DL), which otherwise has brought about major improvements in many fields (Krohn et al 2020) has had only minor successes in phylogenetics and that "it is difficult to conceive of an end-to-end DL model to directly estimate phylogenetic trees from raw data in the near future"(Sapoval et al. 2022, p.8). In stemmatology, there is to date no known DL approach at all. In this paper, we present a new DL approach to placement of manuscripts on a stemma and demonstrate its potential. This could be extended to phylogenetics where the universal code of DNA might be an even better prerequisite for the method using sequence to sequence based neural networks in order to retrieve tree distances.
... In particular, using traditional historical comparative methods, sign researchers have been unable to distinguish the results of tree-compatible evolutionary processes-that is, patterns of similarity reflecting inheritance in a vertical ancestor-descendant relationship-from tree-incompatible processes, such as borrowing and convergence. As a consequence of these methodological challenges, comparative studies of SLs at times conflate vertical and horizontal relationships in forming SL families [20,41] or forgo historical interpretations of their results [21]. ...
... Another source of tree-incompatible signal is the inclusion of putative "ancestors" in the form of historical MAs, as well as their direct or distant "descendants". Spencer et al. [41] showed that the distance-based Neighbor-Nets (NNets; [42]), which were designed to counter the problem of signal incompatibility, outperform tree inferences when it comes to correctly depicting ancestor-descendant relationships. ...
Preprint
While the evolution of spoken languages is well understood and has been studied using traditional historical comparative methods as well as newer computational phylogenetic methods, evolutionary processes resulting in the diversity of contemporary sign languages are poorly understood, and scholars have been largely unsuccessful in grouping sign languages into monophyletic language families. To date, no published studies have attempted to use language data to infer relationships amongst sign languages on a large scale. Here, we report the results of a phylogenetic analysis of 40 contemporary and 36 historical sign language manual alphabets coded for morphological similarity. Our results support grouping sign languages in the sample into six main European lineages, with three larger groups of Austrian, British, and French origin, as well as three smaller groups centering around Russian, Spanish, and Swedish. The British and Swedish lineages support current knowledge of relationships amongst sign languages based on extra-linguistic historical sources. With respect to other lineages, our results diverge from current hypotheses by indicating (i) independent evolution of Austrian, French, and Spanish from Spanish sources; (ii) an internal Danish subgroup within the Austrian lineage; and (iii) evolution of Russian from Austrian sources.
... The attempt to reconstruct phylogenetic relationships within a set of related text documents is a challenge of interest not only on our current digital era. For instance, historians have been studying this problem in handwritten medieval manuscripts [10][11][12] , Babbage was very concerned about errors in logarithmic tables [13], and The Book of Soyga, which surrounds an interesting part of John Dee's mystical interests [14], had multiple slightly different versions. Textual criticism and stemmatology have been playing an important role on the reconstruction of original manuscript texts from a set of different copies derived from it. ...
... On another category, some works in stemmatic analysis compare the evolution of manuscripts to the evolution and mutations in DNA sequences, and the relationship among a manuscript and its extant versions is also represented by means of a phylogeny tree [10, 17, 27]. In a similar trend, Spencer et al. [12] and Roos and Heikkila [15] evaluate different methods for reconstructing the phylogenetics of manuscript copies created artificially, and whose true phylogeny is known. However, these approaches reconstruct unrooted trees (or are manually rooted), and focus is given to the identification of groups of copies that are closer to each other. ...
Article
Full-text available
Over the history of mankind, textual records change. Sometimes due to mistakes during transcription, sometimes on purpose, as a way to rewrite facts and reinterpret history. There are several classical cases, such as the logarithmic tables, and the transmission of antique and medieval scholarship. Today, text documents are largely edited and redistributed on the Web. Articles on news portals and collaborative platforms (such as Wikipedia), source code, posts on social networks, and even scientific publications or literary works are some examples in which textual content can be subject to changes in an evolutionary process. In this scenario, given a set of near-duplicate documents, it is worthwhile to find which one is the original and the history of changes that created the whole set. Such functionality would have immediate applications on news tracking services, detection of plagiarism, textual criticism, and copyright enforcement, for instance. However, this is not an easy task, as textual features pointing to the documents’ evolutionary direction may not be evident and are often dataset dependent. Moreover, side information, such as time stamps, are neither always available nor reliable. In this paper, we propose a framework for reliably reconstructing text phylogeny trees, and seamlessly exploring new approaches on a wide range of scenarios of text reusage. We employ and evaluate distinct combinations of dissimilarity measures and reconstruction strategies within the proposed framework, and evaluate each approach with extensive experiments, including a set of artificial near-duplicate documents with known phylogeny, and from documents collected from Wikipedia, whose modifications were made by Internet users. We also present results from qualitative experiments in two different applications: text plagiarism and reconstruction of evolutionary trees for manuscripts (stemmatology).
... The reconstruction of the copy history of manuscript texts is largely similar to that of DNA, which is why phylogenetic approaches have been adopted (Robinson and O'Hara, 1996;Robinson et al., 1998;van Reenen et al., 1996;van Reenen et al., 2004;Spencer et al., 2004;Roos and Heikkilä, 2009;Roelli and Bachmann, 2010;Andrews and Macé, 2013). However, the main goal of the philological work on ancient manuscripts is not the reconstruction of the copy history but compiling an edition of a historical text. ...
... An artificial tradition is a fully digitized set of manuscripts which have been produced through manual copying in recent times whilst recording the true copy history/genealogical relationships. Three of these corpora have been published to date, Parzival by (Spencer et al., 2004), Notre Besoin by (Baret et al., 2004), and Heinrichi by (Roos and Heikkilä, 2009). They are provided in a fully word-aligned tabular version by the authors, so that collation must not be performed anymore. ...
... Kleinlogel, 2007;Robins, 2007), but that did little to change the situation. Recent commentary and application of unrooted trees in stemmatics (Baret, Mac e, & Robinson, 2006;Spencer et al., 2004a;Windram et al., 2008) continue to derive their views from the phylogenetic concept (Bryant & Moulton, 2002, 2004. As a consequence, unrooted trees were, in a way, re-seeded into stemmatics. ...
... Since then, their application has been increasing (e.g., Barbrook, Howe, Blake, & Robinson, 1998;Eagleton & Spencer, 2006;Mac e et al., 2004;Mooney et al., 2001;Phillips-Rodriguez et al., 2010;Robinson & OHara, 1996;Salemans, 2000;Stolz, 2003;Spencer et al., 2004a, b;Windram et al., 2008), although not without opposition (e.g., Hanna, 2000). Phylogenetic methods have even been experimentally applied to a set of artificial, human-copied, manuscripts, with promising results (Baret et al., 2006;Roos & Heikkil€ a, 2009;Spencer et al., 2004a). The subject is now seen as part of the broader field of phylomemetics, i.e., the application of phylogenetic methodology to non-biological entities which undergo replication with incorporation of changes which are more or less accurately transmitted to subsequent generations (Howe & Windram, 2011). ...
Article
Full-text available
We report on the only known case of independent discovery of unrooted trees in a historical science outside of biological systematics. The method of textual criticism (ecdotics, i.e., the building of text-version genealogies) created by French philologist Henri Quentin (1872–1935) proposes the use of a type of branching scheme equivalent to unrooted trees in phylogenetics. Because Quentin's method has never become the prevailing paradigm in philology, his insight into unrooted trees has not been noticed in previous studies comparing philology and phylogenetics. In fact, the modern use of unrooted trees in philology is seen as imported from phylogenetics. Quentin's procedure starts by building an unrooted tree (‘chain’) expressing the network of text versions (taxa) based on ‘variants’ (equivalent to unpolarized character states). Such undirected scheme is then rooted on the basis of extrinsic temporal information, thus resulting in a complete (rooted) hypothesis of relationships. Quentin asserts that the building of an unrooted tree precedes the determination of its orientation (rooting) and that the two procedures reflect distinct levels of structural organization, relying on different assumptions. Henri Quentin fully grasped the implications of time-reversible properties of unrooted trees and associated characters, in striking prescience of the same concepts developed in phylogenetics some 45 years later. The two versions of unrooted trees were developed entirely independently of each other and such convergence is testimony to the formal efficiency of approaching historical reconstruction in unrooted and rooted dimensions. © The Trustees of the Natural History Museum, London 2016. All Rights Reserved.
... Computer-assisted stemmatological analysis was 45 first used in 1977 (Platnick and Cameron), but it became a relatively common tool in the 1990s (O'Hara and Robinson, 1992Robinson, , 1993Robinson and O'Hara, 1996;Robinson, 1991, 1997, 2000a,b, 2001, Salemans 1996Spencer et al. 2003aSpencer et al. ,b, 50 2004). It has been used with artificial traditions (Spencer et al., 2004a, Baret et al., 2006, Roos and Heikkilä 2009 ing (Evert, 1996) and the coherence method developed by Gerd Mink (2004). More recently, software specifically designed to work with texts has been developed. ...
... A possible objection to this type of representation, for example, is that we might know with cer-20 tainty that three or more witnesses are descended from another one. We could know this from extratextual evidence, as in the case of the Svipdagsmál manuscripts (Robinson, 1991, O'Hara andRobinson, 1993) or of the artificial textual traditions 25 (Spencer et al., 2004a, Baret et al., 2006, Roos and Heikkilä, 2009 ...
Article
Full-text available
For some time, scholars have been using computer-assisted methods to produce graphic representations of the relationships between witnesses within a textual tradition. ¹ The use of methods originally developed by evolutionary biologists has been called into question on account of the perceived lack of identity between two different disciplines. This view arises from a misunderstanding about how the methods work in relation to texts and how the resulting stemmata should be interpreted. This article refines textual critical terminology, particularly the distinction between textual traditions and manuscript traditions, in the context of the use of computer-assisted stemmatological methods to further our understanding of how these fit within the wider theoretical framework of textual criticism and scholarly editing, and makes explicit the way in which stemmata produced by using evolutionary biology software should be read.
... The principle of copying with incorporation of changes underlies both phylogenetic and stemmatic analyses and, over the last 15 years or so, several projects have applied computer programs from phylogenetic analysis to studies of the transmission histories of textual traditions. Collaborations with literary scholars have resulted in phylogenetic analyses of a range of textual traditions, such as Chaucer's Canterbury Tales, 8 the German legend Parzival, 9 Dante's political philosophy treatise Monarchia, 10 manuscripts of the Finnish legend of St. Henry 11 and the poetry of the English 17th-century poet, Robert Herrick. 12 Analyses of artificially prepared 'traditions' of known copying history have indicated both the accuracy and some limitations of the approach 13 and have driven the development of new algorithms. 14 The methodology has also been shown to be applicable outside textual criticism and evolutionary biology, 15 having been very successfully applied to studies in the evolution of languages, 16 oral folk traditions 17 and cultural artefacts (e.g. the transmission of the patterns of Turkmen carpets). ...
... His pieces (nos. [16][17][18][19][20][21] are preceded by 15 compositions by his older contemporaries, William Byrd (1-8) and John Bull (9)(10)(11)(12)(13)(14)(15). Gibbons' Prelude in G, the last item in the volume, proved to be an exceptionally popular piece 30 and it survives in 16 extant sources from the 17th and early-18th centuries (see Table 1). ...
Article
Textual scholars studying the transmission history of literary texts increasingly make use of ‘phylogenetic’ computer programs from evolutionary biology, which are conventionally used for inferring the evolutionary relationships among organisms from DNA sequence data. However, very little use has been made of phylogenetic methods in studying musical traditions. We have tested the use of the methods in analysing the transmission history of 16 extant sources of the Prelude in G by Orlando Gibbons. Variations in features such as pitch, rhythm and note pattern were recorded as a ‘Nexus file’, which was analysed using the phylogenetic methods of Maximum Parsimony and NeighborNet. Statistical confidence was tested using bootstrapping. The Maximum Parsimony analysis placed the sources into four groups with strong statistical support and the NeighborNet analysis gave similar results, while indicating a linkage between members of two of the major groups. Separate analyses of passages of running semiquavers and the chordal accompaniment showed that the latter was responsible for most of the phylogenetic structure, consistent with traditional scholarship. The analysis also showed a more fundamental division into two groups, with one containing mostly early to mid-17th century sources and the other containing only more recent ones. The study shows that phylogenetic methods can be used to infer robust conclusions on the transmission history of this tradition that are consistent with conventional scholarship. These novel methods are likely to be of general applicability as a tool for music scholars.
... Computational textual restoration has previously involved either (i) domain experts using errordetection algorithms to discover a limited number of real errors (Graziosi et al., 2023), or (ii) broadly evaluating error detection algorithms using datasets of artificially generated errors (Spencer et al., 2004;Roos and Heikkilä, 2009;Hoenen, 2015). In contrast, we introduce the first error detection dataset composed of real errors. ...
... From these, a tree-type graph is obtained for the tradition (the set of manuscripts that collectively preserve a given work), and this graph is called a stemma codicum (Fig. 1, C). This is of course much reminiscent of the way phylogenetic trees or cladograms are obtained in biology, based on shared characteristics between observed or inferred species, and it is possible to turn a stemma into a binary phylogram (Fig. 1, D). Methods from cladistics and phylogenetics have sometimes been directly applied to texts, with results that remain debated [5,24,44,32,22]. ...
Preprint
Our knowledge of past cultures relies considerably on written material. For centuries, texts have been copied, altered, then transmitted or lost - eventually, from surviving documents, philologists attempt to reconstruct text phylogenies ("stemmata"), and past written cultures. Nonetheless, fundamental questions on the extent of losses, representativeness of surviving artefacts, and the dynamics of text genealogies have remained open since the earliest days of philology. To address these, we radically rethink the study of text transmission through a complexity science approach, integrating stochastic modelling, computer simulations, and data analysis, in a parsimonious mindset akin to statistical physics and evolutionary biology. Thus, we design models that are simple and general, while accounting for diachrony and other key aspects of the dynamical process underlying text phylogenies, such as the extinction of entire branches or trees. On the well-known case study of Medieval French chivalric literature, we find that up to 60% of texts and 99% of manuscripts were lost (consistent with recent synchronic "biodiversity" analyses). We also settle a hundred-year-old controversy on the bifidity of stemmata. Further, our null model suggests that pure chance ("drift") is not the only mechanism at play, and we provide a theoretical and empirical framework for future investigation.
... Computational textual restoration has previously involved either (i) domain experts using error-detection algorithms to discover a limited number of real errors , or (ii) broadly evaluating error detection algorithms using datasets of artificially generated errors (Spencer et al., 2004;Roos and Heikkilä, 2009;Hoenen, 2015). In contrast, we introduce the first error detection dataset composed of real errors. ...
Preprint
Full-text available
As premodern texts are passed down over centuries, errors inevitably accrue. These errors can be challenging to identify, as some have survived undetected for so long precisely because they are so elusive. While prior work has evaluated error detection methods on artificially-generated errors, we introduce the first dataset of real errors in premodern Greek, enabling the evaluation of error detection methods on errors that genuinely accumulated at some stage in the centuries-long copying process. To create this dataset, we use metrics derived from BERT conditionals to sample 1,000 words more likely to contain errors, which are then annotated and labeled by a domain expert as errors or not. We then propose and evaluate new error detection methods and find that our discriminator-based detector outperforms all other methods, improving the true positive rate for classifying real errors by 5%. We additionally observe that scribal errors are more difficult to detect than print or digitization errors. Our dataset enables the evaluation of error detection methods on real errors in premodern texts for the first time, providing a benchmark for developing more effective error detection algorithms to assist scholars in restoring premodern works.
... In an extended version of this research (currently ongoing by the author), the HTR-produced transcriptions are tested further as data input in experiments of manuscripts' hierarchical clustering. The aim is to produce a classification system by which all instances of a text can be traced back to their ancestors through a series of branching points -much like the phylogenetics method in Biology, but with DNA sequences replaced by manuscripts [Macé and Baret, 2004;Spencer et al., 2004]. Considering a big data scenario, hierarchical clustering was preferred over a stemmatical analysis for its speed and simplicity. ...
Article
Full-text available
HTR (Handwritten Text Recognition) technologies have progressed enough to offer high-accuracy results in recognising handwritten documents, even on a synchronous level. Despite the state-of-the-art algorithms and software, historical documents (especially those written in Greek) remain a real-world challenge for researchers. A large number of unedited or under-edited works of Greek Literature (ancient or Byzantine, especially the latter) exist to this day due to the complexity of producing critical editions. To critically edit a literary text, scholars need to pinpoint text variations on several manuscripts, which requires fully (or at least partially) transcribed manuscripts. For a large manuscript tradition (i.e., a large number of manuscripts transmitting the same work), such a process can be a painstaking and time-consuming project. To that end, HTR algorithms that train AI models can significantly assist, even when not resulting in entirely accurate transcriptions. Deep learning models, though, require a quantum of data to be effective. This, in turn, intensifies the same problem: big (transcribed) data require heavy loads of manual transcriptions as training sets. In the absence of such transcriptions, this study experiments with training sets of various sizes to determine the minimum amount of manual transcription needed to produce usable results. HTR models are trained through the Transkribus platform on manuscripts from multiple works of a single Byzantine author, John Chrysostom. By gradually reducing the number of manually transcribed texts and by training mixed models from multiple manuscripts, economic transcriptions of large bodies of manuscripts (in the hundreds) can be achieved. Results of these experiments show that if the right combination of manuscripts is selected, and with the transfer-learning tools provided by Transkribus, the required training sets can be reduced by up to 80%. Certain peculiarities of Greek manuscripts, which lead to easy automated cleaning of resulting transcriptions, could further improve these results. The ultimate goal of these experiments is to produce a transcription with the minimum required accuracy (and therefore the minimum manual input) for text clustering. If we can accurately assess HTR learning and outcomes, we may find that less data could be enough. This case study proposes a solution for researching/editing authors and works that were popular enough to survive in hundreds (if not thousands) of manuscripts and are, therefore, unfeasible to be evaluated by humans.
... 48 The F sources group closely with the London Autograph and show a typical exemplar/copy grouping in which, for each pair of sources (firstly the London Autograph/P 416, and then P 416/Go.S.312), the exemplar has the shorter branch, while the copy, which carries the variants of the exemplar plus its own variants, has a longer branch. 49 The lengths of the lines in the network are a measure of the number of differences (i.e. distance) between sources. ...
Article
Full-text available
J. S. Bach’s Well-Tempered Clavier II is well known for the complexity of its source situation, and its vast array of variant readings. The current article uses techniques of phylogenetic analysis, developed in the biological sciences, to deepen our understanding of the complex relationships between the primary sources. The computer algorithm NeighborNet is used to analyse data comprising the textual variants for the Prelude in A, bwv888 and the Prelude in C, bwv870. The resultant grouping of sources reflects the differences in revision practice between the two preludes. While Bach saw little need to revise the Prelude in A, the Prelude in C underwent a process of continued revision that can be discerned in the results of the phylogenetic analysis. The analyses also highlight the uncertain relationship of the manuscript DD70 with the other sources of the Prelude in C and the implications for 18th-century performance practice.
... Moretti 2005), and the genealogy of manuscripts (e.g. Spencer et al. 2004), but they are beyond the scope of this paper. ...
... Independently of its justification, the use of tools originally developed for a specialized field and later implemented in another has not been without criticism (Robins 2007, Hanna 2000, Cartlidge 2001, Alexanderson 2018. For this reason, various experiments were carried out with artificial textual traditions (Spencer et al. 2004;Roos and Heikkila 2009) and, when these were still not considered enough, a response to the criticisms was published (Howe, Connolly, and Windram 2012) followed by further theoretical explanations (Bordalejo 2016). Those interested in the use of phylogenetic analysis or other stemmatological tools would acquire reasonable knowledge from the texts mentioned above. ...
Preprint
Full-text available
This article describes computer-assisted methods for the analysis of textual variation within large textual traditions. It focuses on the conversion of the XML apparatus into NEXUS, a file type commonly used in bioinformatics. Phylogenetics methods are described with particular emphasis on maximum parsimony, the preferred approach for our research. The article provides details on the reasons for favouring maximum parsimony, as well as explaining our choice of settings for PAUP. It gives examples of how to use VBase, our variant database, to query the data and gain a better understanding of the phylogenetic trees. The relationship between the apparatus and the stemma explained. After demonstrating the vast number of decisions taken during the analysis, the article concludes that as much as computers facilitate our work and help us expand our understanding, the role of the editor continues to be fundamental in the making of editions.
... To accomplish that, the specific idea of phylogeny, where inferred evolutionary relationships are expressed in a tree-like structure, must come into play. Drawing specifically from ideas in evolutionary biology, Spencer et al. demonstrated that by using computational phylogenetic methods, it is possible to identify the relationship among a set of handwritten medieval manuscripts [12]. In the field of manuscript studies, texts exhibit similar behaviors of mutations as in genes in biology, with diverging patterns that represent key differences in the texts. ...
Article
Full-text available
The ease with which one can edit and redistribute digital documents on the Internet is one of modernity’s great achievements, but it also leads to some vexing problems. With growing academic interest in the study of the evolution of digital writing on the one hand, and the rise of disinformation on the other, the problem of identifying the relationship between texts with similar content is becoming more important. Traditional vector space representations of texts have made progress in solving this problem when it is cast as a reconstruction task that organizes related texts into a tree expressing relationships – this is dubbed text phylogeny in the information forensics literature. However, as new text representation methods have been successfully applied to many other text analysis problems, it is worth investigating if they too are useful in text phylogeny tree reconstruction. In this work, we explore the use of word embeddings as a text representation method, with the aim of trying to improve the accuracy of reconstructed phylogeny trees for real-world data and compare it with other widely used text representation methods. We evaluate the performance on established benchmarks for this task: a synthetic dataset and data collected fromWikipedia. We also apply our framework to a new dataset of fan fiction based on some famous fairy tales. Experimental results show that word embeddings are competitive with other feature sets for the published benchmarks, and are highly effective for creative writing.
... In the context of this study, we use the three previously most used artificial datasets, called Parzival PRZ (English), Notre Besoin NB (French) and Heinrichi HR (Finnish), (Baret et al., 2004;Spencer et al., 2004;Roos and Heikkilä, 2009). 7 PRZ has 21 manuscripts and the alignment has 855 lines, NB features 13 manuscripts of 1035 lines and HR 64 manuscripts of 1208 words. ...
Conference Paper
Full-text available
Corpora of manuscripts of the same ancient text often preserve many variants. This is so because upon copying over long copy chains errors and editorial changes have been repeatedly made and reverted to the effect that most often no 2 variant texts of the same so-called textual tradition have exactly the same text. Obviously in order to save the time to read all of the versions and in order to enable discourse and unambiguous referencing, philologists have since the beginnings of the age of print embarked on providing one single textual representation of this variety. In computational terms one tries to retrieve/compose the base text which is most likely the latest common ancestor (archetype) of all observed variants using stemmata-that is trees depicting the copy history (manuscripts = nodes, Copy processes = edges). Recently, they have been computed and evaluated automatically (Roos and Heikkilä, 2009). Likewise, automatic archetype reconstruction has been introduced lately (Hoenen, 2015b; Koppel et al., 2016). A synthesis of both stemma generation and archetype reconstruction has not yet been achieved. This paper therefore presents an approach where through iterative clustering a stemma and an archetype text are being reconstructed bottom-up.
... Quantitative analysis of linguistic markers to determine the timing and evolution of manuscripts has been increasingly employed over the last 20 years (Barbrook et al. 1998;Spencer et al. 2004;Eagleton and Spencer 2006;Howe and Windram 2011). More recently, these analyses have been extended to investigate the cultural legacies of folk tale records, some of which likely originated before the emergence of written records (da Silva and Tehrani 2016). ...
Article
Full-text available
Human settlement into new regions is typically accompanied by waves of animal extinctions, yet we have limited understanding of how human communities perceived and responded to such ecological crises. The first megafaunal extinctions in New Zealand began just 700 years ago, in contrast to the deep time of continental extinctions. Consequently, indigenous Māori oral tradition includes ancestral sayings that explicitly refer to extinct species. Our linguistic analysis of these sayings shows a strong bias towards critical food species such as moa, and emphasizes that Māori closely observed the fauna and environment. Temporal changes in form and content demonstrate that Māori recognized the loss of important animal resources, and that this loss reverberated culturally centuries later. The data provide evidence that extinction of keystone fauna was important for shaping ecological and social thought in Māori society, and suggest a similar role in other early societies that lived through megafaunal extinction events. Electronic supplementary material The online version of this article (10.1007/s10745-018-0004-0) contains supplementary material, which is available to authorized users.
... 9 Arbres were usually used as hypothetical units of argumentation for outlining general scenarios of copying and proliferation in philological discourse, see for instance Castellani (1957). However, recently, they have gained actuality through artificial traditions, that is, complete copied sets with known ground truth (Spencer et al., 2004;Baret et al., 2006;Roos and Heikkilä, 2009;Hoenen, 2015), where arbres are used for evaluation, comparing them to computationally reconstructed stemmata. ...
... Tehrani and Collard 2002), manuscripts (e.g. Spencer et al. 2004), languages (e.g. Gray and Atkinson 2003;Kandler et al. 2010), ancient games (e.g. de Voogt et al. 2013;de Voogt et al. 2015), monuments (Cochrane 2015;Neiman 1997) and many other domains. ...
Article
Full-text available
Migrations have occurred across the history of the genus Homo and while the movement of pre-modern humans over the globe is typically understood in terms of shifting resource distributions and climate change, that is in ecological terms, the movement of anatomically modern, and specifically Holocene, populations is often explained by human desire to discover new lands, escape despotic leaders, forge trade relationships and other culture-specific intentions. This is a problematic approach to the archaeological and behavioural explanation of human migration. Here an evolutionary and ecological framework is developed to explain various movement behaviours and this framework is applied to the movement of human groups from the inter-visible islands around New Guinea to the widely dispersed archipelagos of the southwest Pacific about 1000 BC. Labelled the Lapita Migration, this movement is explained as a selection-driven range expansion. The development of evolutionary and ecological theory to explain human movement facilitates empirical testing of alternative hypotheses and links different histories of human movement through shared explanatory mechanisms. Full text link: http://rdcu.be/uh7h
... Altogether, we tried to keep the phylogenetic analyses as simple and straightforward as possible: we used a matrix of unsorted and unweighted characters to infer neighbournet splits graphs (Bryant & Moulton, 2002) based on simple (Hamming) pairwise distances. Neighbour-nets are designed to better handle incompatible signals, and are more sensitive with respect to actual ancestor-descendant relationships than are dichotomous trees (Spencer et al., 2004;Denk & Grimm, 2009). The distance between two tips in a neighbour-net reflects the actual distance value, which is not necessarily the case in dichotomous trees (Bryant & Moulton, 2004;Huson & Bryant, 2006). ...
Article
Full-text available
The Osmundales (Royal Fern order) originated in the late Paleozoic and is the most ancient surviving lineage of leptosporangiate ferns. In contrast to its low diversity today (less than 20 species in six genera), it has the richest fossil record of any extant group of ferns. The structurally preserved trunks and rhizomes alone are referable to more than 100 fossil species that are classified in up to 20 genera, four subfamilies, and two families. This diverse fossil record constitutes an exceptional source of information on the evolutionary history of the group from the Permian to the present. However, inconsistent terminology, varying formats of description, and the general lack of a uniform taxonomic concept renders this wealth of information poorly accessible. To this end, we provide a comprehensive review of the diversity of structural features of osmundalean axes under a standardized, descriptive terminology. A novel morphological character matrix with 45 anatomical characters scored for 15 extant species and for 114 fossil operational units (species or specimens) is analysed using networks in order to establish systematic relationships among fossil and extant Osmundales rooted in axis anatomy. The results lead us to propose an evolutionary classification for fossil Osmundales and a revised, standardized taxonomy for all taxa down to the rank of (sub)genus. We introduce several nomenclatural novelties: (1) a new subfamily Itopsidemoideae (Guaireaceae) is established to contain Itopsidema, Donwelliacaulis, and Tiania; (2) the thamnopteroid genera Zalesskya, Iegosigopteris, and Petcheropteris are all considered synonymous with Thamnopteris; (3) 12 species of Millerocaulis and Ashicaulis are assigned to modern genera (tribe Osmundeae); (4) the hitherto enigmatic Aurealcaulis is identified as an extinct subgenus of Plenasium; and (5) the poorly known Osmundites tuhajkulensis is assigned to Millerocaulis. In addition, we consider Millerocaulis stipabonettiorum a possible member of Palaeosmunda and Millerocaulis estipularis as probably constituting the earliest representative of the (Todea-)Leptopteris lineage (subtribe Todeinae) of modern Osmundoideae.
... Za primenu ove teorije u arheologiji ključan je članak Frejzera Nejmana (Fraser Neiman) koji je formulisao kvantitativnu metodologiju koja je omogućila proučavanje evolucionih procesa kulture na osnovu arheološkog zapisa (Neiman 1995). Od tada pa do danas pojavio se veliki broj teorijskih, metodoloških i empirijskih arheoloških i antropoloških studija u ovom teorijskom okviru sa veoma zanimljivim rezultatima (Bentley et al. 2004;2007, Hahn and Bentley 2003, Henrich 2001, Herzog et al. 2004, Jordan and Shennan 2003, Lipo 2001, Lipo et al. 1997, Mesoudi 2011, Mesoudi et al. 2006, O'Brien et al. 2001, O'Brien and Lyman 2000, Powell 2009, Shennan and Wilkinson 2001, Shennan 2002, Spencer et al. 2004, Tehrani et al. 2010). ...
Article
Full-text available
Archaeological culture still persists as a basic analytical and interpretative concept in Serbian archaeology despite criticism. This paper presents a formal view of archaeological cultures and explores the epistemological implications of this formalization. Formal analysis of archaeological culture is achieved through logical and quantitative explication of the traditional definition of archaeological cultures. The main result of the formal analysis is that there are real patterns of formal variability of material culture that may or may not correspond to traditional archaeological cultures. These patterns are real only in the analytical sense – they are real for given input data and scale of analysis. Unlike the traditional approach where this patterns are equated with archaeological cultures which are furthered interpreted in essentialist terms or as quasi- organic entities such as ethnic groups, it is claimed here that discovered patterns are only the starting point – the empirical situation that needs to be accounted for in anthropological an historical terms. This paper shows how patterns that are traditionally identified as archaeological cultures can arise as a consequence of an entire range of processes – different social and historical realities. The main conclusion is that the traditional concept of archaeological culture is not useful neither as analytical or interpretative tools for two reasons: 1) traditional cultures are subjectively defined entities with no theoretical justification for the criteria used in their definition and 2) the empirical pattern cannot be an explanation in itself because it is the thing that needs to be explained. Cultural evolutionary (transmission) theory is proposed as a general framework for defining and interpreting patterns of formal variability of material culture in time and space.
... 101). With this linguistic metaphor in mind, Gatherer and others have suggested that discovering meaningful sequences in biological texts is like cryptologywith geneticists working as "biomolecular cryptologists" [19][20]-like Jean-Francois Champollion seeking out the sounds, words, and meanings of Egyptian hieroglyphics [21]. In biology the units would be "nucleotides, codons, motifs, domains, exons, genes, genomes, etc... up to cells and organisms" (John Sanford, personal communication in email dated July 29, 2011). ...
Chapter
Full-text available
[Here is the abstract of the published version that appeared in 2013. The uploaded full text, however, is from a draft written prior to reviewing by Sanford and others that was completed on January 11, 2011. I like my original paper better than the version that was published in 2013 by World Scientific after being stripped of much of the substance and most of the mathematics.] The goal of this paper is to define pragmatic information with a view toward measuring it. Here, pragmatic information means the content of valid signs — the key that unlocks language acquisition by babies and to human communication through language — also the content that enables biological “codes” in genetics, embryology, and immunology to work. In such systems, the inter-related layers appear to be ranked as in a hierarchy. Sounds are outranked by syllables, in turn outranked by words, and so on. In DNA, nucleotide pairs are outranked by codons, which are outranked by genes, and so on. As signs of lower rank combine to form signs of any higher rank, combinatorial “explosions” occur. With each increase in rank, the number of possible combinations grows exponentially, but the constraints on valid strings and, thus, their pragmatic value, sharpens their focus. As a result with each explosive increase in the number of possible combinations the relative proportion of meaningful ones diminishes. Consequently, random processes of forming strings or changing them must tend increasingly toward meaninglessness (invalid and nonviable) strings. The consequent outcome of random mutations is mortality of individuals and in deep time an increasing number of disorders, diseases, and the eventual extinction of populations. Read More: http://www.worldscientific.com/doi/abs/10.1142/9789814508728_0003
... We rely exclusively on network methods as implemented in SplitsTree v. 4.13.1 [131] to draw phylogenetic conclusions based on the morphological matrix (see [58,[132][133][134][135]): (1) a neighbour-net [136,137] based on mean inter-taxon distances, and (2) bipartition networks to visualize support (Bayesian-inferred posterior probabilities, PP; non-parametric bootstrapping, BS) for alternative phylogenetic relationships [58,138,139]. BS support was established under three commonly used optimality criteria using 10,000 bootstrap replicates: (1) Least-squares via the BioNJ algorithm (BS NJ ; [140]); (2) Maximum parsimony (BS MP ) using PAUP* [141,142] ...
Article
Full-text available
Background: The classification of royal ferns (Osmundaceae) has long remained controversial. Recent molecular phylogenies indicate that Osmunda is paraphyletic and needs to be separated into Osmundastrum and Osmunda s.str. Here, however, we describe an exquisitely preserved Jurassic Osmunda rhizome (O. pulchella sp. nov.) that combines diagnostic features of both Osmundastrum and Osmunda, calling molecular evidence for paraphyly into question. We assembled a new morphological matrix based on rhizome anatomy, and used network analyses to establish phylogenetic relationships between fossil and extant members of modern Osmundaceae. We re-analysed the original molecular data to evaluate root-placement support. Finally, we integrated morphological and molecular data-sets using the evolutionary placement algorithm. Results: Osmunda pulchella and five additional Jurassic rhizome species show anatomical character suites intermediate between Osmundastrum and Osmunda. Molecular evidence for paraphyly is ambiguous: a previously unrecognized signal from spacer sequences favours an alternative root placement that would resolve Osmunda s.l. as monophyletic. Our evolutionary placement analysis identifies fossil species as probable ancestral members of modern genera and subgenera, which accords with recent evidence from Bayesian dating. Conclusions: Osmunda pulchella is likely a precursor of the Osmundastrum lineage. The recently proposed root placement in Osmundaceae—based solely on molecular data—stems from possibly misinformative outgroup signals in rbcL and atpA genes. We conclude that the seemingly conflicting evidence from morphological, anatomical, molecular, and palaeontological data can instead be elegantly reconciled under the assumption that Osmunda is indeed monophyletic. Keywords: Calcification, Evolutionary placement, Fern evolution, Organelle preservation, Osmundales, Osmundastrum, Outgroup, Paraphyly, Permineralization, Phylogenetic networks
... weigh variant readings (Andrews and Macé, 2013): what is indeed the effect of not distinguishing orthographic and linguistic variations from variants and errors? The answer is not obvious, as contrasting opinions among scholars demonstrate (Salemans 2000;Spencer et al., 2004). ...
Article
This book provides an up-to-date, coherent and comprehensive treatment of digital scholarly editing, organized according to the typical timeline and workflow of the preparation of an edition: from the choice of the object to edit, the editorial work, post-production and publication, the use of the published edition, to long-term issues and the ultimate significance of the published work. The author also examines from a theoretical and methodological point of view the issues and problems that emerge during these stages with the application of computational techniques and methods. Building on previous publications on the topic, the book discusses the most significant developments in digital textual scholarship, claiming that the alterations in traditional editorial practices necessitated by the use of computers impose radical changes in the way we think and manage texts, documents, editions and the public. It is of interest not only to scholarly editors, but to all involved in publishing and readership in a digital environment in the humanities.
... The ideal way to determine whether phylogenetic methods perform well on cultural data is to compare the results of analyses to cultural traits with known histories. Spencer et al. (2004) have made a start along these lines. They created an artificial manuscript tradition by having 20 volunteer "scribes" copy different versions of a manuscript (some copied the original, others copied scribal copies). ...
Article
The present experiment examined the effects of instructions transmitted across more than two individuals on a two-response sequence. An undergraduate (participant) was exposed to a contingency of continuous reinforcement of touching two of eight squares in a specified sequence (i.e., touching first the upper-left square then the bottom-left square) presented on a computer touch screen. Then the participant was asked to describe how to obtain the reinforcers. The first participant’s descriptions were presented to the next participant as instructions, prior to their exposure to the same contingency. In this way, verbal descriptions generated by each participant were transmitted from 1 participant to the next among 36 participants. Rates and percentages of the two-response sequence for the last 20 participants were higher than those for participants who were exposed to the contingency with no instructions (no instruction participants) and those who received descriptions generated by the no instruction participants. These results extend the generality of the effects of transmitted instructions on human responding, obtained from a multiple fixed-ratio differential-reinforcement-of-low-rate schedule in a previous experiment, to a continuous reinforcement schedule of a two-response sequence. Furthermore, they isolate the effects of instructions transmitted across more than two individuals from those transmitted within dyads.
Chapter
Full-text available
This volume presents the state of the art in digital scholarly editing. Drawing together the work of established and emerging researchers, it gives pause at a crucial moment in the history of technology in order to offer a sustained reflection on the practices involved in producing, editing and reading digital scholarly editions—and the theories that underpin them. The unrelenting progress of computer technology has changed the nature of textual scholarship at the most fundamental level: the way editors and scholars work, the tools they use to do such work and the research questions they attempt to answer have all been affected. Each of the essays in Digital Scholarly Editing approaches these changes with a different methodological consideration in mind. Together, they make a compelling case for re-evaluating the foundation of the discipline—one that tests its assertions against manuscripts and printed works from across literary history, and the globe. The sheer breadth of Digital Scholarly Editing, along with its successful integration of theory and practice, help redefine a rapidly-changing field, as its firm grounding and future-looking ambit ensure the work will be an indispensable starting point for further scholarship. This collection is essential reading for editors, scholars, students and readers who are invested in the future of textual scholarship and the digital humanities.
Chapter
Full-text available
This volume presents the state of the art in digital scholarly editing. Drawing together the work of established and emerging researchers, it gives pause at a crucial moment in the history of technology in order to offer a sustained reflection on the practices involved in producing, editing and reading digital scholarly editions—and the theories that underpin them. The unrelenting progress of computer technology has changed the nature of textual scholarship at the most fundamental level: the way editors and scholars work, the tools they use to do such work and the research questions they attempt to answer have all been affected. Each of the essays in Digital Scholarly Editing approaches these changes with a different methodological consideration in mind. Together, they make a compelling case for re-evaluating the foundation of the discipline—one that tests its assertions against manuscripts and printed works from across literary history, and the globe. The sheer breadth of Digital Scholarly Editing, along with its successful integration of theory and practice, help redefine a rapidly-changing field, as its firm grounding and future-looking ambit ensure the work will be an indispensable starting point for further scholarship. This collection is essential reading for editors, scholars, students and readers who are invested in the future of textual scholarship and the digital humanities.
Chapter
Full-text available
This volume presents the state of the art in digital scholarly editing. Drawing together the work of established and emerging researchers, it gives pause at a crucial moment in the history of technology in order to offer a sustained reflection on the practices involved in producing, editing and reading digital scholarly editions—and the theories that underpin them. The unrelenting progress of computer technology has changed the nature of textual scholarship at the most fundamental level: the way editors and scholars work, the tools they use to do such work and the research questions they attempt to answer have all been affected. Each of the essays in Digital Scholarly Editing approaches these changes with a different methodological consideration in mind. Together, they make a compelling case for re-evaluating the foundation of the discipline—one that tests its assertions against manuscripts and printed works from across literary history, and the globe. The sheer breadth of Digital Scholarly Editing, along with its successful integration of theory and practice, help redefine a rapidly-changing field, as its firm grounding and future-looking ambit ensure the work will be an indispensable starting point for further scholarship. This collection is essential reading for editors, scholars, students and readers who are invested in the future of textual scholarship and the digital humanities.
Chapter
Full-text available
This volume presents the state of the art in digital scholarly editing. Drawing together the work of established and emerging researchers, it gives pause at a crucial moment in the history of technology in order to offer a sustained reflection on the practices involved in producing, editing and reading digital scholarly editions—and the theories that underpin them. The unrelenting progress of computer technology has changed the nature of textual scholarship at the most fundamental level: the way editors and scholars work, the tools they use to do such work and the research questions they attempt to answer have all been affected. Each of the essays in Digital Scholarly Editing approaches these changes with a different methodological consideration in mind. Together, they make a compelling case for re-evaluating the foundation of the discipline—one that tests its assertions against manuscripts and printed works from across literary history, and the globe. The sheer breadth of Digital Scholarly Editing, along with its successful integration of theory and practice, help redefine a rapidly-changing field, as its firm grounding and future-looking ambit ensure the work will be an indispensable starting point for further scholarship. This collection is essential reading for editors, scholars, students and readers who are invested in the future of textual scholarship and the digital humanities.
Preprint
Full-text available
How did written works evolve, disappear or survive down through the ages? In this paper, we propose a unified, formal framework for two fundamental questions in the study of the transmission of texts: how much was lost or preserved from all works of the past, and why do their genealogies (their ``phylogenetic trees'') present the very peculiar shapes that we observe or, more precisely, reconstruct? We argue here that these questions share similarities to those encountered in evolutionary biology, and can be described in terms of ``genetic'' drift and ``natural'' selection. Through agent-based models, we show that such properties as have been observed by philologists since the 1800s can be simulated, and confronted to data gathered for ancient and medieval texts across Europe, in order to obtain plausible estimations of the number of works and manuscripts that existed and were lost.
Article
Full-text available
This article describes computer-assisted methods for the analysis of textual variation within large textual traditions. It focuses on the conversion of the XML apparatus into NEXUS, a file type commonly used in bioinformatics. Phylogenetics methods are described with particular emphasis on maximum parsimony, the preferred approach for our research. The article provides details on the reasons for favouring maximum parsimony, as well as explaining our choice of settings for PAUP. It gives examples of how to use VBase, our variant database, to query the data and gain a better understanding of the phylogenetic trees. The relationship between the apparatus and the stemma explained. After demonstrating the vast number of decisions taken during the analysis, the article concludes that as much as computers facilitate our work and help us expand our understanding, the role of the editor continues to be fundamental in the making of editions.
Article
Le corpus des mythes de la « Calebasse dévorante » est ici enrichi et complété, de même que celui des mythes diluviens, non rares en Afrique contrairement à une opinion répandue. Les méthodes phylomémétiques montrent que la répartition des mythes africains du Dévoreur s’explique au mieux en supposant l’existence de deux groupes. Leur aréologie suggère qu’une version eurasiatique aurait été introduite en Afrique de l’Est et qu’elle s’y serait alors propagée en s’enrichissant du développement strictement africain selon lequel un Dévoreur anthropomorphe, une fois tué, est brûlé, mais renaît sous la forme d’une calebasse monstrueuse poussant sur ses cendres ; alors ce fruit se met lui aussi à dévorer tout le monde, et il faut donc vaincre une seconde fois le Dévoreur ainsi « réincarné ». Cette nouvelle variante se serait diffusée vers l’ouest et le sud en donnant naissance aux récits dans lesquels ne figure plus que la calebasse, et ces nouvelles versions se seraient plus particulièrement implantées à l’ouest du continent en s’enrichissant d’un nouveau motif : celui selon lequel le vainqueur de ce Dévoreur végétal n’est plus un humain, mais un animal, et plus particulièrement un bélier.
Article
Phylogenetic trees or networks representing cultural evolution are typically built using methods from biology that use similarities and differences in cultural traits to infer the historical relationships between the populations that produced them. While these methods have yielded important insights, researchers continue to debate the extent to which cultural phylogenies are tree-like or reticulated due to high levels of horizontal transmission. In this study, we propose a novel method for phylogenetic reconstruction using dynamic community detection that focuses not on the cultural traits themselves (e.g., musical features), but the people creating them (musicians). We used data from 1,498,483 collaborative relationships between electronic music artists to construct a cultural phylogeny based on observed population structure. The results suggest that, although vertical transmission appears to be dominant, the potential for horizontal transmission (indexed by between-population linkage) is relatively high and populations never become fully isolated from one another. In addition, we found evidence that electronic music diversity has increased between 1975 and 1999. The method used in this study is available as a new R package called DynCommPhylo. Future studies should apply this method to other cultural systems such as academic publishing and film, as well as biological systems where high resolution reproductive data is available, and develop formal inferential models to assess how levels of reticulation in evolution vary across domains.
Article
Full-text available
Un problème dans l’étude des jeux d'enfant est la reconstruction de leur histoire et de leurs formes ancestrales. Je proposerai ici d’emprunter des outils à la biologie évolutive pour ce faire. La méthode sera appliquée à un corpus bien daté d’une célèbre comptine anglaise : The Grenadier.
Article
Full-text available
Ribosomal RNAs are complex structures that presumably evolved by tRNA accretions. Statistical properties of tRNA secondary structures correlate with genetic code integration orders of their cognate amino acids. Ribosomal RNA secondary structures resemble those of tRNAs with recent cognates. Hence, rRNAs presumably evolved from ancestral tRNAs. Here, analyses compare secondary structure subcomponents of small ribosomal RNA subunits with secondary structures of theoretical minimal RNA rings, presumed proto-tRNAs. Two independent methods determined different accretion orders of rRNA structural subelements: (a) classical comparative homology and phylogenetic reconstruction, and (b) a structural hypothesis assuming an inverted onion ring growth where the three-dimensional ribosome's core is most ancient and peripheral elements most recent. Comparisons between (a) and (b) accretions orders with RNA ring secondary structure scales show that recent rRNA subelements are: 1. more like RNA rings with recent cognates, indicating ongoing coevolution between tRNA and rRNA secondary structures; 2. less similar to theoretical minimal RNA rings with ancient cognates. Our method fits (a) and (b) in all examined organisms, more with (a) than (b). Results stress the need to integrate independent methods. Theoretical minimal RNA rings are potential evolutionary references for any sequence-based evolutionary analyses, independent of the focal data from that study.
Article
Full-text available
Accretions of tRNAs presumably formed the large complex ribosomal RNA structures. Similarities of tRNA secondary structures with rRNA secondary structures increase with the integration order of their cognate amino acid in the genetic code, indicating tRNA evolution towards rRNA-like structures. Here analyses rank secondary structure subelements of three large ribosomal RNAs (Prokaryota: Archaea: Thermus thermophilus; Bacteria: Escherichia coli; Eukaryota: Saccharomyces cerevisiae) in relation to their similarities with secondary structures formed by presumed proto-tRNAs, represented by 25 theoretical minimal RNA rings. These ranks are compared to those derived from two independent methods (ranks provide a relative evolutionary age to the rRNA substructure), (a) cladistic phylogenetic analyses and (b) 3D-crystallography where core subelements are presumed ancient and peripheral ones recent. Comparisons of rRNA secondary structure subelements with RNA ring secondary structures show congruence between ranks deduced by this method and both (a) and (b) (more with (a) than (b)), especially for RNA rings with predicted ancient cognate amino acid. Reconstruction of accretion histories of large rRNAs will gain from adequately integrating information from independent methods. Theoretical minimal RNA rings, sequences deterministically designed in silico according to specific coding constraints, might produce adequate scales for prebiotic and early life molecular evolution.
Article
Full-text available
The evolution of spoken languages has been studied since the mid-nineteenth century using traditional historical comparative methods and, more recently, computational phylogenetic methods. By contrast, evolutionary processes resulting in the diversity of contemporary sign languages (SLs) have received much less attention, and scholars have been largely unsuccessful in grouping SLs into monophyletic language families using traditional methods. To date, no published studies have attempted to use language data to infer relationships among SLs on a large scale. Here, we report the results of a phylogenetic analysis of 40 contemporary and 36 historical SL manual alphabets coded for morphological similarity. Our results support grouping SLs in the sample into six main European lineages, with three larger groups of Austrian, British and French origin, as well as three smaller groups centring around Russian, Spanish and Swedish. The British and Swedish lineages support current knowledge of relationships among SLs based on extra-linguistic historical sources. With respect to other lineages, our results diverge from current hypotheses by indicating (i) independent evolution of Austrian, French and Spanish from Spanish sources; (ii) an internal Danish subgroup within the Austrian lineage; and (iii) evolution of Russian from Austrian sources.
Article
Full-text available
Abstract: The corpus of the myths of the "Devouring Calabash" is here enriched and complemented, as well as that of the diluvial myths, not uncommon in Africa contrary to a widespread opinion. Phylome-metics show that the distribution of the African versions of the Devourer is best explained by assuming the existence of two groups, and areal studies suggest that a Eurasian version would have been introduced in East Africa and that it would have spread by enriching itself with the strictly African development according to which an anthropomorphic Devourer is killed and burned, but resurrects in the form of a monstrous calabash growing on the ashes; then this fruit also begins to devour everyone, and it is therefore necessary to kill anew the Devourer thus "reincarnated". This new variant would have spread to the West and South, giving birth to stories in which there is only the calabash, and these new versions would have been more particularly implanted in the West and enriched by a new motive: the one according to which the winner of this vegetal Devourer is no longer a human, but an animal, and more particularly a ram. Résumé : Le corpus des mythes de la « Calebasse dévorante » est ici enrichi et complété, de même que ce-lui des mythes diluviens, non rares en Afrique contrairement à une opinion répandue. Les mé-thodes phylomémétiques montrent que la répartition des mythes africains du Dévoreur s'explique au mieux en supposant l'existence de deux groupes. Leur aréologie suggère qu'une version eur-asiatique aurait été introduite en Afrique de l'Est et qu'elle s'y serait alors propagée en s'enrichis-sant du développement strictement africain selon lequel un Dévoreur anthropomorphe, une fois tué, est brûlé, mais renaît sous la forme d'une calebasse monstrueuse poussant sur ses cendres ; alors ce fruit se met lui aussi à dévorer tout le monde, et il faut donc vaincre une seconde fois le Dévoreur ainsi « réincarné ». Cette nouvelle variante se serait diffusée vers l'ouest et le sud en donnant naissance aux récits dans lesquels ne figure plus que la calebasse, et ces nouvelles versions se seraient plus particulièrement implantées à l'ouest du continent en s'enrichissant d'un nouveau motif : celui selon lequel le vainqueur de ce Dévoreur végétal n'est plus un humain, mais un animal, et plus particulièrement un bélier.
Chapter
Full-text available
The method presented here relies upon text-genealogical principles inspired by the Lachmannian or neo–Lachmannian tradition, and attempts to computerise them, following and extending the procedure first proposed by E. Poole in the 70’s. More than the application of computerised methods to philology, this method seeks to extend philology through the aid of the computer. It favours interaction between philologist and computer, and requires the former’s critical judgement at some points. After a careful selection of variant locations, needed to eliminate contamination and polygenesis (the two major factors that could impede the elaboration of a stemma), we then proceed to produce a stemma that is, at least at first, a simplification. The method is applied to two traditions: the artificial Parzival and the Bestiaires d’Amors by Richart de Fournival. The results obtained on the second are close to the hypotheses of C. Segre and J. Holmberg / G. B. Speroni.
Article
Full-text available
Over the past several decades, archaeologists, anthropologists, linguists, and others who study cultural phenomena have begun to appreciate that methods developed to reconstruct the evolutionary, or phylogenetic, relationships among biological taxa can be used to create cultural sequences based on heritable continuity. One method in particular is cladistics, which creates hypothetical statements of relatedness—rendered as trees—based on the model and parameters used. To date, cladistics has been used to create phylogenetic orderings of a wide variety of cultural phenomena, including basketry and other textiles, ceramic vessels, stone projectile points, languages, folk tales, manuscripts, residence patterns, and political organization. Here we lay out the basic method of cladistics and show how it has formed the basis for long-term studies of the colonization of eastern North America during the Early Paleoindian period (ca. 13,300–11,900 calendar years before the present). Statement of Significance Archaeologists have long used changes in artifact form to measure the passage of time, the supposition being that if the changes are ordered correctly, a historical sequence of forms is created. This is correct, but oftentimes what archaeologists really want to know is which thing produced another thing as opposed to simply preceding it. This is an evolutionary sequence. Over the past several decades, not only archaeologists but also anthropologists, linguists, and others who study cultural phenomena have begun to use a suite of methods that were developed to reconstruct the evolutionary, or phylogenetic, relationships among biological taxa, one of which is cladistics. This marks a return to the questions on which the founding of much of anthropology rests: the writing of cultural lineages. This return is important to the growth and continued health of archaeology and anthropology because a reconstructed phylogeny helps guide interpretation of the evolution of traits in that it generates hypotheses about the lineages in which those traits arose and under what circumstances. Data availability The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are contained within the paper.
Book
In fourteen thoughtful essays this book reports and reflects on the many changes that a digital workflow brings to the world of original texts and textual scholarship, and the effect on scholarly communication practices. The spread of digital technology across philology, linguistics and literary studies suggests that text scholarship is taking on a more laboratory-like image. The ability to sort, quantify, reproduce and report text through computation would seem to facilitate the exploration of text as another type of quantitative scientific data. However, developing this potential also highlights text analysis and text interpretation as two increasingly separated sub-tasks in the study of texts. The implied dual nature of interpretation as the traditional, valued mode of scholarly text comparison, combined with an increasingly widespread reliance on digital text analysis as scientific mode of inquiry raises the question as to whether the reflexive concepts that are central to interpretation – individualism, subjectivity – are affected by the anonymised, normative assumptions implied by formal categorisations of text as digital data.
Article
In this paper we study the manuscript tradition of Petrus Alfonsi’s Dialogus contra Iudaeos, written around AD 1110. This text was widely disseminated in the Middle Ages, especially during the century after its composition; there are over sixty complete manuscripts known. In order to group them we calculate a distance matrix from standardised text strings transcribed from the manuscripts. From this, tree graphs can be generated easily and quickly with the aid of software developed for biological phylogeny. The resulting tree graph can be iteratively improved by modifying the distance matrix using a number of methods, partly fully algorithmic, partly relying on philological decisions. We are thus able to divide the tradition into some ten main groups.
Article
The role of human philological judgement in textual criticism, and particularly in stemmatics, has been at times hotly debated and in computational stemmatology tends to be carefully circumscribed. In this context philological judgement is deployed to distinguish ‘significant’ from ‘insignificant’ textual variation—that is, to select those variants that are more or less likely to betray information about the exemplar from which a given text was copied. This article reports on an experiment performed to assess the accuracy of human philological judgement on a set of three artificial traditions, using tools for stemma analysis developed for a prior project and available to the public as the Stemmaweb online service. We show that for most of the artificial traditions, human judgement was not significantly better than random selection for choosing the variant readings that fit the stemma in a text-genealogical pattern, and we discuss some of the implications of these findings.
Article
Full-text available
In this essay, we review the methods of computer-assisted stemmatic analysis available to the Canterbury Tales Project. Our belief that these techniques will permit us to arrive at a more exact reconstruction of the history of the Canterbury Tales than could Manly and Rickert (1940) is vital to our decision to undertake this work. There are two major strands to these techniques. The first, cladistic analysis, is used to gain a rapid overview of the broad relations among the manuscripts. The second, database analysis, is used to refine conclusions about the exact relationships of particular manuscripts and groups, on the basis of scrutiny of individual variants and their distribution. In addition to discussion of these techniques, we briefly report here the results of our testing of these tools on the Wife of Bath’s Prologue manuscripts, among other materials.
Article
Full-text available
This article presents the results of the application of computer software programs developed for evolutionary biology to manuscript stemmatics. In a test case comparing manual stemmatics methodologies with the computer software when applied to analysis of the Middle English poem, «Kings of England» by John Lydgate, the researchers found that the computer programs performed well, delivering results comparable to those arrived at through manual stemmatic analysis.
Article
Full-text available
As a discipline, phylogenetics is becoming transformed by a flood of molecular data. These data allow broad questions to be asked about the history of life, but also present difficult statistical and computational problems. Bayesian inference of phylogeny brings a new perspective to a number of outstanding issues in evolutionary biology, including the analysis of large phylogenetic trees and complex evolutionary models and the detection of the footprint of natural selection in DNA sequences.
Article
Full-text available
We consider the problem of reconstructing a stemma for five extant manuscripts of the Ni?rstigningar Saga. The method proposed uses a simple stochastic model to check the plausibility of different suggested trees. The model is based on presence/absence recordings for readings in the documents. It is shown that the model can reproduce the numerical variation of the recordings to within reasonable accuracy. The model shows how the particular tree‐structure is reflected in the frequency distribution of the ‘profiles’. It also gives estimates for the rate of loss of readings. Furthermore it is suggested how to infer the existence of lost documents from the estimated loss probabilities and how to estimate the size of lost texts.We also briefly discuss how different assumptions on the rate of ‘creation’ (the amount of readings added by a particular scribe) will force the investigator to accept different stemmas.
Article
Full-text available
Chaucer's CanterburyTales consists of loosely-connected stories,appearing in many different orders in extantmanuscripts. Differences in order result fromrearrangements by scribes during copying, andmay reveal relationships among manuscripts. Identifying these relationships is analogous todetermining evolutionary relationships amongorganisms from the order of genes on a genome. We use gene order analysis to construct astemma for the Canterbury Tales. Thisstemma shows relationships predicted by earlierscholars, reveals new relationships, and sharesfeatures with a word variation stemma. Ourresults support the idea that there was noestablished order when the first manuscriptswere written.
Article
Full-text available
MOTIVATION: Real evolutionary data often contain a number of different and sometimes conflicting phylogenetic signals, and thus do not always clearly support a unique tree. To address this problem, Bandelt and Dress (Adv. Math., 92, 47-05, 1992) developed the method of split decomposition. For ideal data, this method gives rise to a tree, whereas less ideal data are represented by a tree-like network that may indicate evidence for different and conflicting phylogenies. RESULTS: SplitsTree is an interactive program, for analyzing and visualizing evolutionary data, that implements this approach. It also supports a number of distances transformations, the computation of parsimony splits, spectral analysis and bootstrapping.
Article
Full-text available
Biological techniques can be used to reconstruct the stemmata of text traditions. Here, we describe methods for assessing the reliability of the results. We use compatibility matrices to detect sections of the text with different patterns of transmission. By constructing stemmata from subsets of increasing size, we estimate the minimum amount of data needed to produce a reliable stemma. We use consistency indices to assess the overall reliability of the stemma and the level of support that individual variants give to the stemma. Bootstrap analyses allow us to reject features of the stemma that result from only a few variants. We apply these techniques to the stemma for the Miller's Tale in Chaucer's Canterbury Tales .
Article
Full-text available
Although methods of phylogenetic estimation are used routinely in comparative biology, direct tests of these methods are hampered by the lack of known phylogenies. Here a system based on serial propagation of bacteriophage T7 in the presence of a mutagen was used to create the first completely known phylogeny. Restriction-site maps of the terminal lineages were used to infer the evolutionary history of the experimental lines for comparison to the known history and actual ancestors. The five methods used to reconstruct branching pattern all predicted the correct topology but varied in their predictions of branch lengths; one method also predicts ancestral restriction maps and was found to be greater than 98 percent accurate.
Chapter
Stemmatology is the discipline that attempts to reconstruct the transmission of a text on the basis of relations between the various surviving manuscripts. The object of this volume is the evaluation of the most recent methods and techniques in the field of stemmatology, as well as the development of new ones. The book is largely interdisciplinary in character: it contains contributions from scholars from classical, historical, biblical, medieval and modern language studies, as well as from mathematical and computer scientists and biologists. The contributions in the book have been divided into two sections. The first section deals with various stemmatological methods and techniques. The second section focuses more specifically on the various problems concerning textual variation. An earlier volume on Studies in Stemmatology was published in 1996 and opened the most actual state of the art in stemmatology to a broad audience. That first volume was very well received by stemmatologists and also gave an impulse to new research, as several articles in the current volume clearly illustrate. Both volumes are of interest to scholars in (historical) linguistics, literary studies, Bible studies, classical studies, medieval studies, and history.
Article
Frequently, letters, words, and sentences are used in undergraduate textbooks and the popular press as an analogy for the coding, transfer, and corruption of information in DNA.We discuss here how the converse can be exploited, by using programs designed for biological analysis of sequence evolution to uncover the relationships between different manuscript versions of a text.We point out similarities between the evolution of DNA and the evolution of texts.
Article
The recently-developed statistical method known as the "bootstrap" can be used to place confidence intervals on phylogenies. It involves resampling points from one's own data, with replacement, to create a series of bootstrap samples of the same size as the original data. Each of these is analyzed, and the variation among the resulting estimates taken to indicate the size of the error involved in making estimates from the original data. In the case of phylogenies, it is argued that the proper method of resampling is to keep all of the original species while sampling characters with replacement, under the assumption that the characters have been independently drawn by the systematist and have evolved independently. Majority-rule consensus trees can be used to construct a phylogeny showing all of the inferred monophyletic groups that occurred in a majority of the bootstrap samples. If a group shows up 95% of the time or more, the evidence for it is taken to be statistically significant. Existing computer programs can be used to analyze different bootstrap samples by using weights on the characters, the weight of a character being how many times it was drawn in bootstrap sampling. When all characters are perfectly compatible, as envisioned by Hennig, bootstrap sampling becomes unnecessary; the bootstrap method would show significant evidence for a group if it is defined by three or more characters.
Article
1. George Kane, Piers Plowman: the A version (London: The Athlone Press, 1960). 2. A. E. Housman (ed.), M. Annaei Lucani Belli Ciuilis Libri Decem (Oxford: Basil Blackwell, 1926). 3. Adrian C. Barbrook, Norman Blake and Peter Robinson, 'The phylogeny of The Canterbury Tales', Nature, 394 (27 August 1998), p. 839. 4. John M. Manly and Edith Rickert, The Text of the Canterbury Tales (Chicago and London: University of Chicago Press, 1940). 5. M&R II, p. 8. 6. M&R II, pp. 195ff. 7. Daniel J. Ransom, Charles Moorman et al., A Variorum Edition of the Works of Geoffrey Chaucer, Vol. 2, The Canterbury Tales: The General Prologue (Norman and London: University of Oklahoma Press, 1993). 8. Beverly Kennedy, 'The variant passages in the Wife of Bath's Prologue and the textual transmission of The Canterbury Tales: the "Great Tradition" revisited', in Lesley Smith and Jane H. M. Taylor (eds.), Women, the Book and the Worldly (Cambridge: D. S. Brewer, 1995). 9. M&R II, p. 193.
Article
A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.
Article
Comparative studies have become both more frequent and more important as a means for understanding the biology, behaviour and evolution of mammals. Primates have complex social relationships and diverse ecologies, and represent a large species radiation. This book draws together a wide range of experts from fields as diverse as reproductive biology and foraging energetics to place recent field research into a synthetic perspective. The chapters tackle controversial issues in primate biology and behaviour, including the role of brain expansion and infanticide in the evolution of primate behavioural strategies. The book also presents an overview of comparative methodologies as applied to recent primate research which will provide new approaches to comparative research. It will be of particular interest to primatologists, behavioural ecologists and those interested in the evolution of human social behaviour.
Article
The application of Darwinian theory to archaeological phenomena has always been a difficult concept. In its most modern form, this approach has only gained currency since the 1980s. Perhaps the greatest hurdle to incorporating scientific evolutionism into archaeology is the necessary development of more than a rudimentary understanding of Darwinian evolution itself. Failure to recognize the conflict of anthropological terms such as "adaptation" and "fitness" with standard biological usage is fatal to any attempt to apply scientific evolutionism to the material record. Even more problematic are the outdated notions that human culture has allowed us to escape the effects of selection, that culture evolves, and that it does so in a progressive manner.This volume assembles what might be considered the benchmark articles in evolutionary archaeology articles that show how to apply scientific evolutionism to the study of variation in the archaeological record. It delineates an approach to the past in which artifacts are viewed as parts of human phenotypes and thus are subject to selection in the same manner as any somatic feature."Evolutionary Archaeology: Theory and Application" is aimed at archaeologists who want to understand the basics of evolutionary archaeology and who wish to do so from the beginning."
Book
— We studied sequence variation in 16S rDNA in 204 individuals from 37 populations of the land snail Candidula unifasciata (Poiret 1801) across the core species range in France, Switzerland, and Germany. Phylogeographic, nested clade, and coalescence analyses were used to elucidate the species evolutionary history. The study revealed the presence of two major evolutionary lineages that evolved in separate refuges in southeast France as result of previous fragmentation during the Pleistocene. Applying a recent extension of the nested clade analysis (Templeton 2001), we inferred that range expansions along river valleys in independent corridors to the north led eventually to a secondary contact zone of the major clades around the Geneva Basin. There is evidence supporting the idea that the formation of the secondary contact zone and the colonization of Germany might be postglacial events. The phylogeographic history inferred for C. unifasciata differs from general biogeographic patterns of postglacial colonization previously identified for other taxa, and it might represent a common model for species with restricted dispersal.
Article
The originals of ancient literary works gave rise to copies. These manuscripts were often copied in turn; often, too, manuscripts eventually perished. The editor of an ancient work must consider the relations among the manuscripts, extant and lost, which transmitted the text, and the structure of such manuscript populations in general. For example, most family-trees reconstructed by editors for the manuscripts of ancient works show exactly two main branches; is this to be expected, or due to some flaw in methods of reconstruction? Such questions are approached by modelling the evolving manuscript population through a birth-and-death process, with illustrative data for Greek and Latin literature.
Article
Until printing was invented, texts were copied by hand. The probability with which changes were introduced during copying was affected by the kind of text and society. We cannot usually estimate the probability of change directly. Instead, we develop an indirect method. We derive a relationship between the number of manuscripts in the tradition and the mean number of copies separating a randomly chosen pair of manuscripts. Given the rate at which the proportion of words that are different increases with the mean number of copies separating two manuscripts, we can then estimate the probability of change. We illustrate our method with an analysis of Lydgate's medieval poem The Kings of England.
Article
This article presents the results of a stemmatic analyis of the fifty-eight fifteenth-century witnesses to The Wife of Bath's Prologue. This analysis is based on the transcripts and collations of these witnesses published on my CD- ROM of The Wife of Bath's Prologue, and uses the techniques outlined in my article (with Robert O'Hara) on computer-assisted stemmatic analysis published in the first volume of the Canterbury Tales Project Occasional Papers (Robinson and O'Hara 1993; Robinson 1996.) The aim of the Canterbury Tales Project is to determine, as thoroughly as we can, the textual history of The Canterbury Tales. The rationale of the Project is twofold. First, the computer methods now at our disposal, for discovering, storing, sorting, and filtering all the information in all eighty-eight witnesses to the text of the Tales, may enable us to travel further towards this goal than previously possible. Secondly, the magic of hypertext and the spaciousness of computer publication,whether on CD-ROM or network, may permit us to provide other scholars with the most complete and convenient access to all the materials (transcripts, manuscript images, collations, databases of spellings, descriptions of manuscripts.) The published CD-ROM represents our first attempt at the second part of this rationale. This article, offering the stemmatic analyis of the four percent of the text of all the witnesses to the whole Canterbury Tales contained in The Wife of Bath's Prologue, represents our first substantial endeavour towards our overall aim: the reconstruction of the textual history of all the witnesses to the whole Tales.2
Article
The recently-developed statistical method known as the "bootstrap" can be used to place confidence intervals on phylogenies. It involves resampling points from one's own data, with replacement, to create a series of bootstrap samples of the same size as the original data. Each of these is analyzed, and the variation among the resulting estimates taken to indicate the size of the error involved in making estimates from the original data, In the case of phylogenies, it is argued that the proper method of resampling is to keep all of the original species while sampling characters with replacement, under the assumption that the characters have been independently drawn by the systematist and have evolved independently. Majority-rule consensus trees can be used to construct a phylogeny showing all of the inferred monophyletic groups that occurred in a majority of the bootstrap samples. If a group shows up 95% of the time or more, the evidence for it is taken to be statistically significant. Existing computer programs can be used to analyze different bootstrap samples by using weights on the characters, the weight of a character being how many times it was drawn in bootstrap sampling. When all characters are perfectly compatible, as envisioned by Hennig, bootstrap sampling becomes unnecessary; the bootstrap method would show significant evidence for a group if it is defined by three or more characters.
Article
Geoffrey Chaucer's The Canterbury Tales survives in about 80 different manuscript versions. We have used the techniques of evolutionary biology to produce what is, in effect, a phylogenetic tree showing the relationships between 58 extant fifteenth-century manuscripts of ``The Wife of Bath's Prologue'' from The Canterbury Tales. We found that many of the manuscripts fall into separate groups sharing distinct ancestors.
Article
Over the past two decades, many of the major controversies in historical linguistics have centred on language classification. Some of these controversies have been concentrated within linguistics, as in the methodological opposition of multilateral comparison to the traditional Comparative Method. Others have crossed discipline boundaries, with the question of whether correlations can be established between language families, archaeological cultures and genetic populations. At the same time, increasing emphasis on language contact has challenged the family tree as a model of linguistic relatedness. This paper argues that we must quantify language classification, to allow objective evaluation of alternative methods within linguistics, and of proposed cross–disciplinary correlations; and that a first step in this quantification is represented by the ‘borrowing’ of computational tools from biology.
Article
Darwinian evolution can be defined minimally as “any net directional change or any cumulative change in the characteristics of … populations over many generations—in other words, descent with modification”1 (p. 5). In archeology the population comprises artifacts, which are conceived of as phenotypic.2–4 Extension of the human phenotype to include ceramic vessels, projectile points, and the like is based on the notion that artifacts are material expressions of behavior, which itself is phenotypic. Archeology's unique claim within the natural sciences is its access to past phenotypic characters. Thus, historical questions are the most obvious ones archeologists can ask, although admittedly this is hardly a strong warrant for asking them. But if the issue is evolution, then historical questions must be asked. Posing and answering historical questions is the goal of evolutionary archeology.5.
Article
Cladistics, a method used to create a nested series of taxa based on homologous characters shared only by two or more taxa and their immediate common ancestor, offers a means of reconstructing artifact lineages that reflect heritable continuity as opposed to simple historical continuity. Although cladistically derived trees are only hypotheses about phylogeny, they are superior both to trees created through phenetics, which employs characters without regard as to whether they are analogous or homologous, and to trees created by using undifferentiated homologous characters. To date, cladistics is an unused approach to constructing archaeological phylogenies but one that holds considerable potential for resolving some of archaeology's historical problems. For example, it has long been noted that the southeastern United States exhibits the greatest diversity in fluted-point forms in North America—an observation that prompted Mason (1962) to propose that fluted points originated in the Southeast and then spread to other areas. However, because of a paucity of such points from well-dated contexts in the Southeast, it is difficult to ascertain chronological, let alone phylogenetic, relations among the various forms. Evolutionary trees derived from cladistical analysis are testable hypotheses about those phylogenetic relations.
Article
The debate on the evolution of culture has focused on two processes in particular, phylogenesis and ethnogenesis. Recently, it has been suggested that the latter has probably always been more significant than the former. This proposal was assessed by applying cladistic methods of phylogenetic reconstruction to a data set comprising decorative characters from textiles produced by Turkmen tribes since the 18th century. The analyses focused on two periods in Turkmen history: the era in which most Turkmen practised nomadic pastoralism and were organised according to indigenous structures of affiliation and leadership; and the period following their defeat by Tsarist Russia in 1881, which is associated with the sedentarisation of nomadic Turkmen and their increasing dependence on the market. The results indicate that phylogenesis was the dominant process in the evolution of Turkmen carpet designs prior to the annexation of their territories, accounting for c.70% of the resemblances among the woven assemblages. The analyses also show that phylogenesis was the dominant process after 1881, although ethnogenesis accounted for an additional 10% of the resemblances among the assemblages. These results do not support the proposition that ethnogenesis has always been a more significant process in cultural evolution than phylogenesis.
Conference Paper
We introduce NeighborNet, a network construction and data representation method that combines aspects of the neighbor joining (NJ) and SplitsTree. Like NJ, NeighborNet uses agglomeration: taxa are combined into progressively larger and larger overlapping clusters. Like SPLITSTREE, NeighborNet constructs networks rather than trees, and so can be used to represent multiple phylogenetic hypotheses simultaneously, or to detect complex evolutionary processes like recombination, lateral transfer and hybridization. NeighborNet tends to produce networks that are substantially more resolved than those made with SPLITSTREE. The method is efficient (O(n3) time) and is well suited for the preliminary analyses of complex phylogenetic data. We report results of three case studies: one based on mitochondrial gene order data from early branching eukaryotes, another based on nuclear sequence data from New Zealand alpine buttercups (Ranunculi), and a third on poorly corrected synthetic data.
Article
In this paper, we describe and illustrate a tool for analyzing and visualizing sequence and distance data, called the splits-graph. The construction of this graph is based upon the split-decomposition technique which is a procedure to decompose a given metric defined on a finite set in a canonical way into a sum of simpler metrics. In a way, this technique is comparable to Fourier analysis which also decomposes a given object under consideration (that is a periodic signal) into a sum of simpler such objects, in a canonical way. The splits-graph and the theory behind it have been developed mainly in Bielefeld over the last 5 years. The procedure for producing splits-graphs implemented in the SplitsTree program is also described and it is available from the authors.
Article
The tree model and the wave model of language evolution are united into one geometric network model. A related approach has previously been used for reconstructing DNA evolution, and we now present a network approach for diagnostic word lists of closely related languages. When applied to 17 Alpine Romance languages, the resulting evolutionary network reproduces known linguistic relationships and also fairly accurately reflects the geographic location of each language.
Article
To reconstruct a stemma or do any other kind of statistical analysis of a text tradition, one needs accurate data on the variants occurring at each location in each witness. These data are usually obtained from computer collation programs. Existing programs either collate every witness against a base text or divide all texts up into segments as long as the longest variant phrase at each point. These methods do not give ideal data for stemma reconstruction. We describe a better collation algorithm (progressive multiple alignment) that collates all witnesses word by word without a base text, adding groups of witnesses one at a time, starting with the most closely related pair.
Article
The practice of phylogenetic systematics frequently Includes the assumption that cladogenesis occurs by a series of bifurcations. Consequently, a phylogenetic tree that includes one or more polytomous nodes is generally viewed as unresolved. However, while some polytomles surely represent a failure of resolution, others may be real or the best resolution that can be achieved. Therefore, polytomies should be considered as phylogenetic hypotheses in the same way as bifurcating topologies.
Article
Many stemmatological methods require estimates of pairwise distances between manuscripts, where distance is some measure of the number of changes that have occurred during copying along the path linking the two manuscripts. If a pair of manuscripts are separated from their common ancestor by more than one copy, more than one change may have occurred at some locations in the text, and the observed distance between two manuscripts may underestimate the actual number of changes. We derive a simple estimate of the actual number of changes given the observed number of changes, using a mathematical model for copying errors. This estimate is little affected by the size of the lexicon, the average rate at which copying errors are made, and the number of words for which a given word might be mistaken. Variation in error rates among scribes has no effect, and variation in error rates among words is probably unimportant. We recommend the routine use of this formula. However, variation in error rates among locations in the text can strongly affect the relationships between observed and actual distances. Such variation might easily arise in poetry because of the constraints of rhyme. Two priorities for future work are testing the underlying model for copying errors, and determining patterns of variation in error rates among locations.
Article
Karl Lachmann's edition from 1833 still provides the basis for Parzival scholarship. Although the text has subsequently been revised in parts, a fundamentally new edition considering all extant manuscripts is required. Computer technology offers means for tackling this task in an effective and reliable manner. A critical electronic edition will give access to the manuscript material, which may be published stage by stage, corresponding to different sections of the text. Such an edition will allow users to consult a base text, electronically linked to an apparatus of variants, to manuscript transcriptions, and to facsimiles. Browsing among these components, readers will experience the extent to which the Parzival romance was open to textual variance in the course of its transmission (an aspect stressed by theories of the so&hyphen;called ‘New Philology’). Furthermore, new stemmatological methods borrowed from evolutionary biology (phylogeny) will provide insight into manuscript groupings that may reflect early textual versions that relate to the semi&hyphen;oral status of vernacular literary culture. Thus, an electronic edition will be the essential prerequisite of any new Parzival book edition. But it also constitutes an edition in its own right, revealing the discursive and visual richness of medieval text traditions and involving the readers in the editorial process.
Article
A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.
Article
The application of maximum likelihood techniques to the estimation of evolutionary trees from nucleic acid sequence data is discussed. A computationally feasible method for finding such maximum likelihood estimates is developed, and a computer program is available. This method has advantages over the traditional parsimony algorithms, which can give misleading results if rates of evolution differ in different lineages. It also allows the testing of hypotheses about the constancy of evolutionary rates by likelihood ratio tests, and gives rough indication of the error of ;the estimate of the tree.