Fig 6 - uploaded by Mark Eisen
Content may be subject to copyright.
Attribution of Shakespeare plays. We attribute the 28 plays in Table I and the additional 10 plays in Table VI. All plays are attributed to Shakespeare. Marlowe's distance to a play is highly dependent on whether the analyzed play is a history play or not, emphasizing the impact of genre in attribution.  

Attribution of Shakespeare plays. We attribute the 28 plays in Table I and the additional 10 plays in Table VI. All plays are attributed to Shakespeare. Marlowe's distance to a play is highly dependent on whether the analyzed play is a history play or not, emphasizing the impact of genre in attribution.  

Source publication
Article
Full-text available
Function word adjacency networks (WANs) are used to study the authorship of plays from the Early Modern English period. In these networks, nodes are function words and directed edges between two nodes represent the likelihood of ordered co-appearance of the two words. For every analyzed play a WAN is constructed and these are aggregated to generate...

Contexts in source publication

Context 1
... Fig. 6 we present the attribution of 38 plays believed to have been written by Shakespeare, 30 of which are attributed solely to Shakespeare in [26]. Note that 2 of the 30 sole authored plays, 2 Henry VI and 3 Henry VI are not included in Shakespeare's profile in Table I because they have a strong history of disputed authorship ...
Context 2
... ranked atypically high for this play-second behind Shakespeare. Both Shakespeare and Marlowe have been proposed as candidates for Taming of a Shrew [50], in the former case as a possibly early draft of Taming of the Shrew. While our analysis points to Shakespeare as a more likely candidate, observe that the attribution of Taming of the Shrew in Fig. 6 ranks Marlowe as the worst candidate, indicating that much more of his style is evident in the early ...
Context 3
... act and scene analysis of Shakespeare and Fletcher's other collaboration-Henry VIII-is displayed in Fig. 12. Recall that, when attributing the full play, Shakespeare was the top candidate while Fletcher was in fact ranked fourth, thus revealing no evidence of collaboration; see Fig. 6 or Fig. 9. We see similar results in Fig. 12, in which Shakespeare is assigned every act. Fletcher, again, is ranked poorly in every act. A scene-by-scene analysis between Shakespeare and Fletcher however, does reveal Fletcher to be a stronger candidate than Shakespeare in several individual scenes. In fact, the scene breakdown we ...
Context 4
... suggested by the results in Fig. 6, the three parts of Henry VI have been considered as possible collaborations between Shakespeare and Marlowe [14], though others such as Greene and Peele have also been suggested. The attribution of the acts of 1 Henry VI, displayed in Fig. 16, suggests that Act 1 could have been written by someone other than Shakespeare. It is here ...
Context 5
... suggested by the results in Fig. 6, the three parts of Henry VI have been considered as possible collaborations between Shakespeare and Marlowe [14], though others such as Greene and Peele have also been suggested. The attribution of the acts of 1 Henry VI, displayed in Fig. 16, suggests that Act 1 could have been written by someone other than Shakespeare. It is here attributed evenly between Shakespeare and Jonson with Marlowe the next preferred candidate. Although Jonson is generally not considered a candidate for this play, it may suggest a similar author we do not profile. The rest of the play is assigned ...

Similar publications

Article
Full-text available
The study of Modern naval artillery is particularly relevant to understand combat tactics, naval architecture, and the process of industrialization. The batteries of frigates and ships of the line of the 18th century main maritime powers were the main means to settle their naval disputes. The iron cannons carried on board were subject to several te...

Citations

... Mathematical techniques are sometimes used to resolve debates over authors of some historical texts. Eisen, Segarra, Egan and Ribeiro (2018) investigated authorship attribution and evaluation by applying different stylometric techniques, starting from the 19th century, where manual counting of the stylistic features were carried out, to the beginning of the 20th century where different stylistic features such as the use of rare words, sentence lengths, frequency of function words and richness of vocabulary are tackled by computer-based techniques. Stylometric analysis is normally carried out for authorship attribution. ...
... One of the stylometric techniques stimulated by advances in computer is Word Adjacency Networks (WANs). Eisen et al. (2018) used this technique as nodes and edges, which present information on the use of two function words in a single sentence; each WAN is presented as a chain that displays transition of two function words. Eisen et al. conclude that more attribution accuracy is detected by utilizing WANs, rather than by the usual frequency-based techniques. ...
... According to Eisen et al (2018), author's style is identified by measuring his/her use of function words. In line with the analysis of the author's use of function words, word adjacency networks (WANs) are constructed where function words are taken as nodes and edges which include information on how two function words are used within the same sentence or phrase. ...
... The stylistic features AAA uses rely mainly on the frequency of usage of function words. Eisen et al. have developed a new technique based on function word adjacency networks (WANs) "with function words as nodes, and edges containing information regarding the use of two function words within a certain distance" (Eisen et al., 2018). To examine the use of such WANs R Stylo features Rolling Delta and Rolling Classify may be used, as does Hartmut Ilsemann in both his recent studies about the authorship of the Parnassus Plays (Ilsemann, 2018) and Thomas Kyd's Cornelia (Ilsemann, 2019). ...
Preprint
Full-text available
Automatic Authorship Attribution (AAA) is the result of applying tools and techniques from Digital Humanities to authorship attribution studies. Through a quantitative and statistical approach this discipline can draw further conclusions about renowned authorship issues which traditional critics have been dealing with for centuries, opening a new door to style comparison. The aim of this paper is to prove the potential of these tools and techniques by testing the authorship of five comedies traditionally attributed to Spanish playwright Tirso de Molina (1579-1648): La ninfa del cielo, El burlador de Sevilla, Tan largo me lo fiais, La mujer por fuerza and El condenado por desconfiado. To accomplish this purpose some experiments concerning clustering analysis by Stylo package from R and four distance measures are carried out on a corpus built with plays by Tirso, Andres de Claramonte (c. 1560-1626), Antonio Mira de Amescua (1577-1644) and Luis Velez de Guevara (1579-1644). The results obtained point to the denial of all the attributions to Tirso except for the case of La mujer por fuerza.
... The stylistic features AAA uses rely mainly on the frequency of usage of function words. Eisen et al. have developed a new technique based on function word adjacency networks (WANs) "with function words as nodes, and edges containing information regarding the use of two function words within a certain distance" (Eisen et al., 2018). To examine the use of such WANs R Stylo features Rolling Delta and Rolling Classify may be used, as does Hartmut Ilsemann in both his recent studies about the authorship of the Parnassus Plays (Ilsemann, 2018) and Thomas Kyd's Cornelia (Ilsemann, 2019). ...
Chapter
Full-text available
Automatic Authorship Attribution (AAA) is the result of applying tools and techniques from Digital Humanities to authorship attribution studies. Through a quantitative and statistical approach this discipline can draw further conclusions about renowned authorship issues which traditional critics have been dealing with for centuries, opening a new door to style comparison. The aim of this paper is to prove the potential of these tools and techniques by testing the authorship of five comedies traditionally attributed to Spanish playwright Tirso de Molina (1579-1648): La ninfa del cielo, El burlador de Sevilla, Tan largo me lo fiáis, La mujer por fuerza and El condenado por desconfiado. To accomplish this purpose some experiments concerning clustering analysis by Stylo package from R and four distance measures are carried out on a corpus built with plays by Tirso, Andrés de Claramonte (1560-1626), Antonio Mira de Amescua (1577-1644) and Luis Vélez de Guevara (1579-1644). The results obtained point to the denial of all the attributions to Tirso except for the case of La mujer por fuerza.
... Our approach is thus neither a bag-ofwords nor a word-sequence approach. Word-sequence approaches-specifically, functionword-adjacency networks (WANs)-have been used in authorship attribution [47,48] and gender classification [48] (for a detailed description of WAN, see [49]). ...
Article
Full-text available
Computational textual aesthetics aims at studying observable differences between aesthetic categories of text. We use Approximate Entropy to measure the (un)predictability in two aesthetic text categories, i.e., canonical fiction (‘classics’) and non-canonical fiction (with lower prestige). Approximate Entropy is determined for series derived from sentence-length values and the distribution of part-of-speech-tags in windows of texts. For comparison, we also include a sample of non-fictional texts. Moreover, we use Shannon Entropy to estimate degrees of (un)predictability due to frequency distributions in the entire text. Our results show that the Approximate Entropy values can better differentiate canonical from non-canonical texts compared with Shannon Entropy, which is not true for the classification of fictional vs. expository prose. Canonical and non-canonical texts thus differ in sequential structure, while inter-genre differences are a matter of the overall distribution of local frequencies. We conclude that canonical fictional texts exhibit a higher degree of (sequential) unpredictability compared with non-canonical texts, corresponding to the popular assumption that they are more ‘demanding’ and ‘richer’. In using Approximate Entropy, we propose a new method for text classification in the context of computational textual aesthetics.
... For the unlabelled case (no additional actions), such models are already in use for stylometric analysis, e.g. sequence of characters (Khmelev and Tweedie, 2008), sequence of words (Sanderson and Günter, 2006) and sequence of function words (Eisen et al., 2018). Usually, this is done by averaging values of a transition matrix given by a text and comparing different texts with these results. ...
Preprint
Full-text available
The syntactic behaviour of texts can highly vary depending on their contexts (e.g. author, genre, etc.). From the standpoint of stylometry, it can be helpful to objectively measure this behaviour. In this paper, we discuss how coalgebras are used to formalise the notion of behaviour by embedding syntactic features of a given text into probabilistic transition systems. By introducing the behavioural distance, we are then able to quantitatively measure differences between points in these systems and thus, comparing features of different texts. Furthermore, the behavioural distance of points can be approximated by a polynomial-time algorithm.
... These linguistic markers are relatively distinct from contextual factors, such as topic, and occur sufficiently frequently across different writings by the same author. New methods, such as Word Adjacency Networks (Eisen et al., 2016), continue to further our capacity to identify the author signal on the basis of lexical and co-lexical patterns. ...
Article
Aphra Behn’s dramatic outputs are recognized for their diversity and responsiveness to trends in Restoration drama. A stylometric approach is used to investigate the linguistic dimension of Behn’s dramatic style, with a particular focus on evidence of chronological change. Quantitative analysis (most frequent words, function words, zeta) suggests that Behn’s drama falls into three periods. A qualitative analysis indicates that the periodization may reflect a change in the construction of Behn’s dramatic worlds, from an abstract psychological focus to a more grounded, interactive and social representation. The study considers the problematic dating of Behn’s tragi-comedy The Young King. Although critical opinion holds that this play was the first that Behn wrote (i.e. pre-1670), the stylometric analysis suggests that Behn heavily revised, or, indeed, penned, the drama in the mid-to-late 1670s, mid-way through her writing career. The paper demonstrates the potential for stylochronometric techniques to complement other linguistic approaches to style, and enhance our understanding of how literary writing evolves.
... The first manual quantitative analysis occurred in the late 1880s by Thomas C Mendenhall (1887) who used word length distributions from the works of Bacon, Marlowe, and Shakespeare to identify the authorship of Shakespeare's plays. Stylometry has been used extensively to determine the authorship of many undocumented playwright collaborations from the Elizabethan period, including Shakespeare (Segarra et al., 2017). Below we summarize some analytical techniques, but for a more comprehensive overview of stylometry and its classification techniques see Neal et al. (2017) and Aljumily (2015). ...
... They also use methods based on the Information Theoretic measure Jensen-Shannon divergence (JSD), and unsupervised graph partitioning clustering algorithms (Arefin et al., 2014). There are other techniques used in this period of Shakespearean analysis, including simple function words (Matthews and Merriam, 1993;Merriam and Matthews, 1994) and word adjacency networks (WANs) (Segarra et al., 2017), or looking at rare and unique phrases (Swaim, 2017). However, the most relevant to the RPAS technique used in this paper are the ones based on personality. ...
Article
Full-text available
Little is known of the private life of William Shakespeare, but he is famous for his collection of plays and poems, even though many of the works attributed to him were published anonymously. Determining the identity of Shakespeare has fascinated scholars for 400 years, and four significant figures in English literary history have been suggested as likely alternatives to Shakespeare for some disputed works: Bacon, de Vere, Stanley, and Marlowe. A myriad of computational and statistical tools and techniques have been used to determine the true authorship of his works. Many of these techniques rely on basic statistical correlations, word counts, collocated word groups, or keyword density, but no one method has been decided on. We suggest that an alternative technique that uses word semantics to draw on personality can provide an accurate profile of a person. To test this claim, we analyse the works of Shakespeare, Christopher Marlowe, and Elizabeth Cary. We use Word Accumulation Curves, Hierarchical Clustering overlays, Principal Component Analysis, and Linear Discriminant Analysis techniques in combination with RPAS, a multi-faceted text analysis approach that draws on a writer's personality, or self to identify subtle characteristics within a person's writing style. Here we find that RPAS can separate the known authored works of Shakespeare from Marlowe and Cary. Further, it separates their contested works, works suspected of being written by others. While few authorship identification techniques identify self from the way a person writes, we demonstrate that these stylistic characteristics are as applicable 400 years ago as they are today and have the potential to be used within cyberspace for law enforcement purposes.
... Stylometry has been extensively used to determine the authorship of the undocumented collaborations of the playwrights from the Elizabethan period, including Shakespeare (Segarra, Eisen, Egan, & Ribeiro, 2017). There appears to be dissension among authorship attribution scholars about an agreed method (Rudman, 2016), but the most successful and robust methods are based on low-level information, such as character n-grams or auxiliary word (function word, stop words such as articles and prepositions) frequencies (Stamatatos, 2009). ...
... There appears to be dissension among authorship attribution scholars about an agreed method (Rudman, 2016), but the most successful and robust methods are based on low-level information, such as character n-grams or auxiliary word (function word, stop words such as articles and prepositions) frequencies (Stamatatos, 2009). The premier works in evaluating authorship include MacDonald P. Jackson, Brian Vickers, and Hugh Craig and Arthur Kinney (Segarra et al., 2017). Jackson (2006) uses common low-frequency word phrases, repetition of phrases, collocation, and images to link word groups to other works. ...
... Vickers (2011) uses a tri-gram, or n-gram, approach, while Hirsch and Craig (2014) use function word frequency and other methods, including ones based on word probabilities and the Information Theoretic measure, Jensen-Shannon divergence (JSD) and unsupervised graph partitioning clustering algorithms (Arefin, Vimieiro, Riveros, Craig, & Moscato, 2014). However, there are other techniques used in this period of Shakespearean analysis, including simple function words (Matthews & Merriam, 1993;Merriam & Matthews, 1994) and word adjacency networks (WANs) (Segarra et al., 2017). However, the Meaning Extracting Method (MEM) from the field of psychology to extract themes from commonly used adjectives and describe a person from their personality, or self, is very different Chung & Pennebaker, 2008). ...
Article
Full-text available
Using data containing stylometric markers for depression and Alzheimer’s disease, the 45 novels of Iris Murdoch and P.D. James are examined to see if a signature of an individual, their personality, changes over time due to life events and natural ageing. We use variants of the critical slowing down 1-lag autocorrelation and coefficient of skewness techniques with a multivariate identity measure, RPAS to visualize these changes. We find that life events such as depression, anxiety, and Alzheimer’s disease might be identified outside of natural ageing through a tipping point phenomenon. We believe these techniques might be a useful self-help tool to aid in the signalling of depressive episodes, such as averting suicide, and the early identification of Alzheimer’s disease, or for law enforcement personnel monitoring terrorists on watch lists.
... Stylometric analysis, the quantitative analysis of a text's linguistic features has been extensively used to determine the authorship of the undocumented collaborations of the playwrights from the Elizabethan period, including Shakespeare [9]. There appears dissension among leading Shakespearean authorship attribution scholars about an agreed method [10], but the most successful and robust methods are based on low-level information such as character n-grams or auxiliary words (function word, stop words such as articles and prepositions) frequencies [11]. ...
... There appears dissension among leading Shakespearean authorship attribution scholars about an agreed method [10], but the most successful and robust methods are based on low-level information such as character n-grams or auxiliary words (function word, stop words such as articles and prepositions) frequencies [11]. The premier work in evaluating authorship in the 16 th to mid-17 th centuries includes MacDonald P. Jackson, Brian Vickers, and Hugh Craig and Arthur Kinney [9]. Jackson [12] uses common low-frequency word phrases, repetition of phrases, collocation, and images to link word groups to other works. ...
... partitioning clustering algorithms [15]. However, there are other techniques used in this period of Shakespearean analysis, including simple function words [16,17] and word adjacency networks (WANs) [9]. However, the meaningextracting method (MEM) from the field of psychology to extract themes from commonly used adjectives and describe a person from their personality, or self is very different [18,19]. ...
Article
Full-text available
In 1598-99 printer, William Jaggard, named Shakespeare as the sole author of The Passionate Pilgrim even though Jaggard chose a number of non-Shakespearian poems in the volume. Using a neurolinguistics approach to authorship identification, a four-feature technique, RPAS, is used to convert the 21 poems in The Passionate Pilgrim into a multi-dimensional vector. Three complementary analytical techniques are applied to cluster the data and reduce single technique bias before an alternate method, seriation, is used to measure the distances between clusters and test the strength of the connections. The multivariate techniques are found to be robust and able to allocate nine of the 12 unknown poems to Shakespeare. The authorship of one of the Barnfield poems is questioned, and analysis highlights that others are collaborations or works of yet to be acknowledged poets. It is possible that as many as 15 poems were Shakespeare’s and at least five poets were not acknowledged.
Article
We begin with an admission of error and an apology to Pervez Rizvi. Looking again at his software, we concede that, contrary to what we asserted in our essay, his software does indeed implement the distinctive feature of our ‘formula 7’, which is that of processing only those ‘transitions’ that are non-zero in all the candidates’ profiles when testing multiple candidates for authorship. We overlooked the bit of his code that does this because, unlike our code that he builds upon, Rizvi’s code is undocumented and it adds this functionality using a logical construction that computer scientists are taught to avoid. Specifically, Rizvi uses a 200-character long ‘if’ statement governing seven distinct Boolean expressions. Much virtue in ‘if’. But it works, and Rizvi is right that without it his replications would not even come close to producing meaningful results. We apologize to Rizvi for our error arising from overlooking that line of his code. We reject the rest of Rizvi’s complaints.