Article

Relative contributions of Shakespeare and Fletcher in Henry VIII: An analysis based on most frequent words and most frequent rhythmic patterns

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The versified play Henry VIII is nowadays widely recognized to be a collaborative work not written solely by William Shakespeare. We employ combined analysis of vocabulary and versification together with machine learning techniques to determine which other authors took part in the writing of the play and what were their relative contributions. Unlike most previous studies, we go beyond the attribution of particular scenes and use the rolling attribution approach to determine the probabilities of authorship of pieces of texts, without respecting the scene boundaries. Our results highly support the canonical division of the play between William Shakespeare and John Fletcher proposed by James Spedding, but also bring new evidence supporting the modifications proposed later by Thomas Merriam.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... More importantly, machines may dig out some secrets that are difficult for humans to perceive in ancient books through optimized algorithms and models. For example, Plecháč et al. [5] analyzed the vocabulary and versification with machine learning techniques to determine which authors also participated in the writing of Henry VIII in addition to William Shakespeare. Clanuwat et al. [6] made use of deep learning technology to transcribe an ancient cursive text, namely Kuzushiji, which was uniformly used in pre-modern Japan but currently cannot be read by most Japanese people into modern Japanese, thus obtaining a key to open the door to understand ancient Japan. ...
... As mentioned in section 2.6, the realization of character recognition through one-shot verification is actually continuously comparing the similarity between the character image to be recognized with all characters in the alphabet. Finally, the one with the highest probability of similarity is selected as the recognition result, which is formalized in formula (5). ...
Preprint
Full-text available
Although the Tibetan language is widely used, its intelligent application is seriously lagging behind. Most studies on character recognition have almost ignored minority languages like Tibetan. For the purpose of recognizing Tibetan characters, a convolutional architecture named TwinNet is proposed in this work. Specifically, two parallel convolutional sub-networks sharing the same parameters were firstly carefully designed and connected using an energy function, thus achieving Tibetan character recognition via one-shot verification task based on similarity metric. Second, a fuzzy c-means clustering module based on statistical laws of Tibetan characters was integrated into the pipeline of TwinNet, thus greatly reducing the search space at the recognition stage. Third, a Tibetan similar character dataset (TSCD) is constructed after sufficient amount of mining and analysis work, providing data support for training supervised models. The results of binary classification experiment demonstrate that TwinNet can differentiate the similar and dissimilar image pairs with a recall of 0.92, a precision of 0.89, an accuracy of 0.90, an F-1 score of 0.90 and a Kappa coefficient of 0.82. Furthermore, the consistency between the output TwinNet and the subjective evaluation by Tibetan-speaking volunteers is also evaluated. The experimental results support the idea that the similarity evaluation of Tibetan characters by TwinNet is highly consistent with that of humans, the indicator of SROCC, PLCC and RMSE are 0.68, 0.84 and 0.04, respectively. The results of character recognition experiments based on one-shot verification task show that the improved model with the integration of the fuzzy c-means clustering module has an average accuracy of 92% for regular characters and 80% for severely deformed characters.
... Their most natural use is in studies focused on poetry; nevertheless, they have also been employed in authorship analysis of prose texts. In particular, some researches have studied the application of accent, or stress, for AId problems in English [8]. In the work by Corbara et al. [4], the documents are encoded in sequences of long and short syllables, from which the relevant features are extracted and used for AA in Latin prose texts, with promising results. ...
... Finally, we also compare the results obtained with the aforementioned features with the results obtained by a method trained on the original text (hence, potentially mining topic-related patterns). To this aim, we employ the pretrained transformer named 'BETO-cased', from the Huggingface library, 8 with the learning rate set to 10 −6 and the other hyper-parameters set as default. We fine-tune the model for 50 epochs on the training set. ...
Chapter
Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usually write about (as reflected in their past production). In this paper, we present a series of experiments evaluating the use of feature sets based on rhythmic and psycholinguistic patterns for Authorship Verification and Attribution in Spanish political language, via different approaches of text distortion used to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by the BETO transformer when the latter is trained on the original text, i.e., when potentially learning from topical information.
... Often, these features are the most frequent words from the analyzed literary texts-they tend to be function words ("a", "the", "on", etc.)-to which various measures of similarity (e.g., Euclidean distance) are applied. The most common goal of computational stylistics is attributing the authorship of texts where it is disputed like the authorship of Molière's plays (Cafiero and Camps, 2019), the Nobel Prizewinning novel And Quiet Flows the Don (Iosifyan and Vlasov, 2020), or Shakespeare and Fletcher's play Henry VIII (Plecháč, 2021). Thanks to numerous systematic comparisons of various approaches to computational stylometry, we now have a fairly good idea of which procedures and textual features are the most effective ones-depending on the goal of stylometric analysis, the language of texts, or their genre (Evert et al., 2017;Neal et al., 2017;Plecháč et al., 2018). ...
Article
Full-text available
What are the best methods of capturing thematic similarity between literary texts? Knowing the answer to this question would be useful for automatic clustering of book genres, or any other thematic grouping. This paper compares a variety of algorithms for unsupervised learning of thematic similarities between texts, which we call “computational thematics”. These algorithms belong to three steps of analysis: text pre-processing, extraction of text features, and measuring distances between the lists of features. Each of these steps includes a variety of options. We test all the possible combinations of these options. Every combination of algorithms is given a task to cluster a corpus of books belonging to four pre-tagged genres of fiction. This clustering is then validated against the “ground truth” genre labels. Such comparison of algorithms allows us to learn the best and the worst combinations for computational thematic analysis. To illustrate the difference between the best and the worst methods, we then cluster 5000 random novels from the HathiTrust corpus of fiction.
... Authorship attribution or stylometry is meant to identify the author of a text by studying its linguistic properties. While it has found many applications in the humanities [22][23][24], it recently went on to be used in courts of justice [25,26], or in journalistic investiga-tions [27,28]. The computational complexity of the task is rarely a problem, but the interpretation of the results can take some time: most AI-based analyses rely on easily interpretable methods [29], to be sure that the attribution is made on a set linguistically relevant features, rather than on topics evoked in the documents, typographic variations etc. ...
... This calls to mind Bruno Latour's very apt wordplay in French, about the données ("data") never being actually données ("given") but rather always having to be obtenues ("obtained") (Latour, 1993: 188). 1 The most desirable approach by far, though not the most frequent, is reusing a corpus and annotations that have already been compiled by others in the event that such a resource exists, is available for reuse, and has been suitably configured for the research questions at hand. One is clearly more likely to find such reusable resources when working on a prominent, widely studied author like Shakespeare (e.g., Arefin et al., 2014;Eisen et al., 2017;Plecháč, 2021) or Molière 2 after others have already spent a considerable amount of time and energy on creating a very high-quality corpus, sometimes forgoing other hermeneutical goals of their own. ...
Chapter
Full-text available
The progressive digitization of texts, be they literary or not, has had a remarkable impact on the way we access them, making it possible to obtain help from computers towards the analysis of literary works. Treating text as data allows researchers to test existing hypotheses and, sometimes, ask new questions. And yet, what might appear like a scholarly revolution is actually the natural continuation of former efforts. From card files to spreadsheets to deep learning, quantitative approaches are not a disruption of scholarly practices, particularly in the exploration of poetic texts, which typically rely on a highly regulated, and thus readily measurable, material. Whatever the complexity of the method used, viewing texts through this concentrating lens, this restricted gaze, forms a camera obscura within which lines of regularity may appear that we hadn’t necessarily thought of beforehand, enriching and furthering the scholarly examination of literary productions.
... For the above, we chose to measure the style based on the variation in the frequency of use of the most common function words, while using John Burrows' delta algorithm for measuring and comparing the different styles. Our decision was based on the fact that these two methods were already proven to give reliable results, even for small data sets [21,22]. At the same time, we had to compile collections of real texts of different types that would be used as the input to our system. ...
Article
Full-text available
Stylometry is a well-known field, aiming to identify the author of a text, based only on the way she/he writes. Despite its obvious advantages in several areas, such as in historical research or for copyright purposes, it may also yield privacy and personal data protection issues if it is used in specific contexts, without the users being aware of it. It is, therefore, of importance to assess the potential use of stylometry methods, as well as the implications of their use for online privacy protection. This paper aims to present, through relevant experiments, the possibility of the automated identification of a person using stylometry. The ultimate goal is to analyse the risks regarding privacy and personal data protection stemming from the use of stylometric techniques to evaluate the effectiveness of a specific stylometric identification system, as well as to examine whether proper anonymisation techniques can be applied so as to ensure that the identity of an author of a text (e.g., a user in an anonymous social network) remains hidden, even if stylometric methods are to be applied for possible re-identification.
... However, for projects that work with larger text corpora, close reading and extensive manual annotation are neither practical nor affordable. While the speech processing community explores end-to-end methods to detect and control the overall personal and emotional aspects of speech, including fine-grained features like pitch, tone, speech rate, cadence, and accent (Valle et al., 2020), applied linguists and digital humanists still rely on rule-based tools (Plecháč, 2020;Anttila and Heuser, 2016;Kraxenberger and Menninghaus, 2016), some with limited generality (Navarro-Colorado, 2018;Navarro et al., 2016), or without proper evaluation (Bobenhausen, 2011). Other approaches to computational prosody make use of lexical resources with stress annotation, such as the CMU dictionary (Hopkins and Kiela, 2017;Ghazvininejad et al., 2016), are based on words in prose rather than syllables in poetry (Talman et al., 2019;Nenkova et al., 2007), are in need of an aligned audio signal (Rosenberg, 2010;Rösiger and Riester, 2015), or only model narrow domains such as iambic pentameter (Greene et al., 2010;Hopkins and Kiela, 2017;Lau et al., 2018) or Middle High German (Estes and Hench, 2016). ...
... The case of TNK is closely linked to that of another play which was also supposedly co-authored by Shakespeare and Fletcher-The Famous History of the Life of King Henry the Eight. I have discussed the authorship of that work elsewhere (Plecháč 2020). Here I follow the design of that study and apply the same models to classify passages from TNK. ...
Book
https://versologie.cz/versification-authorship Contemporary stylometry uses different methods to figure out a poem’s author based on features like the frequencies of words and character n-grams. However, there is one potential textual fingerprint it tends to ignore: versification. Using poetic corpora in three different languages (Czech, German and Spanish), this book asks whether versification features like rhythm patterns and types of rhyme can help determine authorship. It then tests its findings on two real-life unsolved literary mysteries. In the first, we distinguish the parts of the verse play The Two Noble Kinsmen written by William Shakespeare from those by his co-author, John Fletcher. In the second, we seek to solve a case of suspected forgery. How authentic was a group of poems first published as the work of the 19th-century Russian author Gavriil Stepanovich Batenkov?
... Possibly not now, but certainly in the future, people can be recognized by text; similarly to how a neural network identified the specific scenes in Henry VIII which were not written by William Shakespeare. 61 The digital footprints 60 of many users can be collected from many sources and compared for similarity. ...
Article
Full-text available
Chatbots are artificial communication systems becoming increasingly popular and not all their security questions are clearly solved. People use chatbots for assistance in shopping, bank communication, meal delivery, healthcare, cars, and many other actions. However, it brings an additional security risk and creates serious security challenges which have to be handled. Understanding the underlying problems requires defining the crucial steps in the techniques used to design chatbots related to security. There are many factors increasing security threats and vulnerabilities. All of them are comprehensively studied, and security practices to decrease security weaknesses are presented. Modern chatbots are no longer rule‐based models, but they employ modern natural language and machine learning techniques. Such techniques learn from a conversation, which can contain personal information. The paper discusses circumstances under which such data can be used and how chatbots treat them. Many chatbots operate on a social/messaging platform, which has their terms and conditions about data. The paper aims to present a comprehensive study of security aspects in communication with chatbots. The article could open a discussion and highlight the problems of data storage and usage obtained from the communication user—chatbot and propose some standards to protect the user.
... However, for projects that work with larger text corpora, close reading and extensive manual annotation are neither practical nor affordable. While the speech processing community explores end-toend methods to detect and control the overall personal and emotional aspects of speech, including fine-grained features like pitch, tone, speech rate, cadence, and accent (Valle et al., 2020), applied linguists and digital humanists still rely on rule-based tools (Plecháč, 2020;Anttila and Heuser, 2016;Kraxenberger and Menninghaus, 2016), some with limited generality (Navarro-Colorado, 2018), or without proper evaluation (Bobenhausen, 2011). Other approaches to computational prosody are based on words in prose rather than syllables in poetry (Talman et al., 2019;Nenkova et al., 2007), rely on lexical resources with stress annotation such as the CMU dictionary, (Hopkins and Kiela, 2017;Ghazvininejad et al., 2016), are in need of an aligned audio signal (Rosenberg, 2010;Rösiger and Riester, 2015), or model only narrow domains such as iambic pentameter (Greene et al., 2010;Hopkins and Kiela, 2017;Lau et al., 2018) or Middle High German (Estes and Hench, 2016). ...
Preprint
Full-text available
A prerequisite for the computational study of literature is the availability of properly digitized texts, ideally with reliable meta-data and ground-truth annotation. Poetry corpora do exist for a number of languages, but larger collections lack consistency and are encoded in various standards, while annotated corpora are typically constrained to a particular genre and/or were designed for the analysis of certain linguistic features (like rhyme). In this work, we provide large poetry corpora for English and German, and annotate prosodic features in smaller corpora to train corpus driven neural models that enable robust large scale analysis. We show that BiLSTM-CRF models with syllable embeddings outperform a CRF baseline and different BERT-based approaches. In a multi-task setup, particular beneficial task relations illustrate the inter-dependence of poetic features. A model learns foot boundaries better when jointly predicting syllable stress, aesthetic emotions and verse measures benefit from each other, and we find that caesuras are quite dependent on syntax and also integral to shaping the overall measure of the line.
Article
Full-text available
Article Info ABSTRACT Article type: Research Article Automated author identification is one of the important fields in forensic linguistics. In this study, the effectiveness of systemic functional grammar (Halliday and Matthiessen, 2014) features in Persian authorship attribution was compared with that of function words. First, a corpus composed of documents written by seven contemporary Iranian authors was collected. Second, a list of function words was extracted from the corpus. Moreover, conjunction, modality and comment adjunct system networks were applied to form a lexicon using linguistics resources. Then, the relative frequency of function words in addition to systemic functional features were calculated in each document. Multilayer perceptron classifier, a type of neural network, was used for learning phase which resulted in a desirable accuracy in evaluation phase. The results of the study showed that using function words method is superior to systemic functional approach alone in Persian author identification, however, simultaneous use of the two methods increases the effectiveness in comparison to each alone.
Thesis
This compilation thesis takes a top-down perspective on the representation of different groups of people in Czech news press over three decades. The starting point is that human equality is a global prerequisite for a democratic world, according to the United Nations Sustainable Development Goals. The research questions for the thesis concern how positively or negatively different groups of people are represented, and how often the different groups appear compared to each other. The thesis contributes results based on a language other than English, which represents a valuable contribution to the field. Theoretically, focus is on the premise that language is a tool for gaining and maintaining power, and a way of expressing power relations (Reisigl & Wodak 2016; Fairclough 2015). An important theoretical focus is the phenomenon of linguistic othering (Fidler 2016), which here means letting a group of people stand out from a certain news description by emphasising some of their characteristics. They then form what is also called an outgroup, as opposed to the ingroup that the writer is assumed to be part of (van Dijk 1987). The findings of this thesis provide insights into how news media can influence our perceptions of, for example, different nationalities or professions, linked to their socio-economic status, and by extension how these perceptions can influence our attitudes and behaviours towards these groups. Methodologically, the thesis uses corpus-based discourse analysis. Empirically, the research is based on the Czech National Corpus (www.korpus.cz). From this corpus, 32 million observations are extracted of when positive and negative adjectives, classified according to a subjectivity lexicon, appear in the news press together with nouns for different kinds of groups of people, such as gendered words like “woman” or “man”, occupations like “maid” or “miner” and nationalities like “Somali” or “Dane”. When adjectives are closer to nouns, or even next to them, they are given more weight than when they are more distant (Cvrček 2014). With such large amounts of data, a top-down or bird’s eye view is the most reasonable, but some detailed analyses are also included. Study I focuses on the representation of nationalities and countries, classified by the World Bank into groups according to their gross national income, and their co-occurrence with the positive and negative adjectives. Results: the nationalities in the different income groups are represented in a descending order; the higher the national income, the more positive the representation. Furthermore, discourses related to the so-called war on terror, as well as the security of different nations, emerge as a result of the analysis. Study II focuses on two groups, a focus group of Arabs and Muslims and a reference group of the other nationalities and countries. The focus group is a very heterogeneous group of people and countries that is often portrayed in the context of conflict (Baker et al., 2013, pp. 2 and 32). Results: Arabs and Muslims are consistently represented as an out-group, which over time affects how the people who read these news media view them. Study III contains two sub-studies, based on an intersectional analysis of modern Czech news reporting; in one sub-study the analysis focuses on professional roles, and in the other on different nouns for women and men. Results: Those with lower socio-economic status and fewer supervisory roles in their work are less likely to appear in news coverage, but when they do appear, it is not always with more negative representations. Regarding gender, men are more often portrayed than women, and women are more often represented by evaluative adjectives than men. In addition, women’s positive representations are based to a greater extent on their appearance and feelings, while men’s representations are based on their importance and competence. Overall, the results confirm quantitatively, with an empirical material covering almost the entire print news reporting in the Czech Republic since democratisation, that hypotheses that have been theoretically proposed, as well as confirmed, for other countries, turn out to be true for Czech news reporting. There are systematic differences in the way that some groups of people are significantly more often represented in the media than others, and that some groups are systematically represented more favourably than others. It also shows that these imbalances are clearly linked to factors such as nationality, occupation and gender. Permanent link to university repository: https://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-227535 .
Chapter
In this chapter, we will learn more about how text is processed automatically. Natural language processing refers to the automated processing (including generation) of speech and text. We will look at some common natural language processing applications and explain common methods from the field of natural language processing. We will deepen our machine learning knowledge and introduce and understand the advantages and disadvantages of deep learning and neural networks. Finally, we will understand how words from human language can be represented as mathematical vectors and why this is beneficial for machine learning.
Article
A series of social media posts on 4chan then 8chan, signed under the pseudonym ‘Q’, started a movement known as QAnon, which led some of its most radical supporters to violent and illegal actions. To identify the person(s) behind Q, we evaluate the coincidence between the linguistic properties of the texts written by Q and to those written by a list of suspects provided by journalistic investigation. To identify the authors of these posts, serious challenges have to be addressed. The ‘Q drops’ are very short texts, written in a way that constitute a sort of literary genre in itself, with very peculiar features of style. These texts might have been written by different authors, whose other writings are often hard to find. After an online ethnography of the movement, necessary to collect enough material written by these thirteen potential authors, we use supervised machine learning to build stylistic profiles for each of them. We then performed a ‘rolling analysis’, looking repeatedly through a moving window for parts of Q’s writings matching our profiles. We conclude that two different individuals, Paul F. and Ron W., are the closest match to Q’s linguistic signature, and they could have successively written Q’s texts. These potential authors are not high-ranked personality from the US administration, but rather social media activists.
Article
Full-text available
This article delves into the literary canon, a concept shaped by social biases and influenced by successive receptions. The canonization process is a multifaceted phenomenon, emerging from the intricate interplay of sociological, economic, and political factors. Our objective is to detect the underlying textual dynamics that grant certain works exceptional longevity while jeopardizing the transmission of the majority. Drawing on various criteria, we present an operational framework for defining the French literary canon, centered on its contemporary reception and emphasizing the role of institutions, particularly schools, in its formation. Leveraging natural language processing and machine learning techniques, we unveil an intrinsic norm inherent to the literary canon. Through statistical modeling, we achieve predictive outcomes with accuracy ranging from 70% to 74%, contingent on the chosen scale of canonicity. We believe that these findings detect what Charles Altieri calls a “cultural grammar”, referring to the idea that canonical works in literature serve as foundational texts that shape the norms, values, and conventions of a particular cultural tradition. We posit that this linguistic norm arises from biased latent selection mechanisms linked to the role of the educational system in the canon-formation process.
Chapter
One branch of important digital humanities research focuses on the study of poetry and verse, leveraging large corpora to reveal patterns and trends. However, this work is limited by currently available poetry corpora, which are restricted to few languages and consist mainly of works by well-known classic poets. In this paper, we develop a new large-scale poetry collection, EEBO-verse (Code and dataset is available on https://github.com/taineleau/ebbo-verse), by automatically identifying the poems in a large Early Modern books collection — English Early-modern Printed Books Online (EEBO). Instead of training text-based classifiers to sub-select the 3.5% of EEBO that actually consists of poetry, we develop an image-based classifier that can operate directly on page scans, removing the need to perform OCR – which, in this domain, is often unreliable. We leverage large visual document encoders (DiT and BEiT), which are pretrained on general domain document images, by fine-tuning them on an in-domain annotated subset of EEBO. In experiments, we find that an appropriately trained image-only classifier performs as well or better than text-based poetry classifiers on human transcribed text, and far surpasses the performance of text-based classifiers on OCR output.Keywordshistorical document classificationdatasetpoetrypretraining
Preprint
Full-text available
A series of social media posts signed under the pseudonym "Q", started a movement known as QAnon, which led some of its most radical supporters to violent and illegal actions. To identify the person(s) behind Q, we evaluate the coincidence between the linguistic properties of the texts written by Q and to those written by a list of suspects provided by journalistic investigation. To identify the authors of these posts, serious challenges have to be addressed. The "Q drops" are very short texts, written in a way that constitute a sort of literary genre in itself, with very peculiar features of style. These texts might have been written by different authors, whose other writings are often hard to find. After an online ethnology of the movement, necessary to collect enough material written by these thirteen potential authors, we use supervised machine learning to build stylistic profiles for each of them. We then performed a rolling analysis on Q's writings, to see if any of those linguistic profiles match the so-called 'QDrops' in part or entirety. We conclude that two different individuals, Paul F. and Ron W., are the closest match to Q's linguistic signature, and they could have successively written Q's texts. These potential authors are not high-ranked personality from the U.S. administration, but rather social media activists.
Chapter
Among the many tasks of the authorship field, Authorship Identification aims at uncovering the author of a document, while Author Profiling focuses on the analysis of personal characteristics of the author(s), such as gender, age, etc. Methods devised for such tasks typically focus on the style of the writing, and are expected not to make inferences grounded on the topics that certain authors tend to write about. In this paper, we present a series of experiments evaluating the use of topic-agnostic feature sets for Authorship Identification and Author Profiling tasks in Spanish political language. In particular, we propose to employ features based on rhythmic and psycholinguistic patterns, obtained via different approaches of text masking that we use to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by a BETO transformer, when the latter is trained on the original text, i.e., potentially learning from topical information. Moreover, we further investigate the results for the different authors, showing that variations in performance are partially explainable in terms of the authors’ political affiliation and communication style.KeywordsAuthorship AnalysisText maskingPolitical speech
Article
It is well known that, within the Latin production of written text, peculiar metric schemes were followed not only in poetic compositions, but also in many prose works. Such metric patterns were based on so‐called syllabic quantity, that is, on the length of the involved syllables, and there is substantial evidence suggesting that certain authors had a preference for certain metric patterns over others. In this research we investigate the possibility to employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts. We test the impact of these features on the authorship attribution task when combined with other topic‐agnostic features. Our experiments, carried out on three different datasets using support vector machines (SVMs) show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
Article
Full-text available
The convergence of textuality and multimedia in the twenty-first century signals a profound shift in early modern scholarship as Shakespeare’s text is no longer separable from the diffuse presence of Shakespeare on film. Such transformative abstractions of Shakespearean linearity materialize throughout the perpetual remediations of Shakespeare on screen, and the theoretical frameworks of posthumanism, I argue, afford us the lens necessary to examine the interplay between film and text. Elaborating on André Bazin’s germinal essay “The Myth of Total Cinema,” which asserts that the original goal of film was to create “a total and complete representation of reality,” this article substantiates the posthuman potentiality of film to affect both humanity and textuality, and the tangible effects of such an encompassing cinema evince themselves across a myriad of Shakespearean appropriations in the twenty-first century (20). I propose that the textual discourses surrounding Shakespeare’s life and works are reconstructed through posthuman interventions in the cinematic representation of Shakespeare and his contemporaries. Couched in both film theory and cybernetics, the surfacing of posthuman interventions in Shakespearean appropriation urges the reconsideration of what it means to engage with Shakespeare on film and television. Challenging the notion of a static, new historicist reading of Shakespeare on screen, the introduction of posthumanist theory forces us to recognize the alternative ontologies shaping Shakespearean appropriation. Thus, the filmic representation of Shakespeare, in its mimetic and portentous embodiment, emerges as a tertiary actant alongside humanity and textuality as a form of posthuman collaboration.
Article
This article is devoted to Shakespeare and Fletcher’s historical play The Famous History of the Life of King Henry the Eight (1613). At first sight, This History seems very different from the tetralogies, since it is staged later, concerns more recent events, displays a sumptuous scenography and is written in collaboration with John Fletcher. Nevertheless, I argue that, far from “making” a monologic, exemplary and providentialist history, the play is rather similar to other shakespearian historical play, as far as the representation of history is concerned. The play questions historical knowledge in a sceptical perspective, and it shows the ways historical representation can be handled and counterfeited.
Chapter
The technique known as contemporary stylometry uses different methods, including machine learning, to discover a poem’s author based on features like the frequencies of words and character n-grams. However, there is one potential textual fingerprint stylometry tends to ignore: versification, or the very making of language into verse. Using poetic texts in three different languages (Czech, German, and Spanish), Petr Plecháč asks whether versification features like rhythm patterns and types of rhyme can help determine authorship. He then tests its findings on two unsolved literary mysteries. In the first, Plecháč distinguishes the parts of the Elizabethan verse play The Two Noble Kinsmen written by William Shakespeare from those written by his coauthor, John Fletcher. In the second, he seeks to solve a case of suspected forgery: how authentic was a group of poems first published as the work of the nineteenth-century Russian author Gavriil Stepanovich Batenkov? This book of poetic investigation should appeal to literary sleuths the world over.
Preprint
Full-text available
It is well known that, within the Latin production of written text, peculiar metric schemes were followed not only in poetic compositions, but also in many prose works. Such metric patterns were based on so-called syllabic quantity, i.e., on the length of the involved syllables, and there is substantial evidence suggesting that certain authors had a preference for certain metric patterns over others. In this research we investigate the possibility to employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts. We test the impact of these features on the authorship attribution task when combined with other topic-agnostic features. Our experiments, carried out on three different datasets, using two different machine learning methods, show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
Article
Full-text available
This article describes pilot experiments performed as one part of a long-term project examining the possibilities for using versification analysis to determine the authorships of poetic texts. Since we are addressing this article to both stylometry experts and experts in the study of verse, we first introduce in detail the common classifiers used in contemporary stylometry (Burrows' Delta, Argamon's Quadratic Delta, Smith-Aldridge's Cosine Delta, and the Support Vector Machine) and explain how they work via graphic examples. We then provide an evaluation of these classi-fiers' performance when used with the versification features found in Czech, German, Spanish, and English poetry. We conclude that versification is a reasonable stylometric marker, the strength of which is comparable to the other markers traditionally used in stylometry (such as the frequencies of the most frequent words and the frequencies of the most frequent character n-grams).
Article
Full-text available
Function word adjacency networks (WANs) are used to study the authorship of plays from the Early Modern English period. In these networks, nodes are function words and directed edges between two nodes represent the likelihood of ordered co-appearance of the two words. For every analyzed play a WAN is constructed and these are aggregated to generate author profile networks. We first study the similarity of writing styles between Early English playwrights by comparing the profile WANs. The accuracy of using WANs for authorship attribution is then demonstrated by attributing known plays among six popular playwrights. The WAN method is shown to additionally outperform other frequency-based methods on attributing Early English plays. This high classification power is then used to investigate the authorship of anonymous plays. Moreover, WANs are shown to be reliable classifiers even when attributing collaborative plays. For several plays of disputed co- authorship, a deeper analysis is performed by attributing every act and scene separately, in which we both corroborate existing breakdowns and provide evidence of new assignments. Finally, the impact of genre on attribution accuracy is examined revealing that the genre of a play partially conditions the choice of the function words used in it.
Article
Full-text available
The output of a classifier should be a calibrated posterior probability to enable post-processing. Standard SVMs do not provide such probabilities. One method to create probabilities is to directly train a kernel classifier with a logit link function and a regularized maximum likelihood score. However, training with a maximum likelihood score will produce non-sparse kernel machines. Instead, we train an SVM, then train the parameters of an additional sigmoid function to map the SVM outputs into probabilities. This chapter compares classification error rate and likelihood scores for an SVM plus sigmoid versus a kernel method trained with a regularized likelihood error function. These methods are tested on three data-mining-style data sets. The SVM+sigmoid yields probabilities of comparable quality to the regularized maximum likelihood kernel method, while still retaining the sparseness of the SVM.
Article
No issue in Shakespeare studies is more important than determining what he wrote. For over two centuries scholars have discussed the evidence that Shakespeare worked with co-authors on several plays, and have used a variety of methods to differentiate their contributions from his. In this wide-ranging study the author takes up and extends these discussions, presenting compelling evidence that Shakespeare wrote Titus Andronicus together with George Peele, Timon of Athens with Thomas Middleton, Pericles with George Wilkins, and Henry VIII and The Two Noble Kinsmen with John Fletcher. Part one of the book reviews the standard processes of co-authorship as they can be reconstructed from documents connected with the Elizabethan stage, and shows that all major, and most minor, dramatists in the Elizabethan, Jacobean, and Caroline theatres, collaborated in getting plays written and staged. This is combined with a survey of the types of methodology used since the early nineteenth century to identify co-authorship, and a critical evaluation of some 'stylometric' techniques. Part two gives detailed analyses of the five collaborative plays, discussing every significant case made for and against Shakespeare's co-authorship. Synthesizing two centuries of discussion, the author reveals a scholarly tradition, builds on and extends previous work, and identifies the co-authors' contributions in increasing detail. The range and quantity of close verbal analysis brought together in this book present a case to counter those 'conservators' of Shakespeare who maintain that he is the sole author of his plays.
Article
This article introduces a new stylometric method that combines supervised machine-learning classification with the idea of sequential analysis. Unlike standard procedures, aimed at assessing style differentiation between discrete text samples, the new method, supported with compact visualization, tries to look inside a text represented as a set of linearly sliced chunks, in order to test their stylistic consistency. Three flavors of the method have been introduced: (1) Rolling SVM, relying on the support vector machines (SVM) classifier, (2) Rolling NSC, based on the nearest shrunken centroids method, and (3) Rolling Delta, using the classic Burrowsian measure of similarity. The technique is primarily intended to assess mixed authorship; however, it can be also used as a magnifying glass to inspect works with unclear stylometric signal. To demonstrate its applicability, three different examples of collaborative work have been briefly discussed: (1) the 13th-century French allegorical poem Roman de la Rose, (2) a 15th-century translation of the Bible into Polish known as Queen Sophia’s Bible, and (3) The Inheritors, a novel collaboratively written by Joseph Conrad and Ford Madox Ford in 1901.
Henry VIII: an investigation into the origin and the authorship of the play
  • Boyle
Another fresh confirmation of Mr. Spedding’s division and date of the play of Henry VIII
  • Furnivall
Conjectural History, or Shakespeare’s Henry VIII
  • Alexander
Mr. Boyle’s theory as to ‘Henry VIII’
  • Fleay
The shares of Fletcher and his collaborators in the Beaumont and Fletcher Canon VII
  • Hoy
Taylor’s method applied to Shakespeare and Fletcher
  • Merriam
Henry VIII, All is True?
  • Merriam
On the ‘weak endings’ of Shakspere, with some account of the history of the verse tests in general
  • Ingram
The works of Beaumont and Fletcher
  • Oliphant
Who wrote Shakespeare’s Henry VIII
  • Spedding
Shakespeare’s Anteil an ‘Henry VIII’. Shakespeare’s
  • Ege
Colloquial contractions in Beaumont, Fletcher, Massinger and Shakespeare as a test of authorship
  • Farnham
Who wrote Shakespeare’s Henry VIII
  • Hickson
What Shakespeare wrote in ‘Henry VIII’: part two
  • Merriam
Extra monosyllables’ in Henry VIII and the problem of authorship
  • Oras