Figure 1 - uploaded by Chris Forstall
Content may be subject to copyright.

The entirety of the Iliad and Odyssey, in 35-line samples. The color of each sample represents the values of the first three principal components of a larger feature set composed of character bi-gram frequencies.
Context in source publication
Context 1
... the areas frequently identified as later additions stand out? • Can internal patterns help us understand a poem’s composition? While statistical studies of Homer have been made before, it is often difficult for the critic to move comfortably between the numbers and the subjective experience of interpreting poetry. David W. Packard, in pioneering computational work on sound patterns in Homer, cautioned that “we cannot expect to identify expressive passages merely by counting letters” (Packard, 1974). More recently, Marjorie Perloff, noting that the significance of sound is all too often overlooked in poetry criticism, has laid part of the blame on “‘scientific’ prosodic analysis,”’ which has relied on an empiricist model that allows for little generalization about poetic modes and values: the more thorough the description of a given poem’s rhythmic metrical units, its repetition of vowels and consonants, its pitch contours, the less we may be able to discern the larger contours of a given poet’s particular practice, much less a period style or cultural construct. (Perloff & Dworkin, 2009, 2) In this poster we focus on visualizing the data in ways that bridge the gap between empirical data and the subjective experience of interpreting poetry. We take our inspiration from work such as that of Plamondon (2009) and Mandell (n.d.) which has shown that innovation in how we visualize data is vital to con- necting computing with humanities scholarship. Plamondon, in particular, used color to represent multi-parameter sound data over individual poems, allowing a subjective appreciation of the poem’s structure based on objective values at a glance. We divide the poem into samples of various sizes, and calculate n-gram frequencies for the most common features. We then use principal components analysis to concentrate the variance among fewer variables. The top three principal components are then assigned to three component color channels: red = PC1, green = PC2, blue = PC3. Each sample is visualized as a color which si- multaneously represents three parameters, each potentially comprehending the most important aspects of a much larger feature set. The flow of sound in the poem may be seen as a gradient with local and large-scale variation (see Figure 1). As a control, we also treat a text of Homer’s poetry in which the order of the lines has been randomized. This is in part a response to the sobering results shown by Eder (2010), who made a strong case that authorship analysis was unreliable at samples fewer than several thousand words, and was improved with randomization in sampling. It may be that smaller samples are less reliable at the author level precisely because they are sensitive to internal patterns in the text, which the randomization should smooth over. The color gradient produced by this visualization of PCA is useful to the classical philologist precisely because of its subjective quality; yet a more difini- tive analysis of the epics’ internal heterogeneity is also desirable. Which sections are the “most different”? Are units which are functionally related, for example the type scences which are played out over and over by different ...
Similar publications
Citations
In 2005, Franco Moretti introduced Distant Reading to analyse entire literary text collections. This was a rather revolutionary idea compared to the traditional Close Reading, which focuses on the thorough interpretation of an individual work. Both reading techniques are the prior means of Visual Text Analysis. We present an overview of the research conducted since 2005 on supporting text analysis tasks with close and distant reading visualizations in the digital humanities. Therefore, we classify the observed papers according to a taxonomy of text analysis tasks, categorize applied close and distant reading techniques to support the investigation of these tasks and illustrate approaches that combine both reading techniques in order to provide a multi-faceted view of the textual data. In addition, we take a look at the used text sources and at the typical data transformation steps required for the proposed visualizations. Finally, we summarize collaboration experiences when developing visualizations for close and distant reading, and we give an outlook on future challenges in that research area.