ThesisPDF Available

Localizing pack messages: A framework for corpus-based cross-cultural multimodal analysis

Authors:

Abstract

The candidate confirms that the work submitted is his own and that appropriate credit has been given where reference has been made to the work of others. This copy has been supplied on the understanding that it is copyright material and that no quotation from the thesis may be published without proper acknowledgement.
A preview of the PDF is not available
... To solve this challenge, Thomas (2009) explores the use of commercial optical character recognition software for automatically generating "proto-GeM" XML, but concludes that extensive post-processing required from the analyst rendered this approach impractical. Nevertheless, Thomas (2009, 243) also observes that computer vision techniques may contribute to automating parts of the annotation process in the future. ...
... Focusing on fast-moving consumer goods, such as toothpaste and shampoo packs, Thomas (2014) describes their genre structure and examines the differences between the two locales. His analysis builds on the more extensive work in Thomas (2009), which defines a set of common 'message types' in product packaging on the basis of a corpus of 24 packages. These messages range from expressing brand identity to providing consumers with instructions and contact information, whose multimodal structure Thomas interrogates using the GeM-annotated corpus, supported by tools developed for the purpose (Thomas 2007). ...
Article
Full-text available
This review article provides an overview of the research conducted within the Genre and Multimodality framework, which has been used to describe the multimodality of page-based documents and other multimodal artefacts over the past 15 years. The article explicates the motivation and inspiration for developing the framework, introduces its central theoretical concepts and presents its applications across a number of case studies. Finally, the article discusses the criticism directed towards the model and identifies avenues of future development.
... The GeM annotation schema, which is intended to "function as a tool for isolating significant patterns against the mass of detail that multimodal documents naturally present" (Bateman, 2014a, 33), has proven useful for comparing the multimodality of documents across cultures (Thomas, 2009;Kong, 2013) and describing their change over time (Hiippala, 2015b). Yet the GeM model has not been adopted widely, because applying the multi-layered annotation schema requires ample time and resources. ...
... Certain attempts have also been made to address the bottleneck issues of time and resources required for producing GeM-annotated corpora. Thomas (2009) explores the use of commercial optical character recognition (OCR) software for automatically producing GeM annotation by using XSLT and Perl to transform and enrich the OCR output. Using XML output from ABBYY FineReader 8.0 SDK for generating annotation for the base and layout layers, Thomas observes that OCR output proves useful for the time-consuming task of describing typographic features, but nevertheless requires extensive manual post-processing. ...
... The GeM annotation model is described comprehensively by Henschel (2003). Its extension to support the analysis presented here has been fully documented elsewhere (Thomas, 2009a). The scheme implements a series of layers of annotation. ...
... In addition to implementing a kind of multimodal concordancer, with a user interface which supports the design and modification of corpus queries, allowing the user to control variables across the various annotation layers described in section 5, and which presents results in a manner which preserves as much of the native graphic realization of search results as possible, it was also necessary to develop ad hoc scripts designed to retrieve results in response to particular queries. This software is documented in full elsewhere (Thomas, 2009a). ...
Article
In the context of rapid theoretical development in multimodal discourse analysis, and of its growing interdisciplinary influence, it is crucial that those working in the field give due consideration to methodological rigour. The corpus-based approach described here offers a means of addressing some key methodological issues. Firstly, this approach provides a check on over- and under-interpretation and also reveals a more nuanced picture of data about specific genres than might be derived from even the closest observation of individual instances. Thus it helps avoid pitfalls associated with relying on hand-picked examples. Secondly, the semi-automated implementation of a multilayered annotation scheme, which separates the representation of layout from rhetorical structure, supports the empirical investigation of a variety of research questions, while minimizing the influence of the analyst on the data by delaying interpretation insofar as possible until it becomes unavoidable. This article illustrates the corpus-based approach through a contrastive case study of one very visual genre, product packaging, with data taken from two locales, Taiwan and the UK. In so doing, issues of the selection of texts for inclusion and corpus design are addressed and the principles and practicalities involved in data preparation are discussed. Consideration is also given to the types of question which such an approach enables us to explore. In addition, since the data analyzed here are drawn from different languages and cultures, the present study sheds light on some issues of interest from the perspective of localization. Finally, some benefits of the approach are suggested, among which not least is that a stronger basis for the critique of designs in turn supports identification of opportunities for their improvement. This is not possible when the analysis is itself circular.
... For describing the discourse structure of diagrams, AI2D-RST uses Rhetorical Structure Theory (RST; see e.g. Mann and Thompson 1988;Taboada and Mann 2006), a theory of textual organisation and coherence which has been previously extended to diagrams in natural language generation (André and Rist 1995;Bateman et al. 2001;Bateman and Henschel 2007) and for describing discourse relations in research on multimodal documents and other artefacts (Bateman 2008;Thomas 2009;Taboada and Habel 2013;Hiippala 2015). This extension of RST, which may be described as multimodal RST, provides the foundation for discourse structure annotation in AI2D-RST, as exemplified in Fig. 6. ...
Article
Full-text available
This article introduces AI2D-RST, a multimodal corpus of 1000 English-language diagrams that represent topics in primary school natural sciences, such as food webs, life cycles, moon phases and human physiology. The corpus is based on the Allen Institute for Artificial Intelligence Diagrams (AI2D) dataset, a collection of diagrams with crowdsourced descriptions, which was originally developed to support research on automatic diagram understanding and visual question answering. Building on the segmentation of diagram layouts in AI2D, the AI2D-RST corpus presents a new multi-layer annotation schema that provides a rich description of their multimodal structure. Annotated by trained experts, the layers describe (1) the grouping of diagram elements into perceptual units, (2) the connections set up by diagrammatic elements such as arrows and lines, and (3) the discourse relations between diagram elements, which are described using Rhetorical Structure Theory (RST). Each annotation layer in AI2D-RST is represented using a graph. The corpus is freely available for research and teaching.
... For describing the discourse structure of diagrams, AI2D-RST uses Rhetorical Structure Theory (RST; see e.g. Mann & Thompson 1988, Taboada & Mann 2006), a theory of textual organisation and coherence which has been previously extended to diagrams in natural language generation (André & Rist 1995, Bateman et al. 2001, Bateman & Henschel 2007 and for describing discourse relations in research on multimodal documents and other artefacts (Bateman 2008, Thomas 2009, Taboada & Habel 2013, Hiippala 2015. This extension of RST, which may be described as multimodal RST, provides the foundation for discourse structure annotation in AI2D-RST, as exemplified in Figure 6. ...
Preprint
Full-text available
This article introduces AI2D-RST, a multimodal corpus of 1000 English-language diagrams that represent topics in primary school natural science, such as food webs, life cycles, moon phases and human physiology. The corpus is based on the Allen Institute for Artificial Intelligence Diagrams (AI2D) dataset, a collection of diagrams with crowd-sourced descriptions, which was originally developed for computational tasks such as automatic diagram understanding and visual question answering. Building on the segmentation of diagram layouts in AI2D, the AI2D-RST corpus presents a new multi-layer annotation schema that provides a rich description of their multimodal structure. Annotated by trained experts, the layers describe (1) the grouping of diagram elements into perceptual units, (2) the connections set up by diagrammatic elements such as arrows and lines, and (3) the discourse relations between diagram elements, which are described using Rhetorical Structure Theory (RST). Each annotation layer in AI2D-RST is represented using a graph. The corpus is freely available for research and teaching.
... Secondly, we will use a set of tools stemming from the multimodal document approach (Bateman et al., 2002;Bateman, 2008Bateman, , 2013Bateman, , 2014 to analyze screens coming from two popular managerial and strategy video games: Football Manager 2018 (Sports Interactive, 2017) and Europa Universalis IV (Paradox Development Studio, 2013a). The documentlike screens are viewed as potential objects of a multimodal analysis performed using the GeM model (Bateman, 2008(Bateman, , 2013(Bateman, , 2014Bateman et al., 2002;Hiippala, 2015Hiippala, , 2017Thomas, 2007Thomas, , 2009Thomas, , 2014, which treats these pages as multi-layered semiotic artifacts. We will focus on those aspects of the model which might reveal the greatest dierences between document-like video game screens and other multimodal documents. ...
Chapter
With the purpose of showing that video game studies should become part of the emerging discipline of multimodality, the present chapter introduces the basics of the study of gameworlds and uses the multimodal document approach to analyze document-like screens coming from two video games: Football Manager 2018 and Europa Universalis IV. These document-like screens are analyzed using the tools coming from the GeM model, which treats these pages as multi-layered semiotic artifacts. Within this approach, all four layers are covered: the base layer, the layout layer, the rhetorical layer, and the navigation layer. Our analysis proposal tries to pinpoint the semiotic specificities of the different layers and test whether the GeM model needs to be adapted for the purpose of approaching these screens. At the same time, the gameworld environment is viewed as an important mediator between the player and the digital game system. We hope that such integrated approach can be beneficial to both multimodality and video game studies and expand the directions of future research endeavors.
... The Genre and Multimodality (GeM) project (Bateman, Delin & Henschel 2002) 1 developed a model which uses XML to code a set of annotation layers which describe layout, rhetorical relations and other aspects of the documents in the corpus. This has been used with collections of documents in categories including field guides for bird-watchers (Bateman 2008), consumer packs (Thomas 2009), insurance documents (Thomas, Delin & Waller 2010), and tourist guides (Hiippala 2013). ...
Article
Full-text available
Multimodal studies are essentially interdisciplinary, crossing boundaries between the verbal and the visual, between creation and consumption, and, it is argued here, between academic analysis and professional practices. This paper presents a practice-based perspective on multimodal document genres which feature typography and page/screen layout. It is argued that in order to reliably account for observed data, analysts need to distinguish between effects which represent an effort to create meaning, those which are by-products of production technology, and those which are intended as navigational support for users. A range of problems is identified, facing those who seek to classify multimodal genres, including problems of method, interdisciplinarity, naming, granularity, and exemplification. Additional problems particular to the study of real documents include design competence, dysfunctional genres, creativity and continuous technical change. Insights from design practice are discussed, introducing a pattern language approach to develop a level of connoisseurship and good judgement that complements the more formal analysis represented by corpus studies and taxonomy.
... Vaikka metafunktioiden käsite osoittautui hyödylliseksi varsinkin diskurssianalyysin tarpeiden osalta (O'Halloran 2008), kritiikki niin kutsuttua metafunktionaalista periaatetta kohtaan kasvoi 2010-lukua lähestyttäessä. John Bateman (2008, 46-50) ja Martin Thomas (2009) korostivat tarvetta menetelmille, jotka kykenisivät tarkemmin selittämään eroja ja yhtäläisyyksiä erilaisten multimodaalisten tekstien ja tilanteiden välillä. Kritiikki kohdistui erityisesti tekstuaalisen metafunktion kuvaukseen, jota Kress ja van Leeuwen (2006: kappale 6) lähestyivät komposition käsitteen kautta. ...
... These levels have been used to annotate several corpora of different kinds of multimodal documents (cf. Thomas 2009;Hiippala 2013Hiippala , 2014, allowing some significant issues in the methodology of visual communication research and its relation to empirical methods to be addressed (Thomas 2014). ...
... These levels have been used to annotate several corpora of different kinds of multimodal documents (cf. Thomas 2009;Hiippala 2013Hiippala , 2014, allowing some significant issues in the methodology of visual communication research and its relation to empirical methods to be addressed (Thomas 2014). ...
Article
The digital turn in visual studies has played a major role in the terminological overlap between ‘archive’, ‘database’ and ‘corpus’, and it has brought about a number of positive developments such as improved accessibility and availability. At the same time, it has also raised important questions pertaining to the materiality, searchability, annotation and analysis of the data at hand. Through a series of theoretical constructs and empirical examples, this paper illustrates the necessity and benefits of interdisciplinary dialogue when tackling the multimodal corpus annotation challenge. The meaningful interrelations between semiotic modes, the combinations between manual and (semi)automated annotation, the seamless integration of coding and annotation schemes which share common logics and the contextual embedding of the presented analyses strongly suggest multimodal document analysis in all its forms will continuously benefit from a corpus-based approach.
ResearchGate has not been able to resolve any references for this publication.