ArticlePDF Available

L1/L2 Eye Movement Reading of Closed Captioning: A Multimodal Analysis of Multimodal Use

Authors:
L1/L2 Eye Movement Reading of Closed Captioning:
A Multimodal Analysis of Multimodal Use
By
Elizabeth A. Specker
________________________
Copyright © Elizabeth A. Specker 2008
A dissertation submitted to the Faculty of the
GRADUATE INTERDISCIPLINARY DOCTORAL PROGRAM IN
SECOND LANGUAGE ACQUISITION AND TEACHING
In partial fulfillment of the Requirements
For the Degree of
DOCTOR OF PHILOSOPHY
In the Graduate College
THE UNIVERSITY OF ARIZONA
2008
1
THE UNIVERSITY OF ARIZONA
GRADUATE COLLEGE
As members of the Dissertation Committee, we certify that we have read the dissertation
prepared by Elizabeth A. Specker
entitled L1/L2 Eye Movement Reading of Closed Captioning: A Multimodal
Analysis of Multimodal Use
and recommend that it be accepted as fulfilling the dissertation requirement for the
Degree of Doctor of Philosophy
_______________________________________________________________________ Date: 11/09/2007
Linda R. Waugh
_______________________________________________________________________ Date: 11/09/2007
Thomas G. Bever
_______________________________________________________________________ Date: 11/09/2007
Yetta M. Goodman
Final approval and acceptance of this dissertation is contingent upon the candidate’s
submission of the final copies of the dissertation to the Graduate College.
I hereby certify that I have read this dissertation prepared under my direction and
recommend that it be accepted as fulfilling the dissertation requirement.
________________________________________________ Date: 11/09/2007
Dissertation co-Director: Linda R. Waugh
________________________________________________ Date: 11/09/2007
Dissertation co-Director: Thomas G. Bever
2
STATEMENT OF AUTHOR:
This dissertation has been submitted in partial fulfillment of requirements for an
advanced degree at the University of Arizona and is deposited in the University Library
to be made available to borrowers under the rules of the library.
Brief quotations from this dissertation are allowable without special permission,
provided that accurate acknowledgment of the source is made. Requests for permission
for extended quotation or reproduction of this manuscript in whole or in part may be
granted by the copyright holder.
Signed,
Elizabeth A. Specker
3
ACKNOWLEDGMENTS:
Of course there are many people that I would like to thank for supporting me and helping
me through this process, including:
… the people who were participants, patiently waiting while the eye tracker was
calibrated, and then reading and answering questions… and for being good sports about
the whole thing. At least this eye tracker didn’t have a bite bar.
… the EMMA lab, and to Joel Brown, Kathleen O’Brien de Ramirez, Marge Knox,
Yoshi Yamashita, Marie Ruiz and Yetta and Ken Goodman… I needed a community in
which to explore and feel a connection, and the EMMA lab and people gave me that. I
promise I’ll return the keys.
… the Bever Lab, which funded the new computer in the EMMA lab, without which my
dissertation would have been on a completely different topic.
… Linda Waugh, inspiring me to learn and apply, and who always had to read my strange
term papers involving media in some fashion or other, all of which eventually became the
building blocks of this study (and doubtless many more to come).
… Tom Bever who told me to do what I really wanted to do: closed captioning.
… Yetta Goodman, who took me into her lab under her guidance and put reading into
perspective – it’s prediction, really.
… my dissertation reading group, Nolvia Cortez and Kim Helmer: getting ready for a
meeting and coffee talk at Epic kept me moving on the writing. Or at least thinking about
it anyway. And to Yumika Muramatsu, who kept calling me to get out and write. And
we did.
… without a doubt, Michael McVey and Gregory Anderson came to the rescue of the
EMMA lab more than once to save the day as one monitor or another failed, one
connection or another became a little wonky, or we just couldn’t figure out why the
graphics card went weird. Also, to the lab crew of the OSCR labs (MLL & the MZ) for
not quite understanding why I asked you so many odd questions about closed captioning
and videos, but nevertheless doing your best to support my technical needs. And to
Nadia Hamrouni, who graciously translated forms from English to Arabic.
… And most of all: to Shawn Steinhart & Neil Johnson. My buddies.
~ thanks ~
4
DEDICATION:
To my mom. I still love to read. Thanks.
To my dad. I still love to tinker with technology. Thanks. $D$3 saved me!
To mi hermano. I couldn’t do the tinkering with video without you.
To my friends: You are the best. And you listen well.
To my advisors: Yetta, Lin & Tom. I appreciate you letting me go and explore and
research my passion: learning in a multimodal environment. May be it useful someday.
5
TABLE OF CONTENTS
LIST OF TABLES............................................................................................................ 8
LIST OF CHARTS ........................................................................................................... 9
LIST OF FIGURES........................................................................................................ 10
LIST OF GRAPHS ......................................................................................................... 11
LIST OF TERMS............................................................................................................ 12
ABSTRACT..................................................................................................................... 14
CHAPTER 1: WHAT’S CHIC?.................................................................................... 16
1.1 Multiple Multimodalities..................................................................................... 17
1.1.1 Overview of Research Questions................................................................. 20
1.2 Dissertation Overview ......................................................................................... 20
1.3 The context behind the texts and the participants............................................ 21
1.4 Merging across disciplines .................................................................................. 22
1.5 Multimodality: Different definitions in different disciplines........................... 23
1.5.1 Multimodality 1: Sensory ‘modes’:............................................................. 23
1.5.2 Multimodality 2: Modes of semiotic ‘literacies’ of different ‘modalities’ 26
1.6 Placing multimodality into communication: The Speech Event...................... 27
1.6.1 Jakobson’s Speech Event Model laid out.................................................... 29
1.6.2 A communication event: The multiple sides of the addresser .................. 31
1.6.3 A communication event: The very necessary contact and context........... 34
1.6.4 A communication event: Message & code.................................................. 37
1.6.5 A communicative event: Genre & intersubjectivity.................................. 40
1.6.6 A communication event: The all-important addressee.............................. 42
1.7 The puzzle and comprehension…....................................................................... 45
1.8 Summary............................................................................................................... 46
CHAPTER 2 –POSITIONING THIS STUDY............................................................. 49
2. Overview of Chapter 2........................................................................................... 49
2.1 Closed captioning and subtitling ........................................................................ 52
2.1.1 The history of closed captioning.................................................................. 56
2.1.2 Closed Captioning: What is it?.................................................................... 57
2.1.3 Additional audiences for closed captioning and subtitling ....................... 59
2.2 Closed Captioning & subtitles used in language research............................... 60
2.2.1 Closed Captioning as a learning tool in the literature............................... 60
2.2.2 Past research using closed captioning and subtitles: Different L1/L2..... 62
2.2.3 Same language subtitling and closed captioning........................................ 64
2.2.4 Motivation in self-learning from video. ...................................................... 67
2.3 Multimodality....................................................................................................... 69
2.3.1 The call for multimodality in pedagogy...................................................... 70
2.3.2 Multimodal theories...................................................................................... 72
2.3.3 Eye tracker and movement: Relevant terms and research....................... 78
2.3.4 Tracking attention in multimodal environments....................................... 80
2.4 Relevant reading theories and research for a multimodal literacy event....... 84
2.5 Unique literacy event challenges: L2 reading ................................................... 88
6
TABLE OF CONTENTS
2.6 Individual preferences when using multimodal texts....................................... 91
2.7 Chapter Two summary........................................................................................ 93
CHAPTER 3: METHODOLOGY AND MATERIALS............................................. 94
3.1 The Research Questions and significance of this study.................................... 94
3.2 The conditions ...................................................................................................... 95
3.3 The data collection and the participants............................................................ 97
3.3.1 Types of data ................................................................................................. 97
3.3.2 Eye tracker numerical data and eye movement visual data (data types 1
and 2).................................................................................................................. 98
3.3.3 Types of data: Interviews and strategies (data types 3, 4 & 5)............... 100
3.4 Materials: Choosing the texts ........................................................................... 101
3.4.1 The text selection......................................................................................... 102
3.4.2 Extension of the texts: Selected internal units of texts ............................ 105
3.5 Materials: Production of the texts.................................................................... 108
3.6 The EMMA lab .................................................................................................. 113
3.7 Data collection – how it all fits together........................................................... 114
3.7.1 Order of data collection.............................................................................. 114
3.7.2 Equipment used........................................................................................... 116
3.7.3 Interviews: Comprehension questions as retell protocols....................... 117
3.7.4 Interviews: Extended interviews ............................................................... 119
3.7.5 Questionnaire: Learning Styles Inventory ............................................... 119
3.8 Problems: It’s all moving around..................................................................... 121
3.9 Summary............................................................................................................. 122
CHAPTER 4: ANALYSIS OF QUESTION ONE.................................................... 124
4.1 Analysis Overview............................................................................................... 124
4.1.1 Notations unique to this study ................................................................... 125
4.2 Presentation of the airbag and biker texts to the viewer................................ 126
4.2.1 Line presentation rhythm........................................................................... 127
4.2.2 Predictions in/of/about the text.................................................................. 132
4.3 Analysis of reading patterns ............................................................................. 136
4.4 Analysis: Question One ..................................................................................... 138
4.4.1 Reading and the use of eye trackers.......................................................... 139
4.5 Reading patterns of fixations............................................................................ 142
4.6 The whole text analysis: Number of fixations ................................................. 144
4.7 The whole text analysis: Fixation durations.................................................... 148
4.7.1 Research Question One, part one.............................................................. 148
4.7.2 Fixation duration trends across conditions .............................................. 150
4.7.3 Research question one, part one: Fixation durations.............................. 153
4.7.4 Research question one, part two: Fixation durations.............................. 155
4.8 The whole text analysis: Fixation times and saccade degrees of movement. 162
4.9 Introduction to row reading patterns .............................................................. 167
7
TABLE OF CONTENTS
4.9.1 Analysis of the reading patterns by row use............................................. 168
4.9.2 Results of analysis for subquestion one..................................................... 172
4.9.3 Results of analysis for subquestion two .................................................... 176
4.10 Comparison of eye movement patterns: Across conditions......................... 178
4.11 Summary of Analysis for Question 1 ............................................................. 182
CHAPTER 5: RESEARCH QUESTION TWO: THE MULTIMODAL
ENVIRONMENT.......................................................................................................... 184
5.1 Overview of look zones by participant............................................................. 185
5.2 Use of the available modalities: Look zones .................................................... 187
5.3 Fixation durations within each look zone: An overview ................................ 189
5.3.1 Fixation durations in multimodal reading patterns ................................ 193
5.4 Consecutive fixations in look zone 1 (reading or noticing?) .......................... 196
5.5 A detailed multimodal analysis of the use of look zones (for NNS2)............. 198
5.6 Comparison of line use in multimodal texts .................................................... 206
5.7 Aural channel: The other modality present .................................................... 218
5.8 Summary and reflections regarding eye movements in multimodal
environments............................................................................................................. 218
CHAPTER 6: RESEARCH QUESTION 3: CASE STUDIES................................. 222
6.1 Questions 3: overview of procedures and questions....................................... 223
6.2 TARIQ ................................................................................................................ 225
6.3 FARID................................................................................................................. 230
6.4 ELENA................................................................................................................ 234
6.5 SARAH................................................................................................................ 238
6.6 Summary............................................................................................................. 242
CHAPTER 7: SUMMARY OF FINDINGS, IMPLICATIONS AND
RECOMMENDATIONS FOR FURTHER RESEARCH ........................................ 244
7.1 Summary of findings ......................................................................................... 245
7.2 The Puzzle: Back to the metaphor ................................................................... 252
7.3 Implications based on the findings................................................................... 253
7.4 Limitations of this study and Future research:............................................... 258
APPENDIX A: Airbags closed captioning text.......................................................... 264
APPENDIX B: Arizona Bikers closed captioning text .............................................. 268
APPENDIX C: RETELL AND EXTENDED INTERVIEW QUESTIONS: NNS 272
APPENDIX D: RETELL AND EXTENDED INTERVIEW QUESTIONS: NS.... 276
APPENDIX E: PRACTICE READING & RETELL............................................... 279
APPENDIX F: SAMPLE OF LEARNING STYLES INVENTORY (LSI)
QUESTIONS ................................................................................................................. 282
APPENDIX G: HANDOUT ABOUT LEARNING STYLES.................................. 283
APPENDIX H: MULTIMODAL ANALYSIS OF AIRBAG TEXT........................ 285
REFERENCES.............................................................................................................. 293
8
LIST OF TABLES
Table 2.1 Comparison of the two types of print modality in a multimodal event........... 55
Table 3.1 Participants with order of conditions (1 = viewed 1st, 2 = viewed 2nd)........ 100
Table 3.2 The video clips used as texts.......................................................................... 103
Table 3.3 Smaller units of text identification and characteristics.................................. 105
Table 3.4 Example of sequenced unit texts in the Airbag sequence used for question 1,
with text, type/token count and duration....................................................... 107
Table 4.1 Airbag total text and Biker total text statistics............................................... 127
Table 4.2 Biker text, lines 17 - 22.................................................................................. 132
Table 4.3 Total number of fixations for each participant in the dynamic reading
condition (one).............................................................................................. 145
Table 4.4 Fixation durations: number of fixation durations for each time category for
each participant in the airbags CC or airbags SCC (total text)..................... 152
Table 4.5 Fixation durations: number of fixation durations for each time category for
each participant in the bikers CC or bkSCC (total text) ............................... 152
Table 4.6 Total fixation times, total interfixation saccades by time and average
interfixation degree for the whole text for each participant.......................... 163
Table 4.7 Comparison of saccade length in degrees between dynamic and static texts 165
Table 4.8 Smaller units of text identification and characteristics.................................. 171
Table 4.9 Difference in the number of row changes between unit A (beginning) and unit
D (end) of dynamic text reading ................................................................... 174
Table 4.10 Comparison of eye movement patterns between static and dynamic texts.. 179
Table 5.1 Comparison chart illustrating overall fixation patterns in Condition Two
(multimodal). ................................................................................................ 188
Table 5.2 Percentage of fixations for each look zone.................................................... 189
Table 5.3 Fixation duration averages comparing look zones 1 and 2 for both participant
groups............................................................................................................ 193
Table 5.4 Comparison of average fixation duration across conditions in ms:............... 195
Table 5.5a Context: before unit A of Biker sequence. Note that the CC text carries over
from Camera Shot 2 to Camera Shot 3 ......................................................... 199
Table 5.5b Camera Shot 3. Relationship between appearance of printed text (CC), of
video description (graphic), of the audio transcription for this camera shot 200
Table 5.5c Camera shot 4, and 5, close up of bike wheel and tachometer during which
there is no change in Look zones.................................................................. 201
Table 5.5d Camera shot 9, illustrating the changes between LZ1 & 2 in relation to the
changes in the audio, graphic and printed modalities................................... 202
Table 5.6 The fixplots for the multimodal analyses in Table 5.5.b, c & d. ................... 204
Table 5.7 Illustration of scene change possibly preempting LZ change by NS8, fixations
highlighted by circles.................................................................................... 207
Table 5.8 Detailed multimodal analysis of NNS4 eye movements between LZ1 and 2
illustrating nonfamiliarity with the text. ....................................................... 209
Table 5.9 Multimodal analysis of NS7 illustrating distraction in LZ 2 and catch up using
printed text in LZ 1 ....................................................................................... 214
9
LIST OF CHARTS
Chart 3.1 Example of the combination of video texts and files for each condition....... 110
Chart 4.1 Illustration of the data type, use and relationships in this study between
Research Questions 1, 2 & 3......................................................................... 137
10
LIST OF FIGURES
Figure 1.1. Illustration of apex of study: The cross section between language learners
using reading in multimodal environments..................................................... 23
Figure 1.2 The factors and functions of the Speech Event Model (Jakobson, 1960). ..... 29
Figure 1.3 The Multimodal Event.................................................................................... 47
Figure 2.1 The Multimodal Multimedia Communication Event ..................................... 50
Figure 2.2 Areas of research included as influencing this study ..................................... 51
Figure 3.1 Example of cursor superimposed on the image............................................ 112
Figure 3.2 EMMA Lab computer and camera set-up. ................................................... 114
Figure 3.3 Example of a fixplot ..................................................................................... 117
Figure 4.1 Position of line appearances of closed captioning........................................ 125
Figure 4.2 Illustration of movement of text across and up the screen ........................... 126
Figure 4.2 Fixplot of fixations 73 – 81 of NNS3........................................................... 134
Figure 4.3 Cross strategies: Prediction and projection of the reading of spoken language
....................................................................................................................... 136
Figure 4.4 Comparison of Conditions 1 and 3............................................................... 139
Figure 4.5 Position of line appearances of closed captioning........................................ 168
Figure 4.6 Units A & D relative placement within the total text................................... 176
Figure 5.1 Example of the look zone.parameters .......................................................... 186
Figure 5.2 NNS4 fixations 38 – 46, corresponding to the visual data in Table 5.6....... 212
Figure 5.3 Fixplot of NS7 eye movements when distracted, then reading to catch the text.
....................................................................................................................... 217
Figure 6.1 The place of the reader in a Multimodal Multimedia Communicative Event
....................................................................................................................... 222
Figure 6.2 Tariq’s eye movements while watching the biker multimodal video text.... 229
Figure 6.3 Farid’s eye movements watching the airbag multimodal text...................... 233
Figure 6.4 Elena’s eye movements watching the biker multimodal text....................... 237
Figure 6.5 Sarah’s eye movements while watching the airbag multimodal sequence... 241
11
LIST OF GRAPHS
Graph 4.1 Rhythm of line presentation for airbag text .................................................. 129
Graph 4.2 Rhythm of line presentation for biker text.................................................... 129
Graph 4.3. Bimodal frequency of times for line presentations of airbag text................ 131
Graph 4.4 Total number of fixations for each participant in Condition One: Dynamic CC
text................................................................................................................. 146
Graph 4.5 Total fixations for the whole text for NS in condition 1 (dynamic presentation)
and in Condition Three (static presentation)................................................. 147
Graph 4.6 Comparison of the fixation durations of the two NS conditions: dynamic and
static text, converted to individual percentages. ........................................... 154
Graph 4.7 Comparison of the total fixation durations by both NS and NNS in both
conditions (SCC and CC).............................................................................. 157
Graph 4.8 Two illustrations of the diverging patterns between NS and NNS fixation
duration trends, focusing on two duration categories, in both total number of
fixations and as converted percentages......................................................... 158
Graph 4.8 Two illustrations of the diverging patterns between NS and NNS fixation
duration trends, focusing on two duration categories, in both total number of
fixations and as converted percentages......................................................... 159
Graph 4.9 Illustration of the combined participant categories for three categories,
highlighting the higher average number of longer fixations (300 – 400 ms)
made by NNS................................................................................................ 160
Graph 4.10 Illustration of fixation duration differences between NS and NNS in the
dynamic reading condition, converted to percentages.................................. 161
Graph 4.11 Fixations and Saccades with Calibration loss in Condition One ................ 164
Graph 4.12 Illustration of the differences in eye movements between dynamic and static
text presentations. ......................................................................................... 166
Graph 4.13 Comparison of NNS and NS use of rows in airbag unit texts .................... 173
Graph 4.14 Comparison of NNS and NS use of rows in biker unit texts ....................... 173
Graph 4.15 NNS and NS difference in change in rows for text at beginning and text at
end................................................................................................................. 175
Graph 4.16.1 Comparison of time of use of row 3 in units A & D for the biker text (in %)
....................................................................................................................... 177
Graph 4.16.2 Comparison of time of use of row 3 in units A & D for the airbag text (in
%).................................................................................................................. 177
Graph 5.1 Average fixation durations for Look zones 1 & 2 for each participant ........ 194
Graph 5.2 Frequency of consecutive fixations in look zone 1 (the reading area) ......... 197
12
LIST OF TERMS:
Channel: Term used by various researchers, including Géry d’Ydewalle, to refer to a
sensory source of information, such as audio, visual or written text. At times,
overlaps in definition with Mode.
Closed captioning: The presentation of a written transcription of a video or televised
broadcast in which the text is encoded within the video track and must be enabled
by the viewer in order to be seen.
Fixation: The location of the focal point of the eyes.
Fixation time (duration): The amount of time that the eyes are fixated on a point
(including small oculomotor movements).
Interlingual: Translation or reference between two languages (e.g. subtitles).
Intralingual: Translation or reference within a language (e.g. closed captioning).
Look zone: The area designated on the screen, usually a smaller designated area within
the total area.
Mode: Term used by various researchers, including Gunther Kress and Theo van
Leeuwen, to refer to a semiotic source of information, such as illustrations,
printed text, font, color and layout in print. At times, overlaps in definition with
Channel.
Multimodality: There are various interpretations of multimodality; regardless of which
type of mode is used in which reference, there is more than one source of
information involved.
Saccade: The movement of the eyes in between fixations during which little to no visual
information is obtained.
Subtitle: The presentation of a written translation on a film; usually found on foreign
language films and DVDs. It is an overlayed presentation of the script on top of
the video track.
Teletext: The presentation of a written transcription or translation of the spoken text at
the bottom of a television screen found in many European countries.
13
LIST OF TERMS:
Track: An audio/video text is usually comprised of several tracks, or layers, of
information that, when rendered, become the end result seen on a DVD, CD,
television or computer. Usually there is a video layer, an audio layer, a layer for
sound effects as well as another one for music, and a layer for the titles, e.g.
credits or subtitles.
Transcription: A direct copy, which in this study refers to a copy of spoken text to
written text (e.g. closed captioning)
Translation: A copy, which in this study refers to a copy of information from spoken to
written text although there may be interpretation and mediation in between the
copies that may take into account non-corresponding cultural differences and
grammatical differences (e.g. subtitles).
14
ABSTRACT:
Learning in a multimodal environment entails the presentation of information in a
combination of more than one mode, such as in written words, illustrations, and sound.
Research conducted across many disciplines (e.g. education, psycholinguistics, cognitive
science) shows that the multimodal presentation of information can be beneficial to
learners. This has been demonstrated in past studies that have included both school age
children and adult learners (e.g. Koolstra, van der Voort & d’Ydewalle, 1999; Neumen &
Koskinen, 1992). Other studies have described the benefit for both native and non-native
language learners (e.g. d’Ydewalle & Gielen, 1992; Kothari et al, 2002). Production by
learners of multimodal texts (Kress & van Leeuwen, 2001; Hull & Nelson, 2005) as well
as the interpretation by learners of multimodal texts (Scollon, 1999; van Leeuwen, 2004)
have recently been focal points of inquiry.
Building upon past studies, my research interests in multimodality instead focus
on the processes involved in the combination of various modalities and how this
combination is used by learners of differing proficiencies in English to gain better
comprehension (cf. Mayer, 1997, 2005; Graber, 1990; Slykhuis et al, 2005). As part of
this process, my dissertation focuses on the addition of the written mode (closed
captioning) to the already multimodal environment that exists in film and video
presentations, and in so doing I have presented the framework of a Multimodal
Multimedia Communicative Event in which to situate the language learner. Research
questions focus on the eye movements of the participants as they read moving text both
with and without the audio and video modes of information. Small case studies also give
15
a context to four of the participants by bringing their individual backgrounds and
observations regarding closed captioning and subtitles to bear on the use of multimodal
texts as language learning tools in a second or foreign language learning environment. It
was found that bilinguals and lower proficiency Non Native English Speakers (NNS)
(native speakers of Arabic) show longer eye movement patterns in reading moving text
(closed captioning) which is similar to those found with static text. Native Speakers of
English (NS) tend to have quicker eye movements when reading closed captioning.
When closed captioning was included with audio and video, the multimodal environment
was shown to be used differently by the two groups: NNS looked longer at the closed
captioning and NS were more able to quickly navigate the text presentation. While
associative activation between the audio and print modalities was not found to alter the
eye movement patterns of the NNS, participants were found to alternate between the
modalities in search of additional information. Other research using closed captioning
and subtitling as an additional modality have shown that viewing a video program with
written text added turns the activity into a reading activity (Jensema, 2000; d’Ydewalle,
1987). The current study found this to be the case, but the results differed in regard to
proficiency and strategy. Enabling the closed captioning while viewing multimedia has
been shown to help children improve their reading skills (Linebarger, 2001) and it is
hoped that this study will further knowledge of reading and also contribute to the second
language acquisition and language learning body of research regarding the use of
multimodal texts.
16
CHAPTER 1
WHAT’S CHIC?
“Reality exists outside language, but it is constantly mediated by and through
language: and what we can know and say has to be produced in and through
discourse” ~ Stuart Hall, Encoding/Decoding
“Chick,” I said. “It means a baby chicken - or it’s a slang word for a girl.”
“Or something with fashion?” my student said.
“What…?!?”
My young student, probably about 10 years old in 1997, had somehow learned a
strange low frequency word. After some further explanation, I figured out that he was
referring to ‘chic’. He’d read this word in the teletext prompt while watching American
television shows in his home in Vaslui, Romania. There weren’t an abundance of shows
in English to choose from, but Seinfeld had started about a month before, Baywatch had
been on for awhile, and I knew from previous questions that he liked to watch television.
He, and many of my other students, had an envious proficiency and command of
the English language in just their third year of English classes. All of their teachers were
trained in British English, and had British accents. But I had many students who had
American accents, yet had never met an American before my arrival. They were
television and movie junkies as well.
I started to think about simultaneous reading, hearing and watching as learning
tools for language learning. Upon my return to the United States, I started researching
17
the use of closed captioning as a tool in learning and in retention of vocabulary. This
dissertation is an expansion of my earlier attempts at understanding this combination of
input; I’ve now branched into the interdisciplinary studies involved in exploring
multimodality.
1.1 Multiple Multimodalities
The term multimodality is used in different fields of language study and different
disciplines to refer to quite different concepts. It can refer to the physical modalities such
as seeing, hearing, touching, tasting, etc. This is generally the term used in
psycholinguistics and cognitive science, for example when discussing synaesthesia
studies or pedagogical uses of technology (e.g. Paivio, 1986; Mayer, 2001; Nelson,
2006). Somewhat related1, multimodality can also be used when talking about analyzing
conversational discourse, such as the use of pragmatic cues or gestures to communicate
(Norris, 2004). Multimodality is also used to refer to the combination of different
semiotic qualities of a physical representation, such as the use of color, font size or
pictures in a magazine article or a children’s drawing (e.g. Kress & Van Leeuwen, 2001).
The common theme between all of these uses is the premise that communication is being
conveyed through more than one ‘channel’, or more than one signal. A combination of
cues is used to breach the gap between the speaker and hearer, or the author and the
1 Statistics also uses affiliated terms, e.g. ‘bimodal’ for the occasion when the data does not fall into a
normal curve, rather into two areas of distribution. In the medical sciences, multimodality is used to refer
to the use of more than one medical test or more than one treatment for a diagnosis.
18
viewer, encoder and decoder, or whatever two terms that could be used to describe a
communication event.
But to try to break a multimodal event into its pieces in order to observe and
analyze its constituents is quite challenging. For one reason, meaning may not be
contained in only one of the modalities; often it is a synthesis of the different channels of
meaning. Kress & Van Leeuwen, in the preface to their text Multimodal Discourse
(2001), admit to the complexity (and frustration) of trying to break down multimodal
texts into their individual pieces and instead had to create a framework and concepts to
go with their analysis. Sigrid Norris also needed to develop a framework to analyze the
combination of modes of communication in interaction, as she looked at the use of modes
such as gesture, gaze, intonation units, and even levels of awareness (2004). In order to
analyze the different aspects of websites, Lisbeth Thoriacius adapted Jakobson’s
Communication Model to break a website into its different functions and their rhetorical
appeals (2002).
However, all of these frameworks for analyzing the parts also take into account
the ‘whole’, and that the whole text, whatever the text is, “transcends the collective
contributions of its constituent parts”, according to Hull & Nelson (2005), who worked
with digital multimodal video texts created by teens at an urban community center
(2005). Perhaps the ultimate expression of this challenge of multimodal texts and their
analysis comes from an example originally used to describe language and thought by Lev
Vygotsky. In an analogy of a scientist looking for answers as to why water extinguishes
fire, Vygotsky illustrates that breaking this element into its pieces will not yield the
19
answers to the question since hydrogen burns and oxygen feeds a fire. Instead, it is the
union of the parts that is the process that should be studied (1986, p. 4). It is this very
notion that the multiple layers of meaning expressed in different modalities means more
together than when in monomodal expressions, and that mass media can be accessed by
such a large percentage of the world’s population, often for free or a low cost, that led me
to become interested in using media as a learning tool. The physical modes (channels) of
meaning gave me reason to think that learners with different proficiencies in the different
modalities (e.g. listening or reading) would be able to use their preferred modality to
learn and gather as much meaning from a text as possible on an individual basis.
Up to now I have touched on a few of the issues that I encountered while trying to
decipher all the different uses and definitions across disciplines, in the quixotic attempt to
bring them together in one dissertation. This doesn’t quite happen. One of the issues that
I encounter is my adherence to a functional approach to language and the use of authentic
texts, which leads to methodological entanglements. The inherent mix of variables in a
multimodal presentation of a text, and the desired analysis of a variable that sits within
the context of its original presentation without alterations, is fascinating and difficult. As
a result, in order to gather research data about one of the modalities separately, an
alteration to the original text was made, and the text in Condition 1 was reduced from its
multimodal form to a monomodal form with which information about reading closed
captioning can be gathered. Otherwise, all errors in the text were kept and the
multimodal presentation of the story was presented with all of its channels. The use of
authentic texts is not unprecedented, although it is in the minority in eye movement data
20
methodology. These issues of the text, the conditions, the participants and the literature
surrounding multimodal texts in reading and learning are discussed further in the
remaining chapters. Below, I give a preview of the research questions discussed in
Chapters 5, 6, and 7 as I explore the modality of moving print and different uses of it by
different participants:
1.1.1 Overview of Research Questions
READING PATTERNS OF MOVING TEXT:
How do the reading patterns of dynamic text differ in comparison to the
reading patterns of static text? And, is there a difference between native and
non-native speakers of English’s reading patterns of dynamic text?
VIEWING PATTERNS WHEN TEXT IS ADDED:
In what ways do the reading patterns of dynamic text change with the addition
of the multimodal environment? And, are there any similarities or differences
between native and non-native speakers in the reading patterns of dynamic
text?
SITUATING THE LEARNER: SMALL CASE STUDIES
What are the relationships, if any, that can be established between the
individual viewer’s reading patterns and the self reported background history
related to multimodal use?
1.2 Dissertation Overview
This introduction gives the impetus behind the research questions and places this
dissertation in a broad context of multimodality and learning. It also presents the reader
with a framework in which to place the study: second language learners using different
channels of information available in a multimodal event as tools for improving and
gaining comprehension. It gives the background for setting this dissertation in the context
necessary to move in and out of the different disciplines revolving around this study and
places the participants in a multimodal framework. Chapter 2, the literature review,
21
narrows the context to that which directly affects the conditions and questions of this
study. Chapter 3 describes the methodology to discover the answers and the materials
that were created in order test for them. Chapter 4, 5 and 6, the analyses of the three
research questions, details the analysis of the data collected, breaking the large text into
smaller segments, and Chapter 7 connects the questions and the answers, listing
limitations of data collection and possible future research venues, including pedagogical
applications gained from this study.
1.3 The context behind the texts and the participants
In the remainder of this chapter, I will establish the broader context for this
research study by looking at the parts that synthesize into the whole, as mentioned above.
One complication is the very interdisciplinary nature of the topic of this study.
Multimodality is at the core of my research interests. I believe that it can be used to give
reinforcement and options to language learners and interpreters of a text. However,
different disciplines give different meanings to the same terms: ‘multimodality’ in
psychology is quite different from that of literacy studies, and for that matter, from the
term ‘modal’ as it is used by grammarians. A seemingly simple term ‘text’ is defined
differently by scholars in reading, media, linguistics and rhetorical studies. This chapter
will set up a general context or framework in which this study can be grounded and
viewed within a larger picture.
However, one of the most important aspects of this chapter is to present the reader
with the context for the present study. The different channels of information available
during a multimedia event (explained below) are complex. In order to explain how the
22
second language learner fits into the framework of a multimedia event, Chapter two looks
at each factor of Jakobson’s Speech Event, and complicates it by expanding it into a
Multimedia Event for second language learners. Once the second language learner is
thus placed, the relevancy of the study to the Multimedia Event and the availability of the
different channels of information for use in interpretation by the second language learner
are presented.
1.4 Merging across disciplines
This study touches on many topics – and necessarily so, as multimodality by its
very definition indicates multiplicity in its essence. Multimodality is enormous in its
breadth, and it will be refined in its scope and relevance to this study later in the chapter
(for a definition, please see the List of Terms at the beginning of the dissertation). This
study looks at a minute area of multimodality, that of reading a text with or without the
additions of audio and visual texts for the purposes of supporting better comprehension
for second language learners (see Figure 1.1 below). But first, the context of
multimodality is needed, in order to place this study within it. Then, the study is placed
within a framework of communication, the Speech Event Model, so that a common set of
ideas can be used for description. Following this broader context, the focus will narrow
to the theoretical backgrounds of reading and the idea of a transaction with the text
including that of L2 readers, along with a review of past research in closed
captioning/subtitling, learner strategies, and a brief overview of past eye movement
research will finish. More specific literature regarding these will also appear in the
analysis sections. In the end, this study seeks to find the cross section between language
23
learning (and the language learner), reading, and multimodality, as illustrated in the
figure below.
Figure 1.1 Illustration of apex of study: The cross section between language learners
using reading in multimodal environments
1.5 Multimodality: Different definitions in different disciplines
There is more than one use of the term ‘multimodal’ in literature. Various authors
use modality to mean different literacies, or abilities to interpret and manipulate verbal,
graphic, acoustic, etc, texts, including the ability to semiotically interpret, for example,
different genres, the internet, or advertisements. Other forms of the word ‘modal’ include
linguistic terms, such as believability or strength related to verbs (could, would, might).
1.5.1 Multimodality 1: Sensory ‘modes’:
In Psychology, a multimodal event is one in which more than one modality is
used for perception. A famous example of a bimodal event is the McGurk Effect
24
(McGurk & McDonald, 1976), in which the sound produced (/ga/) is different than the
movements made visually by the speaker’s mouth (/ba/). The resulting effect is that the
person who is observing this altered bimodal event perceives the sound /da/ when the two
modalities, visual and aural, blend in perception. Multimodality, in this sense, refers to
instances when the modalities are physical in nature and there are multiple inputs of
different physical sensory modes, such as visual, aural, tactile, etc. An interesting
example of an integrated cross-modal experience is synaesthesia, in which activation or
use of one physical mode triggers another, e.g. hearing music and seeing colors.
In more common language experiences, recent experiments have shown the
bimodal synergistic effects between visual and aural stimuli. In these studies, response
times are faster when the stimuli are redundant (e.g. seeing a dog and hearing a dog
barking) (cf. Giard & Peronnet, 1999; Fort et al., 2002). Experiments involving affective
prosody (e.g. seeing a face while hearing a sentence, both on a happy to fearful
continuum) have shown that aural stimuli can bias perception of visual facial expression.
In other words, humans integrate the two modes and are influenced by both in an
automatic and mandatory response, regardless of attentional resources (cf. Gelder &
Vrooman, 2000; Vrooman et al., 2001; Pourtois et al., 2005). In language learning,
gesture, too, has been shown to play a role in comprehension in a multimodal form of
communication (cf. McNeill, 1992). Breckenridge Church, Ayman-Nolley, & Mahootian
(2004) conducted a classroom experiment in which gesture either did or did not
accompany instruction in English to children with an L1 in Spanish. Fifty percent of the
children who received the concrete, representational gestures improved their
25
understanding in a post-test compared to only 20% of the children who received
monomodal instruction, or without gesture. The authors point out that gesture should be
used to enhance instruction for learners and for those learning in a language other than
their first language.
It is evident that humans use multiple sources of information to inform themselves
of the world around them; this is particularly useful in the use of media as a tool for
learning a second language: using subtitling/closed captioning may reinforce the modal
inputs. These types of psychology experiments, in which the sensory modalities are
separated and strictly controlled, seem to go against a transactional or sociocultural view
of language in that they are based on the premise of studying the unitary nature of
complex items, instead of the parts. Vygotsky (1986) uses the metaphor of studying
water: If its composition is studied for its extinguishing properties by separately studying
oxygen, which fuels fire, and hydrogen, which burns in contact with fire (pp. 4-5), then
its combination is puzzling.
Likewise, multimodality should be studied in its totality. However, by studying
perceptions of bimodal stimuli, further advances in discovering how the brain functions
and perceives stimuli actually provide support for the complexity of thought and
language. Therefore, this study looks at both conditions: one in which the channels of
information have been isolated and so only dynamic text (closed captioning) is shown,
and one in which all of the modalities are present (aural, visual and print textual channels
of information).
26
1.5.2 Multimodality 2: Modes of semiotic ‘literacies’ of different ‘modalities’
But other definitions of multimodality exist in the literature. Kress and Van
Leeuwen (1995) argue for taking a fresh look at the economic changes in the post-
industrial world, in which the changes are “knowledge based” and “information-driven”
and in which “information of various kinds may be more aptly expressed in the visual
rather than in the verbal mode,” including looking at human semiosis in the domain of
communication and representation (p. 183). In the past, writing was primarily perceived
as monomodal in perception and valued over speech. The senses were often thought of
as separate. Kress and van Leeuwen give examples of this monomodality such as writing
without illustration, paintings used canvas and the same mediums (oils), in concerts
musicians dressed in similar uniforms, whereas today there is more crossing of
boundaries. The authors state the need to “explore principles behind multimodal
communication. [They] move away from the idea that the different modes in multimodal
texts have strictly bounded and framed specialist tasks” and give an example of the
prescribed modality of film in which “images may provide the action, sync sounds a
sense of realism, music a layer of emotion, and so on, with the editing process supplying
the ‘integration code’, the means for synchronizing the elements through a common
rhythm…” (p. 2). In opposition to this, Kress and van Leeuwen see “a view of
multimodality in which common semiotic principles operate in and across different
modes, and in which it is therefore quite possible for music to encode action, or images to
encode emotion” (p. 2) in which the ‘same’ meanings can often be expressed in different
semiotic modes.
27
Hull and Nelson (2005) explored the semiotic use of multimodality in digital
texts. In a discussion about the use of multimodality in literacy pedagogy, they state, “the
point is that images, written text, music, and so on each respectively impart certain kinds
of meanings more easily and naturally than others. We believe that this idea is the most
crucial conceptual tool that one must bring to bear in understanding the workings and
meanings of multimodal texts” (Hull & Nelson, p. 229). The authors explore the use of
multimodality in different semiotic modes in digital storytelling (including spoken words,
images, music, written text, and movement and transitions), finding that the different
modes act synesthetically in their multimodality through four qualities: a) the visual
pictorial mode can repurpose the written, linguistic mode; b) iconic and indexical images
can be rendered as symbols; c) titles, iconic and indexical images and thematic movement
can animate each other cooperatively; and d) modes can progressively become imbued
with the associative meanings of each other (Hull & Nelson, p. 239). In the end, the
authors call for a widening of the definition of writing in which multimodal composing is
included (p. 252).
1.6 Placing multimodality into communication: The Speech Event
For this study, in order to place everything into a unified structure, all aspects will
be grounded in the Speech Event Model as laid out by Roman Jakobson (1960). No
attempt is made to offer the model as the only way to look at language learning or at the
use of multimodal texts, but merely as a streamlined and effective one that gives the
reader and the author a way to establish terms and relationships within this research. In
fact, for this study, the Speech Event Model could really be re-titled as a Communicative
28
Event Model, or one that works with a synchronous, dialogic speech event occurrence.
Also relevant is a discussion of Literacy Events, but this will be discussed in the next
chapter. At this point, it is sufficient to attend to the broad picture: all interactions are
transactions, in that something is given and something is changed. In Sociocultural
Theory (Vygotsky, 1986), learning is not in isolation but rather it is the interactions
between people that can further assist learning, while in Rosenblatt’s Transactional
Theory of Reading and Writing (1994), as well as in Goodman’s Transactional
Sociopsycholinguistic View (1994), the reader is involved in an interaction with the text,
and brings to the text world knowledge and expectations that change as the reader reads.
This dissertation revolves around multimodality by concentrating on one of its
modalities: reading. In Chapter 3, a background of relevant reading literature and
theories will be explored. At this point, for this chapter, the idea of an interaction is the
main focus.
Similar to the idea of interactions and change, Jakobson discusses the Saussurian
ideas of langue and parole as he furthers their definitions as code and message
(1960/1990). As a speaker speaks (the message) he/she is adding to the fluidity of the
code, or the larger summation of language that is used by the language’s speakers. This
interaction and change is important for second language learners as a language is in
motion and never static. For this reason, Jakobson’s Speech Event Model will be laid out
so that the dynamic relationship between second language learners and the language can
be analyzed.
29
1.6.1 Jakobson’s Speech Event Model laid out
Jakobson used Karl Bühler’s tripartite model of the speech event to study
language as a dialogue by expanding it in its early stages to four factors: a speaker
(encoder), an addressee (a decoder), a thing referred to (context) and then he added the
message (or, the instance of parole communicated from speaker to addressee) (Waugh &
Monville-Burston, p. 15). Jakobson’s actual Speech Event Model, as later laid out in his
plenary speech for a “Linguistics and Poetics” conference, is composed of the six factors
(addresser, addressee, context, message, contact and code) which are variously focused
on in the message by six functions (emotive, conative, referential, poetic, phatic and
metalingual) (Jakobson, 1960, p. 72-3) (see Figure 2.2 below). It is a broad model in its
deceptive simplicity, one which is easily used as a tool to analyze not only spoken but
also written discourse, including one of Jakobson’s passions, poetry.
Figure 1.2 The factors and functions of the Speech Event Model (Jakobson, 1960).
Six factors of the speech event:
CONTEXT
ADDRESSER MESSAGE ADDRESSEE
CONTACT
CODE
Six functions of the speech event:
REFERENTIAL
EMOTIVE POETIC CONATIVE
PHATIC
METALINGUAL
30
Its ease of application stems from its ability to encompass an entire speech event
within the model, and so it facilitates further expansion of the description and breakdown
of communication. As the process of communication necessarily entails at least two
parties, (or the essence of a dialogue whether it be inner-speech (Vygotsky, 1986) or in
face-to-face interaction), Jakobson’s model captures this dynamic nature of the speech
event in the factor he labeled ‘context’. To define the parts of the Speech Event Model,
as they relate to an attempt at a successful communicative act, the speech event
physically begins with the ADDRESSER, who, through CONTACT, sends a MESSAGE
(constructed on the basis of the CODE), which is embedded in a CONTEXT, to the
ADDRESSEE. If successful, the addressee will understand and be able to generate a
communicative act back if he or she so chooses.
As an example of this, two people, George and Leah, are talking. George says,
“I’m hungry – wanna catch a bite to eat?” to Leah. In this example, George is the
ADDRESSER, Leah is the ADDRESSEE, the MESSAGE is the utterance, which is in
English (the CODE) and the CONTEXT is that it is dinner time as the two are sitting on
the couch (CONTACT). But what happens in a multimodal environment is that part of
the MESSAGE could be conveyed by pragmatic means, such as indirect speech acts, or
by body language in the form of gestures, which could reinforce the simple message, or
could create confusion. While the message in this case seems fairly straight forward, it
could be misconstrued by a second language learner for whom the CODE is not as
familiar, especially if idiomatic phrases or slang terms are used. If Leah looks confused,
George can re-state his question using different words or gestures or intonation, etc., in
31
which case he is altering his speech act for his addressee and through the interaction of
the speech event the addressee can attempt understanding.
Jakobson’s speech event model includes the broad nature of the context, and
indeed, without the dialogic factor communication would fail. This would seem to pose a
problem for using this model as the base framework for this study on multimodality and
the use of multimedia, however I contend that no use of multimedia is truly a one way
communication event existing in isolation. Rather, a multimedia text is always created
for an audience so that a viewer will understand and this, coupled with the inner dialogue
of the viewer and the use of the multimedia as a learning tool, necessarily creates the
conditions needed to fill the schema of the addresser, addressee and the remaining factors
and functions of a speech event. Mainstream multimedia texts are rarely created to
exclude viewers, particularly multimedia texts which aim at large audiences (e.g.
television, film, and the internet). Again, the functions and factors as they relate to a
multimedia text and a viewer will be further explained as the chapter progresses and the
discussion progresses to a sociopsycholinguistic, transactional interaction between the
viewer and the text.
1.6.2 A communication event: The multiple sides of the addresser
The communication event then is composed of an interaction that can be analyzed
according to its functions and its factors. To start with, the degree and strength of the
orientation of the function depends on the composition and intent of the message; for
example, a message focused on the context would carry a greater degree of the referential
function and perhaps less of the poetic or metalingual function. In terms of a multimedia
32
example, this might be something similar to a newscaster (the addresser) talking about a
news item (the message) for the benefit of the audience (the addressee) in which the
intent is to deliver the news (the message is focused on the context, the referential
function). A contrastive example may include a commercial for a bath bubble product, in
which the language that is used is poetic in its alliteration and flowery language is used to
convince the viewer that this bath bubble product will relax the buyer (and so the
message is quite emotive in its function by the addresser, or the commercial’s director,
and uses poetic language and imagery in its message to reach the viewer, or the addressee
who is the receiver of the conative function). The point is that a message in and of itself
has many components, and so an addressee, or the person who must decode the message,
must take this myriad of components into account to try to understand what the addresser
meant. If the addressee is not of the same culture, one or more of the components may
not be interpreted or decoded correctly.
For a second language learner, not only is the message a difficult path to navigate
at times, but also the task of figuring out the context of the addresser and how the
‘framing’ of the context places the addresser, especially if the addresser is embedded in a
multimedia text such as a movie or television show. Besides Jakobson’s framework,
there are other, similar, frameworks that illustrate the complexity that a person
encounters when engaging in communication. One of the more prominent
sociologists/anthropologists in the topic of communication and participation, Erving
Goffman, divides the speaker into three distinct roles. In his participation framework,
Goffman refers to the speaker, or the first person “I” in deictic terms, as the ‘animator’,
33
‘author’ and ‘principal’, who can coincide and be the same person, or can be
differentiated into the one who produces the message (in voice) (the animator), one who
is responsible for the selection of words (the author), and the one whose beliefs or
positions are being represented (the principal) (Duranti, 1997, p. 297-8). For example, in
the production of media texts, such as film and video, the animators would be the actors
and the characters that they play, the authors would be the scriptwriters, while the
principals would be the directors and producers.
Verschueren (1999) further expands upon the multiplicity of the addresser. In a
Bakhtinian sense, the addresser, or the ‘utterer’, can consist of different layers, or voices,
relating to the original ‘source’. As the “utterance is simply defined as a stretch of
discourse produced by the same utterer(s), with a relatively clear beginning and end” (p.
81), the utterer producing the utterance echoes that of Jakobson’s ‘addresser’ in that the
discourse can be either spoken, or written. In the case of written discourse, Verschueren
designates the option of the author as the main utterer although an ‘embedded utterer’,
who is one (or more) of the characters, produces the utterance. An example of the
embedded utterer in spoken discourse includes mass media, such as “in the world of
television, an interviewer and an interviewee are consciously engaged in the uttering and
interpreting of linguistic forms which they know or want to be embedded into a wider
communicative event with the television network at one end and a mass audience at the
other, either simultaneously or consecutively” (Verschueren, p. 82).
This concept of the embedded utterer is important, especially in reference to the
influence and use of media and technology in modern terms. As such, Verschueren lays
34
out the terminology for the multiplicity of voices and sources surrounding the main
utterer, using uttererE for the embedded utterer, and uttererV for a virtual utterer, or a
reference to an utterance which might or might not have been uttered. He also
designates the ‘source’ as the origin of information about which one is uttering, which
can again be further specified into a source that is not the actual original source but a
source once removed (source-1), or a virtual source (sourceV) in which the actual original
source of information is unknown. All of these utterers and sources, virtual or removed,
may be used within the course of a speech event. For the purpose of this study, it is
suffice to say that, although a communication event can be quite complicated when
involving a media text, the audience including second language learners need to navigate
the multiplicity of the addresser and eventually the addressee in order to interpret the
message.
1.6.3 A communication event: The very necessary contact and context
Second language learners have to able to engage in a communicative event
beyond just receiving the sound wave produced by speech: there is more to interpreting
the message than translation from one speech event to another. The addresser/utterer/
speaker has to have some way to communicate the message, and so the factor of
CONTACT is designated as, “a physical channel and psychological connection between
the addresser and the addressee, enabling both of them to enter and stay in
communication” (Jakobson, 1960, p. 73). The contact factor is related to a focus on the
message by the phatic function, and so the purpose is not only a physical presence, but
also an underlying psychological understanding of the communication act. Important for
35
this study and the idea of multimodality in general is the premise that a message may be
conveyed semiotically through non-linguistic communication (e.g. through visual
representations, through music, etc.). Therefore, in second language acquisition, a
participant may not understand all of the pragmatic communication that could fall under
the limits of the contact factor and its phatic functions in a communicative event but may
be able to pick up some of the message through other means.
The CONTEXT is emphasized by the referential function, or the “leading task of
numerous messages” (Waugh & Monville-Burston, p.15), or the communication of
information. Jakobson placed messages in the contiguity of a context: “whether
messages are exchanged or communication proceeds unilaterally from the addresser to
the addressee, there must be come kind of contiguity between the participants of any
speech event to assure the transmission of the message” (1956, p. 120). Between the
participants, an internal relation must exist, through “the separation in space, and often in
time…there must be a certain equivalence between the symbols used” in order for
communication to be successful.
As an example of how the context, and the signs selected, must be understood by
the participants in order for communication to take place, Barthes uses “The World of
Wrestling” (1957) as a communication event in French (entertainment) culture. As a
spectacle, rather than a sport, wrestling uses signs to signify meaning to the audience;
without the context of the spectacle, and knowledge of its semiotic system, the signs
would vary in their interpretation. But within the world of wrestling, the actions take on
36
clarity in their meaning as representations for larger concepts as an “intelligible
spectacle” (Barthes, p.20).
An illustration from a multimedia2 standpoint might further resonate: if a viewer
starts a film or a TV series in the middle, much of the context is lost and the storyline is
confusing. The beginning of the narrative is needed for full comprehension and
relevancy of the current plot. In the same way, for the effective exchange of information
a communication event must exist in a context that is known to both the speaker and the
hearer. Likewise, in second language acquisition, the context is highly relevant to
comprehension. An outsider entering into a speech event with a non-familiar context
(e.g. a new community, a new job, a new class, a new culture, a new language) will
encounter initial difficulty, as the semiotic clues may have different meanings. For a
second language learner, the instances of parole, or the messages, may be slightly
different from the phrases and language learned in a foreign language classroom and
static textbook, and so when entering into an environment, or communicative event, the
learner may not be able to correctly interpret or comprehend the message.
It is possible, however, that a modified multimedia text may be able to assist the
learner in accessing more recent and relevant instances of parole within a situated langue.
In the previous section, I indicated that I believe that all multimedia texts are modified to
some extent in that they must appeal to a wide audience in order to be successful media
texts. In the following section, I will further explain how meaning in media texts is
2 Multimedia is the use of words and pictures to present material (Mayer, 2001) while multimodality refers
to the broader definition in which more than one mode is used to communicate or present material (e.g.
aurally, haptically, visually, etc). See earlier in the chapter for definitions and uses of the term
multimodality.
37
shared between interlocutors, and further explain the idea of accessing the langue and
parole, and a modified multimedia test in the terms of recipient design.
1.6.4 A communication event: Message & code
Language has fluidity. The message and the code exemplify this. Jakobson
defines the MESSAGE by illustrating its essence and integration with the code, and so
the two are intertwined. The MESSAGE and the CODE are also known as langue and
parole; the instance of parole includes the influence of langue, and the necessary semiotic
knowledge in order to interpret the instance. In other words, the utterance by an
individual (parole) is influenced and incorporated within the intersubjective code of
society (langue). In “Langue and parole: code and message”, Jakobson discusses, in
depth, Saussure’s definitions of langue and parole by contrasting them with his own
extension of the two aspects of language. Whereas Saussure envisioned the two as
distinctly different, Jakobson found integration instead of dichotomy.
Saussure argued for linguistics to concentrate on langue, as parole was confusing
because of its individuality (Culler, p. 41). In this framework, Saussure worked towards
an explanation of, what he called, the arbitrary nature of signs and in so doing needed to
separate the social nature of langue from the individual nature of parole and so create a
controlled system to study. Jakobson did not agree with this strict dichotomy, as he saw
langue and parole both at once as social and individual in their aspects. One of his main
arguments against the notion of parole as an individual act is based on the idea of
language as a dialogic action, always containing a speaker and listener (even if the
listener is internal), and so because “a conversation is a whole of which each remark is
38
but a part that cannot be separated from the whole except in an artificial manner…
[parole] is an intersubjective phenomenon, and consequently, a social one” (Jakobson,
1984, p. 93). This relationship between langue and parole as individual and social in
nature is important for second language learners: as access into a language community
progresses, a second language learner will hopefully learn more about the
intersubjectively held meanings and be able to use them as they engage with other
members. One way of accessing a new language community is through the multimedia
produced by its members, for its members.
The message, or parole, is also something that must be decoded by the addressee
in order for the intention of the communication act to be successful. In fact, Jakobson
later altered the terms, addresser and addressee, to encoder and decoder (Waugh &
Monville-Burston, p. 489). The concept of encoding a message which must be decoded,
using knowledge of the code and message, is echoed in the process of the ‘recipient
design’, or the “process of adapting forms of expression to interpreter roles”
(Verschueren, 86). In this process, “speakers design their speech according to their on-
going evaluation of their recipient as a member of a particular group or class” (Duranti,
299). In recipient design, the utterance (or the message) is “designed specifically for an
intended audience, to ensure continued attention as well as the desirable level of
understanding” (Verscheuren, 86). If it is face-to-face conversation then the design,
encoding, and enactment of the message is simultaneous, while if it is a communication
event that is pre-drafted, pre-recorded, or rehearsed, such as in a media event, the
audience is considered and appropriate language use is judged ahead of time.
39
Recipient design in the use of the message (parole) within a common code
(langue) is relevant to another of Jakobson’s topics of inquiry: that of the axis of
equivalence. Jakobson emphasized the selection and combination of units of language
when building utterances, starting with lexical items and grammatical structure combined
to form sentences and on up. But these choices are limited in that the utterances must be
built “from the lexical storehouse” common to the addresser and addressee (Jakobson,
1953, p. 117), and thus “the efficiency of a speech event demands the use of a common
code by its participants” (p. 120). The message, then, is composed of these choices
which fall into two modes of arrangement: combination and selection.
On the level of the building of an utterance (for Jakobson uses these two modes in
the selection and combination of units as small as phonemes as well), the “addressee
perceives that the given utterance (message) is in a combination of constituent parts
(sentences, words, phonemes) selected from the repository of all constituent parts (the
code)” (p. 119-120). The combination of words follows a similar path set by Saussure,
in the paradigmatic and syntagmatic choices. It is within this dichotomy that the
addresser chooses the message. It is also within this dichotomy that the addresser must
choose linguistic signs that the addressee can decode. Within the ideas of
accommodation and recipient design, then, the choices for a message are restricted or
expanded depending on the common knowledge of a similar code, as well as knowledge
of the recipient. In second language acquisition, this choice of signs is fairly crucial; in
media the rhetorical devices chosen are likewise crucial to the intended decoding by the
recipients. Therefore, the specific codes chosen for accessible decoding by a wide
40
audience also has the potential for decoding by the extended audience: those viewers,
such as second language learners, who are outside the intended language community but
are still interested in accessing the multimedia event.
While the message is obviously an important part in a communication event, for
second language learners the code, e.g. the choice of which language is to be used in a
conversation, is crucial. The code must be at least a partial commonality between the
addresser and addressee in order for successful communication to occur. Much of the
code is agreed upon by the society in which the code rests. It is, as Duranti describes it,
“commonality between the participants, reference to cultural occurrences and events,
common expectation for cultural speech acts” (pp. 318-9). The addressee therefore must
be able to select the correct meaning that the addresser has embedded in the utterance.
1.6.5 A communicative event: Genre & intersubjectivity
Oftentimes the meaning is embedded in a selected common ‘genre’ or a typical
form of language, or utterance, including certain typical kinds of expressions used within
genres, i.e. greetings, farewells, table conversations, narrations, or as Bakhtin writes,
“typical situations of speech communication, typical themes, and, consequently, also …
particular contacts between the meanings of words and actual concrete reality under
certain typical circumstances” (Wertcsh, p. 60-1). Bakhtin referred to the existence of
different types of genre in speech as a necessary element; to further explain its integral
part of deciphering the code, he states that “a speech genre is not a form of language, but
a typical form [a type] of utterance; as such the genre also includes a certain typical kind
of expression that inheres in it” (quoted in Wertsch, p. 61). Wertcsh writes that, in
41
Bakhtin’s view, “it is no more possible to produce an utterance without using some
speech genre than it is possible to produce an utterance without using some natural
language, such as English” (p. 61).
If the genre is so integrated with the code of the society, then the outsider, or a
second language learner, could encounter difficulties decoding the addresser’s selection
or use of genre and its multivoicedness, or ‘ventriloquism’. Indeed, Bakhtin addresses
the issue of necessary knowledge of the code when he discusses the multivoicedness of
speech genres in writing that “the expressiveness of individual words is not inherent in
the words themselves as units of language, nor does it issue directly from the meaning of
these words: it is either a typical generic expression or it is an echo of another’s
individual expression, which makes the word, as it were, representative of another’s
whole utterance from a particular evaluative position…” (Bakhtin, p. 85). These
expressions of words within different genres are used within pragmatic contexts of
speech communication held intersubjectively by a speech community, one of which
second language learners may or may not be aware.
On the other hand, the outsider or second language learner could also use the
recognizable patterns known genres to aid in comprehension, such as those found in
media texts. A television show about the news is supposed to adhere to certain features
such as an unbiased truth in reporting, although there are varying degrees of reliability
and veracity within this genre. However, the point is that there are similar features across
different news shows that are recognizable and characteristic of the genre, such as the
reporter or the newscaster, the presentation style on the screen of a person speaking with
42
the presentation of additional pictures or video clips that are related to the newscast.
These characteristics make the genre easily recognizable and also somewhat predictable
in its presentation of information. Likewise, many films follow a similar narrative
pattern, with introductions, complications and a climax in the plot near the resolution at
the end.
The idea of the intersubjectivity of the code is also described by Eco as he
integrates semiotics within the code when he states that “any linguistic innovation can
work only when accepted and integrated by social consensus, and the same happens with
the other communicative systems,” and adds that, “any semiotic system is submitted to
general semiotic laws and functions as a code; but such codes are also linked to specific
communities (from village to ethnic unit) in the same way in which a language produces
its subcodes linked to given professions or activities…” (Eco, p. 112). The code, or
langue, including the use of speech genres, held intersubjectively by a society describes
the necessary semiotic knowledge and commonality needed by the addressee to decode a
message, and one that must be accessed or partially shared by the second language
learners in order to effectively understand the communicative event.
1.6.6 A communication event: The all-important addressee
Finally, as the last of the six functions of Jakobson’s Speech Event, the
ADDRESSEE can be discussed. In this study, the addressee is very important, as this
Speech Event function is comprised of the participants. The addressee, or the recipient,
decoder, interpreter, or hearer, of the speech event needs to have knowledge and
understanding of the context and code of the message in order to interpret the message.
43
This involves a similar “equivalence between the symbols used the addresser and those
known and interpreted by the addressee (Jakobson, 1954, p. 120). For Bakhtin, the
addressee “can be an immediate participant-interlocutor in an everyday dialogue,…a
more or less differentiated public, ethnic group, contemporaries, like-minded people,
opponents and enemies, a subordinate, a superior, … it can also be an indefinite,
unconcretized other” (p. 87). In other words, the addressee echoes the complexity of the
addresser. It is within the addressee that the communication event is successful or not
and, as Verschueren stated it, “once spoken, an utterance is no longer in the control of the
utterer: it begins to lead a life of its own in the mental worlds of others” (p. 169).
Within his production format of an utterance, Goffman separates the addressees,
or hearers, into two groups: ratified or unratified participants. These distinctions are
important as “speakers will modify how they speak, if not what they say” (Goffman,
p.136) depending on the audience and the participants. In a similar vein to Goffman,
Verschueren complicates the interpreter much like he does with the utterer. In order to
distinguish the multiple audiences possible of an utterance, Verschueren labels those
interlocutors closest to the utterer, or those directly involved in the speech event, as direct
addressees, while those who are also listening and acknowledged participants, or
presences, are labeled as side participants. All other are non-participants, but since these
non-interpreters are within hearing range of the communication event, they are also
factors accounted for and labeled as such. Bystanders are those who are nearby,
overhearers are possible interpreters and these include listener-ins (those that are
attempting to listen) and eavesdroppers (those that are secretly listening) (Verschueren, p.
44
83). An addressee can even be an intermediary, or someone who is told a message with
directions to pass the information on to another, such as a press release given to a
journalist who gives it to be printed or released through a media outlet to a wider
population (a ‘removed addressee’, or addresse1)(p. 85). An important note regarding the
addressee and media is discussed by Stuart Hall in which the interpretation and
acceptance of the message can be negotiated by the recipient (Hall, 1980).
To sum up Jakobson’s speech event, with its six factors as the focus, is to realize
the integrativeness of an act of communication. Bakhtin writes about the integrative
features:
We know our native language – its lexical composition and grammatical structure
– not from dictionaries and grammars but from concrete utterances that we hear
and that we ourselves reproduce in live speech communication with people
around is. We assimilate forms of language only in forms of utterances and in
conjunction with these forms. The forms of language and the typical forms of
utterances, that is, speech genres, enter our experience and our consciousness
together, and in close connection with one another… (p. 83)
It is within the code and message, the selection and combination of words and phrases,
with influence from the environment of the addresser and addressee and inclusion of the
contact and context, that Jakobson’s model de-lineates speech into a necessarily dialogic
act. For second language users and those without an intimate knowledge of the code
and/or context, communication can become ‘tricky’ and the intended interpretation of the
utterance can fail to be successful. However, if the message is presented in a multimodal
45
format, in which a second language user may choose the easiest, most familiar path to
approximate comprehension of the message, then the success rate may improve.
1.7 The puzzle and comprehension…
Taking the complexities of the functions of the speech event into consideration
gives rise to a metaphor which I will attempt to provide and examine in the context of
this study. If one takes the idea that comprehension of a message by the recipient entails
interpretation of bits of the message as conveyed through different means, such as sound,
gesture, color, the context, the speaker, the pretext of what happened before the utterance
and the predictions of what might come next, and that these bits of information come in
different modalities of presentation (sound, vision, touch, etc), then an interpreter of the
message picks the bits that carry the most relevant and easily interpretable meaning for
that individual. If a person is hard of hearing, then much of the message may come from
the visual aspects of the message presentation, while if a person is a second language
learner then perhaps the learner’s strengths in, say, reading, may be the preferred manner
of interpreting a message (therefore in a print medium). In other words, the meaning of
the message is put together much like the pieces of the puzzle, whereby the interpreter of
the message gathers all the information that he or she can and strategically puts them
together to form the best, most complete picture, or interpretation, possible. Therefore,
the metaphor that COMPREHENSION is a PUZZLE will be looked at from the point of
a second language learner in a multimodal learning environment. Questions two and
three will look at this metaphor is some detail using eye movements to look at the
viewer’s selection of information input, while Question three will use small case studies
46
to see if there is any correlation between a learner’s strategies for language learning and
what he or she actually does when watching a multimodal event.
1.8 Summary
This section has focused on setting up the ideas surrounding a Communication
Event, drawing on Jakobson’s Speech Event as the central schema. On the one side,
Verschueren provides insight into the varied composition of the addresser(s), while on
the other side exists the multiple layers of the addressee such as that posited by Goffman.
However, the terms used by Jakobson are slightly different than those used by Goffman
or by Verschueren, or by Kress & van Leeuwen or by other scholars who will be
referenced in the forthcoming chapters. In order to truly place the addresser and
addressee within the environment of a multimodal multimedia communication event,
which is constructed of different producers of a message and multiple receivers of that
message, a separate terminology will be used, as illustrated below in Figure 2.3:
47
Figure 1.3 The Multimodal Multimedia Event.
The Multimodal Multimedia Communication Event then is a media text that may
be viewed by audiences other than those originally intended: a global community.
Whether the original audience was an English speaking audience in Europe or North
America, due to global communication viewers may exist in every time zone, years in the
future, and could watch it in a variety of languages due to the addition of subtitles/closed
captioning or voiced-over narration. Language learners (either L1 or L2) can use these
media texts as tools to access the code/langue and the immediacy of the message/parole
of the characters and story as it was originally intended for the primary audience, or the
members of the original community. In addition, because of the multimodal aspects of
the media text, learners have optional access points for interpreting and comprehending
the actions and dialogue of the characters: gestures. Whether or not the gestures can be
labeled as ‘authentic’, since they are performed by actors, the actions nevertheless are
Addresser(s):
producers,
editors,
writers,
actors,
and
subtitlers,
voice-overs
PRODUCE
R
Addressee(s):
Primary
audience
Secondary
audience
(re-runs,
internet,
global
satellite…)
RECIPIENT
The Multimodal Multimedia
Communication Event
language
learner
48
acceptable to the primary audience and used by that audience as well to interpret and
comprehend the media text. In that manner, they can also be used by a secondary
audience to learn. In Chapter two, the Multimodal Multimedia Communication Event
will be placed in relationship to this study and within relevant background literature.
49
CHAPTER 2 –POSITIONING THIS STUDY
2. Overview of Chapter 2
In Chapter 1, the participants involved in this study are placed in the wider
context of a multimodal event. On one side of the event is placed the producer of the
message, or the “encoder” in a communicative event and the “text” in a literacy event,
labeled as such in the context of the multimodal text it consists of multiple producers of
the message or text. On the other side, the “recipient” interacts with the message or text.
The idea of the recipient is often composed of an extended audience that has the ability to
transcend a singular moment in time, since many multimodal texts are available to be
recorded and displayed repeatedly to different audiences in different cultures, which was
explained in Chapter 1 as ratified and unratified participants.
As illustrated in Figure 2.1 below, the left side of the Multimodal Multimedia
Communication Event is termed the ‘producer’ and encompasses the actions and input of
the various people and medium restrictions in the production of the message. The right
side of the Multimodal Multimedia Communication Event is termed the ‘recipient’, or the
person or persons who are the receivers of the message. Note that the term recipient has
been used (instead of hearer, listener, viewer, receiver, reader or decoder) which avoids
many of the discipline heavy interpretations and cross-interpretations and also avoids the
direct implication of the acceptance or comprehension of the intended message: the
recipient is only a recipient with no overt interpretation of the message: it is a referent for
50
the person in that position. What the person does with the message may involve
additional terminology depending on the message and the context (the multimodal event).
The name of the event, Multimodal Multimedia Communication Event, is
necessarily long because it is an accumulation of other types of Communication Events.
Referring to the framework set up by Jakobson, the Communication Event refers to the
integrated factors involved in communication. A Multimodal Event is an interaction or
presentation of information that involves more than one channel of information on a
physical, sensory level, such as aural and visual channels. A Multimedia Event is one in
which multimedia texts are used in the presentation of information, using semiotic
qualities found in color, font size, layout, sound, and video. A Multimodal Multimedia
Communication Event, however, is one in which there are factors involved that use
sensory channels of information integrated within the semiotic qualities of
communication to affect a message to the recipient and that recipient may or may not be
the primary intended audience but is still recognized as a conscious component of the
event.
Figure 2.1 The Multimodal Multimedia Communication Event
PRODUCER:
p
roducers,
editors, writers,
actors,
and
subtitlers,
voice-overs
RECIPIENT:
Primary
audience
Secondary
audience
(re-runs,
internet, global
satellite…)
The Multimodal Multimedia
Communication Even
t
language
learner
51
Therefore the multimodal multimedia communication event is recognized as being
similar to a text in the form of a book or article, in that it transcends time, and is
distinguishable from a momentary speech act which fades as time passes. The important
aspects, as laid out in Chapter One, that are particularly relevant to the construction and
theoretical positioning of this dissertation include the premises that the recipient is not
necessarily the primary audience of the producer, that the recipient is an active participant
in the interpretation of the intended message, and that each recipient’s individual
experiences and background knowledge affect the interpretation of the message. It
should also be restated that the term ‘language learner’ refers to L1 and L2 learners (NS
and NNS refer to the learner’s relationship with that language).
In this chapter, I use the context and placement of the recipient as explained in
Chapter Two, and narrow the context by showing past research and relevant literature
that surrounds the participants in this study and their connection with it.
Figure 2.2 Areas of research included as influencing this study
M
ultimodality:
r
esearch and
u
se
Closed
captioning
and subtitles
Research
with closed
captioning
and subtitles
Reading
theory &
research / L2
reading
Eye
movement
research
Learners:
styles and
strategies
Participants
and the
study
52
Figure 2.2 represents the interdisciplinary areas of inquiry that affect research in
multimodality and in particular research with an eye tracker that focuses on reading
closed captioning with non-native speakers of English (NNS). All areas are important in
informing each other, and so they are shown as overlapping circles rather than bullet
points in a list implying hierarchy.
I start by briefly explaining the history and definitions of closed captioning and
subtitles, since that sets the stage for the study, and from there explain the research that
has already been conducted both in the United States and in Europe that uses closed
captioning or subtitles as a focus of the study. This literature segues into multimodality,
its multiple definitions and uses and research that has used it as a focus. While eye
movement research is used within both of the previous topics, without a background of
relevant reading research, and L2 reading in particular, the eye movement research isn’t
anchored within the multimodal event and so it will be the last to be explained before the
methodology is laid out in Chapter 3, and the research questions are explored in chapters
4, 5 and 6.
2.1 Closed captioning and subtitling
There are significant differences between closed captioning and subtitling,
including the terms, the technology and the intended audience. The terms Open versus
Closed is one of the primary differences. Open Captioning is when the presence of text
somewhere on the viewing screen is always present, similar to the presentation of
subtitles, while Closed Captioning requires a decoder chip or box to display the encoded
53
text on the viewing screen. Subtitles fall into the category of Open Captioning, as they
are either always viewed, or are activated as a secondary track on a DVD. Recent
technology for television sets take advantage of an optional closed captioning system,
whereby an encoded signal that contains the captioning text is displayed only when the
viewer turns on the decoder, usually by the remote control or through the television set’s
menu system.
Another difference between closed captioning and subtitling is the intended
audience for each. Subtitling is a translation between two texts, more often than not an
interlingual translation, or the translation between two separate languages such as
between Spanish and English. The text must fit into certain requirements, such as the
space allowed on the screen so as not to impede the visual aspects (the picture). Because
subtitling is interlingual, it is often intercultural as well, and so must try to accommodate
the differences between the language and culture of the primary audience and the
secondary audience (e.g. slang, culture specific ideas). Usually, the subtitle text is placed
in the middle at the bottom of the screen. Sometimes a dash (-) is used to distinguish
between two speakers. As the special effects and sound are able to be heard by the
audience, no indication is needed in the subtitles (González Rodríguez., 2006).
Closed captioning, however, is intralingual, or the transcription within the same
language, such as between spoken text and written text within English. The primary
audience is usually the deaf and hard of hearing. The text must still be confined to a
relatively small space on the screen, but oftentimes the text moves in order to
accommodate the pictures being displayed on the screen. For instance, at the beginning
54
of a show, because the opening credits are often displayed in the lower third of the
screen, the closed captioning is positioned to display at the top of the screen. Depending
on the type of show and the company that is contracted to do the captioning, the
captioning may be displayed in a few different styles (and fonts).
First, it is important to note that captioning can be added to either pre-recorded
video productions (e.g. for dramas, sitcoms, movies) or it can be done ‘live’, or as the
video is being broadcast (such as for a sports game, news announcement, live late night
programming). If the captioning is added to a pre-recorded video then the text display
can be placed more precisely to represent the spatial differences of the characters on the
screen. For example, if there are two characters shown on the screen, then the spoken
words can be placed more to the left or the right side to visually show the turns in the
conversation for each speaker. These lines are synchronized to “pop-on” and then
disappear with each phrase or sentence (cf. www.ncap.org), and so more than one word
will appear at a time.
Another style of captioning is the “roll-up”, in which one word at a time appears.
The appearance of each consecutive word in a phrase or sentence rolls across the screen,
often using three rows of text starting on the left, bottom side of the screen. As a new
line appears, it pushes the previous line up; once a phrase or sentence reaches the top line,
it will disappear as the next line appears. This type of captioning seems to roll across the
screen from left to right, and from bottom up. Often, this is used for live-captioning and
sometimes for pre-recorded captioning as well. The markers >> are used to indicate a
change in speakers, since all of the text starts on the left, and rolls to the right. With roll-
55
up captioning there is no positioning of the text near the on-screen characters similar to
that of the “pop-on” style of captioning presentation. Table 2.1 below illustrates the
similarities and differences between closed captioning and subtitling.
Table 2.1 Comparison of the two types of print modality in a multimodal event
Closed Captioning
(tele-text) Subtitling
Primary audience Deaf, hard of hearing, NNS
Extended audience NNS Deaf, hard of hearing,
Audio to print Intralingual Interlingual
Encoding Closed, special decoder
chip required to display it Open: always displayed
(DVDs with language
choices excepted)
Modality matching Transcription: as close as
possible to actual audio
track (spoken text,
background sounds, music
soundtrack, etc)
Translation: cultural
translation, sometimes
a summary of spoken
text
Placement Can move around the
screen Almost always on
bottom third of screen
Style of appearance Varies: can be place
centrally using two lines,
or can roll across and up
using three lines of text
Almost always placed
as a united line of text,
sometimes placed
according to utterance
or phrase
Since closed captioning is primarily a transcription of the sounds of the video
production3, with the primary audience composed of the deaf and hard of hearing, there
are symbolic indications of music (), and references to on- or off-screen noises that are
important for the storyline (e.g. (footsteps), (laughing), (crowd talking), etc.), as well as
3 There is no industry-wide standard for how accurate the transcription used in closed captioning must be.
Instead, it varies by type of captioning (live or pre-recorded), the media genre (talk shows tend to miss
chunks of conversation due to inaudibility or the rate of speech) and the company that does the captioning.
In the United States, there are several companies that can be contracted to transcribe and place the
captioning, just as there are many smaller companies that may be contracted to do assist with local cable
shows. Another alternative is to hire trained captioners or court stenographers to transcribe local shows
live. In other words, there is variability in the accuracy and the timing of closed captioning.
56
changes in the text appearance (usually all lowercase or in italics) to indicate changes in
the message such as for whispering. While these sound indications are usually left out of
subtitling, since the primary audience is one that can hear these sounds in the original
soundtrack, in closed captioning they are made salient in the captioning text. The written
text is now indicating more than one modality – that of the speech event between the
characters as well as that of the aural soundtrack which may contain semiotic meaning for
the plot and understanding of the storyline.
2.1.1 The history of closed captioning
Closed captioning is an encoded file within the televised signal4 that is not visible
unless a decoder chip or box is present. The decoder chip is built in to all televisions
(over 13”) produced in or for import to the United States as of 1993, as mandated by the
Decoder Circuitry Act of 1990 by the Federal Communications Commission (FCC).
Before that, a separate decoder box was needed to connect to the television in order to
display closed captioning. Interestingly, many of the decoder devices were purchased by
non-native speakers of English (Markham, 1993). In the early 1970’s closed captioning
started in attempts with formatting between ABC and Gallaudet College (the trial was on
the Mod Squad)(cf. www.ncicap.org); PBS stations such as WBGH in Boston pioneered
the technology (cf. www.main.wgbh.com). In 1984 Dallas was first televised with
closed captioning and in 1989 music videos were captioned. By the mid-1990’s real-time
4 A televised signal refers to the broadcast signal for television. Since cable and satellite are also
possibilities to receive video signals, these shall fall into the category of televised signal. However,
receiving video via the internet does not (yet) include the possibility for an encoded signal such as closed
captioning or subtitles, nor does the monitor have a decoder chip similar to that of a television set.
Therefore, the video streamed and viewed on the internet is not included as a media option in the current
study if reading and literacy is to be involved in the multimodal event.
57
captioning was available for events such as the Presidential Inauguration, the Olympics
and other sports events (www.robson.org/capfaq/ index.html, October, 09, 1999).
Currently, the FCC has set regulations and transition rules for the amount of
programming that must be closed captioned with exceptions (for local cable broadcasting,
local newscasts, among others), recognizing the benefits of closed captioning for the deaf
and hard of hearing, for non-native speakers of English, and for literacy skills
(www.fcc.gov).
2.1.2 Closed Captioning: What is it?
For broadcast and cable television, the closed captioning text is encoded and sent
within the broadcast signal in line 21, a line of the signal that is not used for displaying
the picture. (Recent additions to line 21 include the signal for the V-chip, or parental
control over the viewing options). It can be enabled, or turned on, by the viewer, or
turned off; again, the option to not view it defines it as closed rather than open text that
cannot be turned off because it is ‘burned on’ to the video. The placement of the written
text has a few different ways of application, depending on the software being used and
the company that is performing the captioning service, but, regardless, a timecode is used
in order to precisely place the written text in the desired spot at the desired time (although
this is more precise for the “pop-on” style, and therefore the text takes more time to be
placed than the “roll-up” style).
Whereas subtitling uses a graphics layout, and orthographies other than Roman
lettering can be easily displayed, closed captioning is restricted to using Roman lettering
58
because of the original technology of the decoder chip. Due to this, the text displayed is
often in all capital letters, avoiding the use of lowercase lettering, which has descenders
(e.g. y, p, g, j) that can be misaligned when displayed by older television sets with older
decoder chips (cf. www.robson.org/capfaq ).
However, the FCC has recently implemented new regulations that include
Spanish channels and closed captioning (cf.www.fcc.gov/cgb/consumerfacts/closed
captioning.html). Since more recent decoder chips can use lowercase lettering and some
diacritic marks (e.g. ó, ñ, é) within Line 21, Spanish Language Programming can be
displayed. Interestingly, there are crosses between the language use of CC1 for closed
captioning. Currently, ABC and NBC, for a limited selection of programs, display
English closed captioning on CC1 and concurrently display Spanish closed captioning on
CC2, while Univision, with Spanish language programming, will use CC1 for Spanish
closed captioning. As Spanish language programming increases, and English language
programming continues to add captioning in Spanish to appeal to a wider audience, a
system of designation for CC1 and CC2 will be settled on, and as HDTV changes the
availability and broadcasting of video, closed captioning regulations will adapt as well.
Since closed captioning is a regulation mandated by the FCC (Federal Communications
Commission), it is a permanent feature of current programming in the United States and
is becoming a source of interlanguage and intralanguage multimodal texts.
Europe uses a slightly different, and often more easily manageable, form of
presenting same language and different language text, called teletext, or subtitling. The
European format is slightly different than the broadcast rate in the United States (PAL
59
versus NTSC) which allows for more maneuverability in the presentation and
accessibility of the written text. As seen in the next section, research about language
learning has been conducted on both sides of the Atlantic – both with closed captioning
and with teletext subtitles.
2.1.3 Additional audiences for closed captioning and subtitling
With the advent of the VCR and the availability to buy or rent movies to show in
language classrooms, English closed captioning became useful for a secondary audience
other than the deaf or hard of hearing: the English as a Second Language learner.
Language instructors had the option of using the available closed captioning in the
classroom as a tool to help the language learner in comprehension, which only expanded
in options once DVDs were available to use in the classroom. In the 1980s and 1990s
there were a few researchers in the United States and in Europe who pursued research or
advocated use of closed captioning and subtitling for language acquisition. Due to the
multimodal nature of the object of study – that of video in the presentation forms of
television or film – the research can be divided into the research purposes of listening
skills, comprehension, reading, vocabulary development, L1 acquisition, L2 acquisition,
classroom use, use at home, with many of these variables overlapping. The next section
is an attempt to untangle and present some of the research literature dealing with
subtitling and closed captioning/teletext (the European term).
60
2.2 Closed Captioning & subtitles used in language research
The use of an additional modality, that of print, has been explored for use in
various aspects of language learning. Besides as a tool for improving comprehension, it
has also been researched as a tool to increase vocabulary knowledge and retention,
listening skills, reading skills and literacy strategies. Each of these will be addressed in
the following sections. While the majority of the studies cited are ‘same language’
subtitles or closed captioning, i.e. those in which the print modality is a transcription of
the audio track, there are relevant studies that are important to mention in which the
language of the subtitles is a different language than that of the aural modality.
2.2.1 Closed Captioning as a learning tool in the literature
While there is no doubt that closed captioning is useful to its primary audience, it
is the secondary audience, that of the language learner, that this review concentrates on
because of its relevance to this dissertation research. First, past research that specifically
focuses on closed captioning is presented (I include teletext as well when I refer to closed
captioning). Enough research has been done to show that television and video can be
useful in a classroom with accompanying instruction from the teacher (Weissenrieder,
1993; Cooper, 1996) and that even without specific instruction, but with the addition of
closed-captioning, there are incidental effects of increased vocabulary uptake (or the
learning of new words) (Koskinen et al, 1995; Neuman & Koskinen, 1992; Csapo-Sweet,
1997) and increased reading skills (Goldman, 1996; Goldman, M. & Goldman, S., 1988).
Some research has shown that students at higher levels of English benefit more from
closed captioning and television in general than students who are at beginner levels of
61
English (Neuman & Koskinen, 1992). This is presumably due to the ability to process
the grammar and vocabulary that television programs use at a quicker pace than that used
in beginner and intermediate classrooms. Markham’s (1993) study showed that if there
was a high correlation between audio and visual images, then lower level students were
able to learn vocabulary and gain a better understanding of the program, whereas more
advanced students were able to rely upon the subtitles instead of the audio-video relation
to understand the program. While no surprise that better students do better than less
capable students, Markham shows that the use of the closed captioning differed according
to levels of language.
In other studies that use print with audio/visual mediums and NNS, Vanderplank
(1993) stated that television is overlooked as a medium for teaching ESL. In his 1988
study he showed that subtitles (teletext) greatly aided the viewing of most programs. In
his 1990 study, he promoted the idea that a viewer must pay conscious attention while
viewing in order to gain from viewing television with subtitles. Vanderplank set up his
study with a core group of 15 students who watched programs over a period of three
months, during which time they were asked to perform a variety of tasks such as carrying
out oral or written tasks or noting words and striking phrases. From the study
Vanderplank concluded the following: (1) Subjects were able to ask about words and
expressions they had never seen before; (2) subjects were able to appreciate dialectal and
accentual features present in the programs; (3) subjects were able to follow and
understand complex information and ideas, and verbal humor; and (4) subjects were able
to compare their own lexical and grammatical knowledge with that presented in the
62
programs and update if necessary, amongst other benefits (1990:224-5). However, he
surmised that if the subjects did not participate actively, as in note-taking, but rather
enjoyed the programs as entertainment, or passively watching, they were able to follow
the program, but retained little or no specific terminology (1990:226). Therefore, as
Vanderplank suggests, paying attention rather than passive watching by the viewer is the
necessary condition for acquiring language while viewing television. Schmidt (1990)
echoes the need for noticing, or a focal awareness, in learning second languages.
The next few sections elaborate on the use of closed captioning and subtitles by
language learners in different conditions and with different languages. Since there are so
many variables and purposes within the different research studies, I have allocated them
into the following sections: different L1/L2 presentations in the modalities, same
language presentations and finally a section that includes motivation and self-learning use
of multimedia texts.
2.2.2 Past research using closed captioning and subtitles: Different L1/L2
One of the earlier experiments using subtitles and bilingual participants was
conducted by de Bot, Jagt, Janssen, Kessels and Schils (1986). In it, the researchers
explore the uses by the participants of the written text (L1 subtitles) or of the spoken text
(L2 speech) in short news stories in which the written text contained altered differences
in information from the spoken text. The results showed that participants do make use of
the spoken language (in the L2) in comprehension. The authors conclude that “watching
foreign, subtitled TV programmes might be a factor in learning, relearning or maintaining
a foreign language” (p.80).
63
Several studies have been conducted by d’Ydewalle and his colleagues with the
use of different languages in multimedia texts and language learners. d’Ydewalle, Van
Rensbergen and Pollet (1987) used an eye tracker to begin exploring the preferences for
subtitle presentation (two lines were preferred), the rate of presentation (six seconds per
line was preferred) and the amount of reading by subjects in a study in with subtitles in
Dutch (L1) and the soundtrack (when available) in German (L1). They claim that even
when the audio channel5 is in the L1, the subjects still read the subtitles. This was
echoed in a 1992 review article in which d’Ydewalle and Gielen make the claims that
switching channels of information, or input, is effortless or automatic and that viewers
remember detailed information presented in the subtitles during multiple choice recall
tests. This was referred to again in a 1999 study by Koolstra, Van der Voort and
d’Ydewalle: viewers can’t help but look at the subtitles. In this particular study, the
participants were children watching television shows, including a sitcom and an action
cartoon: the children spent more time reading the L2 subtitles of the sitcom as a possible
result of less correspondence between the audio and visual representations (p. 420).
In yet another 1999 study, d’Ydewalle and van de Poel explored the incidental L2
acquisition by 327 children while watching television programs with subtitling and found
that children do better in vocabulary and language acquisition when the L2 is in the
soundtrack (versus in the subtitling) and that adults do better when the L2 is available
visually in print in subtitling.. While this makes intuitive sense, since children are usually
less proficient at reading quickly than adults, the alternation between Dutch and French in
5 The literature authored by d’Ydewalle uses the term ‘channel’ to indicate a modality that provides input
for the viewer. It is used in a similar fashion to the term ‘mode’.
64
the soundtrack and the subtitles, with a control of Dutch/Dutch, has interesting
implications for language learners and language acquisition. It ties in with another study
with adults and teens, in which the claim was made that there is vocabulary acquisition
when two languages are involved, regardless of the order of L1 or L2 in the audio and
visual channels, and that there is better comprehension despite the reported annoyance
and cognitive load implications (d’Ydewalle and Pavakanun, 1995). Indeed, in a 2004
study by Stewart and Pertusa, classroom learners of Spanish as a foreign language
watched videos with the subtitles enabled and reported that the printed text helped
distinguish the audio segments and that they “paid more attention to the film’s audio
because there was a connection with the subtitles” (p. 440).
All in all, the majority of the studies that have been done with subtitles and audio
channels in different languages have shown positive results: in vocabulary learning as
well as retention of the content and comprehension of the message, although there has
been research that shows a cognitive overload when visual and audio modalities are
duplicated. While the difference in languages in the presentation modalities is relevant to
my study in that second language learning is involved, the next section deals directly with
same language modality presentations: either L1:L1 or L2:L2, and the effects on viewers
and language learners, such as those conditions used by the participants in my study as
discussed in Chapters Five, Six and Seven.
2.2.3 Same language subtitling and closed captioning
The audio and print modalities can also be presented in the same language, such
as with closed captioning (transcription) and when the same language is selected for
65
subtitles on a DVD (usually not a direct transcription, rather a condensed version and
closer to a translation or summary of the audio track on the DVD). This option, or the
selection of the same language subtitles, could be used by deaf and hard of hearing
audiences, as well as L2 learners. For instance, a student who is learning French could
watch a film in French and select the French subtitles. Many ESL instructors have used
this method for years in their classrooms to help with listening skills, vocabulary, reading
practice and as an aid in gaining better comprehension (such as explained in section 2.2.1
above).
One of the most comprehensive research endeavors in the examination of the
viewing patterns of people reading and using closed captioning while watching video
texts (television productions or film) was published in a series of reports, (Jensema et al,
2000a; Jensema, 1998; Jensema et al, 2000b), in the American Annals of the Deaf. All of
the research was conducted at the Institute for Disabilities Research and Training, which
possibly explains the focus on the originally intended closed captioning audience of the
deaf and hard of hearing with control groups of hearing subjects. Jensema and his
colleagues looked at several questions, including the eye movement patterns of viewers,
the individual viewing strategies, prior knowledge, and the rate of captioning. This
federally funded research (U.S. Department of Education Grant H026R70003) does have
a few similarities with my research with its focus on a secondary audience of second
language learners. But as shall be seen, it diverges as well.
While Jensema and his colleagues used very short 30 second clips of authentic
text to view the patterns of eye movement, my research uses authentic clips of just over
66
five minutes. This gives context to the story, and allows for each viewer to “settle in” on
the strategy of gaining better comprehension that best suits them. In other words,
Jensema’s findings that “the viewing process becomes much more of a reading process”
(p. 284) may possibly have been the result of such short durations of video text. The
participants were scrambling to get as much information as possible, and without the
longer length of the text, resorted to reading (this is additionally a focused pattern since
there was no audio information that could be used for comprehension for the hearing
participants). The participants in my study, whether NS or NNS, had equal opportunity
to gain better comprehension of the video text in accordance to their individual
preferences, either through the visual, or graphic (written) information with the addition
of the original audio text. In a multimodal presentation, all of these modalities are
channels of separate yet overlapping sources of information.
Additional research in same language multimodal presentation has also been
conducted for the purpose of language learning and improving literacy rates for an
alternate audience to that of Jensema et al’s audience. In a paper that explores the use of
same language subtitling in an effort to increase literacy rates, Kothari, Takeda, Joshi and
Pandey (2002) expound the use of subtitling in music videos as a way to make “reading
practice an incidental, automatic and sub-conscious process” (p. 64). In a study
conducted over three months in India, where the literacy rate is only at 65%, the
researchers found that the use of subtitles in music videos showed improvements in
syllable and word reading versus the control group that did not view with subtitles.
67
Research with same language subtitling has also been conducted by Linebarger
(2001) with children and literacy. She found that the addition of the print in the captions
helps children overcome some of the obstacles presented when learning to read, such as
with the difficulty in understanding the alphabetic principle and the transfer of
comprehension skills between spoken to written language. Linebarger’s research with
literacy may have implications for second language learners as well: problems and
familiarity with orthography are common for learners who must learn and practice a new
orthography with their new language.
Linebarger’s line of research reaffirms earlier work by Garza (1991) in which he
first explored the use of different orthographies and captioned video materials. Using
authentic (made for the native speaking audience, and not for learners of the language as
a foreign language, p. 241) video of different genres with Russian and English material,
he proposed that the captioning was a way of bridging the gap between the learner’s
competence in reading and listening, “by providing students with a familiar… graphic
representation of an utterance, they are empowered to begin to assign meaning to
previously unintelligible aural entities” (p. 246). Counterarguments of these ideas will be
discussed in the next section, 2.3, as theories involving multimodality are explored.
2.2.4 Motivation in self-learning from video.
An interesting case study about a 10 year old Finnish girl provides additional
research about the importance of motivation of language learners and their language
improvement (Jylhä-Laide & Karreinen, 1993). Over two years, researchers met with
68
Laura and observed her as she recorded cartoons and played and replayed them in
English. The point that the investigators make is that video, as opposed to broadcast
television, can be stopped, rewound and forwarded, which gives the observer some
control over the delivery of the message. While it is not a truly interactive learning
environment with mitigating feedback, nonetheless, “the use of the video medium
enables the learner to control the learning context and thus to create a situation which is,
at least to some degree, interactive” (p.131). Video material is an optional language
learning tool from which hypotheses can be made about the language use and then tried
out in a different environment. Laura was a highly motivated young language learner who
took advantage of available language materials outside of her schooling and improved her
English language abilities.
While the topic of the previous study revolved around whether or not video can be
‘interactive’ in some sense, it also segues into the idea that the within the multimodal
event, in which the ‘producer’ is a television screen and the ‘recipient’ is the audience
(see Chapter Two), that the viewer (same as recipient in previous clause) is not a passive
receptacle receiving information as though it were an infusion. Rather, the viewer has
choices about what to do with the message. Hall, in his important essay
“Encoding/Decoding”, (1980), debunks the linearity model of sender/message/receiver,
and instead gives the receiver the power to decode the media message. The receivers (the
audience) can then choose to accept the message or reject that message as not part of their
ideological beliefs.
69
The audience not only has a choice whether or not to accept the (ideological)
content of the message, but as Plass and Leutner (1998) state, “learners actively select
relevant verbal and visual information, organize the information into coherent mental
representations, and integrate these newly constructed visual and verbal representations
with one another (p.25). This topic, of self-selection, will be explored further in the next
section on multimodality, as well as in my study in Chapters Five and Six.
2.3 Multimodality
Kress and Van Leeuwen (2001) define multimodality as “common semiotic
principles [that] operate in and across different modes” (p.2). They work within a
multimodal framework that integrates semiotic meaning in all modes, for example
expressing meaning either visually or verbally. The semiotic exploration of
multimodality is not the precise definition of multimodality that I will be working with in
the current study, mostly because a detailed semiotic analysis of the text isn’t necessary
at this point in exploring the data (however, it is a framework which could be used to
explore the data set).
The term ‘multimodality’ as used by Norris (2004) also includes a semiotic
framework when analyzing the communicative modes, or the units of analysis, such as
those that occur in interactions. Norris includes modes such as gesture, intonation, and
posture as a multimodal interaction since she deals mainly with conversations. Again,
while this framework would be very interesting to use to explore the data collected and
the interpretations by the participants in this study, I do not use the term ‘multimodal’ nor
‘mode’ in this manner.
70
Mayer (2001) defines the term ‘multimedia learning’ as something similar to
multimodality when he explains that “multimedia learning occurs when students use
information presented in two or more formats - such as a visually presented animation
and verbally presented narration – to construct knowledge” (1994, p. 389-390). His
definition is closer the use of the term ‘multimodality’ that I am using in this study,
however, it must also include the semiotic meaning making that Kress & van Leeuwen
and Norris discuss, otherwise there would be no interpretation or use of it.
Therefore, multimodality is the use of more than one mode of information to
express a communicative event, in which a mode refers to “any organized, regular means
of representation and communication, such as, still image, gesture, posture, speech,
music, writing…” (Jewitt, 2004). Ultimately, these prominent authors who study
multimedia from different perspectives, from semiotics to discourse to psychology and
classroom applications, all originate from a similar viewpoint: multimodality is used
everywhere, interactions within it should be studied further, and that newer generations of
students have greater proficiency in it and yet there needs to be a call for informing
critical thinking regarding it.
2.3.1 The call for multimodality in pedagogy
Regarding multimodality and the student or learner, several studies have shown
that technology that uses multimodality needs to cross from everyday life to the
classroom. Love (2003) discusses the promotion of the use of multimodal learning
interfaces in secondary ESL classes in Australian schools with newer generations of
teachers and students in order to engage “with texts which combine visual and verbal
71
elements… and puts English teachers under pressure to learn new disciplinary
knowledges and metalanguages” (p. 22). Kern (2006) also discusses how the rapidly
changing technology changes the way people learn, use and teach languages, and that
pedagogy needs to adapt to succeed. In a similar stance, Kenner and Kress (2003)
emphasize that bilingual children can use their multisemiotic resources as an aid in
language learning (L1 & L2) and their knowledge and ability to operate in more than one
semiotic literacy are “considerable advantage[s] in a global context in which
communication increasingly occurs through a variety of modes and a variety of
languages” (p. 200).
Luke (2003) uses a social constructivist viewpoint and asserts that “multimodal
readings and experiences of the world begin in infancy and constitute the social practices
in everyday life”, and that the classroom is one of the few places where students are
discouraged from blending and matching knowledge from “diverse textual sources and
communications media” (p. 398). Instead, she offers the point that although reading and
writing have changed very little, the “process has shifted from the serial cognitive
processing of linear print text to parallel processing multimodal text-image information
sources” (p. 399). Other studies also include using multimodal arenas for language
learning, such as Nelson (2004) who illustrates how a synaesthesia of multimedia in L2
digi-storytelling offers “the L2 author the freedom to communicate and negotiate
meanings by means of media that are not the L2” in which “they often do not have a high
level of proficiency” (p. 71). Digi-storytelling is the use of video, music, transitions and
72
titles by students using computer software to create and tell a story, usually a personal
one, to an audience.
Vincent (2006) specifically calls for the increased use of teaching and
assessments that include multimodal composition and literacy creativity, since “Firstly it
is the way in which students see the world, and secondly, it releases certain children from
the trials of monomodal, verbal expression where they are unlikely to succeed” (p.56).
This idea, that the world lived in should be reflected in the definitions and uses of
literacy, is echoed again in research by Hull and Nelson (2005) in which they state that a
culture’s mediational means “are intimately connected with our capacities to think,
represent, and communicate” and so “it would seem hugely important to widen our
definition of writing to include multimodal composing as a newly available means” (pp.
251-252). In light of these calls for adapting definitions, uses and the teaching of literacy
skills to encompass multimodality6, a few of the theories that integrate literacy and
learning in multimodal presentations of information are presented in the next section.
2.3.2 Multimodal theories
While the call has been placed by researchers for the increased use of
multimodality in the learning and teaching curriculum, and section 2.2 gave examples of
many of the relevant published application studies, the theories regarding the
effectiveness of multimodality don’t always agree with each other. Paivio (1986, 2007)
6 The New London Group (1996) has discussed these very issues and promotes a literacy pedagogy that
includes recognition that globalized society is increasingly diverse, the cultural interrelations that produce
and circulate texts and the increasing variety of multimedia technologies used to produce information
within these plural texts (Hull & Nelson, 2005).
73
has offered his Dual Coding Theory (DCT), which was expanded upon by Mayer and his
colleagues. In essence, DCT represents cognition as a system of modalities, in which
representations fall into two systems: the verbal system, which uses logogens as
representational units, and the nonverbal system, which uses imagens. A logogen, or a
lexical representation, can be representations that are auditory, visual, motor and haptic in
their modalities (p.38), and can include formulaic sequences, phrases, idioms, or anything
that is remembered as a chunk (p. 39). Imagens are representational units that activate
imagery in the consciousness. To sum, Paivio uses “imagens and logogens [as] modality-
specific internal structures that map onto the sensorimotor attributes of objects and
words” (p.40). Within this dual-coding system, multimodal learning can be explained,
especially regarding second language learners. For example, a cross-modality activation
can occur to aid learners in comprehension. As an example of a referential activation,
Paivio, (2007), gives an illustration of writing or saying the word ship, (the logogen or
verbal linguistic modality), which activates imagens of a ‘ship’, and depending on recent
experiences and context, a ship, schooner, frigate, etc would be activated (pp. 44-45). In
a bilingual DCT system, there are separate logogen systems for two languages (L1 and
L2) based on translation equivalents.
While this could be a debatable point, and Paivio’s description of a bilingual
coding system does not seem thorough, the main point is that a theory in which images
can activate linguistic representational units, and vice versa, is useful for both L1 and L2
language learners using multimodal presentations as tools for learning and gaining
comprehension of the material. The cross activation between representational systems is
74
used in Chapter Five. The DCT is also useful since it recognizes an interactive theory of
reading, in which processing at one level can affect processing at another level (top-down
and bottom-up), so that “how we respond to print is influenced by prior knowledge of
language, prior knowledge of the content of the text, situational contexts associated with
the reading, and the context arising form previously interpreted parts of the text as well as
the print itself” (Sadoski & Paivio, 2001, p. 117). This is in congruence with the reading
theories presented in section 2.4.
Mayer and his colleagues have worked extensively with an application of the
Dual Coding Theory into what they refer to as the Cognitive Theory of Multimedia,
which uses a definition of multimedia that is the presentation of material using both
words and pictures, both in presentation (e.g. printed text and illustrations) and in sensory
modalities (e.g. auditory and visual) (2001). Mayer’s Cognitive Theory of Multimedia is
useful in its use of terms in order to discuss the multiple channels of information that are
presented and possibly used by the participants in my study: much of Mayer’s research is
centered on the student classroom learner, and so can be pedagogically useful. However,
as stated later in Chapter Five, Mayer concentrates on using research involving the
cognitive processes involved in the interpretation and use of multimedia texts used in
science and in linear representations such as diagrams, and not the genre used in this
study: news stories. There is a fundamental difference between multimodal
representations in understanding how pumps work or how lightening strikes and a
multifaceted expository narrative that includes multiple speakers, cultural artifacts, and a
story constructed around multiple connected ideas.
75
A few of Mayer’s principles are useful, however, regardless of the multimedia
conditions present, not withstanding his learner-centered approach to using multimedia in
the classroom. Mayer explores the adaptation of multimedia as an aid to the learner and
in doing so, he has explored different combinations of modalities and the effects on the
learner. Keeping in mind the process oriented texts, Mayer has developed the following
principles (perceived as applicable to my study) (2001):
Multimedia principle: Students learn better from words and pictures than words
alone (in which students will be able to build verbal and pictorial mental models
and connections, rather than just a verbal mental model when only words are
present). (63)
Spatial contiguity principle: Students learn better when corresponding words and
pictures are presented near rather than far from each other on the page or screen
(less cognitive resources are used if searching for relevancy is not mandatory, and
when near to each other learners are more likely to hold them both in working
memory). (81)
Temporal contiguity principle: Students learn better when corresponding words
and pictures are presented simultaneously rather than successively (when they
occur at the same time, the learner is better able to hold mental representations of
both in working memory and build connections). (96)
Modality principle: Students learn better from animation and narration than from
animation and on-screen text (when pictures and words are both presented
76
visually, the visual/pictorial channel can become overloaded but the audio/verbal
channel is unused). (134)
Individual differences principle: Design effects are stronger for low-knowledge
learners than for high-knowledge learners, and for high-spatial learners rather
than for low-spatial learners (high-knowledge learners can use prior knowledge to
assist in forming mental images from words, and high-spatial learners possess the
cognitive capacity to mentally integrate visual and verbal representations). (161)
Coherence principle: Students learn better when extraneous material is excluded
rather than included, i.e. interesting but irrelevant words and sounds are added
(the extraneous material competes for cognitive resources in working memory and
can divert attention from the important material, can disrupt the process of
organizing the material, and can prime the learner to organize the material around
an inappropriate theme) (113),
Mayer’s principles work for Mayer’s research: dual modality settings used with
process-oriented content (cause and effect explanations of scientific systems), therefore,
for the current study not all of Mayer’s principles are relevant. The current study uses
three modalities which cross the two systems of representations of verbal and nonverbal:
the closed captioning and the audio activate logogens while the visual pictures within the
news stories activate the nonverbal imagens. Therefore, only a few of Mayer’s principles
are applicable, as they are stated at this point in Mayer’s research (2001), to the current
study. The Modality Principle is predicated on the combination of two modalities, and so
isn’t applicable as currently constructed to the current study. However, the Multimedia
77
Principle is quite relevant, since the current study focuses on the use of more than one
modality to help foreign and second language students with comprehension. The
Contiguity Effect is relevant, since there are many instances of the closed captioning
presentation occurring asynchronously behind the narration and spoken text in the visual
modality. However, as shall be seen in the data, the slight asynchronous presentation can
also be used as a tool when attention is disrupted. The Coherence Principle is useful for
texts constructed as educational material, in which too many ‘bells and whistles’, as
Mayer calls them, can be distracting and in which a summary, with little extra
information, is more useful to students than longer passages. Again, material created for
educational use, however, greatly differs from the material used in the current study, or
the video material that was created for purposes other than solely education.
Lastly, Mayer briefly discusses the different uses of multimedia by different types of
learners, finding that multimedia works best for low-knowledge and high-spatial learners
(p. 182). This means that students who bring prior knowledge, and students who have
difficulty generating and integrating visual images (spatial ability) (p. 172) do not show
as much benefit as those who have lower prior knowledge and high spatial ability (those
who benefit the most). In regards to the current study, the two participant groups used
fall roughly into these two categories in which the lower proficiency English readers (the
NNS) have the most to benefit from the availability of multimodal representations and
the higher proficiency English readers (the NS) will not show as much benefit from the
additional modality of closed captioning.
78
2.3.3 Eye tracker and movement: Relevant terms and research
At this point, a few terms used in eye tracking research should be useful. Eye
movements are composed of fixations, in which the eyes remain relatively still, and
saccades, in which the eyes are moving to the next fixation. The duration of a fixation is
relevant to the purpose: silent reading averages about 225 ms, while fixations during oral
reading increase slightly to 275 (all mean fixation times are as listed by Rayner, 1998,
which gives an overview taken from a number of sources). Scene perception averages
fixation durations of nearly 330 ms, while visual search fixations are similar in duration
to those of oral reading. Eye fixation in reading in English has averaged to be 200-250
ms and the mean saccade is about 7-9 letter spaces (p. 375).
Also relevant to eye movement research and reading research are the terms for the
eye itself: the fovea is the center of the eye (about 2° of vision) which is used in visual
recognition, while the parafoveal region extends the visual field about 5° on either side of
fixation but has reduced visual acuity (p. 374). The eyes move so that the fovea is placed
on the desired part of the stimulus (fixated), and no new information is obtained during a
saccade. Another useful term includes regressions, in which the eye moves back rather
than forward while reading and can be due to overshooting the desired point of fixation
or could be due to problems fixating the current word, not understanding the text or the
reader deciding to gather more information. The number of words skipped during reading
has been found to be dependent upon features such as the type of word: content words are
fixated roughly 85% of the time while function words are fixated less at only 35% of the
time (Just & Carpenter, 1983; Rayner, 1998).
79
Other features include the length of the word: Drieghe, Brysbaert, Desmet and De
Baecke (2004) found that word skipping is primarily a function of the length of the
upcoming word, so that short words are skipped more often than long words, and
constraint factors play a role as well (cf. Rayner & Well, 1996). The characteristics of
the parafoveal word (often referred to as n+1) has been found to affect fixation time and
words skipped. However, the results in the literature vary. While studies have found that
the frequency of n+1 does not affect the fixation duration of n, (Rayner, 1998) others,
such as Starr and Inhoff (2004) have found that orthographic information is obtained
from n+1 and that n-1 and n+1 are functionally independent in their effects on saccadic
programming (how far and to where the saccade will go). In other words, the perceptual
span, or the parafoveal region around the fixation point, is of variable use in different
studies that utilize different techniques, materials and experimental designs. Other
interesting and relevant findings have been that readers often do not fixate on the first and
last 5-7 letter spaces from the ends of a line of reading text, that the first fixation on a line
is generally longer than other fixations and that the last is shorter (Rayner, 1998, p. 375).
Lastly, there has been much debate about the relationship between cognitive processing
of reading and eye movement; this will be briefly discussed in the next section, along
with a few of the uses of eye tracking and reading.
This eye movement research, involving words skipped, saccade jumps, features of
words, is relevant to the current study in that there is a large body of research that has
been conducted using single line and whole texts, but not with moving text. This study
80
then is the first step in investigating the eye movements of readers using a presentation of
moving text in a multimodal event to make sense of that multimedia text.
2.3.4 Tracking attention in multimodal environments
One way to explore the use of multimodal texts by learners is to use an eye
tracker, which has the possibility to transcend time and materiality: the data can be
recorded and analyzed later. Eye trackers can be, and have been, used in a multitude of
environments for a variety of purposes. Duchowski (2002) gives a broad review of the
range of applications of eye tracking, besides reading and psychology. These include
diagnostic and interactive applications. Media research has involved eye-tracking
regarding scene paths by viewers and users of webpages, as well as to observe consumer
choices in print ads and television, while industrial engineering has explored eye
movements of pilots and drivers, and computer scientists have integrated eye trackers
into gaze-based communication systems (i.e. eye-typing or eye-pointing). Rayner (1998)
gives an extensive review the use of eye trackers in reading research, including the
mechanics of eye movement (discussed in section 2.3.4).
Eye tracking and reading have had quite a long relationship dating back to
Edmund Huey’s research published in 1908 and Emile Javal’s initial observance that eyes
do not move smoothly but jump, or saccade, across the page (Paulson & Goodman,
1999). Huey was the first to conduct an experiment that noted that eye movements do not
land on every word, frequently do not fixate on the first word on a line, and do not
frequently fixate on the last word either. Buswell and Judd contributed to the reading
research literature, and in 1937 added that readers “need to be made aware of the fact that
81
reading habits should be flexible and properly adapted to the purpose and the type of
material which is read” (rpt in Paulson and Goodman, p.2). Their assertion is quite
relevant today, 80 years later, as the participants in my study are reading text that is
moving across the page: indeed, they must be flexible and adaptable in their reading
habits.
To return to eye movements and research in multimodality, analysis of eye
movements have been used to demonstrate the relationships between audio and visual
linguistic processing. One relevant result has found that context plays a role in
processing the information (c.f. Tannenhaus, Spivey-Knowlton, Eberhard & Sedivy
(1995) as an example). When participants hear directions, such as “Pick up the candy”,
and both candy and a candle are available objects, the eye movement latency is longer
than when only candy is available to choose. However, if the context is ambiguous, such
as in the sentence “Put the apple on the towel in the box”, and an apple on a towel is a
choice (in which “on the towel” is a modifier) as well as a towel (which could be seen as
a destination) more than half of the participants look at the towel as well as the apple on
the towel indicating that “people seek to establish reference with respect to their
behavioral goals during the earliest moments of linguistic properties” (p. 2) but are
affected by the context. In relevance to the current study with multimodal presentation of
information, in which both audio and visual linguistic processing occur to some extent
(which varies with language proficiency), eye movements using different channels of
information input will be in evidence.
82
Besides the connection between context and spoken language comprehension, the
context of reading material also plays an important role in eye movements. Rayner and
Well (1996) looked at the contextual constraint, or the predictability of words, as a factor
in eye movements in reading within a model of reading in which there is moment-to-
moment processing regarding when to move the eyes. They found that high-constraint
words are skipped more often, and low-constraint words were fixated longer. Research
Questions One and Two of the current study will explore and analyze fixations durations
for the two participant groups; context is integral to looking at the overall patterns of
multimodal use and so the contextual constraint of words could be a factor in the fixation
duration differences between the two participant groups. This makes sense when past
research is looked at regarding bilingual readers and less proficient readers: less ability
to predict the text indicates an increase in the fixation time, while a greater ability to
predict, possibly because of greater contextual constraint, background knowledge, and
the fixation durations reduce the length of time.
Reading research is also being conducted that observes contextual constraints
used as reading strategies. Flurkey and Goodman (2004) discuss contextual constraint
within literary genres, particularly within children’s books, and identify it as an aid for
reading and transacting with a text. As the reader progresses through the text, and as the
predictability and familiarity with the vocabulary and the content increase, the reader’s
reading pattern changes (in the case of Flurkey and Goodman, the types of miscues
changed, which tracks the transaction via oral reading).
83
An eyetracker is only useful in reading and multimodal environments, however, if
the underlying supposition includes that cognition is reflected to some extent through the
correspondence of eye movements and perception. Murray (2000) promotes the use of
eye tracking as an indication of cognitive processes, but warns about using eye movement
as an exact indicator. He uses the metaphor of a rubber band to explain the relationship
between eye movement and cognitive processes, saying that one of the problems is the
elasticity of the coupling between the eye and the mind, or the language processor: when
is the rubber band tight or saggy? In other words, when attention and perception are
involved, the exact timing of when an eye movement shows cognitive processing cannot,
or should not, be deterministically measured. This makes sense when transactional
theories are used as well: the reader (viewer) brings his or her background knowledge
with him and perceives or notices or reads in a dynamic situation.
For example, Hegarty (1992) looks at the use of diagrams and written text by
participants using an eyetracker but she acknowledges that the comprehension process
has three inputs (the text, the diagram, and the reader’s prior knowledge) (p.428).
Duckett (2000) looks at the use of pictures and text by children reading texts constructed
for children, and finds that the illustrations are often used to make meaning and that even
children skip words, especially when illustrations carry meaning. In the above studies,
the reader is acknowledged as bringing a personal background to the literacy event and
this in turn affects his or her perception of the text (further discussion is included in
section 2.4 below)
84
Several studies have been conducted in which the multimodal use of text book
illustrations, diagrams, and written text have been explored. Mayer and his colleagues
frequently use this genre of learning a scientific process in their research regarding
multimodal presentations of information (Mayer & Sims, 1994; Mayer, 1997; Mayer &
Moreno, 1998; Mayer, Moreno, Boire & Vagge, 1999; Moreno & Mayer, 2000; Mayer,
Heiser & Lonn, 2001; Mayer, 2005) as well as research based on Mayer’s theories such
as Kalyuga, Chandler & Sweller (1999). Slykhuis, Wiebe and Annetta (2005) use an eye
tracker to observe students’ attention to different categories of photographs in a
multimodal PowerPoint presentation. They found that students are able to quickly view
and dismiss irrelevant pictures, to attend to the photographs that are relevant and provide
additional information, and that the eye tracker does indeed provide insight regarding
attention allocation over multirepresentational instructional materials (p.519). The next
section gives a brief explanation of the uses, terminology and history of eye tracking and
reading.
2.4 Relevant reading theories and research for a multimodal literacy event
In light of the prominence that I give to the recipient in a multimodal event as
explained in Chapter One, I refer to selected reading research that places value on the
integration of the reader, the reader’s background knowledge in interpretation and
prediction of the message in the reader’s search for meaning7. Since the multimodal
event used in the current study uses reading printed text as a possible modality choice,
7 In this section, as well as in the entire dissertation, the term ‘reader’ is used to refer to the person as the
recipient of the multimodal event. In a multimodal event, the person is the recipient, in a literacy event in
which the message is presented via printed text (with or without illustrations) the person will be more
specifically referred to as a reader.
85
sentence comprehension is also a variable to be looked at. Therefore, a combination of
two viewpoints regarding reading will be applied to a multimodal literacy event: a macro
view of reading and text comprehension and a micro view of sentence comprehension.
The macro view of reading in this study uses Goodman’s Transactional
Sociopsycholinguistic Model of reading (1994) in which there are dual texts: the reader
constructs his/her own text that is parallel and closely related to the published text. The
reader is highlighted as essential to meaning making in this model, and the reader’s
individual text “involves inferences, references and coreferences based on schemata that
the reader brings to the transaction. And it is this reader’s text that the reader
comprehends and on which any later retelling is based” (p. 1114). From the perspective
of comprehension by a second language learner, in which concepts and prior knowledge
could be based on multicultural experiences, the individual interpretation is vital in a
reading model. Indeed, in the interactive, interpretive process of reading, L2 readers
often bring unique features separate from their L1 counterparts, including differences in
vocabulary knowledge and syntactic knowledge of the target language. Their
expectations and predictions may also differ if their experiential and background
schemata do not help them in predicting the text (e.g. Carrell, 1988).
Goodman explains the functions of reading and defines the major purposes of
reading in the following categories: environmental, occupational, informational,
recreational and ritualistic. For the current study, environmental, or that of semiotic
reading of the literacy environment, and occupational reading, or reading that is required
during the course of one’s day in school or in a workplace environment may be
86
considered as the purpose of the literacy event. The participants in the current study are
students who are studying English, and in doing so they must read their textbooks; they
must also read signs, menus, notices, etc in their environment, since their schooling is in
a (mostly) English use environment. In Chapter Six, I report on the participants’
responses to questions about their recreational reading, and their use of the combination
of recreational reading for enjoyment and multimedia such as movies and television.
One of the ways in which the Transactional Sociopsycholinguistic model
illustrates the dual text creation and the role of the reader’s comprehension is through
miscue analysis, in which an oral reading gives insight to the reader’s text creation, or
‘comprehending’ as a process (Goodman, 1994, p. 1118). Miscues, or observed oral
responses to print that do not match the expected responses, provide a continuous
“window on the reading process” (Goodman, 2003, p. 125). Through analysis of the
miscues, a readers’ construction of the text can be observed as their own meaning is
expressed.
The receptive process, according to Goodman (2003/1970), involves more than
just the opposite process of producing; instead, the reader (language user) samples,
predicts, tests and confirms using strategies which “yields the most reliable prediction
with the minimum use of the information available” (pp. 247-248). This reading process
is universal across languages according to Goodman, and so second language readers also
sample and use as efficient and minimal graphic cues as possible to read and comprehend
in addition to their knowledge or what they know about grammatical patterns and
semantic aspects of their second language (p. 251). He promotes the idea that “proficient
87
readers make generally successful predictions, but they are also able to recover when they
produce miscues which change the meaning in unacceptable ways” (italics in original) (p.
250).
For the readers in my study, their strategies must be able to select the most
successful predictions in a context in which the test presentation of closed captioning
only allows for roughly six seconds8 of available re-reading time (regressions) and so the
available cues for reading meaning must be selected quickly; this corresponds to the idea
that “reading requires not so much skills as strategies that make it possible to select the
most productive cues” (ibid). Goodman acknowledges9 that the patterns and rules
operate differently in each language, but that the metastrategies remain the same:
graphophonic cues, syntactic cues and semantic cues will help make successful
predictions with the text. Reading strategies, however, are still linked with reading
proficiency in a language and its writing system. For second language readers, not only
will it be easier to learn to read a second language when literate in a first language, but
there should also be some grammatical control in the second language. In addition,
semantic input will help supplement syntactic input and that the reading instruction
should involve the use of natural, meaningful texts (pp. 252-253). As shall be seen in
Chapters Four, Five and Six, proficiency affects the strategies and choices seen in eye
movements when the written text is moving.
8 This information will be presented with more explanationbin chapter 4, materials and methodology: the
average amount of time that a word is present on the screen, regardless of which line of presentation, is
roughly six seconds (from initial presentation of the word to when it scrolls off the screen).
9 cf. Goodman, 1998/1975, The Reading Process
88
Lastly, the idea of early prediction matching the visual cues is also promoted in
sentence comprehension during the reading process. Bever and Townsend (2001) refer to
similar processes as Goodman when they promote a model of sentence comprehension of
Analysis by Synthesis. In this adaptation of previous models, a hypothetical (or
predictive) sequence is first used by the reader to assign a likely meaning, which is
temporarily stored and begins the process of comprehension, after which the sentence is
mapped onto a syntactic structure (which derives a surface structure). The third step
involves a matching between the initial likely meaning, and the derived meaning. The
prediction based on a ‘pseudosyntax’ relies on “statistical properties of the ecology of
language and specific lexical information” (p. 164). The familiarity with the probablistic
possibilities is the key for second language readers: the more familiar they are with the
second language, the higher proficiency, the more skilled they are at producing a
pseudosyntax to assist them with their reading process.
2.5 Unique literacy event challenges: L2 reading
Readers who are reading in a less familiar language, for example a second or third
language, encounter extra challenges compared to those reading in their first. At this
point, in the current study, when readers are discussed as reading in a language other than
their first, the reader is assumed to be a proficient reader in their first language and that
their second language is not their most proficient language within which to conduct
reading10. While Goodman discussed a few aspects of second language reading, Koda
10 This reference to proficiency within this particular study is also predicated on the placement within levels
of proficiency for the NNS within their Intensive English Program (see chapter 4). The NNS had not yet
89
(1994) focuses on second language readers and specifically discusses second language
reading strategies. She points out that the strategies used in one language do not always
cross to other languages, since “linguistic features that are essential to sentence
comprehension and production vary from one language to another, and, importantly, the
specific skills and strategies involved in language processing are developed to capitalize
on the essential linguistic information” (p. 3). She also discusses the unique background
of the L2 reader: often adequate oral proficiency is not attained before starting to read
(unlike an L1 reader) and that exposure to print materials in the L2 is a limited experience
(again, such as for an L1 beginning reader who is usually surrounded by print in the
environment).
While Koda calls for more bottom-up research involving word recognition
processes in an L2, she also acknowledges that some strategies are universal and some
are language specific and are customized to accommodate the linguistic features for a
particular language and processing strategies should be explored cross-linguistically (p.
18). She gives three characteristics of reading strategies (not limited to L2): deliberate,
goal/problem-oriented, and reader-initiated/controlled (2004, p. 205). What will be seen
with the participants in this study, during their reading patterns with moving text, for
which they have no control in manipulating the presentation rate, is that while there may
be deliberate reading strategies (such as which lines to read, and whether to use
regressions or not), the strategies may be difficult to maintain as reader controlled. One
demonstrated reading proficiency in English at a level that would permit them to enter the University based
on their TOEFL scores, but they were in the highest levels and so are considered intermediate high to
advanced English Language Learners. The NS were skilled at reading and were all peer tutors in the
Writing Center at the university.
90
of the reasons why this may be is that past research with eye movements have shown that
while the number of eye fixations does not alter widely between proficiency levels, the
fixation duration for lower proficiency learners is considerably longer (Oller & Tullius,
1973). Strategies, therefore, may be consciously deliberate, but factors such as
proficiency may override the reader control. It remains to be seen once the data is
analyzed in later chapters.
Koda, amongst other researchers, also explores L2 reading using word and
sentence processing, lexical organizations, and cross-linguistic analysis; these areas are
integral to L2 reading insights, as well as strategies, proficiency results and pedagogical
implications. Not all of these are discussed in the current study, however future research
using these lenses and the data elicited in the current study could very well touch on
many of these topics.
Another topic worthy of future research with second language readers and the
data elicited in the current study is socio-cultural theory and the L2 reader. It is
mentioned here, since the study was placed in the general perspectives of whole language
and sociocultural theory, although no claims are made at this time. If the literacy event,
such as a Multimodal Multimedia Communicative Event using closed captioning, is
placed within the framework of a natural setting with authentic texts, similar to that of the
Whole Language perspective (Goodman & Goodman, 1990), then the individual is
placed within a larger framework similar to that explained in Chapter Two. Closed
captioning of media texts is not a reduced environment, but a transcription of a speech
91
event intended for a proficient L1 audience11; for L2 learners, then, media texts can be
used as linguistic input with optional choices for accommodation (either aural, visual or
print). If viewing a movie is desirable, and the language is difficult for an L2 viewer,
nonetheless the desire is there which is important for the viewer in that it positions the L2
viewer as an active, independent, motivated learner, in line with a whole language
perspective on reading.
Freeman and Freeman (1992) have spent considerable time integrating second
language learning with a whole language perspective in classroom settings. They
emphasize that oral and written language can develop together and that second language
learners can learn and communicate through many different modalities. In this sense,
multimodality refers to not just the sensory modalities of the channels such as aural,
visual, haptic, etc, but also the semiotic modalities that Kress & van Leeuwen refer to
(see section 3.3). Whole language perspectives with natural authentic resources for
learning are very much a part of the initial premises for the current study. Multimodal
resources are everywhere, and highly accessible in the current technological and
informational era.
2.6 Individual preferences when using multimodal texts
The use of the different modalities to gain better comprehension of a text depend
upon the text content and the reader’s prior knowledge, but also depends upon the
11 The use of proficient in this sentence is debatable: American Sign Language is a different language than
English, and its syntax is not the same. Therefore, the proficiency in English of the primary audience
varies according to who is using it at that particular time of presentation. For many people, ASL is a
second language and English is their first, but for many others English is considered their second language,
and so the closed captioning is presented as reading in a second language.
92
individual’s preferences for getting information, or learning styles. Reid (1995) defined
learning styles as habitual and preferred ways of absorbing, processing, and retaining new
information and skills. Oxford’s (1990, p.8) definition of learning strategies overlaps
Reid’s definition since she says that they are “operations employed by the learner to aid
the acquisition, storage, retrieval, and use of information” and that learning styles are
“the general approaches students use to learn a new language…” and that these styles are
used in solving various problems (Scarcella & Oxford, 1992, p. 61). Learning strategies
are extensions of learning styles in these definitions. Lincoln and Rademacher (2006)
discuss learning styles of ESL students on the premise that “every human favors one or
more senses for learning” (p. 487) and explain learning styles categorized into sensory
modalities such as:
Visual learning styles: In which learners prefer visual stimuli like pictures, charts
and graphs
Aural learning styles: Learners learn best by listening to stories, lectures, or
audiotapes
Tactile learning styles: Learners learn best by through hands-on learning
Kinesthetic learning styles: Learners learn best when presented with practical
information, and when they are allowed to physically mobile.
The authors also mention that in any population, it is common to find a combination
of learning styles, and they cite research wherein cultural differences and genetic
differences may be factors in perceptual learning styles (p. 487). Kinsella (1995) echoes
this finding, that “everyone has a learning style, but each person’s is as unique as a
93
signature” (p. 171). For example, Parry (1996) found that the cultural backgrounds of
two groups of ESL readers (from China and Nigeria) used different reading strategies
with texts. She connected these strategies with their different language backgrounds and
different experiences with literacy. She does not convey that culture should determine
teaching practices, since individuals within the groups varied (p. 687).
Likewise, the current study does not attempt to make any strong claims about
learning styles per se, rather the focus on the individual’s background experiences and
knowledge is emphasized: learning styles are another facet of this premise which also
includes foregrounding the individual in the reading process as seen in sections 2.4 and
2.5.
2.7 Chapter Two summary
In this chapter, I have taken theory from psychology and linguistics, reading and
literacy and pedagogy. Intertwined between have been previous research agendas with
the focal point of closed captioning or subtitles. I have also given a brief history of
closed captioning in the past, included the present and a (hopeful) future as technology
eases the burden of transcribing spoken texts to written. I have also included brief
allusions to a few of the challenges that the English as a Second Language learners
encounter with reading and the change in orthography, as well as a quick review of
learning strategies and styles. This multifaceted review is necessary for a multimodal
research study, which is at the confluence of many different fields of research.
94
CHAPTER 3: METHODOLOGY AND MATERIALS
This chapter introduces the research questions now that the context and rationale
for the study and the framework have been discussed in Chapters 1 and 2. In Chapter 3, I
explain the data collection and the participants involved, the material selection and
production, and give an overview of the procedures and analysis used with the collected
data. In order to set the stage for the analysis of data in Chapter 4, a textual analysis is
presented in this chapter which includes an overview of the entire texts used as well as
the breakup of the text into smaller analytical units. The chapter finishes with a
summary.
3.1 The Research Questions and significance of this study
This dissertation seeks to investigate the use of multiple channels of information
by viewers of multimedia texts that incorporate dynamic text (closed captioning). In so
doing, it seeks find answers to the following questions. Each question will be discussed
in separate chapters.
RESEARCH QUESTION ONE:
1) How do the reading patterns of dynamic text by native and non-native speakers
of English differ in comparison to the reading patterns of static text?
1a). Is there a difference between native and non-native speakers of English’s
reading patterns of dynamic text?
RESEARCH QUESTION TWO
2) In what ways do the reading patterns of dynamic text change with the addition
of the multimodal environment?
2a). Are there any similarities or differences between native and non-native
speakers?
95
RESEARCH QUESTION THREE
3). What are the relationships, if any, that can be established between the
individual viewer’s reading patterns and the self reported background
history related to multimodal use?
This dissertation combines major issues in reading and eye movement, as well as
pedagogy and second language acquisition. It is significant in that it uses eye movements
to track the selected attention in multimodal environments of viewers for the purpose of
comprehension of a text, as well as isolating eye movements while reading a presentation
of dynamic text (closed captioning). It is also significant in that it uses whole texts,
unaltered in their format, and therefore supportive of the use of multimodal texts by
readers outside of experimental settings. These text formats are essential for the
methodology, in which the reader is placed firmly in the design, along with the reader’s
background knowledge as an influential component in his/her choices and strategies for
interpreting and predicting the multimodal text.
3.2 The conditions
There were three conditions used in this experiment; each condition included the
same two texts (airbag text and biker text). For this study, the text refers to the video clip
and all of the channels of information used: graphic forms of printed text in layout
designs, narrations and voice-overs, the printed text of the closed captioning, the sound
effects, etc. Primarily, the term ‘text’ encompasses these mentioned modalities, the parts
that make up the whole. However, at times the text is manipulated to use only certain
modalities, such as only the closed captioning without the visual and aural channels of
96
information; this is still referred to as the ‘text’. The conditions of the text presentations
were as follows:
Condition 1: Closed Captioning Only (or dynamic condition)
In this condition, the words appeared as closed captioning on a black background.
The appearance (size, rate of appearance, etc) copied that of the original text. The
text moved across the screen.
Condition 2: Closed Captioning + Video (the original text)
In this condition, the appearance was completely multimodal. The closed captioning
was presented at the bottom of the screen while the video played visual and aural
channels of information.
Condition 3: Static Text (or static condition)
In this condition, native speakers of English were presented with both the airbag and
the biker text, although the text was manipulated and presented statically (or, when
there was no movement across the screen). The participants viewed between 12 and
14 lines of text on a screen and were able to self select when to advance to the next
screen of text by clicking the computer’s mouse. The font and size of the text were
comparable to the text presented as closed captioning in conditions 1 and 2.
Each condition is included in order to gather information regarding Research
Questions 1 and 2. Condition 1, or only closed captioning, is designed to gather a
specific micro-analysis of each participant’s reading patterns. It is also possible to gather
information across participants regarding reading patterns between the two participant
groups, reflecting their different proficiencies in reading English in a dynamic form.
Condition 2, or the original text, is designed to gather information regarding the selective
attentional choices of the participants, with similarities and differences again based upon
differing proficiencies with the different modalities available. The individual
participants’ data can also be compared with the data collected during interviews, which
ties in with Research Question 3. The rationale for Condition 3 is to have a comparison
97
of reading patterns between static and dynamic texts for NS, or high proficiency readers
of English.
3.3 The data collection and the participants
There are three groups of participants involved in this study. Group One consists
of non-native speakers of English (NNS) while the corresponding control group (Group
Two) consists of native speakers of English (NS). The Third Group also consists of
native speakers of English, however, the condition differed in that the text was not
dynamic in its presentation but static.
The following section describes the overall data that was collected and the
resulting data that was used for analysis.
3.3.1 Types of data
Data was collected from seventeen participants in total (n=17, NNS=5, NS=12,
m=8, f=9, mean age = 22). Data was collected using the following methods:
1) eye tracker numerical data,
2) eye movement data imposed on the scene background (visual data),
3) retell interviews,
4) extended interviews, and
5) computerized learning styles inventories.
The above represents the total data collected; for this study only a segment of this data
was analyzed in correspondence to the initial research questions. However, it should be
noted that the data collected for the NS varied by the group: for the NS participants in
conditions one and two all data (types 1-5) were collected (control group). For the four
98
NS participants in Condition Three only three types of data as indicated in 1) through 3)
were collected (for the static reading group).
3.3.2 Eye tracker numerical data and eye movement visual data (data types 1 and 2)
Due to the differences between the individual participants and the nature of the
eye tracker, not all participants were able to be suitably calibrated (calibration of the
reflection from the cornea and pupil), and as a result some of the eye movement data
collected is unusable. This resulted in the removal of one NNS and four NS eye tracker
and eye movement data. Since the administration of conditions one and two were
counterbalanced for presentation order (with video vs. without video) and text order
(airbag topic vs. bikers topic) the remaining NS participants with satisfactory calibration
were narrowed down to four participants so that the N for each subject pool would be
equal and the counterbalanced conditions would also be equivalent, with the addition of
three NS in the static reading control condition.
As a result, the participants now constituted four NS of English and four NNS in
the main study, with three NS in the text presentation control group (static text
condition). The resulting NNS were all male students whose L1 was Arabic and who
were attending an Intensive English Program (IEP) at a Research I university in the
southwestern area of the United States; all had been in the United States between 1 and
1.2 years learning their L2, with age ranging from 20 to 34, and an average length of
study of English of 6.75 years (starting in middle school or high school). At the IEP, at
the time of the study, all of the NNS participants were enrolled in at least the high
intermediate level (two were in the high intermediate level, one was enrolled in the
99
advanced level, while one had just finished the advanced level). The resulting NS in
conditions one and two were all female, with an average age of 20 years, while in
Condition Three there were two females and one male (average age: 20.6). All NS
participants were currently attending the same university mentioned above for
undergraduate degrees. All were volunteers who responded to an email over a listserv,
and were compensated monetarily for their participation and time.
All participants, NS and NNS, were studying an L2. While the NNS were
currently studying English as Second Language, all of the NS had studied a foreign
language and many of them had studied an L3 and L4. The range of languages included
commonly taught languages such as Spanish, French and German, but also included less
commonly taught languages such as Italian, Portuguese, Japanese, Latin, Hindi, and
Polish. Clearly, all participants were aware to some extent of the learning strategies
needed for learning a foreign language, and many of them seemed to pursue language in
general as a topic of interest and study beyond the mandatory language classes necessary
in high school or in undergraduate degrees.
Table 3.1 below combines the information presented in section 3.2 (conditions
and participants) and adds the order of conditions for each participant.
100
Table 3.1 Participants with order of conditions (1 = viewed 1st, 2 = viewed 2nd)
Text
topic:
Airbags Airbags Bikers Bikers
Condition:
Video+
CC text
CC text
only
Video+
CC text
CC text
only
NNS 2 1 2
NNS 3 1 2
NNS 4 2 1
NNS 5 2 1
NS 1 1 2
NS 2 1 2
NS 7 2 1
NS 8 2 1
NS9 1 (static) 2 (static)
NS11 1 (static) 2 (static)
NS12 2 (static) 1 (static)
3.3.3 Types of data: Interviews and strategies (data types 3, 4 & 5)
All of the data collected by means of interview and the computerized learning
styles inventory software were usable. This data includes retell interviews after each
condition of video text clip, extended interviews about learning strategies, language
background, and media habits, and the final data collection of the learning styles
inventory using a computer program. The retell interviews and extended interviews were
transcribed, using the soundtrack recorded on a video camera and the brief notes taken
during the interview (N=13, NNS=5, NS=8).
Retell interviews were used with a dual purpose: 1) as a motivating factor for the
participants to engage in the task, and 2) as a series of open ended questions for which to
gather data about the comprehension of the texts in the various conditions by the
participants in their own words. Forced answer questions, such as yes/no, true/false or
101
multiple choice do not give the participants a way of talking about a subject that they
might understand but not remember the specific words; for the NNS, then, retell
interviews provided them a comprehension test rather than a memory or vocabulary test.
The extended interviews were designed as a series of open ended questions with
which to collect data about the language learning strategies and studying habits employed
by the individual participants as well as cultural and family factors regarding language
learning (see appendix for questionnaires). For the NNS, questions were included about
the cultural environment of using English with friends and family, as well as what
options are available, and used, regarding subtitling in English. The results of the
extended interviews could then be compared to the selective attentional choices made by
the participants in the multimodal condition (Condition 2). Question 3 (Chapter 6) unites
the types of data listed in section 3.3.1, using: 2) eye movement data imposed on the
scene background, 4) extended interviews, and 5) the Learning Styles Inventory. The last
type of data collected, the Learning Styles Inventory, was used as an outside control
regarding the participants’ learning strategies and habits.
3.4 Materials: Choosing the texts
The majority of past eye movement research involves using carefully controlled
variables, with text that is carefully manipulated – either at the sentence, clause or word
level (see previous chapter for greater detail). However, there are eye movement studies
that use authentic natural text for reading experiments. This is the methodology followed
in this study, and it is not without precedence. Radach & Kennedy, in their review article
102
about eye movement research and issues (2004), explain the benefits of each type of
research, using controlled texts versus authentic texts, and state that the latter “can be
very useful for exploratory analyses and for the generation of hypotheses to be
subsequently tested in more controlled experiments” (p.8). Just & Carpenter’s seminal
research (1980) in eye movement used the first few pages of a textbook for their viewing
text, and McConkie, Kerr, Reddix, & Zola (1988) also focused on sentences in a natural
text to analyze the launch sites (points of departure within or near a word). There has
also been eye movement research with reading from the EMMA lab at the University of
Arizona in the Language, Reading & Culture Department in the College of Education,
which routinely uses authentic text (cf. Duckett, 2001; Paulson, 2000; Freeman, 2001;
O’Brien de Ramirez, 2007; Gerard, 2007).
3.4.1 The text selection
That being said, the natural texts for this study are two news stories from a community
PBS station located in a large university in the Southwest. DVDs containing roughly
eight days of recorded shows were given by the television station with permission to use
in an experimental setting. Each excerpt was considered and two were ultimately selected
for their content (e.g. novelty of information presented in relation to familiar topics,
contains concrete meaning v. abstract meaning, contiguity between presentation of aural
narrative text and written text), for the length of time, and the number of captioning lines
presented (see table 3.2 below). They were selected because both were originally aired at
least three years prior to the present study and shown on a local PBS channel. Therefore
it is highly improbable that the text had been seen before by any of the participants,
103
native or non-native speakers of English. None of the participants reported seeing either
story before.
Table 3.2 The video clips used as texts. Each are news stories.
Sequence
title
Originally aired Duration in
minutes:
seconds
Number of lines
in captioning
text
Number of words
(tokens) in
captioning text
Airbags 11/4/2002
5:11 167 698
Bikers 11/5/2002 4:58 162 705
Another element involved in the selection of news stories from this particular
show is that the episodes are ‘live captioned’ in that there is a person (a trained captioner)
who types the written text based on the verbal text as it is broadcast. The typed written
text is then combined with the feed as it is broadcast. The closed captioning text
therefore contains a delay (between less than a second and up to a few seconds) in the
written presentation of the verbal message, and at times is incorrect in spelling and in the
accuracy of verbal parsing. These inaccuracies and incongruities in the text may cause
disruptions in the perception and comprehension of the participants, and break up the
textual reading and the eye movement. More importantly, these interruptions and pauses
in presentations also create situations where the eye movement shows what choices the
reader and viewer choose to gain better comprehension of the storyline. Examples will
be shown in the analysis of research question two.
Both texts were chosen because they contained what would most likely be new
information for the participants. Since the retell interviews are open ended questions, and
the participant would be using background information about the content of the story
104
when perceiving and comprehending the story, both of the chosen texts had some
common background context (car airbags, motorcyclists) as well as non-common
information (e.g. sodium azide, development of airbags, non-stereotypical motorcyclists).
The content of the texts would then contain some information that the participants would
likely know, yet the retell interviews would show whether the participants understood the
new information presented in each text. Those that didn’t would most likely only be able
to talk about the general concepts of the stories without being able to draw on the more
obscure, new information.
The ‘Airbag’ sequence (see appendix for the complete text) is a short news story
about a university scientist who, as a graduate student, accidentally poisoned himself
while working with a chemical called sodium azide. Years later he discovered that this
same chemical was being used in automobile air bags. He illustrates the mechanics of an
airbag and explains that when an airbag explodes the sodium azide is converted into a
harmless chemical, but that if an automobile is sent to a junkyard with an intact airbag
there is a potential for leakage and a resulting poisoning of the land and water. He also
talks about how his interest in the poisonous sodium azide was used by an author in a
mystery novel (complete transcript is in the appendix).
The ‘Biker’ sequence is a short news story about an active motorcyclist who is
attempting to dispel negative myths about motorcyclists and has written a book about
motorcycle road safety. He says that the impetus for the book was when his son started
to ride motorcycles. Besides talking about the myth of ‘Hell’s Angels’, he says that
105
many motorcyclists are people with regular jobs who ride as a hobby. The author is also
active in designing a road sign to keep automobile drivers alert for possible motorcyclists.
3.4.2 Extension of the texts: Selected internal units of texts
Within the larger framework of the story, smaller units of text were selected and
isolated in order to get a micro view of the reading patterns as well as to be able to
compare reading patterns across the whole text. This follows similar methodological
procedures of material selection as in the beginning of this section. The smaller unit texts
have been labeled as follows, with the unit time and position within the larger text:
Table 3.3 Smaller units of text identification and characteristics (label, duration, start
time within whole text (+))
Whole text
labels & times:
Smaller unit text labels (minutes:seconds.milliseconds)
Total time (+ start time within total text)
A B C D
Dramatic-
remembrance
0:13.80
Active-
ingredients
0:21.50
Vehicular-
stream
0:22.70
Propel-ant
0:23.14
Airbags
05:11.30
Duration:
Start time
within text:
(+ 0:29.00) (+ 0:57.30) (+ 0:2:14.00) (+ 02:36.70)
Smell-smells
0:23.00
Father-son
0:22.00
Fun-ride
0:13.00
Cow-deer
0:18.10
Bikers
04:58.90
Duration:
Start time
within text:
(+ 0:17.00) (+ 0:49.00) (+ 01:47.00) (+ 03:54.00)
As seen in Table 3.3, both larger texts were reduced to four smaller texts varying
between 13 and 23 seconds in duration each. These smaller units enable the exploration
of the similarities and differences between the reading patterns of the two groups of
106
participants over time and within smaller, more manageable conditions. Each smaller
text was chosen for its discourse units, or the completeness of the idea within a short text.
For example, Airbag Text A, ‘dramatic remembrance’, was chosen for its
sentence structure (three sentences in which each sentence has referents or repeats from
the previous sentence). The structure of the short text gives a relative amount of context
within its short duration: unit texts A and B follow a pattern in which sentence 1 sets up
the problem, sentence 2 explains what happened, sentence 3 finishes the idea unit,
whereas unit texts C and D follow a question/answer format. In addition, there is at least
one repetition of a content word (e.g. in Unit Text A: the lemma “poison” is repeated, in
the forms of “poison” and “poisoned”). This arrangement gave the possibility of tracking
eye movement of reading patterns in context with reoccurring variables across the shorter
text units. Possible variables include repetition of words and forms, errors in the text
presentation (e.g. texts C and D), line movement timing (both texts A and B contain one
line that skips position two), and general patterns in a reader-text transaction. The
restriction of choosing a unit based on its idea unit results in the varied total time amounts
for each unit (between 13 and 20 seconds), and will cause the analytical results to be
converted to percentages in order to compare across time units, as seen in Chapter 4,
answering Research Question 2. Table 3.4 below illustrates the unit ideas for the smaller
text units A, B, C and D in the text topic of Airbags.
107
Table 3.4 Example of sequenced unit texts in the Airbag sequence used for question 1,
with text, type/token count and duration. The characters >> indicates a change in speaker;
line breaks are the original breaks in the text. The unit texts for the biker text can be
found in the appendix.
Text topic: airbag sequence
Text A
(dramatic
remembrance
sequence)
>> ACTUALLY, I POISONED
MYSELF.
I INHALED SOME OF THE POISON
FUMES OF SODIUM AZIDE IN
WATER.
IT HAD SUCH A DRAMATIC
EFFECT ON ME THAT I
REMEMBERED IT FOR A LONG,
LONG TIME.
Types: 27
Tokens: 33
Duration: 13
seconds
Text B
(active
ingredient
sequence)
>> I WAS READING SOME OF THE
SCIENCE ALERT KIND OF
LITERATURE AND SAW SODIUM
AZIDE WAS THE ACTIVE
INGREDIENT IN AIRBAGS.
AT FIRST I DIDN'T BELIEVE
IT.
I THOUGHT IT WAS A TYPO,
BECAUSE OF THE EFFECT THE
AZIDE HAD HAD ON ME.
AND IT WAS NOT A TYPO, IT IS
THE ACTIVE INGREDIENT.
Types: 34
Tokens: 55
Duration: 19
seconds
Text C
(vehicular
stream
sequence)
DANGER, WHAT HAPPENS DOWN
THE LINE WHEN THE CAR IS
JUNKED?
>> NOW WE'RE GETTING TOWARDS
A DECADE OR SO, GETTING
TOWARDS LIFE-TIMES OF
VEHICLES, NOW WE'LL HAVE A
STREAM OF VEHICULAR VEHICLES,
GOING TO THE SCRAP EACH
HOLDING ONE HALF POUND OF
SODIUM AZIDE.
Types: 37
Tokens: 47
Duration: 23
seconds
Text D
(propel ant
sequence)
>> WHY WHY WOULD SOMETHING
SO TOXIC BE USED IN
SOMETHING SO WIDELY USED AS
AIRBAGS?
Types 41
Tokens: 55
Duration: 22
seconds
108
I'VE TRIED TO SURMISE WHAT
THE ANSWER IS.
AZIDE HAS BEEN USED BY THE
MILITARY FOR A LONG TIME AS
A PROPEL ANT.
NOT JUST AS AN EXPLOSIVE BUT
PROPEL ANT IN EJECTOR SEATS
IN FIGHTER PLANES, FOR
EXAMPLE.
3.5 Materials: Production of the texts
Since closed captioning is an encoded signal (see Chapter 2 for the background on
closed captioning versus subtitling), it poses challenges for researchers wishing to
manipulate the video text yet retain the original presentation and timing of the captioning
text. One of the problems encountered when creating the materials for this research was
in the second condition, or that of when the visual and aural aspects of the video are
removed and the participant is only reading the dynamic text.
First, for this research experiment, the selected video clips were individually
ripped from the original DVD, and placed into Final Cut Pro. Final Cut is a video editing
program in which video clips can be manipulated, sound can be added, and limited
special effects can be used to produce professional quality videos and movies. To make
the data collection easier and smoother when conducting the experiments, a calibration
screen lasting eight minutes in length was placed before each video clip, and a two
minute calibration screen was placed after each video clip. (The calibration screen is
standard in eye movement data collection: it is a screen with numbers on it that is used to
calibrate the eye tracker camera to the participant’s unique eye curvature and pupil-
109
cornea relationship. The EMMA lab uses a calibration screen with nine numbers on it).
In this manner, the DVD of materials could be divided into the four conditions, and then
marked with chapters to facilitate changes between the conditions. The participant would
be able to be calibrated, and when the calibration was completed, the DVD could be
forwarded to the next chapter at which the video clip in the appropriate condition would
be started, and upon finishing the post calibration could quickly be captured. With this
design, the experimenter would not need to lean over the participant to switch between
programs on the reader's computer and therefore creates more authentic conditions for the
data collection.
The creation of the digital files in a DVD format would allow the two conditions
for each of the video clips to be manipulated with the closed captioning remaining as the
constant unchanging factor between the two. A black overlay was placed over the video
in each of the video clips so that only the closed captioning would be displayed in its
original presentation and timing, without the pictures in the background. The sound file
was also disconnected, so that no sound would be heard by the participant (see Chart 3.1
below for a graphic description of the overlays and sequencing). Once all the pieces
were in place, chapter markers could be inserted, and each sequence was then rendered.
Rendering entails the blending of the separate pieces into one complete, smooth video
text, or the final video product.
110
Chart 3.1 Example of the combination of video texts and files for each condition
Countdown Video clip Calibration screen
Condition 1:
Only closed
captioning
(video blacked out
and sound
disconnected)
5 4 3 2 1
>> closed captioning text
end
Addition of track + .scc closed captioning file
Condition 2:
Video text plus
closed captioning
5 4 3 2 1
sound
end
Addition of track + .scc closed captioning file
Length in minutes
& seconds
00:05 05:29 00:03 08:00
Unfortunately, Final Cut Pro eliminates the encoding in line 21. As soon as
anything is changed or rendered all previously encoded texts are wiped out. In other
words, the resulting materials for the two video clips in two conditions for this
experiment were no longer useful since they no longer contained the closed captioning.
To solve this problem, the closed captioning file needed to be added back into
each clip sequence. SCC_TOOLS is a freeware software package that allows a user to
extract files from DVDs and CDs, including subtitle files and closed captioning files
(www.geocities.com/mcpoodle43/SCC_TOOLS/DOCS/SCC_TOOLS.HTML). Using the
video clips that were created to make the sequences, SCC_TOOLS was used to obtain a
.scc file, or the closed captioning raw data with the exact timecode needed to present the
closed captioning text exactly as it was presented in the original video clip. Therefore,
111
once a video sequence was created for each condition (two video clips x two conditions),
the four sequences could then be rendered and yet a different program was needed to
create a single DVD with the four sequences that also had the capability to add the closed
captioning .scc file back to the video.
DVD Studio Pro, which is compatible with Final Cut Pro, is a program designed
for creating professional DVDs; it also is one of the only programs that will read and
insert Line 21 files (.scc files). The extracted files were inserted and added to the
sequences, resulting in an assembled sequence with closed captioning that contained not
only the video clip, but the calibration screen as well. Chapters were added so that the
sequence could be started at the calibration screen, and then once the participant was
calibrated, the remote control could be used to go to the chapter with the video text and
properly aligned .scc file. The advantage of this long process is that the background
video and sound files could be 'taken out', leaving a black screen, while the closed
captioning retained its original presentation form and timeline codes. The authenticity of
the text as a natural text was retained.
The computer used in the EMMA lab was a Gateway PC with an added ATI
Multimedia Radeon X1900 graphics card. The ATI card was instrumental in conducting
the collection of data for this experiment because it can read and display the Line 21
timecode and closed captioning data on the computer monitor. When a video clip that
has closed captioning encoded in its signal is played via the ATI card DVD Player, the
closed captioning text can be enabled and viewed. The ATI player can display the closed
captioning either at the bottom of the viewing screen, replicating that of a television
112
viewing experience, or can be displayed in an external window. For this experiment, it
was displayed at the bottom, again replicating the more natural conditions of watching
television with closed captioning displayed on the bottom of the screen.
The ASL 504 eye tracker is a desk mounted camera used to measure the subject’s
eye line of gaze. The resulting measurement is displayed as a cursor that is superimposed
on the image from the monitor. The combination of the cursor and the image source is
captured and can be used for later data analysis (see figure 3.1 below). The eye tracker
uses a pupil-corneal reflection to calculate the movement of the participant’s eye.
Participants are not restricted in their head movements by restraining devices, and so the
experience is more natural, although the more the participant remains still, the better the
results and capture of data and eye movements. Prior to starting each viewing session,
the participant was told to sit in such a way as to simulate going to the hairdresser’s: try
not to move too much so nothing ‘bad’ would get cut, but to not stay so tense that it was
painful. All participants seemed to understand this analogy and stayed relatively still
during data collection without complaining of discomfort.
Figure 3.1 Example of cursor superimposed on the image on the Viewing Computer
monitor. The cursor can be seen near the nose.
113
The data for visual analysis was collected using Windows Movie Maker (WMM).
The video signal generated by the eye tracker regarding where the participant was
looking was added together with the video signal of the background (the pictures on the
monitor). WMM recorded this combined signal, resulting in a video file of the original
text with the 'bouncing ball' cursor indicating the location on the screen where the
participant was looking. The .avi file could then be slowed down to view the movements
measured in milliseconds. Numerical data was also generated by the ASL software
(using EYEPOS, EYENAL and FIXPLOT) which could be exported into MS Excel files.
This data in included fixation order, fixation duration, the X and Y axes of the fixations
and the timecode for each fixation.
3.6 The EMMA lab
All data was collected in the EMMA lab (Eye Movement and Miscue Analysis).
The EMMA lab is set up with two computers, an eye camera, and several monitors
(detailed in section 3.7.2). Added to the lab for this experiment was a Panasonic PV-
GS180 video camera.
114
Figure 3.2 EMMA Lab computer and camera set-up.
3.7 Data collection – how it all fits together
In this section I explain the order of the data collection as well as the equipment used in
the lab for data collection.
3.7.1 Order of data collection
For the non-native speaker participants, flyers were given after class to students
enrolled in high and advanced levels of English as a Second Language classes.
Participants signed a list if they were interested in participating after which email
correspondence was used to set up meeting times. Native participants were contacted via
a common listserv; once interest was indicated, email was again the mode of
communication used to set up meeting times. Upon arrival at the EMMA lab, each
Viewing
computer
monitor
Data
computer
monito
r
Eye tracker
control box eye
scene
camera
viewer
Computer w/
graphics card
Data Collection
Computer
camcorde
r
115
participant signed a consent form and after the experiment was finished was paid
compensation for his or her participation.
There were three forms of data collection that included a numerical form (e.g.
X,Y coordinates of the eye movements, fixation durations, saccade length, etc), a video
form (e.g. a video file that combined the representation of the eye movement in the form
of a black ball imposed upon the background of the video that the participants were
viewing and reading), and a video camera form (e.g. the entire interview was
videotaped). The procedures for both groups were the same.
After signing the consent forms and explaining the different computers in the
EMMA lab, each participant was given a practice reading (see appendix for the text).
The practice reading was used primarily to give an example of the type of text to the
reader, to practice the retell questions, to explain the use of the symbolic characters that
indicate a change in speaker (<<) and to help the participant feel more at ease in the lab
setting. The practice reading was the text from another news story from the same local
PBS show as the two texts used in the experiment.
For conditions 1 and 2, the participants were given the two texts, airbags and
bikers, in alternating presentation patterns in order to counterbalance the presentation and
avoid the possibility that the presentation order was a variable in the reading patterns of
the participants. For example, half of the participants viewed Condition 1 (closed
captioning only) first, and half of those viewed the airbag text first. After the
presentation of each text, the readers were asked questions and retell protocols were
collected. Both groups of participants were asked questions about learning strategies and
116
language use in extended interviews and then completed an online questionnaire about
learning styles. For condition 3, the participants were not asked extended interview
questions or asked to complete the learning styles inventory since its purpose was as a
control condition regarding reading patterns. The retell protocol questions and the
extended interview questions used for participant groups one and two are listed in
Appendices C, D and E.
3.7.2 Equipment used
An eye tracker (the 5000SU Eye-Tracker, product of Applied Science
Laboratories) was used to track a participant’s eye movements. In the EMMA lab, the
eye tracker camera sits on a desk near the viewing monitor, facing the participant. It
follows the participant’s eye movement using refractions off of a pupil-cornea
correlation. The camera uses this information to display the movement of the
participant’s eye as a cursor on a nearby screen monitor (see Figure 4.1 for an
illustration). The viewing monitor text and the scene monitor with the moving cursor are
combined and captured into a movie file using Windows Movie Maker and an ATI
graphics card.
Besides the visual data of the moving cursor representing the gaze of the
participant’s eye, numerical data is recorded that includes ordered fixations in points
(using vertical and horizontal coordinates relative to the screen), and duration of the
fixation points measured in milliseconds (in which a fixation is measured as duration of
gaze of at least 100 milliseconds). Using software supplied by the manufacturer, this raw
data, in the form of X,Y coordinates, can be converted to a graphic representation to
117
create fixplots, or a visual representation of the fixation points. The fixplot is overlaid on
top of a picture (.bmp format) of the visual text that the participant was presented with at
the time of the eye tracking capture to create a representation of where, when, or for how
long the participant was looking at the screen (see Figure 4.3 below for an example).
Figure 3.3 Example of a fixplot. Size of circle represents the length of fixation, the line
between the circles shows the order and path of eye movement, as also indicated by the
order of the numbers.
The fixplots, and the eye tracker in general, allow me to follow the participants’ reading
patterns and choices for attention in a multimodal environment (Plass, et al, 1998, 2002)
and break through the restriction of the materiality of time (Norris, 2004): the eye tracker
records enough data to allow researchers to visualize when and where the participants
were looking.
3.7.3 Interviews: Comprehension questions as retell protocols
118
It is important to note that after viewing each text the participant was asked
questions which formed a retell protocol, one that had already been used with the
participants before starting the eye tracker when the participants were given a practice
reading and a practice retell protocol. In order to avoid biasing the participants, the retell
questions were constructed to start with fairly open-ended questions and then to proceed
to more detailed prompts following a procedure similar to Fisch, Brown & Cohen (2001)
in which they used a verbal measure of comprehension with children who had watched
educational television shows for children. The retell stimulus itself followed those used
by Graber (1990), who looked at the schematic processing used by viewers of televised
news stories, in which a general prompt was used: “Your friend comes in the room and
asks you what you’re doing/reading about. What do you say?”. The retell prompts
continued in order to solicit additional comprehension artifacts from the participants, e.g.
“Your friend says, ‘Airbags? I’ve got airbags in my car. What do you mean?’ or ‘Tell me
more!’”. As the interlocutor in this situation, I added phatic communication such as
‘hmmmm’ in order to help the participants feel relaxed in the setting of the lab and
avoided using gendered pronouns in case that may have affected the participant’s
response. In general, I followed the suggestions for retelling story as outlined by
Goodman, Watson & Burke (2005) in that the reader should be aware from the beginning
that s/he will retell the story, that after the reading is finished the reader should retell the
story without interruptions from the interviewer, and that following the initial response
the interviewer is free to use other prompts to further expand the retelling while
remaining a “neutral but interested observer” (pp 22-26).
119
Following the “summary” retell question, participants were asked open ended
questions about what they remember about the text (e.g. “What else do you remember
about the reading?”). The participants were video taped; by using video additional
insight into the appropriation of the text may be viewed in the form of gestures.
3.7.4 Interviews: Extended interviews
All of the NNS and the NS participants in conditions one and two were then asked
to stay for an extended interview about viewing habits and possible interactions with
language acquisition. The purpose of this qualitative collection of data is to inquire into
the choices that that the viewer/learner utilizes to obtain better comprehension. It attends
to Research Questions 2 and 3 regarding the reading patterns of dynamic text by NNS
and NS and whether there are any relationships between an individual’s reading patterns
and his/her individual background and relationship with using multimodal environments
as language learning tools (see beginning of chapter). A set of questions regarding the
participant’s language background, study habits regarding language learning and the use
of subtitles and closed captioning as a language tool were asked (see Appendices C and D
for complete set of questions).
3.7.5 Questionnaire: Learning Styles Inventory
A Learning Styles Inventory (LSI) was administered via computer as the last task
of each session for participant groups one and two (those that viewed multimodal texts as
part of their condition). The participants were able to keep a printed copy of the results.
In order not to bias the participants in their viewing performance during the experiments,
120
the LSI was administered at the end of the session. This questionnaire software was
offered by Educational Activities Software as a promotional software for use in this
experiment.
The rationale for the inclusion of the LSI was that the students would be
answering questions about their studying habits and their use of multimodal multimedia
texts for learning and for recreation. As described in Chapter two, learning strategies
include how a learner approaches learning. The use of the eye-tracker in this study gives
a detailed view of where a viewer looks to get information for comprehension of the
presented text: it is an inside view of the learner’s navigation through the various
channels of presentation that are available. The LSI is an overview of what the learner
self-selects as his or her styles of learning and strategies for gathering information for
learning and comprehension of information. The two together give insight into both ends
of the spectrum: the selective attention patterns of a specific text plus the overall self-
reported strategies with which the participant believes that he or she works best.
For the LSI portion of the experiment, forty-five ‘easy to read’ statements about
individual learning styles were presented on the computer screen and took about fifteen
minutes to answer (a sample of the questions is located in the Appendix F). The
inventory statements are based on a 1-4 Likert scale (e.g. “I like to work by myself”, 1-
Least like me, 2, 3, 4- Most like me). The results are compiled by the software and can
be printed out, from which two categories, “Auditory Language” (“students learn from
hearing words spoken”) and “Visual Language” (“students learn well from seeing
words”), were used. Since the multimodal channels of information present in this
121
experiment included the options of hearing spoken words versus viewing printed words
and viewing semiotically related pictures and video clips, the two categories mentioned
above were the two that were selected and used in an analysis that looked for correlation
between learning style and self reported viewing habits of the participants obtained
during the interviews. Chapter 6, Research Question 3, reports the findings of the
overlay of the extended interview questions about viewing habits and learning styles and
strategies, with the multimodal eye movements for each participant and the results of the
LSI regarding auditory and visual language learning preferences.
3.8 Problems: It’s all moving around
The nature of this study entails movement: the video text is moving, the words in
the sentences are moving, and the eyes are moving as the attentional gaze is moving.
Tracking eye movements is essential, in that it at least partially reflects the cognitive
processing of the reader in gathering information over multiple modalities, but this study
is also heavily based in the framework that world knowledge and context are important
variables in studying comprehension. They are variables that cannot be controlled for,
and are also essential in understanding the transaction and interaction between the viewer
and the text. A highly controlled experiment, with exchangeable exemplars, can be quite
insightful into cognitive processes, yet turns a blind eye to the individual. This study
instead embraces the uniqueness of the individual, while at the same time it looks for
patterns in the reading. The use of authentic text, with all of its inherent context, allows
the interaction between the viewer and the text to be explored, and combines the
preciseness of the eye tracker data with qualitative data gathered from the interviews and
122
recall questions. Exploration of some of these issues were presented in Chapter three
regarding previous eye movement research, multimodality and closed captioning. Issues
will be discussed further in Chapter 5 as the data analysis is presented.
There is an assumption made here that the eye movements reflect the area of
attention of the viewer and that the attention is directed in one area or another in order to
assist in comprehension. This covers the visual modalities of graphic text and written
text in the video clips, but does not reflect the use of the aural channel to gather
information and comprehension. This is an aspect that is out of the control of this
experiment, considering that an authentic text was used, but one that perhaps can be
explored in a future application of this study’s results for further insight into multimodal
attention in the comprehension process.
3.9 Summary
This chapter sets the stage for the next three chapters: Research Questions One,
Two and Three. While it seems lengthy to go into all of the production challenges of the
materials, I believe that by annotating these procedures, others who want to study closed
captioning and reading may find it an easier and less daunting task. I could never have
accomplished it without the support of the knowledgeable people at the University’s
media centers and various online forums. The breakdown and explanation of the unit
texts was also lengthy but necessary, as various analyses in the next chapter are
dependent upon them. I have also given explanations and illustrations of a few of the key
123
terms needed in the next chapter such as a fixplot, as well as explained the participants,
their backgrounds, and the rationale behind the methodology used in this experiment.
124
CHAPTER 4: ANALYSIS OF QUESTION ONE
4.1 Analysis Overview
As seen in the previous chapters, multimodality can be quite useful for learners in
that it offers a variety of tools to use to access the medium and more opportunities for the
learners to make sense of the text. For this study, the multimodal medium is one that
contains video, plus a soundtrack with people speaking and background noises, as well as
a transcription of that soundtrack in the form of closed captioning. In order to be able to
research this particular multimodal medium, the video and audio was removed and the
reading patterns analyzed by observing eye movement patterns in Research Question
One. Later, in Chapter 5 while answering Research Question Two, I examine the
multimodal channels again via eye movement patterns, but instead of as parts, they will
be examined as a whole. I first start with a section that discusses the rhythmic
presentation of closed captioning as a reading text since this literacy event differs from a
traditional reading environment such as print on paper. I then enter into the research
questions posed by this study and the application of the data analysis. .Using both
numerical data (including statistics) and visual data (shown in a graphical form, or
fixplots), the following sections examine the total number of fixations for participants,
the saccade length for participants, participants’ eye movement patterns regarding length
of fixation durations, use of the line presentations for reading and the use of the pictures
versus the written text to comprehend the story. Before the research questions are
tackled, however, I will explain the uniqueness of the texts used in this study and their
unique place between reading and speaking.
125
4.1.1 Notations unique to this study
This study is one of the first to analyze reading patterns with dynamic closed
captioning text with the minute details available to an eye tracker. Previously, eye
movements had been used to record the attentional movements between subtitles (non-
dynamic printed translations of the spoken words) (e.g. d’Ydewalle & Gielen, 1992) and
overall patterns of eye movements with deaf subjects and closed captioning had been
monitored and reported (e.g. Jensema, et al, 2000), but none so far with dynamic text and
NNS. This study is also one of the first to analyze the reading patterns of the readers’ use
of the written lines of text as presented. Therefore, in order to be able to discuss the lines
of closed captioning text, the following two figures should be helpful:
Figure 4.1 Position of line appearances of closed captioning.
The written text sequentially appears on the bottom row first (row 1), rotates up to row 2,
then to row 3, and then disappears from screen. The line of text appears in short
segments, generally word by word, in row 1. The now completed line then rotates up to
row 2 and then 3 and gives the reader a repetitive opportunity to read the line of text.
Figure 4.2 shows the movement up the screen row by row.
row 3
row 2
row 1
126
Figure 4.2 Illustration of movement of text across and up the screen
4.2a The text appears 4.2b The entire row moves 4.2c The entire row moves
across the screen, word up as a unit as new text again, then disappears off of
by word appears the screen as a new line starts
4.2 Presentation of the airbag and biker texts to the viewer
The presentation of the text is important to the reader’s interpretation of the text
and the ability of the reader to project the future possibilities and schema applications.
While a standard text should follow general semiotic rules for enabling interpretation and
prediction, closed captioning only marginally adheres to a predictable pattern. While the
content of the text may follow predictable narrative schemata, the dynamic presentation
of the lines of text show variability in timing, as explained and detailed in this section.
This section explains the differences between past reading research that has typically
been with static texts, and emphasizes the importance of researching dynamic text since
this form of text is quickly becoming available and used throughout many forms of media
outlets.
The rate of appearance and movement of live closed captioning varies in the
presentation of each line of text on the screen. While each line starts in the exact same
spot on the screen, and so is predictable, the amount of time it takes for each word to pop
up across the screen (left to right) is variable, as is the rotation and appearance of the next
WORD1 WORD2 ETC
WORD1 WORD2 ETC
word3 word4 etc
WORD1 WORD2 ETC
word3 word4 etc
word5 word6 word7
127
line. It should be noted again that with this type of dynamic text, the line of text does not
appear all at once, such as regularly seen in subtitles on a movie. Rather, the words show
up first on the left side of the line and continue to pop up until the line of written text is
finished and the next line is ready to start. Interestingly, as will be seen below, while
each of the texts (‘bikers’ and ‘airbags’) used as the materials for the experiment is
individual in its rhythm of line presentation and rotation, the averages of the two text
presentations are nearly the same.
Both texts have similar overall statistics for the average words per line of text, the
average time of line presentation (how long a line was on screen before a change in rows)
and the average total amount of time on the screen (the time seen in all three lines).
Table 4.1 below shows the averages.
Table 4.1 Airbag total text and Biker total text statistics (time is shown in seconds (s)
and milliseconds (ms))
Average
words per line
Ave. time of line
presentation (:s.ms)
Ave. total of time word
is on screen (:s.ms)
Airbag text 4.35 :01.86 :05.56
Biker text 4.13 :01.90 :05.60
The two texts were chosen for a variety of reasons in addition to the similarity of the
statistical averages for the words (refer to Chapter 3). What was later analyzed, and
found to be different, was the presentational rhythm of the words and lines of text, which
will be explained in the next section.
4.2.1 Line presentation rhythm
The amount of time a line was presented for was calculated for each line (start of
one line to the appearance of the next line). This gave the amount for the entire line,
128
measured in seconds and milliseconds, regardless of the number of words or the
presentation of individual words within the line. While the presentation of words across
a line is not uniform, meaning that the words do not roll across from left to right in an
even pattern, neither is the movement of a line of text from one row up to the next. This
non-uniform appearance of the text is due to a number of factors, the first consisting of
the very nature of closed-captioning of a live source (for this study, the news show). The
closed captioning at times appears jumpy, or choppy, in its appearance because it is a
product of the process of the captioner listening and typing, after which the data
information is returned to the source of broadcast (the station) where it is combined with
the original audio-video signal. The result of live captioning is a jerky presentation that
at times can be awkward and non-predictive in its appearance. While live captioning
may be awkward and annoying, it can still be a useful tool for language learners and
should not be dismissed; it is an access point for many viewers to current events and pop
culture. The graphs below show the rhythm of the two selected texts in which each dot
represents the time that the line takes to complete its presentation in row 1 until it moves
up to row 2 (i.e. its time of line presentation as it first scrolls across the screen).
129
Airbag text: Rhythm (time per line)
:00.0
:00.9
:01.7
:02.6
:03.5
:04.3
:05.2
1 14 27 40 53 66 79 92 105 118 131 144 157
Line in text
Time in seconds.m
Biker text: Rhythm (time per line)
:00.0
:01.7
:03.5
:05.2
:06.9
:08.6
1 13 25 37 49 61 73 85 97 109 121 133 145 157
Line of text
Time in seconds.m
Graph 4.1 Rhythm of line presentation for airbag text
Graph 4.2 Rhythm of line presentation for biker text
The above graphs illustrate the fluctuating times of line presentations of the closed
captioning which in turn reflect on (and may indeed influence) the resulting ability for
prediction by the reader regarding the appearance of the following line of text. The
Line 21
130
rhythm of line presentation ultimately affects how the reader may strategize about how
quickly to attempt to read a line as well as how long to linger over the meaning of the text
up to that point. Closed captioning presented in this live-captioning manner does not
allow the reader the ability to stop and reread at leisure, the text is only available for re-
reading for a short time, and no skimming ahead or forecasting is possible since the text
cannot appear ahead of time on the screen. Therefore the reader must strategize how
quickly to read; if a predictable rhythm to the text can be discerned, then this should help
readers, particularly those who are slower readers such as NNS.
Graph 4.2 Biker text (above) illustrates a pattern of line presentation that has a
mean average of around 2.0 seconds in the middle of the text, with the end and beginning
of the story showing greater variability in the presentation. In Graph 4.1, however, the
airbag text shows a different pattern: not one of settling around a consistent number, but a
pattern of alternation between a quick presentation of approximately 1.1-1.4 seconds and
approximately 2.3 - 2.6 seconds. The graph below presents this information by looking
at the same data from the perspective of the frequency of line presentation times. The
graphic representations indicate a somewhat bimodal presentation style for the closed
captioning text timing for the airbag text.
131
Graph 4.3. Bimodal frequency of times for line presentations of airbag text
airbag text: line presentation frequency
0
2
4
6
8
10
12
14
16
00:00.3
00:00.6
00:01.0
00:01.3
00:01.6
00:01.9
00:02.2
00:02.5
00:02.8
00:03.1
00:03.5
00:03.8
00:04.6
time of presentation
frequency of occureance
There are various reasons for presentation rhythm as seen in graphs 4.1 – 4.3. The line
presentation rhythm is tied to the type of text: the narrative style of a news story, with
multiple speakers and points of view, gives us a clue as to the cause of the bimodal
presentation of the airbag text and the relative rhythm of the biker text. The longer line
presentations, of which both graphs 4.1 and 4.2 show a few, generally correspond with a
change of speaker, which, in this case of a news text, is a change between the interviewer
and the interviewee and so a longer pause occurs with turn-taking in the conversation.
As an example of the variability in the live closed captioning text presentation
within the news genre, a small textual analysis is illustrated below12. Later, Figure 4.2
will show the eye movement fixation patterns of a reader that reflect the rhythm of the
line presentation. As marked by the circle and arrow in Graph 4.2 above, a long line
presentation in the biker text is illustrated in the chart below by the text in line 21 and the
12 Please note that it is not only the news genre that uses the roll-up closed captioning that contains such
unique text presentation formats. Talk shows also use the roll-up form. In other words, many hours of
televised media use this form.
1
2
132
corresponding time per line in seconds along with the change in speakers, while given
context with the rows before and after which complete the idea unit in the form of a
sentence:
Table 4.2 Biker text, lines 17 - 22
Line # Text # of
words
per line
Time per
line
(seconds)
Speaker
17 >> IF YOU EVER GO THROUGH SALT 6 2.0
Interviewee
18 RIVER CANYON YOU CAN SMELL 5 1.0
Interviewee
19 SMELLS DRIVING THERE YOU CAN'T 5 2.0
Interviewee
20 SMELL IN THE CAR, YOU KNOW WHY 7 1.1
Interviewee
21 YOU RIDE A MOTORCYCLE. 4 4.9 *
Interviewee
22 >> FEARED BY SOME, ADMIRED BY 5
1.6
Voice-over
by narrator
Line 21 is on screen for an extended amount of time compared to the lines before and
after it, giving an indication that it is likely that the following line will be a switch in
point of view, or perspective, because of a switch between the narrator (interviewer) and
the interviewee. This switch is primarily marked on screen by a character symbol ‘>>’
(see line 22 in Table 4.2). The possible implications of this presentation style for reading
strategies are discussed in the next section.
4.2.2 Predictions in/of/about the text
Since the rhythm of presentation is uneven, and for the most part unpredictable, a
reader must use other available clues (and previous experience with this style of
captioning presentation) to strategize for more complete comprehension. These strategies
may include a pattern of greater amount of fixations (than that of static text), or coverage
of rows of lines since the text is repeated as it moves from one line up to the next. These
strategies will undoubtedly change between condition 1 and 2 since the subtraction of the
133
aural/visual channels reduces the clues that can be used for improving comprehension.
All of these will be looked at in following sections.
But before leaving the idea of the rhythm of the text, I’d like to propose that when
sound is absent, and therefore the contiguity between the aural/visual channels and the
printed text channel is then null, as in Condition One of this study, the reader may indeed
use the rhythm of the timing of the presentation in order to predict changes in perspective
in the text such as that seen in Table 4.2 above. Condition One, as a reminder, is the
presentation of text in which only the dynamic text is present and for which the aural and
visual channels are absent.
Evidence of prediction and the use of the predicted rhythmic presentation of the
text is seen in the eye movement data as proficient readers wait at the end of the line in
row 1 for the next word to appear. For a few of the readers, as the length of time extends
and a new word fails to appear, the eye moves back to the left in anticipation of a new
line or uses the time to regress back in the text until a new line is presented. The latter is
evidenced in the fixplot (figure 4.2) below, as the reader is responding to a developing
pattern as he transacts with the text.
134
Figure 4.2 Fixplot of fixations 73 – 81 of NNS3 (fixation 80 and 81 are actually on the
following screen when the lines have changed rows), with fixation duration length in box
at side.
Here, as this fixplot illustrates, the reader is reading the end of the line in row 2 (SMELL
IN THE CAR, YOU KNOW WHY), then fixates for a longer duration on YOU in row 1
(fixation 76, 584ms), and continues through to the end of the sentence to fixate at least
twice on motorcycle (fixations 77, 217ms, and 78, 701ms).
There are a few possible reasons for this reading pattern. Up to this point in the
text, the word BIKER or BIKERS has been used, and line 21 is the first expansion of the
word to MOTORCYCLE. This could account for the double fixation, as well as the
length of the word; additionally, the longer duration at the end of the sentence could be a
result of the prediction that there will be extra time before the next sentence starts. The
# fix
dur
73 .234
74 .184
75 .15
76 .584
77 .217
78 .701
79 .267
80 .133
81 .133
135
reader moves backwards in the text in fixation 79, and as the screen changes with a line
rotation the reader once again reads YOU RIDE with fixations 80 and 81, before starting
the new line (22) in row 1 (not seen in the above fixplot).
The reader appears to be using the waiting time at the end of the line to fixate for
a longer duration on MOTORCYCLE before continuing to use the time to regress back to
a complicated text that involves five instances of YOU and three instances of SMELL,
one of which is a verb while the other two are nouns. The regression of the eye back into
the text is terminated once the line is reread and then continues to row 1 again as the next
line has already appeared (fixations 82-83, not shown).
In a way, this waiting period at the end of a line makes sense. Prediction of turn
taking in spoken language occurs regularly. As Cameron notes, “people who are
listening to someone else’s speech can use their knowledge of the possible unit types to
project the end-point of the turn currently in progress (p. 90). When reading is framed in
terms of prediction of the text, and speaking is framed within the concepts of text, then
closed captioning, or the reading of spoken text, is unique in that it may use a
combination of the two. Below is a graphic representation of the combination of the
forms in reading closed captioning:
136
Figure 4.3 Cross strategies: Prediction and projection of the reading of spoken language.
This short section has explored the two texts used in this study using a textual
analysis of the presentation of the individual lines of the two news stories. The beginning
of this analysis is applicable for both Condition 1 and 2, while the latter half focuses on
Condition 1 in which the only clues that the reader can use for comprehension are the
words as they are presented. The following sections will explore patterns of reading
using the eye tracker to look at other means of gathering information for textual
comprehension when the text is dynamic in its presentation.
4.3 Analysis of reading patterns
The remainder of this chapter is devoted to using the data gathered to answer the
Research Question One as laid out in Chapter 3. The nature of the dynamic texts
provides unique opportunity to explore reading patterns by exploring the two sources of
data: numerical and video. As a reminder, the numerical data is collected in the form of
Reading Speaking
PREDICTING
PROJECTING
Closed Captioning
and Subtitles
137
Chart 4.1 Illustration of the data type, use and relationships in this study between
Research Questions 1, 2 & 3
fixation duration times, X,Y coordinates of fixations, saccade degrees, etc., while the
video data is a representation of the fixation point in the form of a small black dot that is
superimposed on the reading/viewing text. Chart 4.1 gives a graphic explanation of the
Question 1: Regarding the reading patterns of unimodal dynamic text
Question 3:Regarding small case studies of eye movements and personal histories
Tariq
Look
Zones +
Farid
Look
Zones +
Sarah
Look
Zones +
Elena
Look
Zones +
Fixation durations:
- NS static
- NS dynamic
- NNS dynamic
*numerical data source
Saccade/fixation
totals
- NS static
- NS dynamic
- NNS dynamic
UNITS A & D
*numerical data
source
Line usage:
- NS static
- NS dynamic
- NNS dynamic
UNITS A & D
*video data source
Saccade
degrees
*numerical data
source
Question 2: Regarding the viewing patterns of multimodal dynamic text
Overview of
lookzones use
by participant
*numerical data
Comparison of
fixdurations
when reading
(whole text)
* numerical data
Reading or
Noticing?
Consecutive
fixations in LZ 1
* numerical data
Multimodal analysis:
Line use when
reading in A/V
condition (units A & D)
*video data source
138
interplay between the forms of data used to answer Research Questions One, Two and
Three which are discussed in Chapters 4, 5 and 6 respectively.
4.4 Analysis: Question One
The present study contains the opportunity for exploration on many fronts: it
lends itself to several different frames of analysis. Question one, “How do the reading
patterns of moving, or dynamic, text differ in comparison to past research of the reading
patterns of static text?”, including “are there any differences between NS and NNS”, is
answered using several different analyses.
After a description of the types of data that were collected and their relevancy to
this question, the first section describes the fundamental differences between reading
static and dynamic text presentation formats. As discussed in section 4.2.1, readers of
closed captioning must adapt their reading patterns to a timed presentation in which the
text appears and then disappears without the option of further regressions. A second
primary difference constitutes the nature of the presentation form: dynamic closed
captioning is generally presented using three rows, and the use of these three
opportunities for reading is explored in section 4.9. First, I explore the similarities
between the two groups of participants and the two conditions of text presentation
through analysis using the following five descriptive tools, for which the rationale is
explained in section 4.4.1: 1) total fixations, 2) saccade time and saccade degree of
fixations, 3) fixation durations by time category, 4) comparison of the reading patterns in
the different text presentations, and 5) use of the lines of the text.
139
As a reminder of the different conditions in which the printed texts were
presented, figure 4.4 illustrates Condition One, dynamic text in a unimodal condition, and
comparison Condition Three, static text in a unimodal condition.
Figure 4.4 Comparison of Conditions 1 (CC) and 3 (SCC).
Condition 1 (CC): Unimodal, dynamic text
in which the printed text moves across and
up the screen.
Condition 3 (SCC)/: Unimodal, static text
4.4.1 Reading and the use of eye trackers
What makes this study challenging is its methodology: it is a combination of eye-
movement monitoring criteria from across disciplines. The analysis of reading patterns
differs according to discipline, theoretical views on the reading process, and the focus of
the study13. While Linguistics and Psychology often analyze the patterns of eye
movements in search for links between eye movement and cognition, including language
processing, researchers in the Education, Reading and Literacy fields tend to look at
13 See Duchowski, A. (2002) for a general overview of the use of eye tracking in various fields and
applications and Rayner, K. (1998) for a more specific overview of literature in which an eye tracker is
used to explore the reading process.
140
reading patterns regarding the phenomena of reading itself as a language process.
Cognitive studies of reading that use eye trackers often use a tightly structured visual
environment, controlling the variables of a single word, a single sentence (sometimes
referred to as ‘continuous text’) or a visual task, which generally limits the presentation
of text to one horizontal line across the viewing monitor in an environment much less
complex than one of a naturally occurring text (cf. Radach and Kennedy, 2004, for a
comparison).
However, eye movements of a complete text, with multiple lines of text in a story,
have also been used in previous research. In these types of experiments, both horizontal
and vertical eye movements are recorded, allowing the presentation of a whole page of
text on a screen. One such reading research field that utilizes the recording of eye
movements across a complete page of reading text is that of (Eye Movement) Miscue
Analysis (EMMA)in which the production of changes in verbal text that readers produce
as they read out loud a whole text gives insight into the reader’s perception of the printed
story. These miscues, or the differences between what is said out loud and what is
printed on the page, give insight into the reading process with the idea that reading is the
making of meaning of a text by a reader (Goodman, 1996). The combination of eye
tracking and the reading of longer texts (ideally stories) in these labs show a window into
the reading process as a whole, versus the micro look at reading and the language process
as accomplished in many eye-tracker labs (cf. Rayner, 1998). Both types of research give
valuable insight into the reading process, and while this study refers to both, it is based
141
upon the transactional process of making meaning in reading and viewing texts, and uses
the lens of a research tradition in which context is vitally important.
The use of whole texts that are not manipulated for variables, and so in their
natural authentic state, is a rare methodology in the cognitive field of reading research
and more often found in the eye movement studies of research in the literacy field.
Therefore, in conjunction with the premise that reading is a transaction between the
reader and the whole text, smaller numbers of participants and in-depth analysis of not
only their eye movements but also their retellings of the stories are important (see the
Methodology Chapter for more information on the EMMA lab). As the participants for
my study were from two different cultures and language groups, with different
proficiencies in reading in English, their interactions with the text reflect their
individuality, yet also show trends of similarity between the groups. The data collected,
then, is not able to be analyzed and presented in the manner of the majority of the eye-
movement literature; instead it is analyzed for patterns and trends of reading and viewing
as an addition to future research and in expanding the field of eye tracking, reading and
literacy, and most importantly the use of closed captioning and dynamic text by readers.
Therefore, the remaining sections in Chapter Four use the following criteria for
analysis: 1) the comparison of the numbers of fixations which may indicate the
proficiency of the reader along with 2) the saccade length which may indicate how far the
readers were jumping in between fixations and may give an insight into the control of
their reading patterns and text predictions. In addition, 3) fixation durations have been
142
split into time categories in order to look at the differences between the proficiency levels
and the influence of the text presentation format on reading patterns and reader strategies.
4.5 Reading patterns of fixations
The first topic of analysis, described below, is the fixation patterns of the NS and
NNS participants (who are also referred to as ‘readers’ and ‘viewers’). Eye movement
studies have indicated previously that fixation durations are connected to general
participant characteristics, although fixations and the resulting statistical analysis are
often dependent upon factors of the text such as comprehension difficulty, typographical
variables (quality of print), line and letter spacing, as well as general characteristics of the
writing system being used (cf. Rayner, 1998).
One of the overarching claims of this study is that readers have strategies for
improving their comprehension of a text and that these should be evident in a multimodal
situation. It should also be evident in the reading pattern differences between the
proficiencies of readers (highly skilled versus less skilled readers in English) as well as
the reading pattern differences when readers of comparable reading skill must
comprehend a text in different presentation styles (dynamic versus static). O’Regan has
contributed much to research on eye movements over the past 20 years, much of it with
focused, single word exemplars, observing oculomotor control and perceptual span
(1992). However, in a 1992 article, he writes about reading strategies, or “exploration
strategies”, that readers may or may not employ while reading which depend on the text
at hand and the choice of reading speed (352). The idea that text-dependent strategies are
used works nicely into the present study since readers are presented with two different
143
conditions with which to work at obtaining comprehension, and strategies must be
employed to understand enough of the story in order to be able to retell something about
it afterwards. In question one, the reading speed is relational to the individual; the
individual varies in proficiency at reading English by experimental group. While the NS
can be grouped as a high proficiency group in reading their native language of English,
the NNS are grouped in a lower proficiency group, as they are intermediate/high L2
learners and their second language of English employs a different orthographic system
and reading direction (from their first language, Arabic).
In order to discuss the reading patterns of existing literature, and more specifically
that literature which concerns whole texts (vs. single sentence or word reading), common
points of reference and measurements must be decided upon. Eye movement reading
research with whole texts tends to use a variety of measurements to give an overall
picture of the readers’ movements using data points such as single fixation duration,
number of fixations on a target word, saccade length, etc. (Rayner, 1998, p. 378). Of
these options, selected measurements are used in this study, as well as measurements
unique to this study relevant to its unique analysis of moving text.
As explained in Chapter 3, Materials and Methodology, my study uses two news
stories selected according to certain criteria of length, topic and genre, in which smaller
unit texts were selected for analysis. With the studies of Just & Carpenter (1980), in
which whole, unadulterated texts were used, and from which excerpts were also analyzed
(see also McConkie, Kerr, Reddix, & Zola, 1988) as precedence, and the support of the
research in the literacy field, this study follows suit in using excerpts of selected texts to
144
analyze for eye movement and reading patterns. The use of smaller segments of text
from within the whole text to look at targeted areas selected for theoretical reasons is also
an alternative methodology when using whole texts (Rayner, 1998; McConkie et al,
1988; Radach and Kennedy, 2004). In order to compare the performances of the different
participants, measurements have been analyzed for the entire reading of the text, as well
as diachronically via the smaller chosen texts which are presented later in this chapter (in
section 4.9.3).
4.6 The whole text analysis: Number of fixations
One of the analyses used to observe reading patterns is to gather information
about the number of fixations for each reader. Total fixation numbers indicate how many
times an individual reader fixated on a text: participants with higher reading proficiencies
should be able to better predict the text and skip more words than those with lesser
proficiencies who tend to fixate more words and have more regressions which will be
represented with a higher number of total fixations.
Using the EYENAL software with the eye movement data collected by the eye
tracker camera, the total number of fixations for each participant in Condition One was
calculated. The data results represent a reading pattern over a whole text and,
importantly, one in which the text is presented at an identical rate for each participant.
The participants can not speed up, nor reduce, the presentation of the information nor use
their regular reading practices and strategies for a canonical text presentation such as
when reading a page of a book. The table below illustrates the total fixations for the
145
dynamic texts for each reader (a between participant analysis of the dynamic reading
condition):
Table 4.3 Total number of fixations for each participant in the dynamic reading
condition (One)
Participant Total #
of
fixations
Participant Total #
of
fixations
NNS2 525 NS1 137
NNS3 538 NS2 239
NNS4 404 NS7 369
NNS5 340 NS8 387
A Fisher’s Exact Test for small n was used to calculate if the difference between the NS
and NNS reading patterns regarding the total number of fixations was significant. The
result, (n=8, df=1) p=.07, made it only a marginally significant result. However, the
trend can be seen in the following graph, in which the differences between the two
proficiency groups (NS and NNS) are highlighted by circles:
146
Graph 4.4 Total number of fixations for each participant in Condition One: Dynamic CC
text. The circles represent the general trend area for each group.
In a comparison across conditions, the following graph shows the total number of
fixations for the NS in Condition One (dynamic text only) and the NS in Condition Three
(static text only). Condition Three was included in the experimental design in order to be
able to compare reading patterns when the same text was presented in two different
conditions: one in which the text is moving across the screen with a maximum of three
lines showing at one time and one in which the text isn’t moving and is presented with at
least fourteen lines of text showing at one time and in which the reader can take as much
time as he or she likes to read the text. The prediction that the participants’ different
reading strategies would be observable with the different reading speeds is not evident in
the graph below; in fact, the NS did not present significantly different reading patterns
between the two texts when the total fixations were measured and compared. NS1, NS2,
NS7 and NS8 read a dynamic version of the text, while NS9, NS11 and NS12 read a
Fixa tions in Dyna mic Te xt C ond ition: ( C C o nly)
0
100
200
300
400
500
600
NNS2
NNS3
NNS4
NNS5
NS1
NS2
NS7
NS8
Pa r t ic ipants
# of Fixations
bikers CC
airbags CC
147
static version of both text sequences. The text sequences are not showing evidence as a
variable in the data.
Graph 4.5 Total fixations for the whole text for NS in condition 1 (dynamic presentation)
and in Condition Three (static presentation) (the dotted line represents the average
number of fixations for all participants).
The general trend observed here is that the NS do not significantly change their
reading patterns regarding the number of fixations as a function of the textual
presentation (moving or static). This was surprising. Regardless of the reading speed or
ability to control the presentation of the text, the higher proficiency readers used a similar
amount of total fixations. Note that this figure does not represent where or for how long
the NS fixated, but only how many fixations. At this point in the reading pattern
observations, the reading strategies used by the higher proficiency readers do not alter
whereas there is a trend of difference in the strategies of reading between the higher and
lower proficiency groups. Later in the chapter, alternative views of reading patterns
Total fixations for whole text:
N S in C ond it io n 1 ( Dynamic) a nd N S in C o ndition 3 (Static)
0
100
200
300
400
500
NS1
NS2
NS7
NS8
NS9ab
NS9bk
NS11ab
NS11bk
NS12ab
N S particip a nts
# of Fixations
148
emerge that do show a different reading strategy with the different text presentations,
however, these patterns are illustrated by the words that are fixated.
4.7 The whole text analysis: Fixation durations
Using the software EYENAL®, the numerical data collected regarding the eye
movements of each subject were exported into MS EXCEL® which could then be sorted.
The numerical data includes a time log, fixation duration and order, interfix duration
(saccade latency), interfix degrees (length of saccade), pupil diameter, the X and Y
coordinates of the fixation and any data loss (e.g. loss of calibration or a blink). For the
purpose of my study, which is to research reading patterns with moving text and
multimodal focal attention, the data that was most useful proved to be fixation durations,
fixation numbers (order of fixations), saccade degrees, and X, Y coordinates.
The first data findings that I will discuss regarding fixation durations are those in
which I used the numerical data to determine the number of fixations per reading for each
participant. I then divide the fixation durations into categories of 100 ms so that trends
and patterns can be observed (100 ms is the smallest fixation setting used for the
calculation for this analysis section).
4.7.1 Research Question One, part one
Reading patterns of static text tend to adhere to the following patterns: readers
with higher proficiencies have shorter fixations than less proficient readers (Oller &
Tullius, 1973); readers with lower proficiency in reading also tend to have more fixations
per text (ibid); readers with lower proficiency also tend to skip fewer words and have
149
more regressions than readers of higher levels of proficiency in reading (Ashby, Rayner
& Clifton, 2005). The current study also looks at proficiency patterns of the participant
groups, although the proficiency ratings are based upon proficiency in a language,
English: for participant group one English is their second language, while participant
group two has English as their first language. Therefore, it shall been seen in Research
Question One if the differential patterns found in the eye movements between high and
low proficiency readers in previous research studies will also be found when high and
low proficiency is based upon experience in a language.
Regarding the fixation locations, in general most readers do not fixate on the very
edges of the lines of text, instead initially fixating 5-7 letters from the beginning and
leaving the line of text about the same amount of distance from the end (Rayner, 1998).
In addition, bilingual readers tend to take longer to read texts (Oller & Tullius, 1973).
Much of the above research has been in both eye movement research using single line
texts as well as in whole text research.
What differentiates this current study from previous studies of reading is that the
texts in this study are moving14 and the participant reader does not have the control to
stop or change the speed of the text presentation, or to use many of the reading
techniques used by proficient and learning readers such as skimming or scanning the
text15. Instead, the text is presented line by line, without concern for phrasing or ease of
14 Much eye movement research with reading has been conducted using a procedure with a continuous
window, or text that changes as the eye movement nears it. While the text is technically moving, in a sense,
the participant is still reading at a normal pace for that individual.
15 To clarify, there has been some research on closed captioning, and some eye movement research on
closed captioning or subtitles, but none with the in depth detail analyzed in this study. See Chapter 2.2 for
the background literature on closed captioning and subtitle research to date.
150
reading and so the line breaks according to the length of the line (66 characters in length)
rather than by any other formatting regulations (although, it should be noted, that a new
sentences starts on a new line). The reader must adapt to this presentation style in order
to transact and comprehend the text and the story.
The data collected in this study give some insight into the actions of the readers as
they attempt to comprehend the stories presented in such a way. In the following
sections, the fixation duration trends will first be analyzed between texts, and then
between participants. Throughout the analysis, it was hypothesized that fixation duration
reading patterns of this metered style of presentation of information would differ from the
fixation duration patterns that are seen with texts presented in a more traditional manner
(as a whole).
4.7.2 Fixation duration trends across conditions
In order to compare the patterns that readers tend to exhibit with a static text and
the patterns exhibited with a moving text, general trends in reading the two selected news
texts in both a static condition and in a dynamic condition were compared. This third
condition (referred to as SCC, or static closed captioning text) served to confirm that the
data results were not a variable of the texts themselves, but of the readers’ interactions
with these texts. An analysis of the similarities and differences in eye movement patterns
exemplifying reading patterns is described in section 4.10.
The static text was presented on the viewer screen using MS PowerPoint in which
the text type and size of font in the SCC condition (Static Condition using the closed
captioning text) was approximately the same as that in the CC condition (Dynamic
151
Condition using the closed captioning text), Lucida Sans Typewriter font, size
20 point. In addition to the variable of the condition of static versus dynamic, the number
of lines present at one time on a page differed: in the static condition 12-15 lines of text
were viewable at once and there was no movement of text until the participant clicked the
mouse to move to the following screen of text. In other words, the SCC condition was a
self-paced style of text presentation.
Four participants who matched the NS control group read both topic texts
(‘airbags’ and ‘bikers’) given as a static text and gave retellings after each eye movement
reading using the same retell prompts as in Condition One and Two. During analysis of
the data collected, one of the participants’ eye movement data was excluded because not
only was the calibration inconsistent, but he read so fast that his data tended to be in an
outlier status.
In order to observe the reading patterns in terms of fixation durations, the
fixations were categorized according to duration times of 100 ms, starting at the smallest
unit of fixation (100 ms) up to fixations above 1sec. The text topics have been combined
(‘airbags’ and ‘bikers’), giving four readings each for the NNS CC condition, the same
number for the NS CC condition, and six readings for the NS SCC condition. The results
are presented in graphic form below, and in table form for each condition below that (not
all data could fit in one table). This micro-view of fixation duration pulls apart the
fixations and enables a detailed look at the fixation patterns that an analysis of the
average fixation duration would not show.
152
Table 4.4 Fixation durations: number of fixation durations for each time category for
each participant in the airbags CC or airbags SCC (total text)
Time
category (ms)
Condition of text presentation
(CC = dynamic, SCC = static)
CC CC SCC
NNS2 NNS4 NS2 NS8 NS9 NS11 NS12
.1 - < .2 212 156 147 216 154 140 118
.2 - < .3 156 140 60 90 110 75 103
.3 - < .4 89 74 21 41 43 25 26
.4 - < .5 42 19 7 18 22 11 8
.5 - < .6 12 11 2 9 7 6 0
.6 - < .7 11 2 0 6 4 0 0
.7 - < .8 0 2 2 3 4 0 1
.8 - < .9 2 0 0 2 1 0 1
.9 - < 1.0 1 0 0 0 0 0 0
> 1.0 0 0 0 2 0 0 0
Total (.2 to >
1.) 525 404 239 387 345
257 257
Table 4.5 Fixation durations: number of fixation durations for each time category for
each participant in the bikers CC or bkSCC (total text)
Time
category (ms)
Condition of text presentation
(CC = dynamic, SCC = static)
CC CC SCC
NNS3 NNS5 NS1 NS7 NS9 NS11 NS12
.1 - <.2 185 136 93 190 103 127 n/a
.2 - <.3 161 98 33 119 51 69
.3 - <.4 95 57 8 38 20 21
.4 - <.5 47 24 10 9 9 9
.5 - < .6 26 12 3 2 2 2
.6 - < .7 11 4 2 2 0 2
.7 - < .8 6 5 2 1 0 0
.8 - < .9 1 0 0 0 0 0
.9 - < 1.0 5 1 1 0 1 0
> 1.0 5 3 1 1 0 0
Total # of
fixations
542 340 153 361 186 230
* NS 12 was not included due to data loss from calibration problems.
The tables above inform the graphs throughout the rest of this section and the next
section.
153
Interesting, and unexpected, are the patterns of reading exhibited in the similar
results between NS in the two conditions of moving and static text. As will be seen in
section 4.7.3, there were not significant differences. This leads to the understanding from
the recorded eye movements that, at least according to percentage of fixations per
individual, highly proficient readers read in a fairly similar fashion whether or not the
text is force-fed in its appearance (CC condition) or whether the reader has complete
control over the speed of the presentation (SCC condition).
4.7.3 Research question one, part one: Fixation durations
This section attempts to answer Research Question One, part one, regarding
fixation durations. Recall that Research Question One focuses on the differences and
similarities between reading patterns for a dynamic text and a static text. Past research
that has dealt with static text is reviewed in Chapter Three; this section instead deals with
a comparison between reading patterns based on the texts used specifically in this study
in the two conditions.
The chart below shows the graphic form of the above tables 4.4 and 4.5 in which
a similar fixation duration trend is found for highly proficient (NS) adult readers
regardless of the conditions: dynamic text or static text. There are four readings of the
dynamic texts (NS1, NS7, NS2 & NS8) and five static readings of the same two texts
(NS9ab, NS11ab, NS12ab, NS9bk & NS11bk; NS12bk was not used due to calibration
difficulties).
154
Percentage of fixation durations per participant (NS):
Dynamic text and S tatic text
0%
10%
20%
30%
40%
50%
60%
70%
.1 <.2
.2 <.3
.3 <.4
.4 <.5
.5 < .6
.6 < .7
.7 < .8
.8 < .9
.9 < 1.0
> 1.0
Fixation durations (ms)
Percentage of total fixations
NS1
NS7
NS2
NS8
NS9bk
NS11b
k
NS9ab
NS11ab
NS12ab
Graph 4.6 Comparison of the fixation durations of the two NS conditions: dynamic and
static text, converted to individual percentages.
Graph 4.6 illustrates that regardless of the text presentation, the high proficiency
readers use similar strategies for reading the text, and follow the same pattern of fixation
durations with around 50-60% of the fixation durations lasting between 100 - 200ms
while around 30% of the fixations fall in the 200-300ms category. These trends in the
amount of fixations per category (converted to percentages to compare across individuals
in the above graph) indicate that the NS adults do indeed read in a similar fashion,
regardless of the presentation type of text, as far as fixation durations. The amount of
fixations for NS 9, 11, 12 would be expected to show a differentiation from the fixations
of NS2 and NS8 since the former had control over the timing of the presentation of the
written text whereas the latter had only the option to read three lines at a time with no
control over the presentation. The results present an option in favor of the individual
155
transaction with the text: perhaps once a high proficiency in reading is reached, the
reading patterns are adapted to a context and condition of the text presented.
4.7.4 Research question one, part two: Fixation durations
Research question one, part two, deals with the differences between NNS (lower
proficiency readers in their L2 of English) and NS (or higher proficiency readers in
English). There do seem to be some differences in the reading patterns16 of the two
subject groups. Based on past research, (cf. Chapter Two) it was hypothesized that the
NNS would have more fixations than NS due to reading proficiency in English. Here
reading proficiency includes the familiarity with the individual words (vocabulary
knowledge), the proficiency at predicting phrases and collocations, and may even include
the proficiency of reading patterns in a left to right motion (again, refer to Chapter Two
for more information on Arabic readers and orthography). Rayner and Polletsek (1981)
refer to the direct control of eye movements, finding that “the information acquired
during a given fixation can influence the duration of that fixation as well as the amplitude
of the outgoing saccade” (Radach & Kennedy, 2004, p.3). This finding implies that the
less information acquired, the shallower and shorter the saccade will be and the greater
the number of fixations; in other words, as less information is acquired during a fixation
because of less proficiency with the language, the number of fixations should increase
16 Note that in this study eye movement patterns are also referred to as viewing patterns, which in turn are
divided into two terms, according to the focus of the question:
the term attentional patterns (used when multimodal texts are available, e.g. sound, visual and printed texts)
brings in the idea that the viewer is engaged in the viewing process, whereas the term viewing patterns
doesn’t necessarily imply an engagement or activeness to the process, and the term reading patterns which
is used when only printed text is available such as in conditions one and three.
156
and the saccade length in between fixations should decrease, as seen below in section 4.8.
Combining the data from these tables, the graph below illustrates the total number
of fixations per individual for both the airbag text and the biker text, including both
moving and static text. Note that the presentation of the text is not the primary indicator
of differentiation: it is the background and proficiency of the reader. Within the reading-
only (CC) condition of the airbag topic, NNS were found to have considerably more
fixations than their NS counterparts with the same text, and as discussed above in section
4.7.3, the NS do not show a change in eye movement patterns that is dependent upon the
text presentation, as indicated in Graph 4.7 below.
157
Graph 4.7 Comparison of the total fixation durations by both NS and NNS in both
conditions (SCC and CC).
NS & NNS total fixations (dynamic & static reading)
0
50
100
150
200
250
.1 -
<.2
.2 -
<.3
.3 -
<.4
.4 -
<.5
.5 -
< .6
.6 -
< .7
.7 -
< .8
.8 -
< .9
.9 -
<
1.0
>
1.0
Duration category (100ms)
Number of fixations
NNS2
NNS4
NNS3
NNS5
NS2
NS8
NS9ab
NS11ab
NS12ab
NS1
NS7
NS9bk
NS11bk
The above chart illustrates the NS and NNS trends in fixation duration by time.
The fixation durations for each participant were again separated into 100 ms categories,
starting at 100 ms. The number of fixations for each duration category were counted and
are represented in the above chart which illustrates the general trend for each individual’s
reading pattern. In the first category containing the quickest fixations of between 100
and 200ms, there are no discernable trends; however as the time of the duration increases,
the participant groups start to untangle and distinguish themselves.
Starting at fixations of 200 ms, NNS begin to show a greater number of longer
fixations compared to the NS participants. In order to compensate for individual
NNS
NS
158
variations of total number of fixations, the number of fixations for each category was
divided by the total number of fixations to give a percentage for each category for each
individual participant. These results are shown in the charts below, highlighting the point
at which the NNS eye movements become noticeably different from the NS.
Graph 4.8 Two illustrations of the diverging patterns between NS and NNS fixation
duration trends, focusing on two duration categories, in both total number of fixations
and as converted percentages.
Number of fixations with durations of .2< .3 and .3<.4 (ms)
0
50
100
150
200
NNS2
NNS4
NNS3
NNS5
NS2
NS8
NS9ab
NS11ab
NS12ab
NS1
NS7
N S9bk
N S11bk
Participants
N umber of fixations
.2 - <.3
.3 - <.4
159
Graph 4.8 Two illustrations of the diverging patterns between NS and NNS fixation
duration trends, focusing on two duration categories, in both total number of fixations
and as converted percentages.
Percentage of fixations with durations of .2<.3 and .3<.4 (ms)
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
NNS3
NNS5
NNS2
NNS4
NS1
NS7
NS2
NS8
NS9bk
NS11bk
NS9ab
NS11ab
NS12ab
Pa rticipa nt
Percentage of total fixations
.2 <.3
.3 <.4
The above graphs illustrate the similarity of individuals in the total number of
fixations (using percentage of fixations to compensate for marked individual variation) in
the 200 to 300 ms range. A marked difference starts to appear as the fixations become
longer in duration: the NNS, or less proficient readers of English, have a higher
percentage of fixations whereas the NS, regardless of the conditions, have relatively
fewer fixations (highlighted by the circles on the graph).
In order to get a larger picture of the eye movement patterns between subject
groups, Chart 4.9 exemplifies the differences between the groups’ number of fixations
when all of the participants for each group have been collapsed and their eye movements
averaged for each category.
160
Graph 4.9 Illustration of the combined participant categories for three categories,
highlighting the higher average number of longer fixations (300 – 400 ms) made by NNS
Average % of fixations by participant condition for
duration categories 1-3
0%
10%
20%
30%
40%
50%
60%
70%
NNS NS-CC NS-SCC
Par tic ip ant c ond ition
Percentage of fixation
.1 <.2 ave
.2 <.3 ave
.3 <.4 ave
The differences in eye movements are clearly illustrated when grouped by subject group:
the study group (NNS), the control group (NS-CC), and the condition control group (NS-
SCC). The NNS’ quick fixations represent a smaller amount compared to the total
because they have more fixations in the mid range. The NS, regardless of condition,
indicate a higher percentage of short fixations and a lesser amount of mid range fixations,
indicating that higher reading proficiency is related to shorter fixations. Also notable is
that the fixation duration trends again are not markedly different between the NS
dynamic text condition and the NS static text condition.
Finally, Graph 4.10 below separates the reading patterns as indicated by fixation
durations for the two sets of groups, NS and NNS, when compared in percentages. The
higher percentage of quick fixations for the NS is contrasted by the higher number of
longer fixations by NNS starting at 200 ms and continuing until around 600 ms when
161
there is little difference between the two groups. With the average fixation duration for
silent reading (of static texts) at about 200 – 250 ms (Rayner, 1998), the data from this
study shows a slightly quicker average per participant (not abnormal considering the
small n of this study and the nature of the dynamic text).
Graph 4.10 Illustration of fixation duration differences between NS and NNS in the
dynamic reading condition, converted to percentages.
Percentage of fixation durations per participant (NS/NNS) for
dynamic text
0%
10%
20%
30%
40%
50%
60%
70%
.1 <.2
.2 <.3
.3 <.4
.4 <.5
.5 < .6
.6 < .7
.7 < .8
.8 < .9
.9 < 1.0
> 1.0
Fixation durations (ms)
Percentage of total fixations
NNS3
NNS5
NNS2
NNS4
NS1
NS7
NS2
NS8
What these graphs, charts and numbers show is that there is indeed a marginal
difference between the reading patterns of NNS and NS in reading within the condition of
reading dynamic text, in so far as fixation duration patterns are concerned. This isn’t
surprising, as the NNS group encounters a variety of challenges when reading moving
text, including that of the basic proficiency at reading in English. However, this is the
first study that actually is able to illustrate this difference in reading patterns in such
minute detail.
NS
NNS
162
4.8 The whole text analysis: Fixation times and saccade degrees of movement
Eye movements are recorded in binary terms: fixations, where the eye stays
relatively in one spot, and the saccade, or the movement to the next fixation. Past
research has shown that while in motion, during a saccade, no perceptual information is
collected visually (cf. Paulson & Goodman, 1999). It is only during the fixation that a
visual perceptual process occurs. However, saccades have been studied extensively
regarding distance, landing points and take off points (cf. Rayner, 1998), indicating in
general that more proficient readers are more accurate when landing after a saccade, and
that a text’s difficulty affects the eye movements of readers in terms of fixation durations
and the number of fixations (Ashby, et. al, 2005). Since Condition One in this study has
two groups of readers of different backgrounds, a difference between the reading patterns
could be evident in the saccade data in which the average length of the saccade
(measured in degrees) for the higher proficiency group would be longer (indicating
greater familiarity with the text and greater accuracy both with reading the text and in
interpreting the text). Conversely, the lower proficiency group should have a shorter
average saccade length since this group should be fixating on more words, with greater
effort placed on word recognition and development of textual meaning.
Table 4.6 below shows the results from Condition One of the experiment: the eye
movement patterns by the participants when the text was moving in terms of the sum of
the individual’s fixations and its companion, the sum of the interfixation saccades. Also
included are the average length of the interfixation saccades for each participant
measured in degrees, in which a degree equals 60 arc minutes of an angle with the origin
163
at the eye (ASL tracking systems, p6). The table gives the information about not only the
total time spent fixating, and the total time spent saccading, but also the average distance
that the reader saccaded, all of which have associations with the proficiency of the reader
as briefly discussed earlier in this section.
Table 4.6 Total fixation times, total interfixation saccades by time and average
interfixation degree for the whole text for each participant
Fixation
total time
(seconds.ms)
Interfix
saccade
total time
(seconds.ms)
Average
interfix
degree
NNS2 131.75 132.81 5.3
NNS3 152.95 148.83 6.3
NNS4 96.50 231.15 5.6
NNS5 90.28 210.18 5.4
NS1 26.49 273.97 6.0
NS2 46.02 274.86 5.8
NS7 76.40 225.74 5.9
NS8 87.12 234.76 6.1
Again, because of the small N, a nonparametric Fisher Exact Test was calculated for the
total fixation times with a significant finding of (n=8, df=1) p = .014. Illustrated below in
graphic form, the interfixation saccade times have been combined with the fixation total
times and the third companion that is often found in eye movement studies: the time loss
that occurs because of calibration loss.
164
Graph 4.11 Fixations and Saccades with Calibration loss in Condition One
Total Times for Fixations and Interfix Saccades
(including calibration loss)
0%
20%
40%
60%
80%
100%
NNS2
NNS3
NNS4
NNS5
NS1
NS2
NS7
NS8
Particip a nt
Percentage of time
total loss
int e rfix
total time
fix total
time
Graph 4.11 illustrates the relationship between the saccades and fixations for each
participant, with the reading patterns of NS following a predictable pattern for proficient
readers. The two groups of readers in this study have considerably different reading
patterns when comparing the length in time of their saccade movements. The NS in this
study have lower total time spent fixating, as well as a lower number of fixations, which
implies that the readers are spending longer times saccading, which is indeed the case.
This, too, has been evident in past research (Ollius & Teller, 1973). Conversely, the
NNS’ patterns in the current study correspond with those patterns found in past research
for readers with lower proficiency with reading: more fixations per reading text (ibid),
longer fixations, and shorter saccades. In past research, it has been shown that readers
with lower proficiency at decoding a text spend a longer amount of time looking at a
word, and the analysis of the eye movements from this study also support this claim with
165
the addition that this pattern emerges regardless of the condition of the text presentation
as far as static versus dynamic texts.
For the present study, data regarding the length of a participant’s saccade is also
analyzed in that it shows differentiation between the reading patterns of the two groups
and their respective proficiency and familiarity with the textual presentation form
(English). Saccade length and accuracy is associated with reading proficiency, as seen in
past research (see Chapter Three).
To compare saccade movement in terms of degrees between the two text
presentations, NS average saccades for dynamic text reading patterns were compared to
static text reading patterns. There is a very surprising result, which we will come back to
in section 4.11 when we look at the fixplots of the eye movement patterns. At this point,
only the numerical data is being analyzed. Regardless, the data has been converted from
Table 4.7 into a graphic illustration in Graph 4.12 below.
Table 4.7 Comparison of saccade length in degrees between dynamic and static texts
Condition 1
(dynamic)
Saccade length
(degrees)
Condition 3
(static)
Saccade length
(degrees)
NS1 6.0 NS9ab 3.68
NS2 5.8 NS11ab 4.13
NS7 5.9 NS12ab 4.01
NS8 6.1 NS9bk 4.02
NS11bk 3.95
166
Graph 4.12 Illustration of the differences in eye movements between dynamic and static
text presentations.
Average interfix saccade length in visual degrees
across conditions
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
NNS2
NNS3
NNS4
NNS5
NS1
NS2
NS7
NS8
NS9ab
NS11ab
NS12ab
NS9bk
NS11bk
Particip a nts
Visual degrees
The saccade length, as presented in the graph above, shows the opposite of what had been
predicted, namely that the dynamic text would produce shorter saccade lengths as the
reader adapts to the moving text presentation. Instead, the dynamic condition produces
longer saccades by the readers, whereas the static text has shorter saccades. This finding
appears contradictory those found when the data is viewed in the visual form of fixplots:
the static texts seem to have long saccades towards the end of a unit passage. Perhaps the
difference in findings lies in difference between the data analysis of a UNIT, such as in
section 4.11, versus the WHOLE text average such as that seen up above. Regardless, the
groups above (specifically NS dynamic versus NS static) show a significant difference
when a Fisher Exact text is run (n=9, df=1) p=.008. These results will be discussed
further as the fixplots are examined across conditions later in the chapter in section 4.11.
167
4.9 Introduction to row reading patterns
Since reading patterns of moving text is one of the primary foci of this study, and
what distinguishes it from other previous studies, in the next section I will analyze the
uniqueness of reading when combined with the presentation features of roll-up closed
captioning. As a reminder of the unique notations mentioned at the beginning of this
chapter in section 4.1, Figure 4.3 below illustrates the row labels. Also, recall from
Chapter 3 that closed captioning is presented in two main ways: the “roll-up” and the
“pop-on”. The roll-up style of captioning, used mainly for captioning of live events such
as news, sports and late-night daily television shows, first appears at the bottom left
corner of the screen and rolls across presenting clusters of letters or words at a time.
When the line fills up (at approximately 66 characters), the line scrolls up to make room
for the next line of text. As that line finishes, it scrolls up and a third line begins, at
which point the first line gets bumped off the screen allowing only three lines of text to
be seen at once. It should also be noted that while the first presentation of the text
appears on the bottom row and rolls across in appearance, when that text scrolls up it
does so as one entity. The text only rolls across the screen in increments upon first
appearance in the bottom-most row.
168
Figure 4.5 Position of line appearances of closed captioning. Appears on bottom row
first (row 1), rotates up to row 2, then to row 3, and then disappears from screen.
In order to clarify and distinguish which line I am referring to when analyzing the reading
patterns of roll-up closed captioning, I have labeled the rows as such in Figure 4.5 above
and will use the term “roll” to indicate the incremental appearance of the line of text from
left to right, and the term “scroll” to indicate the vertical movement upwards of the line
of text.
4.9.1 Analysis of the reading patterns by row use
The eye movements of each of the participants for Condition One were divided
into four categories based on areas of use: row 3, row 2, row 1, and ‘offline’. The term
‘offline’ encompasses the time in which the eye movements were either above or below
the area of the three lines of text, or off to the right in an area where there wasn’t any text.
The amount of time that the eye movement stayed within each are was totaled. It was
hypothesized that the NNS would spend more time in rows 2 and 3 because of the rate of
presentation of the text, and their general lower proficiency in reading English, and that
row 3
row 2
row 1
Terms:
scroll”:
roll”:
169
the more proficient readers in English might spend more time on row 1. This was not
necessarily the case, as it turns out.
In order to calculate the amount of time for each area, the video data showing the
eye movements on the scene monitor was used. This data differs slightly from the
numerical data (in the form of X,Y coordinates and with a time code of hh:mm:ss.ms, or
hour:minute:seconds.milliseconds) primarily because video is presented on a monitor that
refreshes its screen and this affects the scene perception. To further explain, using
Windows Movie Maker v. 6.0 software, the eye movement video data can be presented
on a timeline using a timecode of hh:mm:ss.ms which can be incrementally counted in
segments of .03ms (100ms/30fps). This gives one the ability to view the video and
record amounts of time to the hundredth spot (unlike the hour, counted in 24, or the
minute and second counted in groups of 60). As a result of the monitor needing to be
refreshed, there are times where the picture on the screen is slightly askew (the cursor
representing the focal point of the eye might be moving at the time of the refresh rate and
appear twice per screen). This necessarily stems from the essence of presenting video on
a monitor: the frame rate typical for video is 29.97 fps (often referred to as 30), and the
refresh rate of the computer monitor used in the EMMA lab is 60 Hz (or it refreshes its
image 60 times per second, and so the 30fps is mapped on twice)17.
Regardless, the visual data captured by the eye camera appears as a small black
ball that is representative of the location on the scene monitor at which the eye tracker
17 This is actually a complicated ratio. Basically, if a video is shot at 29.97fps, and shown on a 60 Hz
refresh rate monitor, then each frame is shown twice. When film is used, however, the mapping is altered
since film uses a 24 fps rate, and so, using an algorithm, every second or third frame is repeated again so
that the sum total is equal to 60. All of this is only for NTSC video; PAL, common in many other countries
including Europe, uses 25 fps (www.techvt.edu/HTML/VirtualTextbook/...)
170
camera calculated the focus of the eye (or where it thought the fovea of the eye was
looking). It is this small black ball that is used to capture the time (in
seconds.milliseconds) that the reader spent looking at each row. Importantly, in as far as
could be determined, a change of row use started at the beginning of a saccade jump
(when the eye movement leaves a focal point on its leap to the next landing). This
amount of time was included in the following line use, and not with the row on which the
reader was reading.
In order to compare the participants’ patterns of reading, two smaller units of text
were selected from each of the total texts; these were selected for their approximate
placement within the entire text and for the similarity of the text composition (see
Chapter 3: Methods and Materials for the specifications and textual analysis and for the
text see Appendices: A and B). Table 4.8 below contains the basic characteristics of the
two unit texts regarding the total time of the text sequence and of the smaller unit texts,
with the time placement of the smaller texts (starting points) within the larger sequence.
It also shows information regarding the type/token counts for each unit text and the
number of lines with the average words per line. Each unit text was matched for these
characteristics so that the smaller text unit, which contains an entire discourse unit, from
the beginning of the text sequence could be compared to a text unit near the end of the
sequence.
171
Table 4.8 Smaller units of text identification and characteristics (label, total time (=),
start time within whole text (+))
Smaller unit text labels
(min:seconds.ms)
Whole text sequence labels:
A D
Airbags
05:11.30
Dramatic-
remembrance
= 0:13.80
(+ 0:29.00)
Propel-ant
= 0:23.14
(+ 02:36.70)
Types/tokens: 27/33 41/55
# of lines: 9 13
Average words per line: 3.66 4.23
Bikers
04:58.90
Smell-smells
0:23.00
(+ 0:17.00)
Cow-deer
0:18.10
(+ 03:54.00)
Types/tokens: 37/53 30/49
# of lines: 11 11
Average words per line: 4.82 4.45
Each participant saw one of the topics (airbags or bikers) in Condition One, the
unimodal dynamic condition of only reading. To break down the analysis of the reading
patterns according to row usage, I have analyzed the data according to a between- and a
within-subjects comparison using two subquestions.
Subquestion one: Is there a difference in the use of the rows between native and non-
native readers of English (or higher and lower proficiencies in reading English)? In
order to compare overall reading patterns of moving text, a between-participant
comparison is presented in which the overall usage of row 3, 2 and 1 is analyzed.
Subquestion two: Is there a difference in the individual reading patterns between the
beginning of the text and the end of the text? A marked difference in the reading patterns
172
would indicate that the reader is able to alter his/her reading strategies with the moving
text as the text becomes more familiar and predictable. This could be due to a strategy of
reading that compensates for familiarity with the story in that vocabulary is repeated or
the points in the story have passed an introductory stage and are now being elaborated
upon, or that the rhythm of the line presentation is becoming more predictable (see
Section 4.2 for analysis of the presentation rhythm of the roll-up closed captioning texts
used in this study). To inquire into these possibilities, a within-subject comparison is
presented between the two texts.
4.9.2 Results of analysis for subquestion one
The use by the readers of the different rows of presentation and their different
styles of presentation can be observed in two categories: in time, or the total reading time
for each row in each unit text, and in changes, or the number of times that each reader
changed rows to either follow the text up a row, perform a regression, or to start reading
on a new row.
The charts below show the totaled results of the readers’ use of the three possible
rows while reading moving text in the form of roll-up closed captioning (presented for
each text sequence). The amount of time spent on each row was added together, along
with the offline amount which included fixations that were above, below or off to the
right or left of a line of text and which could not be attributed or associated with a
particular text line.
173
Graph 4.13 Comparison of NNS and NS use of rows in airbag unit texts
NNS & NS use of rows in airbag unit texts
0%
20%
40%
60%
80%
100%
NNS2
A
NNS2
D
NNS4
A
NNS4
D
NS2
A
NS2
D
NS8
A
NS8
D
Participant and unit text (A or D)
Percentage of total use
offline
row 3
row 2
row 1
Graph 4.14 Comparison of NNS and NS use of rows in biker unit texts
NNS & NS use of rows in biker unit texts
0%
20%
40%
60%
80%
100%
NNS3
A
NNS3
D
NNS5
A
NNS5
D
NS1
A
NS1
D
NS7
A
NS7
D
Participant and unit texts (A or D)
Percentage of total use
offline
row 3
row 2
row 1
These graphs illustrate the variability in the use of the rows. While there is no significant
difference between high and low proficiency readers when looking at the total time of use
for each row, there are patterns evident on an individual basis. This is an area that will be
further explored in subquestion two.
174
Another viewpoint on the reading patterns is to compare the number of changes in
row use for each group. Unit A is situated in the beginning of the news story at the point
when the purpose of the story is starting to be explained and new vocabulary is being
introduced. Unit D is situated near the end of the story, when vocabulary has already
been introduced, and when the conclusion of the story is about to be presented. The
reader interaction, then, may change as the text becomes easier to predict according to
genre expectations and familiarity with the story line. Table 4.9 below shows the number
of changes between the beginning and end for each proficiency group:
Table 4.9 Difference in the number of row changes between unit A (beginning) and unit
D (end) of dynamic text reading (matched for unit text differences by dividing the
number of changes by the number of lines)
# of changes / #lines
between units A & D
(- = increase, + = decrease)
NNS2 0.8
NNS3 0.1
NNS4 0.1
NNS5 -0.2
NS1 0.4
NS2 0.0
NS7 -0.2
NS8 -0.4
* for the airbag text: unit text A has 9 lines, unit D has 13; for the bikers text both A & D have 11
lines)
When this table is converted into a graphic form, the changes in reading patterns between
the beginning and the end texts can more clearly be seen, as in Graph 4.15 below.
175
NS # or row changes
(per # o f lines in unit te xts)
0.0
0.5
1.0
1.5
2.0
2.5
Beginning End
Number of changes per line
NS1
NS2
NS7
NS8
Graph 4.15 NNS and NS difference in change in rows for text at beginning and text at
end
NNS # of row changes
(per # of line s in unit text)
0
0.5
1
1.5
2
2.5
Beginning End
Number of changes per line
NNS2
NNS3
NNS4
NNS5
The NS show three increases in the number of changes between the beginning and
the end of the story, indicating that the NS are moving more often in between the rows as
the story nears its end than they were at the beginning of the story in unit A. However,
the NNS show the opposite pattern when they generally decrease the number of changes
near the end of the story, indicating that they are sticking with the text presentation on
one or two rows and not saccading between rows as often as the NS. While there aren’t
significant numbers of participants to show a strong conclusion to this, the trends of eye
movement show a difference in the use of lines (including therefore the timing and the
repetition of the text presentation on each line) between high and low proficiency readers
as the readers move through a story from the beginning to the end.
Subquestion one served to look at the similarities and differences in the use of the
rows of reading dynamic text by the two subject groups. While the results were not
significant in their differences, relative patterns did emerge for the number of row
176
changes for each group but not when the amount of time spent on each row was
compared between participants.
4.9.3 Results of analysis for subquestion two
Flurkey and Goodman (2004) noticed that the patterns of eye movements changed
over time within a text reading. They found that as the reader became familiar with the
story and the plot, character(s), vocabulary, etc, he/she was able to better predict the text
through a transactional relationship with the text. Likewise, due to the nature of the
current study and its use of whole texts, similar comparisons can be observed regarding
the reading patterns over time within the reading of a single text. In other words, did the
row reading pattern change between unit A and unit D? Indeed, this change in reading
patterns was observed between the beginning and the end of the text (see Figure 4.6 for
an illustration of the relative placement of the units within the total text).
Figure 4.6 Units A & D relative placement within the total text (from start to end)
As the graphs (4.13 and 4.14) in subquestion one show, row 1 has the greatest use
by the readers, and therefore is useful to use as an indicator of change in reading patterns
and comprehension strategies using the different rows of closed captioning. Graph 4.16
below illustrates the change in the individual readers’ use of row 1 over time, using unit
A and unit D as the variables (just after the beginning of the text versus nearly the end of
start end
unit
A unit D
177
the text) for the biker text only. In order to compensate and compare between texts, the
total time spent on row 1 was divided by the total time for each text, resulting in a
comparable percentage per text between participants.
Graph 4.16.1 Comparison of time of use of row 1 in units A & D for the biker text (in %)
Percentage of Row 1 use, within subject
0%
20%
40%
60%
80%
100%
Unit A Unit D
Biker text units
Percentage
NNS3
NNS5
NS1
NS7
Graph 4.16.2 Comparison of time of use of row 1 in units A & D for the airbag text (in
%)
Percentage of Row 1 use, within subject
0%
20%
40%
60%
80%
100%
UNIT A UNIT D
Airbag units
Percentages
NNS2
NNS4
NS2
NS8
This resulting change indicates a subtle alteration in the reading patterns exhibited by
each reader except for NNS4 who seems to abandon using row 1, which rolls across the
screen, and instead devotes his effort at reading each line of text once it has been fully
presented, i.e. has scrolled up to rows 2 and 3. This is quite the opposite of NNS2, who
seems to have changed tactics and by the end of the news story is clinging to row 1 and
178
rarely saccading to the other rows. These percentages of the use of the initial presentation
of the line of text are not a result of a change in the number of words presented per line
between units A and D which remained fairly constant between the two units of text, but
instead are possible evidence of the reading strategies employed by the readers. These
results indicate that the readers are indeed using attentional strategies when presented
with dynamic text such as that in the format of live closed captioning.
4.10 Comparison of eye movement patterns: Across conditions
As we have seen in the previous sections, the fixation durations between the static
texts and the dynamic texts varied little for proficient (NS) readers, and that the NNS
fixated longer compared to the NS group. We have also seen that the NS and NNS differ
in their eye movements in so far as their lengths of fixation: NS fixate in shorter durations
regardless of the text presentation style compared to NNS. There also is an indication
that readers change their reading patterns as they complete a story whereby they change
how they read as they become more familiar with the text content and perhaps the rhythm
of the text presentation.
The last comparison between readers and texts regards a comparison of eye
movement patterns and the text itself, regardless of whether it is static or dynamic. By
comparing the eye movement patterns, we can see if it is the text itself that drives the
reading patterns, or if it is the condition of dynamic versus static presentation of the text
passage. Whereas in the previous section 4.7 the total amount of time was used to
indicate line use, the following fixplots are better able to illustrate the fixations as they
occur across the lines, giving a horizontal and vertical representation to the use of the
179
available text. Table 4.10 below highlights one of the units of text, the ‘dramatic
remembrance’ text (which in the dynamic condition is 13.8s in length, but in the static
condition varies with the individual reader). Seven readers’ patterns are presented: three
in the static text condition (NS9, NS11, NS12) and four in the dynamic condition (NS2,
NS4, NNS2, NNS8). A brief synopsis of the interpretation of the fixplots is to the right
of each visual fixplot and further elaborated upon afterwards. The two white arrows in
the static text fixplot indicate the beginning and end of the ‘dramatic remembrance’
passage that is being analyzed.
Table 4.10 Comparison of eye movement patterns between static and dynamic texts
Text type:
Readers:
Dramatic remembrance text passage
(Unit A in the airbag text)
Analysis:
static text
NS 9 =
NS11 =
NS 12 =
Size of
symbols
vary in
relation to
fixation
duration
(longer
times =
larger
symbols)
Fixplot, NS9, 11 & 12, unit A, airbag text
All three readers make
greater use of the left side
of the text
All three readers make
greater use of the
beginning of the text
passage and skip more
words as the passage
continues.
Within the “Dramatic
Remembrance” passage,
all three readers fixate
on/near the following:
ACTUALLY
MYSELF
FUMES
180
Dynamic
text
NS 2
Fixplot, NS2, unit A, airbag text
NS2 primarily uses row 1
(the initial presentation of
information) and tends to
wait for new words to
appear in the first
position (the lower left)
as well as at the end of
row 1 (indicated by the
clump at the right end of
row 1).
Dynamic
text
NS 8
Fixplot, NS8, unit A, airbag text
NS8 primarily uses row 1
(the initial presentation of
the information) but will
skip up to row 2 and row
3 for regressions.
Dynamic
text
NNS 2
Fixplot, NNS2, unit A, airbag text
NNS2 uses only rows 1
and 2 (the bottom and the
middle)… not many
words are skipped
Dynamic
text
NNS 4
Fixplot, NNS4, unit A, airbag text
NNS4 utilizes row 2 for
most of the passage. He
starts on row 1, but must
use row 2 to finish. If he
doesn’t finish, he follows
the text up to row 3.
The dynamic text fixplots show the continuity of the eye movement pattern through time.
The background is represented here as a blank background primarily because it changes
by at least the addition of one word approximately every 10s, with a line change
181
approximately every 1.25s. To better present the movement, and without an
overwhelming amount of minute detail as the background changes, the above fixplots
have been generated to represent a macro view for the dramatic remembrance unit
passage, or unit A in the airbag text. In comparison, the static text fixplot starkly reminds
us that the prototypical reading text is quite different in its presentation.
The reading patterns with the SCC texts were similar to those of past research in
that the beginning and end of lines were not fixated (Rayner, 1998); the reading patterns
in the dynamic texts are similar in that respect. However, a noticeable difference
between the two texts are the numerous fixations in the middle of each line of text and at
the end of the line. While the dynamic texts, for both subject groups and especially for
the NNS group, illustrate fixations moving from the left to the middle to the right end of
the rows, the static text fixplot shows the majority of the fixations occurring on the left
end of the rows. The overall pattern of the eye movement for each of the subject group is
slightly different: while the static text reading patterns show a greater facility with
predicting the text, as indicated by the greater amount of fixations at the beginning of the
idea unit and fewer at the end, the participants reading the dynamic text cannot use the
same strategies for scanning a text since each line is presented one at a time. This is
evident in the fixplots: the text is fixated more evenly across the row.
While the reading pattern of the NS participants show a more irregular pattern of
movement, the NNS seem to have fallen into a pattern. For example, NNS 2 exhibits a
pattern of focusing his reading attention while the new text is in row 1 following it up to
row 2, and never utilizing row 3, whereas NNS4 instead uses a diagonal pattern of
182
focusing on the text presentation. NNS4 reads the text in row 1 as it is presented, but
doesn’t quite finish while the text is in row 1 and must follow it up to row 2; once there
he quickly jumps back down to row 1 but this is after the text has already started to roll
across the screen which puts him behind and so he must once again follow the line of text
as it scrolls up to row 2.
The fixplots are able to show the reading patterns of the participants in relation to
the reading condition they were given. All were fully aware that they would be asked
questions about their comprehension of the texts, and while the NS readers, who were
considered to be readers at a higher proficiency because they were reading in their L1,
were able to answer with greater detail, the readers with a lower proficiency of reading in
English (their L2) were still able to express some comprehension of the story plot. Based
on the assumption then that all of the participants’ eye movements were actively engaged
in reading and a transaction with the text with a goal of comprehending the text, the
fixplots show that there are indeed reading patterns exhibited by the participants, that
proficiency interacts with those reading patterns to some extent, and that these patterns
are also individual in nature.
4.11 Summary of Analysis for Question 1
After looking at the different rhythms in the presentations of live closed
captioning, a suggestion was put forth that closed captioning, the transcription of spoken
text, is at the intersection between the prediction of reading text and the projection of the
end of a spoken idea unit. This juncture possibly influences reading strategies for
dynamic text. The available data collected by an eye tracker was used to look at
183
differences in reading patterns and eye movements between the two groups of readers
and the two conditions of text presentations. A significant difference for the saccade
movement in the length of time (p=.014) was found and marginally significant
differences were found for the number of fixations and the average fixation durations (p=
.07) between groups of readers. What was surprising was that the NS did not show a
change in the number of fixations between text presentations (dynamic versus static).
We have also seen that in the dynamic reading condition, readers’ patterns change
as the passage evolves. The majority of the readers used the bottom row, or the line of
text that is the initial presentation of information, slightly less as the passage evolved, but
that in general the reading patterns regarding which row was used more often to gather
information changed over the course of the reading passage indicating the
implementation of individual reading strategies for gaining comprehension of the story as
it progressed (both across the screen, and across time through the plot). In the next
chapter, I move from an analysis of unimodal dynamic texts to multimodal dynamic
texts, addressing Research Question Two.
184
CHAPTER 5:
RESEARCH QUESTION TWO: THE MULTIMODAL ENVIRONMENT
The second part of this study deals with the reading patterns and use of dynamic
text as an additional modality. Instead of the closed captioning being isolated, as in
Condition One, Research Question Two compounds the options that the viewer has to use
when trying to understand a textual presentation. In Condition Two, the authentic,
original video text is presented which includes a multimodal presentation of information:
aural, visual (graphic) and written text.
Research Question Two: In what ways do the reading patterns of dynamic text change
with the addition of the multimodal environment? Are there any similarities or
differences between Native and Non-native Speakers of English?
In this chapter, the data analysis from Research Question One, regarding fixation
duration patterns and the reading patterns using the three closed captioning rows of
printed text, will be compared with the reading patterns when the other optional
modalities are present. The same participants’ data will be used (NNS2, NNS3, NNS4,
NNS5, NS1, NS2, NS7, NS8). First, an overview of the participants’ general patterns of
viewing will be explored using the analyses tools of “look zone”s and fixation durations,
followed by a comparison of the reading patterns exhibited by the two groups in detailed
multimodal analyses. Included throughout are the topics of noticing and paying attention,
and the use of the available modalities by the participants based on their eye movements.
While Research Question One explored specifically the reading patterns associated with
dynamic text, Research Question Two looks at dynamic reading patterns as only one part
of the bigger picture. Therefore, there is an exploration of the use of the closed
185
captioning rows of text by participants, but it is within the larger framework of viewing
patterns.
5.1 Overview of look zones by participant
In order to analyze the patterns of eye movements and their assumed attentional
focus points, the scene of study can be divided into different zones in order to classify
and quantify the data. A look zone, therefore, is one of the areas for which the fixations
are counted or measured. For example, Slykhuis, Wiebe, and Annetta (2005) used look
zones to analyze the viewing patterns of students observing the multimodal use of text
accompanying pictures in a PowerPoint presentation. The number of times that a student
looked at a picture was counted versus the accompanying text (therefore, two different
look zones) and the accumulated gaze time was calculated as well as the order of the gaze
pattern. They found that students were able to dismiss the pictures that have little
semantic relationship to the text and rather to attend, at the appropriate point in the text,
to the accompanying photographs that provide relevant additional information (p. 519).
Similar viewing and selective attentional patterns will be looked at for the current study
using the participants’ recorded eye movement gaze trails.
For this study, two look zones were used: defined as the areas in which the closed
captioning was being presented, and the area in which it was not. The vertical line of
160° was used as the boundary to mark the difference between zone 1 (LZ 1) and zone 2
(LZ 2) as it was just above the top row of the closed captioning presentation area18. The
18 The ASL 5000 eyetracker and the EYENAL software were set with the parameters in which 0° on the
horizontal axis starts on the left side of the screen and continues to 250° on the right side, while the vertical
186
look zone two area consisted of the visual-graphic modality and the accompanying aural
modality, since those two modalities were synchronized in time. The written text in look
zone one was rarely presented simultaneously with the spoken narration in look zone two,
rather the closed captioning lagged behind. The contiguity effect (Mayer& Sims, 1994;
Mayer, 2005) or the time difference in presentation between two modalities, can be a
hindrance to some viewers, or can be used as a tool for accessing the aural modality: if
vocabulary is unknown or if parsing the verbal stream is difficult, the lag time in the
presentation of the transcription can be used to help parse the verbal stream or give visual
form to the unknown vocabulary word. Therefore, observing the patterns of switching
between the two modalities, specified here by the use of the look zones, as illustrated in
Figure 5.1, proves to be an interesting analysis of language learners’ attentional choices
for comprehension.
Figure 5.1 Example of the look zone parameters
axis starts at 0° at the top of the screen and ends at 240° at the bottom of the screen. Following this
schema, 160° on the vertical axis occurs at a little less than 1/3 of the way from the bottom of the screen. It
was calculated to coincide with the placement of row 1 of the closed captioning text.
look zone
2
160°
look zone
1
187
5.2 Use of the available modalities: Look zones
As expected, each participant used the modalities differently when he or she
transacted with the text, however, general trends were found. Below are the overall
fixplots for each participant viewing the text in the multimodal Condition Two.
188
Table 5.1 Comparison chart illustrating overall fixation patterns in Condition Two
(multimodal). The lines at the bottom indicate the approximate area used for closed
captioning line presentation.
NNS NS
NNS3
NS1
Airbag
text
NNS5
NS7
NNS2
NS2
Biker
text
NNS4
NS8
189
The eye movement patterns in Table 5.1 above clearly illustrate the varied use of
the two visual modalities: the graphic or pictorial presentation of the news story (with an
accompanying synchronized audio) as seen by the fixations and gaze trails in the middle
of the screen (typically where the action takes place and where the faces of the people are
positioned) and the written presentation of the news story in the form of visual words and
sentences as seen on the bottom area of the screen represented by the three-lined box.
5.3 Fixation durations within each look zone: An overview
A quick overview shows the trend that the NNS group as well as NS8, tend to use
the visual presentation of the words more so than the NS group. Table 5.2 below shows
the actual percentage of fixations for each participant (viewing the multimodal condition
of either the airbag or the biker text) divided into look zones one and two.
Table 5.2 Percentage of fixations for each look zone
text Participant look zone 1
(<160°)
look zone 2
(>160°)
NNS3 26% 74%
NNS5 54% 46%
NS1 3% 97%
Airbag
NS7 3% 97%
NNS2 61% 39%
NNS4 24% 76%
NS2 7% 93%
Bikers
NS8 17% 83%
What can be seen from Table 5.2 is that the NS participants in this study are using the
closed captioning far less often than the NNS. An interesting note is that the NNS, as self
reported while answering questions in their extended interviews, tend to be more aural
learners and in general do not like to read (either in Arabic or in English). These
190
relationships, however, between the viewing patterns and the type of learner and learner
strategies are looked at in greater detail in Research Question Three in Chapter 6. For
now, the observation of the viewing patterns will be further detailed in the following
sections by comparing the patterns using fixation durations. Do the fixation durations
change when there are multiple modalities to choose from?
The hypothesis is that the fixation durations in zone one in the multimodal
condition (experimental Condition Two) will be quicker than the fixation durations found
in the monomodal condition (experimental Condition One). The reasoning is that with
the dual input the aural modality is synchronized with the visual graphic modality and
therefore has a chance to activate the word meaning, after which the fixation on the
written vocabulary word, will not need to fixate as long before saccading to another word
since the printed text (the closed captioning) appears slightly after the other two
modalities.
Paivio addresses the cross-modal activation in his Dual Coding Theory, in which
the visual modes are located in a separate non-verbal imagery coding system of mental
representation, different from the linguistic coding system (the verbal system) (Sadoski &
Paivio, p. 43). There are two types of cross-modal activation in this system. ‘Referential
processing’ exists in which there is a possible spreading activation between systems. The
activation that occurs may depend on the “personal history of the individual and the
contextual factors” (p. 54). As an example, seeing the word cup or hearing /kΛp/ would
activate their respective logogens (the basic representational units in the verbal system)
191
and in turn activate an image of the referent object of a cup (or one of a variety of images
connected with cup) (p.59).
However, this can also happen within systems, in ‘associative processing’, so that
reading the written form activates the logogen in the visual modality (in the verbal
system) and in turn may activate the logogens for the pronunciation /kup/ in the auditory
and motor modalities of the verbal system (p. 60). This type of system theory works very
well with the multiple modalities available to the participants in my study, such that the
sound and the related visual images appear, generally, a second or two before the printed
words appear below. The referential processing then, could be quite beneficial for
second language learners as an assistance to their reading comprehension (if the aural
representation is recognized by the hearer).
On another note, Mayer has done quite a bit of research regarding multimedia
systems, and the use of more than one modality in learning situations. Although his
theories have merit and research to support them, his experimental research does not
extend enough to include the type of multimedia being used by the participants in my
study. Mayer and his colleagues look at the various interactions and results for students
who are exposed to scientific texts with diagrams and sometimes with animated
diagramming or added narrative. My study does not use pre-conceived texts in which the
main message was constructed for classroom use and instruction, nor that which is
constructed out of a mechanical operation, rather the texts are presented using an
everyday object that exists with the purpose of delivering information in a (somewhat)
entertaining manner and is widely available (not just to students in a classroom).
192
Relevant to this study, then, and this section of Research Question Two in particular, are
the concepts of the ‘multimedia effect’ and the ‘contiguity effect’ in Mayer’s Cognitive
Theory of Multimedia Learning (see Chapter 3 for the explanation) for the use of using
multimedia as a learning tool for first and second language learners.
Actually addressing the uniqueness of Second Language Learners, Plass & Jones
(2005) walk through the cognitive processes involved in an integrated model, combining
Mayer’s theories with interactionist models and comprehensible input and output, from
which I’d like to highlight the idea of noticing, or the process of focusing attention on
certain aspects of the target language (p. 471). Schmidt (1990) refers to noticing as a
private experience, although with the use of the eye tracker, data can be gathered
regarding what the participant noticed and about what he or she possibly paid attention to
(automatic versus conscious, intentional effort) (p. 134). Perhaps the most important
aspect to mention is the idea that the L2 participants (and the L1 as well) will direct their
attention to the linguistic input that best suits their comprehension needs and that they
think is relevant. This is similar to the idea of apperception, or “the process of selecting
words and pictures to support interaction and thus attain comprehension of the material”
Plass & Jones, p. 483).
The presentation of the individual fixplots above in Table 5.1 illustrate the
processes of apperception and noticing in that the individuals’ eye movements illustrate
when and where they are ‘noticing’ the linguistic input available to them. By looking at
the number of fixations in each look zone, the eye movements can be compared within
each look zone and the differences between the participants. By looking at the duration
193
of the fixations not just the ‘noticing’ can be compared, which may be represented by
quick fixations, but the ‘paying attention’ to the relevant and accessible linguistic input in
the optional modalities. First, fixation durations will be used to distinguish reading
patterns between the two participant groups in Look zones 1 and 2. Following that,
multimodal analyses of various aspects of the attentional patterns of the two groups will
be presented.
5.3.1 Fixation durations in multimodal reading patterns
The fixation duration data collected in Condition Two (the multimodal
presentation of the texts) for Research Question Two is illustrated in Table 5.3 below. It
has been divided into look zones one and two. The total average for fixation durations has
also been included in the table, as well as the number of fixations that are greater than 1
second in duration (which for reading, is very long).
Table 5.3 Fixation duration averages comparing look zones 1 and 2 for both participant
groups
Look
zone 1
(CC)
Look
zone 2
(video)
ave
total
fix dur text
fixations
> 1.0 s
LZ 1
fixations
> 1.0 s
LZ 2
NNS2 0.251 0.292 0.233 bk 4 0
NNS3 0.292 0.418 0.385 ab 2 5
NNS4 0.297 0.472 0.426 bk 2 20
NNS5 0.316 0.499 0.401 ab 4 26
NNS
ave 0.289 0.420 0.361
NS1 0.128 0.345 0.340 ab 0 11
NS2 0.156 0.291 0.281 bk 0 6
NS7 0.287 0.452 0.447 ab 1 38
NS8 0.168 0.352 0.320 bk 0 10
NS ave 0.185 0.360 0.347
194
The above numerical data is presented in graphic form in the graph below:
Graph 5.1 Average fixation durations for Look zones 1 & 2 for each participant
Average fixation duration for each look zone
0.0
0.2
0.4
0.6
0.8
1.0
1.2
NNS2
NNS3
NNS4
NNS5
NS1
NS2
NS7
NS8
Participant
Average fixation in s:ms
LookZone 2
(vid eo )
LookZone 1
(CC)
Table 5.3 and Graph 5.1 illustrate the differences between the NNS and the NS fixation
durations for each look zone. The results correspond to those found in Research
Question 1 regarding dynamic text reading patterns (in a monomodal environment):
based on proficiency with reading English: the NNS participants had longer fixations
when reading, which also held in Condition Two (in a multimodal environment) even
with the addition of other choices of gaining comprehension. In other words, when
compared to the higher proficiency group’s fixations, in general, the lower proficiency
group still fixated longer on words when they were reading. There were no
distinguishing differences between the two groups’ fixation durations when looking at the
video (Look zone 2) although they are slightly longer than those reported by Rayner
(1998) as the average fixation duration for scene perception (330 ms).
195
To further explore the two groups’ reading patterns with dynamic reading, a
comparison between the two conditions will help show the relationship between their
fixation durations. The lower proficiency group’s fixation durations in the dynamic
reading condition are generally quicker when compared to the dynamic reading patterns
in Condition Two, in which there were multiple forms of information. An exception,
NNS2, actually had relatively the same fixation duration across conditions, indicating a
fairly fixed reading pattern when reading in his L2. His reading pattern is examined in
further detail below in section 5.5 and he is also one of the case studies explored in
Research Question 3 in Chapter 619. The opposite is found as a reading pattern with the
higher proficiency group: the NS participants exhibit a general trend of quicker fixations
in look zone 1 than in the monomodal Condition 1, when reading a constant stream of
dynamic text with no other modalities from which to choose.
Table 5.4 Comparison of average fixation duration across conditions in ms:
Condition 1 Condition 2
ave total fix
dur sequence
ave fix dur
for LZ 1
NNS2 0.251 0.251
NNS3 0.284 0.292
NNS4 0.239 0.297
NNS5 0.265 0.316
NS1 0.193 0.128
NS2 0.193 0.156
NS7 0.210 0.287
NS8 0.225 0.168
19 NNS2 was chosen to be a case example in Research Question 3 (Chapter67) because he had relatively
low amounts of data loss. It was a coincidence that his eye movement data was also explored in detail,
although from a different perspective, in this chapter.
196
This opposition in reading patterns illustrates that the proficiency of the readers is
a clear indicator of reading patterns, and that this holds in situations where the text and
information is presented without rate control. The NNS group do not seem to be reacting
to the text presentation by reading as quickly as a NS which in all likelihood is due to a
greater difficulty in predicting the words and less familiarity in recognizing what the
words could be. However, refer back to the literature review in Chapter 3 in which the
benefits of closed captioning had been explored: while it has been shown that exposure
and use of closed captioning can improve vocabulary, it has only been minimally shown
to help with reading rates (Linebarger, 2001) and not in all subjects.
5.4 Consecutive fixations in look zone 1 (reading or noticing?)
It is highly unlikely that the number of fixations (expressed in percentages of the
total number in Table 5.2 above) has much to do directly with the individual’s length of
the fixations. The data collected shows that the NS participants very rarely looked down
at the text. Once this data came to light, I decided to look at the number of consecutive
fixations for the participants in each group, since if the NS were reading, then their
fixation durations would be on average shorter and with a more general pattern of
consecutive fixations. The NNS participants, if they were only looking at the text and not
reading it but perhaps fixating in the area quickly, perhaps due to a movement that caught
their attention rather than reading the text, would show isolated fixations. This actually
seems to be the case.
197
When analyzed for the number of consecutive fixations in Look Zone 1, the NNS
showed a greater number while the NS consistently showed fixations that were limited to
one or two fixations in a row before looking back up at look zone 2 again.
Graph 5.2 Frequency of consecutive fixations in look zone 1 (the reading area)
Frequency of consecutive fixations
0
5
10
15
20
25
12345678910111213141516+27+38+
Number of consecutive fixations
Frequenc
NNS2
NNS3
NNS4
NNS5
NS1
NS2
NS3
NS4
* consecutive fixations start at 2, although to show the high number of single fixations in look
zone 1, frequency counts for that category are shown as well.
Graph 5.2 illustrates the frequency of consecutive fixations for both groups; NNS2, 3 and
4 show longer stretches of consecutive fixations indicating more fixations spent reading
the text versus the NS who show a greater frequency of quick glances of between 1 and 3
fixations in a row.
Indirectly, the frequency of consecutive fixations is showing the individual
strategies that the viewers are using: the NNS are at least looking at the printed text more
often and for longer durations of time than the NS who are primarily getting their
information from the visual and aural modalities. This was also echoed earlier in Table
5.2 when the percentage of fixations in each look zone was explored. Whether the non-
198
consecutive fixations and saccades indicate that a participant is noticing movement and
looking at a word in the text area, or whether the participant is checking a word in the
text area against information from the other modalities remains to be seen. The next
section uses one participant’s eye movements to micro-analyze the movements between
the two look zones.
5.5 A detailed multimodal analysis of the use of look zones (for NNS2)
There are inherent factors that may affect how and when a participant switched
between look zones 1 and 2. In this next section, a detailed multimodal analysis of the
saccade movements of NNS2, while watching the biker sequence, are displayed which
illustrate viewing patterns.
Table 5.5 graphically describes the saccades and fixations of NNS2 using a
multimodal analysis and several terms unique to this description. A Camera Shot
indicates the first display of a new scene. Each time the scene changes (which involves a
new camera shot) a different Camera Shot is used, which aids in distinguishing and
breaking up the table below into manageable sections.. For example, in Table 5.5a,
Camera Shot 1 starts with the crossover between a display of the weather and the
television show’s host. At that point, the Transcription (audio) starts and the male host is
heard saying “As a subculture of American society…”. When he finishes his dialogue by
saying “record straight,” there is a change in the display, indicated by Camera Shot 2 in
column two, which shows the title of the section of the show, entitled, “Art Scene”. At
this point, the host is heard (but not seen) narrating “Producer SooYeon Lee….”.
199
There is a shift in contiguity, however, as the closed captioning starts to fall
behind the spoken text at this point, indicated by the corresponding rows “CC text
(written)”. At this point, the eye movements of NNS2 have not been displayed and only
the context has been given. Table 5.5b begins to include the eye movements in
conjunction with the available modalities, such as those just explained above for Table
5.5a. The Camera Shot shows the beginning of the scene, a road, before a motorcyclist
rides in front of the camera. Appendix H includes a sample multimodal analysis of the
Airbag text with a breakdown of the camera shots for the entire sequence.
Table 5.5a Context: before unit A of Biker sequence. Note that the CC text carries over
from Camera Shot 2 to Camera Shot 320
20 Note that the CC text is represented as such by using CAPS and Ariel font with the original line breaks,
the audio transcription is also in Ariel, and the multimodal and eye gaze notations are made in Times New
Roman. As a reminder, the << symbols are used in closed captioning to indicate a change in speaker.
Camera Shot 1 2
Description
(graphic)
Transcription
(audio)
>> As a subculture of
American society, bikers have
been largely misunderstood.
That's why a Tucson biker is
trying to set the record
straight.
Producer SooYeon Lee has
tonight's edition of Art scene.
CC text
(written)
THAT'S WHY A TUCSON BIKER IS
TRYING TO SET THE RECORD
STRA
STRAIGHT.
PRODUCER SOOYEON LEE HAS
TONIGHT'S EDITION OF ART SCENE.
200
Table 5.5b Camera Shot 3. Relationship between appearance of printed text (CC), of
video description (graphic), of the audio transcription for this camera shot
Camera
Shot 3
Graphic
shot 3
Picture of road, motorcycle
appears, biker looks at camera
and drives by
Audio
shot 3
>> why do I ride? It's a feeling of
freedom for me. It's the wind in
your face. The smells,
CC text
by
segment
STRAIGHT.
PRODUCER SOOYEON LEE HAS
TONIGHT'S EDITION OF ART SCENE.
PRODUCER SOOYEON LEE HAS
TONIGHT'S EDITION OF ART SCENE.
>> WHY DO I
0:18.57
NNS2 reads CC text until
0:22.67
person’s face appears on
screen. Quickly looks at it and
then reads text at start of new
unit (>> WHY DO I ).
NNS
fixations
by
segment
with time
listed
(s:ms)
0:26.33
Looks back up at person after
finishing RIDE
0:26.67
and then saccades back down
to CC when the row scrolls
up. Starts where he left off,
row 2, at IT’S A FEELING…
201
Table 5.5c Camera shot 4, and 5, close up of bike wheel and tachometer during which
there is no change in Look zones
Camera
Shot
4 5
Graphic
Shot 4&5
Camera focuses on close
up of motorcycle bike
wheel
Close up of tachometer
Audio
Shot 4&5
Good people you meet.
(none)
CC text
ME.
IT’S THE WIND IN YOUR FACE. ME.
IT’S THE WIND IN YOUR FACE.
THE SMELLS, GOOD PEOPLE YOU
NNS
fixations
0:29.53
Reads the CC text in LZ 1
0:31.27
Still reading the text, not
changing LZs
NNS2 continues reading the text and not looking up into look zone 2 until he finishes the
idea unit (which is unit A), and until there is a lengthier pause in written text presentation
and a change in the sound and the picture. The background sound includes the increased
volume of the background soundtrack song and the added extraneous sound of a
motorcycle engine revving.
At this point in the biker sequence the male interviewee speech has just ended and
the female narrator begins to talk. There are quite a few changes occurring, but perhaps
the most significant is the end of the idea unit which coincides with the change of visual
scenery also indicated by the audio track. Table 5.5.d skips to the look zone change
fixations about 10 seconds later in the text.
202
Table 5.5d Camera shot 9, illustrating the changes between LZ1 & LZ2 in relation to the
changes in the audio, graphic and printed modalities
Camera
Shot 9
graphic 0:40.07 (start)
Four motorcyclists are riding down the street.
audio (♫♪ Music starts)
>> Feared by some, idolized by others, some members of the biker
community are dispelling myths about those who ride two wheels
instead of four. >> I was a biker for many, many years.
CC text
RIVER CANYON YOU CAN SMELL
SMELLS DRIVING THERE YOU CAN’T
SMELL IN THE CAR, YOU KNOW WHY
SMELLS DRIVING THERE YOU CAN’T
SMELL IN THE CAR, YOU KNOW WHY
YOU RIDE A MOTORCYCLE
NNS
fixations
0:40.82
Finishes reading CC text 0:43.27
Looks up from reading to face
(and then back down again)
CC text
SMELL IN THE CAR, YOU KNOW WHY
YOU RIDE A MOTORCYCLE
<<
SMELL IN THE CAR, YOU KNOW WHY
YOU RIDE A MOTORCYCLE
<< FEARED BY
NNS
fixations
continued
0:45.13
Look up briefly at the
motorcycle again
0:45.80
Looks back down and starts
reading again
The above table shows that NNS2 reads the written text after the screen change
until he finishes the unit and then looks up at 0:43.27 to view the visual modality until at
203
44.33 (1 second later) he starts reading again. At this point the written text lags behind
the audio track, so that the narrator is talking and the corresponding text hasn’t been
presented yet. So, NNS2 looks back up at look zone 2 until the written text changes and
he begins reading again in look zone 1. He looks up and down two more times, in
between the text and the pictures of the motorcycles moving down the street until he
finishes the unit idea (marked by a period punctuation mark) and the video switches to a
mid-range close-up of the interviewee which is also marked by a switch in the audio as
the interviewee begins to talk and the background music fades out.
While the video data shows the eye movement in the form of a cursor, or a black
dot, as seen in the screen shots in Tables 5.5.a-d, the numerical data that is transposed
into fixplots shows similar, but not identical data. The fixplots below show the fixation
patterns and gaze trails for the fixations that correlate with Tables 5.5.b, c & d. Notice
that in Tables 5.5b and d, there are not as many switches between look zones as indicated
in the video form of data. This can be accounted for by the quickness of the fixations. If
the fixations were less than .100 seconds or did not remain within 1 unit visual degrees
(as calculated by the eye tracker) then the fixation duration was not registered separately
as such by the EYENAL software and was averaged into the prior and following fixations
durations. In doing this, the software and the eyetracker are controlling for natural
oculomotor movement.
204
Table 5.6 The fixplots for the multimodal analyses in Table 5.5.b, c & d.
Table Fixplot Fixations
and
notations
5.6.b
27 – 48
Note:
NNS2 is
reading,
and the
two quick
fixations
in LZ2 are
not
registered.
5.6.c
53-58
Note:
Even
though the
video and
audio
changed,
NNS2
kept
reading in
LZ 1
205
5.6.d 73-85
Note:
NNS2
read but
kept
looking up
to LZ1 at
the figure
on the
screen.
Only two
of four
fixations
were
registered
by Eyenal.
The multimodal analysis gives a more complete picture of the reading patterns
exhibited by the readers in this study. NNS2 is a fairly typical example of the NNS
group; the NS group, in general, show less reading, with only quick glances down in
almost the opposite pattern of NNS2. The causes of the changes often seem to be related
to changes in the units (ends of idea units, phrases, etc) or in a change in the visual
screen. Many saccades from look zone 1 to 2 occurred when a change in the camera shot
showed a face.
The three excerpts above (Tables5.5 b, c & d) exemplify the changes that
occurred between modalities. Conscious reasons for some of the changes were given
during the interviews, including a NS response that at one point she couldn’t distinguish
the narrator from the background noise and a NNS response that he was reading, but then
it just got too fast and so he started to devote his attention to the visual and aural
206
modalities. Interestingly, these two participants, at least, were aware of their actions and
motivations for modality use.
5.6 Comparison of line use in multimodal texts
The use of the lines available for text differed between the two participant groups.
The NNS participants, for the most part, read quite a bit of text. Their eye movements
indicated that as they read, once in awhile they would look up at the screen. This next
section will look at what happens when the reader returns to look zone 1 after an elapsed
amount of time during with the text has moved. First, though, there were a few
noticeable causes for the look zone changes.
At times a look zone change was initiated as a referential verification between the
print and the picture. Table 5.5 c above showed NNS2 reading the text and looking up
and down at the picture of a motorcyclist as he read about bikers. Up until 5.5c the word
BIKER had been heard in the narrative and seen in print, but the word MOTORCYCLE
had yet to be used. In this case, the modalities complimented one another to aid the
viewer in word-object recognition.
However, the majority of the changes between look zones 1 and 2 were motivated
by different scene changes such as in the following table illustrating NS8’s attentional
patterns:
207
Table 5.7 Illustration of scene change possibly preempting LZ change by NS8, fixations
highlighted by circles.
A. LZ 1 B. scene change C. LZ 2
(0:25.83)
NS8 reading row 1,
THE SMELLS, GOOD
( 0:25.93)
scene changes to mid
distance shot of
motorcycle and people’s
legs
(0:26.30)
reader changes to look at
new scene, fixating the
motorcycle
Besides scene changes, there were occurrences of changes between look zones
that were motivated by a pause in the presentation of the text, during which the eye
movements showed a scanning for additional information, perhaps waiting for new text
to appear. On the whole, a change between speakers (and therefore camera direction) and
changes in the movement initiated a change, if only briefly, up to look zone 2 from look
zone 1.
A noticeable difference between the two groups was that while the NS
participants looked down to refer to a word or an utterance in print, the NNS participants
tended to read and look up for verification (while they were reading). When NNS
participants changed from look zone 1 to 2, they tended to first land on a word in row 3,
the top row, and then search for the word they were looking for in that or the other two
rows, while the NS participants tended to land on any of the three rows and either begin
to read or to jump back up to look zone 1. This trend was connected to the pattern
exhibited by many of the NNS participants, in that once they returned to the text they
208
were more often than not seen searching around, without reading in a consecutive fashion
from left to right but rather skipping from word to word, and line to line, until fixating
finally at the beginning of a line and then starting to read in a pattern from left to right.
In general, when NS participants did look down and read, they seemed to find their place
more easily (see Table 5.6 below). This isn’t surprising, since the higher proficiency
readers are reading in their L1 and have a greater degree of familiarity with decoding the
text quickly and predicting neighboring words and phrases.
The following analysis shows NNS4 looking between look zone 1 and 2 to gain
comprehension. He is unable to fully engage with the text as he returns and saccades
back to LZ 1, which is seen as he quickly moves around the lines before he eventually
devotes all of his attention to the modalities in LZ 2. In this table the available modalities
are in the “Audio/PRINT” column which present the soundtrack (indicated by a “ ) and
the closed captioning (indicated by CAPITAL print) during the camera shot which is
presented in the third column in which the eye movement is displayed (captured by the
cursor) with the corresponding eye movement described below. The text which is fixated
for each scene illustration is indicated in CAPS.
209
Table 5.8 Detailed multimodal analysis of NNS4 eye movements between LZ1 and LZ2
illustrating nonfamiliarity with the text.
Time
code
Audio (“ “)/CC PRINT Eye movement / Description
“and you can smell
A.
04:31.77
THE SMELLS, GOOD PEOPLE YOU
MEET.
>> IF YOU EVER GO THROUGH S
NNS4 fixates GO
“smells
B.
04:32.03
THE SMELLS, GOOD PEOPLE YOU
MEET.
>> IF YOU EVER GO THROUGH SA
Saccades to THROUGH
210
“driving there
C.
04:32.80
MEET.
>> IF YOU EVER GO THROUGH SALT
RIVER CANYON YOU
Starts to quickly move eyes
around , not recorded by
numerical data in fixplot (over 1
second of interfixation durations)
“you can’t smell in a car
D.
04:33.27
MEET.
>> IF YOU EVER GO THROUGH SALT
RIVER CANYON YOU CAN SMELL
Fixates CANYON
“you know why you ride
E.
04:33.87
>> IF YOU EVER GO THROUGH SALT
RIVER CANYON YOU CAN SMELL
SMELLS DRIVING THERE YOU
“a motor-
F.
04:34.07
>> IF YOU EVER GO THROUGH SALT
RIVER CANYON YOU CAN SMELL
SMELLS DRIVING THERE YOU
Eyes move around, not recorded
by numerical data in fixplot
(again, over 1 second of
interfixation durations)
211
“cycle
G.
04:34.83
RIVER CANYON YOU CAN SMELL
SMELLS DRIVING THERE YOU CAN’T
SMELL IN THE C
A
Moving around
(♪♫ music starts)
H.
04:35.27
RIVER CANYON YOU CAN SMELL
SMELLS DRIVING THERE YOU CAN’T
SMELL IN THE CAR, YOU KNOW
Saccades to LZ 2
♪♫ Music
I.
04:36.30
SMELLS DRIVING THERE YOU CAN’T
SMELL IN THE CAR, YOU KNOW
W
HY
Y
OU RIDE
A
And saccades back to LZ 1
fixating THERE
212
narrator:
“feared by some…”
J.
04:37.33
SMELLS DRIVING THERE YOU CAN’T
SMELL IN THE CAR, YOU KNOW
W
HY
Y
OU RIDE A MOTORCYCLE.
Eyes move around, with a long
interfixation duration of 2.686
seconds, and fixate DRIVING,
moving back to LZ 2 and then
back to LZ 1
*note: row 1 is the break between
two idea units and there is a long
pause before the next line appears
(nearly 4.9 seconds)
Below is the corresponding fixplot, illustrating the fixations of 100 ms or more as
captured by the eye tracker and calculated by the EYENAL software. Notice that Table
5.8 B, C, F and G are not indicated by fixations in the fixplot.
Figure 5.2 NNS4 fixations 38 – 46, corresponding to the visual data in Table 5.6
213
NNS4 starts in LZ 2 with fixation 37 (517 ms) (looking at the man’s face as seen above
in Table 5.8A), then saccades into LZ1 fixating GO (5.8 B) (fixation 38, 284 ms), briefly
looking at THROUGH, fixating CANYON (5.8 D)(fixation 39, 117 ms) then saccading
up to YOU (5.8 E) (fixation 40, 133 ms) then continuing to move around in LZ1 as
illustrated by Table 5.8 F and G before saccading out of LZ 1 into LZ 2 in 5.8 H (and
fixations 41, 42 and 43). He then saccades back into LZ 1 and fixates THERE (5.8 I)
(fixation 44, 133 ms) and moves around fixating again in 5.8 J near DRIVING (fixation
45, 434 ms), quickly moving between LZ 1 and LZ 2, before finally leaving LZ 1 and
focusing on the motorcyclist driving down the street.
NNS4 abandons the text in LZ 1 for another 55 seconds while watching the visual
pictures in LZ 2, including the interviewee as he is talking about teaching his son to ride a
motorcycle and writing a book to help that endeavor. NNS4 never looked at the written
text during this section of the news story but in his retell protocol he mentioned it saying,
“he talked about his son [ok] and uh and he I think he…made a books...”. This
information he comprehended from the visual modality (there was a picture of the book
shown in the video clip) and from the dialogue presented in the aural modality (there was
never a picture of the son).
As mentioned before, the NS participants tended to be able to find the information
or words that they needed quickly. Table 5.9 illustrates NS7’s use of LZ 1 to supplement
her viewing in LZ 2 as she is distracted momentarily by a reflection in the background
behind the person sitting in the video clip (5.9 B), finds the information once it is
presented in print in LZ 1, and then moves back to LZ 2.
214
Table 5.9 Multimodal analysis of NS7 illustrating distraction in LZ 2 and catch up using
printed text in LZ 1
Time
code
Audio/PRINT (CC) Eye movement / Description
Audio
Speaker:
“for example … and my
guess is
A.
04:06.97
PRINT (CC)
MILITARY FOR A LONG TIME AS
A PROPEL ANT.
NOT JUST AS AN EXPLOSIVE
BUT
Watches the main face on the screen,
the interviewer, in front of a
bookcase.
Audio
“that, um,
(reflection indicated by circle)
B.
04:07.57
PRINT (CC)
A PROPEL ANT.
NOT JUST AS AN EXPLOSIVE
BUT
PROPEL ANT IN
looks at top right corner movement:
in original video, reflection off of
glass behind the interviewer/woman
215
Audio
“the, um, airbag industry
C.
04:09.63
NOT JUST AS AN EXPLOSIVE
BUT
PROPEL ANT IN EJECTOR SEATS
IN FIGHTER PLANES.
Afterwards… looks at faces again,
Audio
“or the automobile industry
were in need of something
D.
04:10.53
PRINT (CC)
NOT JUST AS AN EXPLOSIVE
BUT
PROPEL ANT IN EJECTOR SEATS
IN FIGHTER PLANES, FOR
and then at the text in LZ 1, perhaps
to pick up the storyline again…
looks at EXPLOSIVE, which
quickly scrolls off screen, then looks
around at EJECTOR, FIGHTER,
MY GUESS … THAT, and after a
very quick regression to LZ 2 and
man’s face,
216
The interesting thing to note here, in this detailed analysis of a high proficiency
reader, is the ability to switch between look zones and continue reading where one left
off. In this case, NS7 was not reading the available text to much extent throughout the
video until now, instead using LZ 2. However, when she is distracted by the moving
reflection in the upper right corner, she then switches to LZ 1 and starts to read, and after
a momentary look at the speaker’s face again in LZ 2, is able to pick back up where she
left off in the sentence in LZ 2. While the aural modality is congruent with the video
clip, the printed text is not contiguous: the sentence happens to have already been spoken
(5.9A starts with the interviewee saying, “My guess is…”) about six seconds before it is
seen in the closed captioning (“MY GUESS IS…” starts in 5.9 E in the table above).
When NS7 begins reading the text in row 1, MY GUESS IS THAT… THE
AIRBAG , appears on the screen just before she was distracted by the reflection (in the
circle in 5.9 B), so that she is able to read the part that she missed hearing by using the
presentation incongruity of the two modalities as an aid in comprehension.
Audio
“in a hurry
E.
04:13.00
PRINT (CC)
IN FIGHTER PLANES, FOR
EXAMPLE.
MY GUESS IS THAT THE
AIRBAG
lands back on row 1 and reads
consecutively where she left off :
THE AIRBAG
217
The fixplot below corresponds to her eye movements as recorded numerically:
Figure 5.3 Fixplot of NS7 eye movements when distracted, then reading to catch the text.
Again, the fixation duration is indicated by the size of the circle. The longer
durations are focused around the faces of the speakers near the middle of the screen.
Fixation 394 and 395 are the moving reflection, after which she returns to look at the
faces again, momentarily looking off to the right at the bookcase (fix 397). She looks
down at the text quickly captured by fixations 400 (167 ms) and 401 (267 ms) up to the
face again and then back to the text with fixation 403 (300 ms) on THE, and 404 (317
ms) and 405 (117 ms) on AIR and BAG respectively, before saccading back to look zone
2 and the speakers. There is a double fixation on AIRBAG, with fixation 404 landing
near the ‘R’, and 405 briefly fixating ‘G’. This example from NS7, then, shows the
ability and flexibility of the proficient readers in switching with agility between the look
zones and using the available extra modality of the transcribed speech in the form of
closed captioning to supplement her comprehension of the text.
218
5.7 Aural channel: The other modality present
The data presented in this section only shows a visual map of the eye movements
and does not directly deal with the aural modality. The experimental design does not
allow for tapping directly into the perception of the speech event by the participants.
Indirectly, however, from previous research, we know that spoken word identification is
closely tied to visual word identification in that before the completion of a word is heard,
eye movements are recorded showing movement towards the appropriate selection of a
word (e.g., Tannenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995). In order to stay
true to a natural textual presentation, in which there would be a natural transaction with
the text for the participant, the original text was used in Condition Two: aural, graphic
and written modalities were present. Part three of the study, in which small case studies
are used, further explains the relationships between the viewer and the text and the
context surrounding the transaction (i.e. the reader’s background).
5.8 Summary and reflections regarding eye movements in multimodal
environments
Studies using closed captioning conducted by Jensema and his colleagues also
looked at the eye movements of individuals over time (2000a, 2000b). Jensema,
Sharkawy, Sarma Danturthi, Burch and Hsu (2000b) used two small groups that consisted
of deaf or hearing participants and from which they made the broad statement that
whenever closed captioning was present that reading would occur. I don’t directly
disagree with their statement, since the participants in my study did read the closed
captioning (even if for an extremely brief amount of time, as in the case of the NS
219
participants) but I do take issue with the experimental design that they chose when they
made that statement. Jensema et al’s participants only had two modalities: the presence
of closed captioning with video. The sound was not available, and so it seems reasonable
that in order to get any comprehension, especially for those participants that couldn’t read
lips, the printed text modality would be used and so reading would take place. My study
differs greatly, in that my subjects, all with a normal range of hearing and eyesight (with
correction), watched the videos in as near normal conditions as possible for Condition
Two, in that all available modalities were present and available for use to comprehend the
text. Closed captioning for the current study is used as an aid for language learners,
which differs in its premise from Jensema et al’s, whose purpose was to explore the use
by deaf adults of the available closed captioning. However, in both of the studies
discussed, Jensema et al’s and the current study, the participants that need the extra
modality of transcribed print used it.
In sum, this chapter explores the different patterns used by the two participant
groups as they viewed video texts that were supplemented by closed captioning. The
NNS group (with lower proficiency in reading English) used the closed captioning to a
greater extent than the NS group. They also seemed to encounter more difficulties
reading when they switch between modalities as their eye movements show a somewhat
random pattern at first, such as in Table 5.5, before either settling on an idea unit and
starting to read or abandoning the print text and devoting attention to the video and audio
tracks. Another observation was made with fixation durations and proficiency: the NNS
fixated longer than the NS. This held true for reading in a monomodal dynamic condition
220
as well as in the multimodal dynamic condition and extends the eye movement studies
regarding static reading conditions and the relationship between proficiency and fixation
duration.
The ideas of noticing and paying attention were brought up, and explored through
the use of the fixplots and the multimodal analysis of the visual data. Both groups
displayed instances of paying attention, such as NNS2 when he ignored the scene change
and kept reading. Noticing was also evident, as when NS7 was distracted by the
extraneous reflection and then moved to look zone 1 to find the text she needed. In order
to do that, she had to find the print text in relation to the spoken text that she had already
heard. NS8 was used to exemplify the changes between look zones when a scene change
occurred; the participants, in general, were aware of their multimodal surroundings, or
options, and used them to their advantage to better comprehend the text.
Complementing this line of exploration, Paivio’s Dual Coding Theory was briefly
explained as one way of describing the interaction between the multiple modes of
information and the activation from one to the other, especially in light of the time lag
between the aural and visual modalities and the transcription into the print modality.
Participants were seen using them together, however, such as NS7, and as well as not
being able to transfer between them such as NNS4, who couldn’t find the information he
needed or else couldn’t catch up with the moving text and went back to the aural and
visual modalities. It seems that the options for comprehension were used by the
individual participants in the manners that best suited their needs. In the next chapter,
221
Research Question Three will look at four of the participants in greater detail, making
connections between their learning strategies, needs and uses of different tools to aid in
learning.
222
CHAPTER 6:
RESEARCH QUESTION 3: CASE STUDIES
The previous chapters have analyzed the data according to the numerical and
video data collected by the eye tracker and the screen monitor. This chapter changes the
perspective and places the reader centrally in the process. Whereas the reader was
definitely involved in the two previous chapters and analysis, this chapter places the
reader in the front seat, metaphorically speaking, as a driver. Placing the reader back into
the reading process, he or she is now seen in his/her relationship between the text that is
being transacted with, and the reader’s personal background that is being used in this
transaction now becomes a principal part of this process.
Figure 6.1 The place of the reader in a Multimodal Multimedia Communicative Event.
The person’s background is included in the event, while the TV icon could also represent
movies or the internet.
Figure 6.1 illustrates the complexity involved in a Multimodal Multimedia
Communicative Event in which readers/viewers bring their background with them to the
Reader in a Multimodal Multimedia Communicative Event.
Closed captioning transcript
L1
L2
L3…
Previous
experiences
Personal
strategies
and st
y
les
reade
r
b
ack
g
round
223
event. Their background includes not only the languages that they are familiar with, but
the cultural meanings attached and integrated in those languages, and the previous
experiences with associated texts and themes similar to the one being viewed/read. The
transaction with the Event21 not only draws on the previous experiences and linguistic
knowledge of the participant, but also adds to the participant’s background and context,
as illustrated by double-headed arrows in Figure 6.1. It is a circular exchange within the
participant. In short, everything that makes that individual unique is used to interpret the
multimodal text in his/her unique way.
6.1 Question 3: Overview of procedures and questions
Four participants were selected for this personalized observance of the textual
interaction conducted during the experiment. Part of the methodology for the experiment
entailed a retell protocol that was collected after each text was viewed. The participants
were told at the beginning of the study that they would be asked a series of questions
about the video or text that they were viewing and each participant had a sample (static)
reading with retell questions to prepare them for the experimental texts. The last portion
of the experiment also involved an extended interview and the completion of a learning
styles questionnaire. The questionnaire was a series of questions that the participants
answered on a computer. Afterwards, the answers were compiled into graphs from which
a person could be placed into general categories regarding preferences for auditory or
visual learning styles and activities. It is these parts of the data collection that will be
21 The “Event” could be labeled as a Literacy Event, a Communicative Event, a Media Event, etc
depending on one’s theoretical definitions. Here, as described in Chapter Two, I have made the Event one
in which there is an attempt at communication, and the communication crosses the modalities of print,
media and spoken discourse a Multimodal Multimedia Communication Event.
224
sewn together to give an additional viewpoint with which to observe the multimodal
choices that a person makes when using a multimodal text and the individualized
background that influences and engages each individual in that process.
Research Question 3:
What are the relationships, if any, that can be established between the individual viewer’s
reading patterns and the self reported background history related to multimodal use?
Predictions for Research Question 3 include that there will be eye movement
differences that can be traced back to the learners’ proficiencies and familiarity with
reading dynamic text in English.
The following forms of data are used to answer Research Question Three:
Extended interviews and results from a learning styles questionnaire
Fixplots showing the eye movements of one whole text that included all available
modalities (video, sound, printed text)
Retell protocols from that specific text
Four participants were selected based on the following criteria: two NS and two NNS
participants from Condition Two (the multimodal condition) for each text sequence
(airbag and biker). The four participants also had minimal amounts of data loss in the
eye movement data. Narrowing down the available participants resulted in the following
selection:
NNS2 who viewed the biker text in Condition Two
NNS 5 who viewed the airbag text in Condition Two
NS 8 who viewed the biker text in Condition Two
NS 7 who viewed the airbag text in Condition Two
225
From this point forward, the participants will be given pseudonyms, and their
backgrounds are discussed in turn with a separate section for each participant. For each
participant, age, background, experience with using multimodal mediums to learn,
learning styles and strategies will be described with the addition of the fixplots of their
viewing activity with the experimental text. This chapter is only meant to show the
relationships between the participants, their self-reported backgrounds, and their eye
movements. The research question involves the reflection of the person in his/her
choices of attention in a multimodal text when comprehension will be asked about
afterwards22. Direct quotes by the participants that are presented in the following
sections will be in italics.
6.2 TARIQ
Tariq, at the time of the study, was 34 and studying English at an Intensive
English Program (IEP) in a southwestern Research I university23. From Saudi Arabia,
Tariq is married with three young girls, one of which is also enrolled in school in
kindergarten. Tariq’s wife speaks English as a second language as well, although at
home Arabic is the main language of use with English used occasionally. Tariq studies
quite diligently to improve his English, working his way over the last year from a low
22 Please note that all quotes are transcriptions of the interviews and that all grammatical and lexical
creativeness reflects the actual words used by the participant.
23 Studying in an IEP usually entails classes five days a week and about twenty hours of classroom time.
Students usually study integrated skills such as listening, speaking, grammar, writing, pronunciation, etc.
Students are placed into a level of general proficiency, and as their skills and abilities in the target language
improve, they advance in the level of instruction. At this particular IEP mentioned in this study, level 30 is
fairly low proficiency in English in which students may or may not be able to conjugate verbs and usually
can use present tense. By level 70, or ‘advanced’, students are preparing to take the TOEFL (Test of
English as a Foreign Language) exam and enter into undergraduate or graduate studies at a university.
226
level classroom (level 30, one of the lowest) to his present level of intermediate high
(level 70, the second highest level). His TOEFL score on the institutional paper based
version was most recently a 524. He is a motivated student who used materials outside
the classroom to try to actively improve his language abilities in his second language.
Tariq studied English starting at around eight years old and continued through
high school, however his English classes were conducted in Arabic by his teachers. Most
of his friends speak English, as does his mother and father. As mentioned, his wife speaks
English as well, and speaks it better than he does since, he jokingly added, “females have
better memory.” He plans on using his English to get a new job back in Saudi Arabia and
so will get an MA in Finance in the United States, adding to his MA in Business
Management earned in his home country. Tariq likes a variety of topics, and uses them
to practice his reading in English, saying that management textbooks have easier
grammar and vocabulary while anthropology is an all new vocabulary for him. He uses a
strategy to help him understand the text in which he reads whole paragraphs and then
figures out the vocabulary from context and “just keeps on reading”. Tariq likes to read,
but says that this is a habit he has forced since arriving in the United States to study
English.
Aware of his learning strategies, Tariq is pointed in his discussion about them.
For example, while he mentions that reading and writing are difficult for him, “lots of
people have same problem. Same difficulty. Mostly.” Asked about why he thinks this is,
he responded that, “first of all, in Arabic we don’t have same characters. Uh, second, we
usually for, I guess for most of Arabic people, they are listener then reader. For that
227
reason, um, my hope is to read… not to listen. To improve my reading.” When he agreed
with my re-statement that he doesn’t like to read as much as to watch or listen or talk, I
added, “So, it’s a cultural thing.” Tariq objected to this statement, saying that it is not a
cultural thing, but behavioral. He elaborated, stating that he guessed this, “because I’m
changing. Most of Arabic – good listener, good speaker… but not good reader. I change
that behavior when I came here.”
Tariq utilizes his time as efficiently as possible, and after the exchange about
behavior versus culture, he gave me an example:
Tariq: Now I’m reader and also as well as listener. While I walk from… while I
drove … while I rode a bike from my home to library or [IEP] or you I
usually have.. usually have… uh headphone, uh cd player listening to
lecture, listen to something to improve my language.
Beth: In English?
Tariq: In English. Yes. Yes.
Beth: oh – wow. Good for you.
Tariq: Thank you. So that… also I have a job in library I spend four to six
hours to… this kind of job in library is boring .. is routine job… you do
every time – same thing. Put shelve the book, shelving the book in the
shelf… that give you, that give you feel boring.
Beth: Yeah yeah yeah … routine.
Tariq: Routine. yes. yes.
Beth: So you listen to…
Tariq: So listen to lecture or long conversation, short conversation .. to improve
my skills and enjoy my time.
Beth: Do you ever do podcasts?
Tariq: Yes yes.
Beth: Where do you do your podcasts from?
Tariq: Um National Public Radio.
Beth: Me too!
In the above dialogue, Tariq explained to me that reading is a part of the English
language that he is consciously trying engage in, and that while he listens to podcasts,
watches movies and studies in his TOEFL book, he adds that “the best way for me is to
228
read. I read a book, any book. TOEFL book, anthropology, management… but, I guess,
easy book is much better than difficult book” after which I had to agree.
Tariq didn’t have time to watch much TV or movies when he lived in Saudi
Arabia, but now he tries to watch when he has time. He mentioned watching the popular
TV show “LOST” and reruns of “Friends”, and tries to watch with subtitles when
possible. Since he sometimes watches “LOST” on the internet, he can’t watch it with the
closed captioning on and wishes that he could since he uses television “as an
opportunity” to improve his English skills. Tariq specifically added that he watches TV
to get knowledge about “question-response” situations, such as “how to say things, how
to react” and to gather cultural and behavioral insight.
Tariq’s eye movements for the biker multimodal video text are presented below in
a fixplot. On the Learning Styles Inventory (LSI) (see Appendix F for an example of
questions asked), he scored a 13 out of 20 for ‘visual language’, and a 12/20 for auditory
language, indicating that he is fairly balanced in preferences for learning visually as well
as for learning aurally. He also scored 14/20 for both ‘oral’ and ‘written expressiveness’
indicating that he is balanced in feeling comfortable producing language both by talking
as well as through writing.
229
Figure 6.2 Tariq’s eye movements while watching the biker multimodal video text.
Tariq’s eye movements show he is indeed fairly well balanced in choosing the
different available modalities for gaining comprehension. Figure 6.2 shows many
fixations and trail gazes in the middle of the screen, typically the position of the face of
the person who is being interviewed in the news story. The dark lines towards the bottom
of the figure are his eye movements, including fixations and trail gazes, as he reads the
closed captioning. He spent about 61% of his fixations reading the closed captioning and
about 39% watching the video modality. This result isn’t far removed at all from what
Tariq expressed in the extended interview: he reads as much as he can in English, but
also realized that his strengths reside in listening. His familiarity with using closed
captioning and subtitles could be reflected in his use of the closed captioning in the news
clips used in the study, and although his retell protocol did not reflect that he understood
much, he was able to produce more in response to the prompts with the multimodal text
than he could in the condition with only closed captioning. Therefore, his eye
movements seem to reflect his language strengths and habits used in learning English.
230
6.3 FARID
Farid, at the time of the study, was 24 and also studied English for the past year at
an IEP at a southwestern university. Farid’s first language is Arabic and English is his
second language. Many family members, including three older brothers and one sister,
know and/or speak English, mostly for business purposes and although his father knows
English, Farid thinks that at this point he speaks English better than his father. Many of
his friends in Saudi Arabia speak English, although he says that they don’t use it much
outside of the classroom. He has used English in the past when talking to foreigners,
saying that English is useful because sometimes it is a common language. During the
interview, Farid spoke of the use of English as a lingua franca a few times, telling stories
about using it with people from India at the oil company where he interned as well as an
example from the United States to talk to a Moroccan whose Arabic dialect was too
different from his own to understand.
Farid has other influences and motivations to learn English besides the already
present connections with English in his family and friends. While he officially began his
study of English as a foreign language while young, in 7th grade, he didn’t take it
seriously until he started university. The university where he got his associate’s degree in
used English in the classroom, and so he had content courses in English for three years
before coming to the United States. When asked about this, Farid elaborated and said
that “we use English also there and we have an idea that the ones who know English…
he’s really educated. Even like the companies, they don’t look for the colleges or schools
that they just teach in Arabic.” Farid would eventually like to go back and work in Saudi
231
Arabia, but after he transfers to the University and completes a degree in Human
Resources.
We discussed his experiences learning languages and English in particular. Of
the interviewees, Farid is the only NNS who listed an L3. He is dabbling in learning
Spanish by talking to a friend, saying he “know[s] just a little… phrases or something”.
He would also like to learn Japanese, since his girlfriend is from Japan. Farid’s openness
to learning languages and cultures is always based on learning from “talking”. He does
not like to read, and said so a few times throughout the interview. Therefore, Farid
repeatedly said that he learns from speaking, and if he were to learn Japanese he would
start with talking and then turn to reading by finding a text book. He would also use
television or movies, but that would occur later on as well. He “wants to speak and use it
first.” But when asked about reading and what he does read, beyond that required by his
teachers in his IEP, Farid spoke of horoscopes and zodiacs and smaller texts. He doesn’t
like to read on the internet, and if he does then he will print it out first. It’s not reading in
English that Farid dislikes: “even in Arabic… I try to read… honestly… but that kind of
thing doesn’t come to me…”
Farid thinks that speaking and listening are the easiest parts of language for him,
while writing and reading are “kind of challenging. … I have to squeeze my mind”. There
were two places where Farid mentioned using print as a tool to help him with English.
One place where Farid doesn’t object to reading was the use of subtitles and closed
captioning. In fact, he is annoyed when he goes to friends’ houses and they are watching
TV or movies and the closed captioning is not turned on. He has used subtitles while
232
watching TV and movies in Saudi Arabia to help him with his language learning (and
sometimes to understand the movie) and says that he uses them because “sometimes I
can’t catch the word so I will read them.”
The other place that Farid uses print is to learn vocabulary. Farid places pieces of
paper with vocabulary that he wants to learn on the wall in his room and reads them
while getting dressed or writing an essay. He tries to use these vocabulary words during
the day. He says that learning vocabulary is easy, although he also adds that, “When I
learned English, like especially in the college, I had to memorize all vocabulary and it
was very hard because, you know, like it was English only in the classes and outside it
was puhhhh” (and makes a gesture with hands that probably symbolizes empty, or
nothing). He connects the idea of being able to actually use vocabulary with the ease of
learning grammar while here in the United States: “Grammar isn’t hard”, he says, “back
home was hard, and here can apply it”. For Farid, learning English is almost equal with
being able to use English in an environment outside of the classroom.
After talking with Farid, it was quite obvious that he presented himself as a
learner who first uses speaking and listening as strategies to learn and then uses print in
small doses to support his learning. When he took his LSI, he scored a 12 (out of a
possible 20) in visual language and a 17/20 on auditory language, placing him in the
range of an Auditory learner. He scored an 11/20 in written expressiveness versus an
18/20 in oral expressiveness, which was not a surprise given his favoritism towards
learning by talking with others. In the figure below, Farid’s eye movements are
displayed while he was watching the airbag multimodal video text. Notice that his use of
233
the screen and the different visual modalities is not that skewed at all, but rather balanced
in his use of the available modalities for gaining comprehension of the text.
Figure 6.3 Farid’s eye movements watching the airbag multimodal text.
Farid’s eye movements actually represent the most balanced viewing pattern
between reading the closed captioning and viewing the video of all 13 participants. He
watched the video for 46% of the eye fixations and read the closed captions for 54%.
While he didn’t produce the most accurate retells, it doesn’t seem to have been his
proficiency in English that prohibited him from understanding the video text. Why he
didn’t answer the retell prompts in detail is unknown. However, he did mention that he
understood the video condition better since, “that one was kind of help me because there
was video… even it’s new topic for me”. Farid seemed to warm up and speak more once
the extended interview started.
As Figure 6.3 illustrates, Farid uses the closed captioning quite frequently. This
corresponds to what he described as one of his strategies for understanding text: using
subtitles and closed captioning. He is not a stranger to the movement and presentation
style of reading in this format. His balanced use of the video is also in line with his self-
234
reported and LSI reported learning styles in that he prefers oral modes of languages. In
Farid’s case, his eye movements represent his background and learning styles fairly well.
6.4 ELENA
Elena, at the time of the study, was 20 and a sophomore at the same southwestern
university. A native speaker of English, she had picked Journalism for her major and
Communications as a minor. She loves to write and keeps a journal, favoring English as
a subject. Originally from Seattle, Elena had already studied French for three years in
high school and was starting her second semester of French at the college level. She
showed a removed attitude about learning French during the interview, in that she studied
because she had to as a requirement and although she had taken the placement exam for
French, she did not place out of any classes and instead was placed in the first semester
beginning French 101. During the interview she stated that she didn’t think that studying
French was difficult, but that it would be easier if she studied more. When talking
specifically about her classes at the university, she pointed out that she thought her
interest in French was related to her teachers, and that this semester she liked her teacher
and was “excited to go to French… so that helps too”.
While Elena’s motivation wasn’t strongly in favor of learning a second language,
she didn’t have an overt aversion to French. She likes to read French magazines and
compare what the French say about Americans and Hollywood figures with what is said
in American magazines. She says that she’s comfortable reading in French, but her
proficiency limits her so that she “can’t read word for word, but […] can definitely pick
235
up the main idea” and that she’s “definitely a lot better at reading than speaking. Which
is unfortunate”. Her years of studying in high school had helped her in her 101 class, but
her 102 class is more challenging. She feels that she should have learned more in high
school, but just didn’t. When I asked her about whether she thinks she’d ever use French,
she thought about it and slowly answered, “I… I really want to use it. I mean, I want to
maybe study abroad or something but… to be completely honest, if I don’t go to …
Europe, then no I probably, realistically, no I won’t use it.” She admits that French is
helpful if she’s reading something and there’s a French word in it, but otherwise it isn’t
seen as very useful to her.
We talked about her study strategies, and Elena described practices associated
with a person who loves to write, saying “I make flash cards… um and … write things
over and over again, like conjugations when I’m trying to remember them. And make up
little… mnemonic devices. I make stuff up like that.. a lot of times. Well… usually
before tests or something I’ll make one up.” I then asked her if she remembers them after
the test and she laughed and responded “no.” In order to see what Elena would do with a
fresh start at learning another language, I gave her a hypothetical language and asked her
how she would start to learn it. She again started with written material, mentioning flash
cards for learning vocabulary and verb conjugations, and then proceeded to audio but
seemed unenthusiastic in her use of it:
Elena: Uh… read the book and uh.. like
Beth: You would definitely use a book?
Elena: Yeah. And.. like in French we have a workbook and it comes with a CD
so some of the exercises are, like you listen, then you repeat, but I don’t
236
feel that helps that much but I’d do it. [laugher] Cause you never know…
you might be able to soak it up.
Beth: OK. So – you do that sometimes in French? or not really…
Elena: Sometimes… but I usually have to write it down. I can’t really just hear it.
Her relationship with French also comes out in the above dialogue when she talks
about using the provided materials in her language class. Elena will use the CD half-
heartedly hoping that she’ll acquire speaking and listening proficiency in French although
she recognizes that writing helps her the best with her second language. At one point, she
talked about feeling confident in her French class with her pronunciation, but not with
word choice and extended conversations in French, adding with a laugh that, “I’m the
kind of person who I would say a sentence and it would sound like an American trying to
speak French.. it would be a direct translation and that’s not the way that a French
person would say it probably…”. Elena seems to know what studying strategies might
work for her, but with a lack of motivation and potential use, she doesn’t seem to employ
them regularly.
Since part of the interview was inquiry into her learning strategies using available
multimodal material such as TV and DVDs, Elena and I talked quite a bit about using
French subtitles and closed captioning on television shows broadcast in English. She
likes to use closed captioning in certain situations, such as when the television program is
muted because her roommate is studying, when she’s on the phone, or at a party. Elena
will also play with the subtitles when she rents movies and put on the French dubbed
version and the French subtitles, although she doesn’t do that very often. She depended
on the French subtitles when her French 101 class watched a French soap opera in class,
and says that she misses it now in 102 since they’ve turned off the subtitles. She added
237
that she can still follow the plot, but mainly because of the body language and that “it’s a
lot harder without the words on the screen.” Overall, Elena depends quite heavily on the
printed material to help her with her second language learning. She has exposure to
closed captioning and uses it from time to time in her daily activities, but particularly to
learn. Below is the fixplot for Elena’s eye movements as she watched the biker
sequence:
Figure 6.4 Elena’s eye movements watching the biker multimodal text.
Elena’s eye movement patterns reflect a greater use of the visual and audio
modalities than the print modality, but she does look at the closed captioning from time to
time. In fact, during her interview she was aware of it and she stated as much:
…what I just watched like I didn’t read the subtitles most of the time but then
there was one scene where it wasn’t pictures of people or anything it was just cars
and you could hear the traffic and … then the lady was talking over the sounds of
the traffic and I couldn’t like really like listen to her so I just read it instead.
Her fixations represented a much lower use of the closed captioning than Tariq or Farid,
the two NNS in this chapter. Elena, as a NS of English, only used the closed captioning
for 17% of her fixations and spent the other 83% up in the visual area. However, her
238
reliance on the closed captioning for comprehension of the narration, such as in the
example that she gave, does show support for Elena’s use of the print modality and her
favoritism of printed materials. Her LSI results also corroborate her interview and to a
lesser extent her eye movements, as she was fairly balanced in her visual and auditory
language learning styles and she favored expression of language through written modes
over oral.
6.5 SARAH
Sarah was 21 at the time of study, and just starting the final semester of her Senior
year at the University with a double major in Journalism and Psychology. She had also
already been accepted to law school. A hard worker and voracious reader, Sarah excels
in many different subject areas, but never found a strong interest or ability with second
languages. Originally from Phoenix, Sarah spoke some Polish as a child since her father
was a native speaker and her mom had lived in Poland, but when her parents stopped
speaking it to her and her sisters, they lost most of what they had known. In high school,
Sarah took four years of Spanish, was able to transfer her requirements, and so has not
been in a foreign language classroom during her four years of university. I asked her
about her Spanish proficiency when she started as a Freshman:
Sarah: eh… it was ok. I don’t know… Spanish was never my thing. I didn’t
have a very good accent and I really didn’t like it that much and I didn’t
use it very much .. it was just this awkward thing… So I mean I could get
away with stuff and I could speak more than I can now but… I don’t know
.. languages are not… it would take me a long time to really fluently learn
to speak Spanish I think . and I don’t know why that is – I consider
myself pretty good in English .. pretty knowledgeable in English… so –
239
and my parents – my dad speaks like four languages so.. and my mom
speaks two at least… so …she picked it up in Poland and spoke basically
without an accent and Polish isn’t that easy to learn so…
Beth: mmmhmmm…
Sarah: but I kind of, I don’t know -- maybe it skips generations.
Both: [laughter]
Beth: yeah – the language gene.
Sarah: Yeah – I’m missing it.
Sarah knows what works for her as far as studying, but thinks that motivation is her
problem with becoming fluent in Spanish. In her other subjects she prefers to learn in an
unstructured environment, on her own, but with languages Sarah thinks that she needs a
more structured environment and wouldn’t study well on her own. She adds,
So it would be nice to know Spanish, it would be nice to know Polish, and I
always admire people who can command more than one language – that’s just
very impressive to me, but, you know, if I had a lot of time, then, sure I would
definitely try to learn Spanish. I think the thing about languages for me was I
think I needed a structured environment and that’s unusual for me cause I can
learn a lot and I like learning in an unstructured environment where I control the
pace … and I read and I do whatever, but with languages I would just not have the
motivation to learn alone.
Sarah finds that, in Spanish, reading is easier than writing, and that vocabulary comes
easily but that “getting them to string together grammatically correct is not one of my
strong points.” She doesn’t appreciate the textbooks that are used in high school Spanish
classes, nor the audio supplements, such as CDs, that come with the textbooks, which she
called, “lame recording CD things, spoken in a very unrealistic slow clear pace, and
you’re like ‘yeah, I understand Spanish’ and you get out and go ‘I don’t understand
Spanish’ unless you’re talking about desks and pencils… that’s about it.” All in all,
Sarah seems to know the limitations of learning Spanish in a foreign language classroom,
240
and yet has not made the effort to learn it outside the classroom either. For her, it is not a
priority at this point in her life.
When I gave Sarah a hypothetical new language to learn, (Japanese), and asked
her how she would go about learning it, she first responded with ‘”I would enroll in a
Japanese class at the university” and then she would do whatever the teacher tells her to
do, which is quite the opposite of her regular studying habits. She would also use flash
cards, and maybe use Japanese media to watch and read and listen. She added that she
had tried that a few times in the past with Spanish, to practice listening, but never pursued
it. She feels that if she can’t understand it, then it’s a waste of her time. Sarah does
recommend that particular learning strategy to the ESL students who she tutors in
writing: watching children’s shows “where the vocabulary is slower and more limited
and you can just grasp things.” She sees television as a tool for language learners, but
finds it frustrating for herself.
Sarah also mentions that she doesn’t like subtitles herself, since she finds them
distracting. She gave me an example of the subtitles used at the opera, and how “you just
don’t get the full experience… you have to watch twice” and so she also doesn’t watch
many foreign films for the same reason. She does appreciate closed captioning and the
practical uses of it not only for the hard of hearing, such as her grandmother, and for ESL
students, but also to use to supplement the audio and “clear stuff up that you didn’t get.”
Though she rarely uses it herself, Sarah readily concedes that she can “much better
recognize – at least in Spanish – written words than […] spoken words.”
241
When it came to viewing the multimodal text, which for Sarah was the airbag
sequence, she could recall a large amount of details. She could also do that with the
reading-only monomodal text. For Sarah, though, it wasn’t the choices of modalities that
affected her retelling rather it was the story; for her the airbag story was much more
coherent and the biker story was fragmented. Her journalism experience and love of
writing informs her critique. Overall, she expressed a tendency in the interview towards
a reliance on reading and seeing printed words to help her comprehension with foreign
languages, but as a native speaker of English, she rarely used the closed captioning in the
experiment when audio and visual modalities were available. Her eye movements are
represented in the fixplot below:
Figure 6.5 Sarah’s eye movements while watching the airbag multimodal sequence
Sarah’s eye movements reflect her interview statements about closed captioning:
for her, personally, it’s there to be used when something isn’t understood, and Sarah
understood almost everything in this text. She rated the text as unfamiliar, since she had
had no prior knowledge of sodium azide or its use in airbags, but the text and the story
presented little comprehension problems. Instead, Sarah uses the visual and aural
242
modalities in the story as can be seen in the fixplot since she looks at the areas where
people’s faces are located in the middle part of the screen. Her fixation percentages were
one of the lowest for the number of fixations looking at the printed text at only 3%,
leaving 97% of her fixations on the visual zone of the screen. However, her scores on the
LSI also represented Sarah’s use of language in that her visual language was far higher
(18/20) than her auditory language and so was her comfortableness with expression
through writing (also 18/20). In Sarah’s case, our interview discussion revealed more
about what she prefers to do when learning and studying than the eye movements in her
fixplot or her results in her LSI than either would alone.
6.6 Summary
In summation, this chapter answers Research Question Three and is an illustration
of the intersection between the participant and a text. For all four of the participants,
their eye movements reflected their individual patterns of comprehending text, that when
taken into consideration with their discussion during the interview and coupled with their
LSI responses, indicate that the eye movements could be representative to some extent of
the individual’s viewing and reading style when multimodal choices are available. The
point is really that: the choices that are available in multimodal environments are used in
different ways by different learners based on their individual needs. For these four
participants, their backgrounds, which include their L1 and their previous experiences
such as their schooling, culture, and behavior, impact their comprehension styles. The
NNS, although in general they both did not like reading in English nor in Arabic,
nonetheless used the closed captioning to a far greater extent than the NS who were of
243
course much more proficient in English, although who said that they rely on reading
printed text when they are using their L2 of Spanish or French. In sum, then, the closed
captioning seems to be used based on proficiency in the viewing language, rather than on
styles or strategies, with the caveat that a person’s background is always present and
informing the transaction with the text.
244
CHAPTER 7: SUMMARY OF FINDINGS, IMPLICATIONS AND
RECOMMENDATIONS FOR FURTHER RESEARCH
In the previous three chapters I have examined and discussed data regarding the
three research questions and subquestions put forth. This study, as a whole, is a foray
into a rich area of multimodal research that will hopefully have practical implications and
applications in the future. The current study has reviewed literature from several
different disciplines, but always with one eye towards language learning. Informed by
many of the past research studies and their results, the current study sought to find a
center between language learning, reading and multimodality. In so doing, the
Multimodal Multimedia Communicative Event framework was used to explain how
important the transaction between the viewer and the texts is for language learning: the
background of the viewer is vital, as are the available modalities for the viewer to use,
engage in, and select for optimal strategized learning events.
The Multimodal Multimedia Communicative Event is also important for placing
the L2 learner in the context of a member of an extended audience, rather than the
primary audience, of a media text. In so doing, the differences in background
information, the predictions and perceptions of the motivations of the addressers, the
differences in the perceptions of the code and the contact, must be recognized as possible
challenges to comprehension. By offering multiple modalities in a Communicative
Event, the L2 learner has a greater chance to use his/her background knowledge to
interpret correctly the message. By studying how L2 learners engage in Multimodal
Multimedia Communicative Events, and enter into the stream of communication in other
245
languages and cultures, e.g. the langue and parole, perhaps future research and
pedagogical practices can further incorporate and efficiently use the multimedia texts
already available to a global audience.
In this chapter, the areas of research, including the findings from the current study
(see Figure 2.2 from Chapter 2), will be used to explore the intersect between language
learning, reading and multimodality as seen in Figure 1.1 (from Chapter 1) below.
Figure 2.2 Areas of Research Figure 1.1 Cross sections of study
In the following sections I review the study, the findings and bring back to the
discussion the cognitive metaphor of COMPREHENSION is a PUZZLE. Implications
for pedagogical use, future research ideas and limitations follow.
7.1 Summary of findings
The participants in my study were placed into a communicative event as the
recipients of a message. The message was collectively produced as a Multimodal
M
ultimodality:
r
esearch and us
e
Closed
captioning
and subtitles
Research
with closed
captioning
and subtitles
Reading
theory &
research / L2
reading
Eye
movement
research
Learners:
styles and
strategies
Participants
and the
study
246
Multimedia Communicative Event in the form of a televised news story that was
recorded and replayed three years later during this study. Therefore the participants
involved were not the originally intended audience, but an extended audience, as
explained in Chapter Two. Within the study, the conditions of the communicative event
were altered in order to view different reading and viewing patterns.
Twelve participants’ eye movements were recorded using an eye tracker which
provided data in numerical forms as well as in a video form. Eight of the participants
(four NNS and four NS of English) viewed dynamic presentations of text (closed
captioning) in conditions one and two, and an additional four (NS) viewed a static
presentation of identical texts (condition three). The eight participants who viewed the
dynamic condition viewed two texts: one monomodal with reading as the only option for
gaining comprehension, and one multimodal in which audio, visual and print modalities
were available. Retell protocols were collected afterwards for all participants, with
extended interviews for the eight participants in Conditions One and Two. All interviews
were videotaped and transcribed.
Research Question One:
Upon analysis of the data, reading patterns were found for the transaction with dynamic
text. The lower proficiency readers in English (the NNS group) lingered longer while
reading as indicated by their fixation durations. This is in concordance with previous
findings about bilingual speakers and about lower proficiency readers with static texts,
but it can now be extended to the use of closed captioning and dynamic text as well.
247
Readers of dynamic text also display a trend in prediction of the text and a
transaction with the text. While the text presentation is one that is nearly always in
motion, with individual words visible on average for an average total of six seconds
while scrolling vertically up the screen on average every two seconds, each reader must
adapt his or her reading style to the rhythmic presentation of each particular text. It was
found that each reading text had its own rhythm but that the text was not as random in its
presentation as it initially seemed.
In exploring the data for this study and in conjunction with other research about
reading and oral communication, I posited a combination of speaking and reading
predictions to be used by readers of closed captioning. Since closed captioning is a
combination of modalities, or transcribed speech represented in the graphic form of print,
a reader can predict the textual content and structure using both the projection of idea
units and utterances and the prediction used in reading indicated by punctuation and idea
units and utterances (see Figure 4.5 illustrated again below).
Figure 4.5 Cross strategies: Prediction and Projection
Reading Speaking
PREDICTING
PROJECTING
Closed Captioning
and Subtitles
248
Situating closed captioning in this unique position, between the worlds of reading
and of speaking, makes comprehension easier for higher proficiency learners, and
perhaps more challenging for lower proficiency learners but indicates that it is also a
profitable place for language learners to practice with available language input. It is also
a place for language instructors to exploit, as will be seen in the following section
regarding pedagogical applications stemming from this study.
One of the interesting findings in the exploration of Research Question One was
the reading patterns of the NS groups. The fixation durations were expected to change as
the conditions of the textual presentation changed from dynamic to static, mainly due to
the forced timing of the dynamic presentation and the limitations for possible regressions;
however, the NS fixation duration remained fairly constant across conditions. This
indicated that the higher proficiency readers (of English in this study) read in a consistent
manner in order to gain comprehension of the text and that they weren’t particularly
rushed by the presentation of moving text nor did they adapt their reading strategies
regarding the length of fixations whether the words traveled up the page through the rows
or remained stationary on the page.
The dynamic reading setting in Condition One illustrated other reading strategies
or transactions with the texts that differed for the two groups of participants. Higher
proficiency readers of English tended to use a greater number of short fixations (between
100 and 200 ms) than the lower proficiency readers (in English) and these fixations were
slightly quicker than the average found in past reading literature involving eye movement
249
(where the average for silent reading has been found to be between 200 – 250ms). The
lower proficiency readers in English in the current study were found to use fewer short
fixations (100 – 200 ms) and more average fixations (200 – 300 ms) than the NS group,
indicating that the NNS group is perhaps approximating the supposed average fixation
duration for reading but is not matching what a more proficient reader of English tends to
do in this condition. It also illustrates that eye movement patterns for reading in a
dynamic condition is quite different than studies that report eye movements, and make
theoretical assumptions about reading, with static text as well as with text that is not
authentic or a whole text (rather the use of single sentences or words).
Another unique aspect of the current study is the analysis of the use of the
available lines. Closed captioning is different from subtitling in a few significant factors.
Besides transcription versus translation and the placement of the printed text on the
screen, closed captioning presents a repetitious pattern of textual presentation (while with
subtitles the line of text is presented only once). Therefore, the finding that there is a
difference in reading patterns of dynamic text in the amount of time spent on each row
for each viewer is novel. In researching Research Question One, I found that the
individual viewers alter their reading patterns as they move through the text from the
beginning to the end of the news story, indicating possible changes in reading strategies
or in transactions with the text as the content and the textual presentation rhythm become
more familiar.
250
Research Question Two:
In Condition 2, the dynamic presentation of only the text was replicated, although in this
condition there were the additional modalities of sound and visual graphics as options
from which the participants could choose to use for comprehension. The participants
were significantly divided in their choices: the NNS fixated more often and with longer
consecutive fixations patterns in look zone 1 than the NS, who by and large remained in
look zone 2. Congruent with Condition One, the lower proficiency group fixated longer
in look zone 1 than the higher proficiency group, suggesting that proficiency is still a
factor in how long a person fixates on a word regardless of the presentation style of the
text. However, I was predicting that because of associative activation that the fixations
would be quicker in Condition Two, and this was not the case for the lower proficiency
group. The average fixation duration instead increased indicating that there was an
alteration in the reading strategies and reactions to the text when more than one modality
was used to present information.
The response to Research Question Two also included an analysis of the patterns
of use and the changes between the look zones and the modalities available. Switches
from reading in look zone 1 to viewing in look zone 2 seemed to be caused by changes in
the scenes in the video, or by pauses in the closed captioning presentation that were
usually at the end of an utterance or at the end of a speaker’s turn. Switches from
viewing to reading were examined when the cause could be reasonably determined, such
as when there was a distraction and the viewer found the missed information in the print
text, or when there was contiguity between the visual picture and the corresponding
251
printed word. While this isn’t surprising, the detailed multimodal analysis of this sort
adds to the existing literature.
Beyond the reasons for the switches between modalities, a general trend was
found regarding the landing fixations when switching from look zone 2 to look zone 1.
NNS participants were found to return and land in row 3, the top row, and then search for
a recognizable word from which to start reading in a more consecutive pattern from left
to right, while NS participants were found to land on a word and to not search around as
much. This saccade pattern indicates that the proficiency, and most likely the familiarity
with the orthography, vocabulary and adeptness with predicting texts in English, is a
factor in using the text in a more efficient manner as a supplement to the aural and visual
modalities. All in all, it seems that the options for comprehension were used by the
individual participants in the manners that best suited their needs.
Research Question Three:
Research Question 3 explored the individual in relation to the text. When the extended
interviews and the information gathered from the eye tracker and the retell protocols were
integrated, the participants’ choices in selecting the modalities most useful to them for
gaining comprehension were indeed chosen. The different options allowed for the
individuals to choose; the limiting factor in their comprehension seemed to be their
proficiency in English, which is not unexpected. Since their comprehension was not
quantitatively measured during the retell protocol, but merely collected as evidenced by
the participants’ individual production, the telling mark is in the response that each
participant gave during the interviews. Interesting qualitative comments came from the
252
open retell protocols, during which the NNS participants remarked that they understood
more with the video rather than just the closed captioning and both NS and NNS often
included short anecdotes and background information that they found germane to the
story’s content.
7.2 The Puzzle: Back to the metaphor
In Chapter two, I explained that I wanted to use the metaphor of a puzzle to
explain the purpose and findings of this study. The metaphor of COMPREHENSION is
a PUZZLE is a cognitive metaphor that explores the use of language surrounding
comprehension and multimodality. A learner chooses a modality to pay attention to,
selects the relevant information, and switches between modalities when necessary to
complete the puzzle that is the ultimate goal: comprehension of the message. The pieces
of the puzzle together form the whole, and the learner must use all strategies available to
the best of his or her ability to piece together the meaning in the message. Pieces of the
comprehension puzzle include more than the sum of the modalities: as this study
evidences, the individual’s background, interpretation and perception of the message
must not be forgotten as vital in the prediction and construction of the meaning of the
text. They, too, are pieces in the puzzle. Some of these metaphorical word choices
reference the conceptual metaphor that KNOWLEDGE is a BUILDING which makes
sense since the two domains of comprehension and knowledge overlap.
The metaphor, however, is that within the multimodal event, the recipients must
use the most relevant and useful attributes of the information available to construct the
message, (and to relate to other disciplines and events: the decoder must choose what is
253
most salient to his or her abilities in decoding the incoming message, the hearer must use
not only the words but other contextual clues with which to add together to understand
the meaning and the message, the reader must use prior knowledge to help predict and
construct what is on the page or presented using print). Within the event, the puzzle that
is the message could be constructed in many ways by different recipients/hearers/readers/
decoders in conjunction with each person’s abilities and strategies. Condition Two and
Research Question Three found this to be the case: the recipients illustrated the selection
of the modalities which best aided their individual skills in improving their
comprehension of the text.
7.3 Implications based on the findings
The findings of this study indicate implications that influence second language
pedagogy and literacy as well as policies regarding awareness of multimodality as a
learning tool. Past research that used printed text as an additional modality for language
learning can be divided into two areas: studies conducted in a classroom with students,
and those conducted in laboratory settings. Both are of value to the present study, which
wishes to utilize both in a synergistic interplay whereby the students are brought into a
research setting with controlled conditions and yet placed in a position of individuality.
In this manner, the participants are seen in a context in which their background
knowledge, experiences and individual learning styles as students are seen as influences
upon their eye movements and reading patterns. Using a transactional theory of reading,
(e.g. Goodman’s Sociotransactional Theory (2003), and Rosenblatt (1994)), the
participant transacts with a text, using background knowledge, to perceive and predict
254
the texts in what I have presented as a Multimodal Multimedia Communication Event
(see Chapter Two and Three). In this Event, a viewer is a reader and a listener or a
receiver, as well as a participant. As part of an extended audience, the viewer transacts
with a text that was created to be understood and in so doing adheres to generally
predictable schemata and genre boundaries. The viewer is integral to the Event, which is
transacted with differently by every viewer.
7.3.1 Implications for the classroom and language teachers
The Multimodal Multimedia Communicative Event can be applied to uses by
language learners: in a classroom with a teacher, or by the motivated learner seeking
additional comprehension or practice with a second language (either in a second language
environment or as a foreign language). Chapter Three reviews past research conducted in
a classroom, (e.g. Neuman & Koskinen, 1992; Csápo-Sweet, 1997; Scollon, 1999; Tudor,
1987; Weyers, 1999) and those studies exploring the use of captioning, subtitles or other
types of additional printed text to existing visual texts for language learning in controlled
conditions in experimental settings (e.g. Mayer, 2005; Graber, 1990; Fisch, et al, 2001;
Gunter et al, 2000; Koolstra, et al, 1999; Linebarger, 2001; Markham et al, 2001;
Slykhuis et al, 2005; Stewart & Pertusa, 2004; Taylor, 2004; Walma van der Molen &
van der Voort, 2000). The vast majority of these studies overlooked the individual
identities of the participants, grouping them together by age, sex, language background or
language proficiency, or the fact that the participants were in the same language class.
These studies are important and inform the current study, however, this study deviates in
255
Chapter Six by giving a presence and a place for four of the participants. These
participants were truly placed into the EVENT.
Classroom teachers can, and frequently do, use video texts in their classrooms.
The current study encourages the classroom use of video, especially classrooms in which
there is a variation in proficiency levels and in which the students come from a variety of
cultural backgrounds and language experiences. Seen in the cross-section between
language learning, reading and multimodality, as in Figure 1.1 above, pedagogy can be
informed and encouraged through the ideas of allowing students multiple avenues of
comprehensible input.
Within this avenue of thought, supporting evidence in this study was found for the
proficiency of the viewer affecting the choices in modality selection, implying that stress
should be placed in guiding language learners to take advantage of the additional
modality of print and to use it if needed. A common complaint heard by teachers is that
subtitling and closed captioning are annoying. This is perhaps the case when the
multimedia text is given to high proficiency learners, or, as for the teachers, native
competency levels. But for lower proficiency language learners, the options for the best
comprehensible input, in whichever modality is best for the individual student, should be
made available.
In addition to the past research which has shown improvement in vocabulary,
listening skills and comprehension when the closed captioning is enabled and used by the
viewer, reading and familiarity with the print and the orthography may also be improved
with the inclusion of the print modality. The implications found in this study imply that
256
closed captioning, and by extension subtitling, can improve the learner’s comprehension
of the text if the modality is chosen. If the viewer chooses not to use the print text, he or
she does not have to pay attention to it and the eye movement patterns in this study have
shown that it is indeed a choice. There are short quick eye movement references to the
printed text at times for those that choose not to use print text, and there is switching
between the two for those who choose to use both.
The contiguity delay in the presentation of the print should be looked at as a
positive quality if used as an additional choice that supplements the audio and visual
texts. Therefore, second and foreign language classrooms would be well advised to
utilize the print text as a tool for increasing language proficiency, not only in the
classroom but to support students who wish to use it outside of the classroom. The
implications also are that with motivation, either garnered from self-study or from an
authority such as an instructor, the use of the additional modality can be used to gain
vocabulary and pragmatic skills using authentic material rather than the somewhat stale
material constructed for textbooks.
The implications also include that strategies are involved in the individual choice
in choosing modalities, and that these strategies can be promoted in the classroom to be
brought outside the classroom for personal use. If the promotion of using the already
available print text was implemented, even literacy rates in a first language may improve,
as shown by initial studies by researchers such as Deborah Linebarger (2001). Improved
literacy rates can only be beneficial to a society as a whole, and if the choice is available
to use, with motivation less proficient readers may be engaged with interesting texts and
257
use their own strengths (such as in listening) to aid them in building meaning and
improving their reading abilities with whole texts.
7.3.2 Implications for eye movement research
This study proved that moving text can be studied, albeit with challenges that are
not presented in other eye movement research involving static texts. In the process of
observing the eye movement patterns of viewers interacting with moving text, and with
moving text integrated into additional modalities, the possibilities for future research
regarding utilizing closed captioning and subtitling as a language learning tool are
opened. Future research aimed at supporting the subsidization of adding the print
modality to video texts to improve literacy rates and increase language learning
opportunities for all those with access to video are avenues to be promoted. The texts are
available: the internet is proof enough that people are interested in subtitling: many video
clips, movies and television shows are already being translated with the addition of
subtitles and posted for people who might not otherwise have access to these shows and
movies to view, and to potentially have a cross-modal opportunity to access and learn
other languages. Further research regarding how people cognitively use these available
modalities, how the combination adds or subtracts from the language learning process,
are avenues to be pursued. While Mayer and his colleagues have started down this road
of research, the road needs to be widened to include more than the scientific text genre: it
needs to be directed toward texts that are more widely used and more widely available to
a greater population.
258
7.4 Limitations of this study and future research:
In conducting this experiment and in collecting this data I’ve been able to observe
eight participants’ actions and choices in using different modalities to gain
comprehension. I’ve been able to look, for the first time in the published research
literature, at a few of the detailed patterns of reading moving text such as that presented
in closed captioning. One of the limitations that the moving text presented included the
challenge of collecting details such as specific words fixated and specific words skipped.
I was able to see small windows of such actions, but not on a global scale. This is not to
say that this can’t be done; time limitations prevented such analysis for this dissertation.
Many of the “limitations” listed in this section are also interesting and viable research
avenues that should be explored.
Once detailed analysis was embarked upon, there were several differentiations
between the present study and past research that came to light. Whereas in past
experiments the text presentation was presented over the majority of the page, such as in
past research in the EMMA lab, the text for the current study was confined to that of the
natural text of closed captioning, in which all of it was below 160° on the vertical axis.
This made the tell-tale saccades that indicate switches between rows and pages more
difficult to pinpoint using the numerical data. The moving text itself also created
difficulty in matching the background visual data to the numerical data to produce a
fixplot with the background present. I was able to get around this by showing a series of
fixplots to represent the movement within the same area on the screen and show
movement, and used a blank background for the fixplots. Corresponding screen shots
259
from the visual data could then be used to match the fixplots and the approximation of the
visual background. In this manner, a multimodal analysis of a multimodal literacy and
communication event was the only way to present the information. Although it was a
deviation from normal procedures, since showing the eye movement against a static
background was impossible, it opened up the possibilities to look at other various
characteristics in the data, such as those mentioned above in section 7.1.
Another difficulty in creating fixplots and calculating words skipped and
regressions occurred when I found the eye movements as represented in the visual data
and those movements as represented in the numerical data nearly impossible to match:
oftentimes I could watch eye movements that were not showing up in the numerical data
as fixations. I eventually accounted this to movements that were under 100ms, and that
the oculomotor movement was visible in the video and was not counted as a fixation by
the EYENAL software. An option for future research is to reduce the threshold for a
minimum fixation to less than 100ms, as addressed later in this section under future
research avenues.
This makes sense when the data that was collected for the conditions in this
experiment is looked at again: the 200-250ms average as reported by Rayner (1998) is
for static reading conditions; the 100ms threshold set by the ASL software is also set for
static background conditions. This condition in this study contains constantly moving
exemplars and so perhaps the threshold for noticing needs to be lower than that for
reading. Further research and another avenue of a literature review would give some
insight into distinguishing the possibilities regarding this issue. As an example, one of
260
the NS participants in the static reading condition was reading at an abnormally fast rate,
almost reading in a diagonal line across the page, and yet he remembered with
exceptional clarity the details of the news story. His eye movements could give insight
into the question of the fixation threshold as well. It is an avenue, again, for future
research when additional time to explore that data is available.
Other limitations, and future research avenues, in this study stemmed from the
experimental design: a natural, authentic text was chosen and that limited some of the
conclusions and analysis. I also chose not to use a multiple choice pre- or post-test for
comprehension, instead using the retell as a form of motivation for paying attention. This
limited the ability to directly conclude comprehension rates with statistics, but I don’t
believe that producing a response in a second language is a clear indicator of
comprehension anyway. Instead, it is an indicator of the ability of that participant to
compose a sentence in a second language which may or may not approximate
understanding.
A greater number of NNS of English would have greatly supported any claims
made. One participant’s eye movement data was found unusable due to calibration
problems, and two other participants had visual problems. Working with an eye tracker
produces an overwhelming amount of data when all is calibrated correctly, when the
participant isn’t tired or has allergies, when the lighting is correct, when no technical
malfunctions appear, etc. Research using a small n, however, often produces insightful
observations and interactions with the participants and the background that they bring
with them. In other words, Chapter Three was based on the premise that each participant
261
was an individual reader transacting with the text in a unique way. A larger n would have
given better statistical accuracy, but not held the premises of what a reader really does
such as in a Transactional model of reading.
Based on the data that was collected, with the addition of the video data from the
interviews, I plan on a few additional avenues for future research.
One avenue of future research with the eye movements was briefly
mentioned above: to look at the data through a new lens of a shorter
fixation threshold.
I would also like to look at the data in greater detail regarding line use and
return saccades from LZ 2 to LZ 1 in regards to proficiency and
familiarity with text.
I would like to further explore the gestures of the participants as they retell
the stories as well as when they discuss their learning strategies. This
could give insight into their comprehension of the text, as well as the
conceptual metaphors that they use when thinking about their own
learning strategies.
Regarding the use of formulaic sequences to aid in reading, predicting and
comprehension, I would like to analyze the text for formulaic sequences
and look for patterns of use in the retell protocols. It’s possible that the
participants are using those exact formulaic sequences to help them
remember and retell, while it’s also possible that the semantic meaning
was understood and then, according to proficiency, retold by paraphrasing.
262
This would give some insight into the comprehension process and how it
varies by proficiency.
This study was about choices, but when relating Condition One to
Condition Two I focused mainly on reasons in the print that caused
changes to look zone 2. I’d like to further explore possible patterns in the
video that prompt switches to the print text from the video and audio
modes (beyond the two that I found already regarding a distraction and
from the interview when the participant explicitly informed me that she
switched because of background noise).
I have learned quite a bit from my participants in this study. While multimodality
is still the main focus of exploration, closed captioning has been my “pet” subject for
many years now. The rapidly changing technology has already updated and altered the
use of closed captioning for the deaf, head of hearing and the language learner: I can now
practice my Spanish watching closed captioning. And so:
I’d like to study L2 Spanish language learners with Spanish subtitles and
either Spanish telenovelas or English drama shows. Using eye tracking,
what do the participants use to help them comprehend the text?
In the end, it is the combination of modalities that interested me and motivated me to
undertake this long and complex study, to troubleshoot the technical challenges, and to
observe the data from so many angles. I hope that the participants were able to take away
something about their learning styles and language learning strategies from the
263
interviews, and that I will be able to express the results and implications to others in the
future, so that multimodal use can be further exploited in the search for improving
language learning and literacy development, hopefully on a global scale as well as in the
classroom on a local scale.
264
APPENDIX A.
Airbags closed captioning text.
This text represents the closed captioning presentation on the screen, with line breaks
and spelling as originally represented. The lines of text have been numbered for
referral.
1. >> IT LOOKS LIKE SALT OR
2. SUGAR.
3. BUT IT IS A DEADLY POISON.
4. A LETHAL CHEMICAL CALLED
5. SODIUM AZIDE.
6. >> THE INCIDENTS OF PEOPLE
7. WHO HAVE TAKEN A TEASPOON,
8. OR LESS, AND HAVE COLLAPSED
9. AND TAKEN TO THE HOSPITAL
10. AND EVEN UNDER INTENSIVE
11. CARE COULD NOT BE SAVED.
12. >> U OF A SCIENTISTS,
13. DR. ERIC B. KNOWS ALL ABOUT
14. IT.
15. EVEN MORE THAN HE BARGAINED
16. FOR.
17. >> ACTUALLY, I POISONED
18. MYSELF.
19. I INHALED SOME OF THE POISON
20. FUMES OF SODIUM AZIDE IN
21. WATER.
22. IT HAD SUCH A DRAMATIC
23. EFFECT ON ME THAT I
24. REMEMBERED IT FOR A LONG,
25. LONG TIME.
26. >> THE DOCTOR SAYTHE
27. CHEMIC IS A BROAD SPECTRUM
28. BIOCIDE AND IN HIGH DOSES
29. CAN BE LETHAL IN ANYTHING IT
30. COMES IN CONTACT WITH.
31. WHEN HE FOUND THE CHEMICAL
32. WAS BEING USED IN AIRBAGS HE
33. WAS STUND.
34. >> I WAS READING SOME OF THE
35. SCIENCE ALERT KIND OF
36. LITERATURE AND SAW SODIUM
37. AZIDE WAS THE ACTIVE
38. INGREDIENT IN AIRBAGS.
39. AT FIRST I DIDN'T BELIEVE
40. IT.
265
41. I THOUGHT IT WAS A TYPO,
42. BECAUSE OF THE EFFECT THE
43. AZIDE HAD HAD ON ME.
44. AND IT WAS NOT A TYPO, IT IS
45. THE ACTIVE INGREDIENT.
46. THIS IS ROUGHLY GRAMS
47. GIVE OR TAKE OF IT.
48. >> AIRBAGS INFLATED WHEN AN
49. ELECTRIC CURRENT IGNITES THE
50. SODIUM AZIDE AND THE
51. POWDERED FORM OF THE
52. CHEMICAL CHANGES TO GAS.
53. >> WHAT HAPPENS IN AN
54. ACCIDENT, AN ELECTRIC RAL
55. CURRENT PASSES THROUGH THE
56. BACK OF THE INFLATABLE
57. MODULE AND CAUSES THE
58. CHEMICAL TO DECOMPOSE AND
59. THE PRODUCT IS NITROGEN GAS
60. WHICH ESCAPES FROM THE HOLES
61. AROUND THIS CAN HERE.
62. AND THAT ESCAPING NITROGEN
63. GAS IS USED TO INFLATE THE
64. AIRBAG, AND HERE IS AN
65. EXAMPLE OF AN AIRBAG.
66. >> TO SEE WHAT HAPPENS, A
67. TECHNICIAN IN A ROYAL BUICK
68. SET OFF THE AIRBAG.
69. >> THAT'S PERFECT.
70. >> AND WHILE THE DOCTOR SAYS
71. THE GAS FORM OF SODIUM AZIDE
72. IS SUPPOSEDLY SAFE, THE REAL
73. DANGER, WHAT HAPPENS DOWN
74. THE LINE WHEN THE CAR IS
75. JUNKED?
76. >> NOW WE'RE GETTING TOWARDS
77. A DECADE OR SO, GETTING
78. TOWARDS LIFE-TIMES OF
79. VEHICLES, NOW WE'LL HAVE A
80. STREAM OF VEHICULAR VEHICLES,
81. GOING TO THE SCRAP EACH
82. HOLDING ONE HALF POUND OF
83. SODIUM AZIDE.
84. >> AND THE DOCTOR SAYS IT
85. HAS A LONG SHELF LIFE AND
86. THESE SALVAGE YARDS COULD
87. BECOME SOME OF AMERICA'S
88. MOST POISONOUS PLACES.
89. >> THE GENIE IN THE BOTTLE,I'EFER TO KEEPT CAP
90. ON THE BOTTLE.
91. >> A RESEARCHER WORKING ON A
92. WATER CONTAMINATION PROBLEM,
93. THE DOCTOR ALSO WANTS TO
266
94. FIND OUT HOW DANGEROUS
95. SODIUM AZIDE IS TO THE
96. ENVIRONMENT.
97. >> WHAT HAPPENS TO ALL THE
98. TOXIC MATERIAL WHICH WILL
99. SOON BE ACCUMULATING IN OUR
100. JUNK YARDS?
101. >> AND HES ALSO HOPING TO
102. CONVINCE LAWMAKERS THAT
103. SOMETHING SHOULD BE DONE
104. ABOUT POISONOUS AIRBAGS.
105. >> SOME KIND OF REGULATIONS
106. INSTITUTED WHERE BY AIRBAGS
107. WOULD AUTOMATICALLY BE
108. DEPLOYED WHEN THE CAR COMES
109. TO A JUNK YARD, WHICH I
110. THINK WOULD BE THE MOST
111. SENSIBLE SOLUTION TO THIS
112. POTENTIAL PROBLEM.
113. BECAUSE UPON DEPLOYING AN
114. AIRBAG, YOU CONVERT THE
115. TOXIC SUBSTANCE INTO A
116. BENIGN SUBSTANCE, NITROGEN
117. GAS, WHICH IS MOSTLY WHAT
118. THE ATMOSPHERE IS COMPOSED
119. OF.
120. >> WHY WHY WOULD SOMETHING
121. SO TOXIC BE USED IN
122. SOMETHING SO WIDELY USED AS
123. AIRBAGS?
124. I'VE TRIED TO SURMISE WHAT
125. THE ANSWER IS.
126. AZIDE HAS BEEN USED BY THE
127. MILITARY FOR A LONG TIME AS
128. A PROPEL ANT.
129. NOT JUST AS AN EXPLOSIVE BUT
130. PROPEL ANT IN EJECTOR SEATS
131. IN FIGHTER PLANES, FOR
132. EXAMPLE.
133. MY GUESS IS THAT THE AIRBAG
134. INDUSTRY, AUTOMOBILE
135. INDUSTRY, NEEDED SOMETHING
136. BECAUSE -- IN A HURRY
137. BECAUSE CON GRACE MANDATED
138. THIS WOULD GO IN EFFECT IN 1994
139. >> THE FATAL POISON THE
140. SUBJECT OF A NEW BEST SELLER
141. CALLED "PARTNER INSHSH THE A FO H
142. BOOK AERERRDINGABABTHE
143. DOCT'S SEARCH.
144. WE HAD DINNER AND CHATTED
145. ABOUT SODIUM AZIDE AND THE
146. TOXIC PROPERTIES, AND
267
147. INCORPORATING THIS MATERIAL
148. INTO AIRBAGS.
149. SHE FLOATED A FEW IDEAS FOR
150. A PLOT FOR HER MYSTERY, AND
151. I DON'T KNOW IF THOSE IDEAS
152. WERE INCORPORATED IN THE
153. BOOK, BECAUSE I HAVEN'T READ
154. IT YET.
155. >> THE DOCTOR SAYS CAR
156. MANUFACTURERS ARE FINALLY
157. BECOMING TO PHASE OUT SODIUM
158. AZIDE FOR SAFER COMPOUNDS,
159. BUT IT COULD BE TOO LITTLE,
160. TOO LATE.
161. >> THE QUESTION IS, EVEN IF
162. WE ELIMINATED ALL THE AZIDE
163. TODAY, THERE WOULD STILL BE
164. TENS OF MILLIONS OF POUNDS
165. OUT IN THE STREET WE HAVE TO
166. DEAL WITH SOMEHOW IN THE
167. FUTURE.
268
APPENDIX B
Arizona Bikers closed captioning text.
This text represents the closed captioning presentation on the screen, with line breaks
and spelling as originally represented. The lines of text have been numbered for
referral.
1. >> AS A SUBCULTURE OF AMERICAN
2. SOCIETY, BIKERS HAVE BEEN
3. LARGELY MISUNDERSTOOD.
4. THAT'S WHY A TUCSON BIKER IS
5. TRYING TO SET THE RECORD
6. STRAIGHT.
7. PRODUCER SOOYEON LEE HAS
8. TONIGHT'S EDITION OF ARTSCENE.
9. >> WHY DO I RIDE?
10. IT'S A FEELING OF FREEDOM FOR
11. ME.
12. IT'S THE WIND IN YOUR FACE.
13. THE SMELLS, GOOD PEOPLE YOU
14. MEET.
15. >> IF YOU EVER GO THROUGH SALT
16. RIVER CANYON YOU CAN SMELL
17. SMELLS DRIVING THERE YOU CAN'T
18. SMELL IN THE CAR, YOU KNOW WHY
19. YOU RIDE A MOTORCYCLE.
20. >> FEARED BY SOME, ADMIRED BY
21. OTHERS.
22. SOME MEMBERS OF THE BIKER
23. COMMUNITY ARE DISPELLING MYTHS
24. ABOUT THOSE WHO RIDE TWO
25. WHEELS INSTEAD OF FOUR.
26. >> I WAS A BIKER FOR MANY,
27. MANY YEARS.
28. I TOOK MY SON, MARK, ON THE
29. DIRT AND I TAUGHT HIM HOW TO
30. RIDE WHEN HE WAS YEARS OLD.
31. BUT THEN I WAS WORRIED TO
32. DEATH WHEN HE WENT ON THE
33. STREET.
34. I SAID, WHAT CAN I TELL HIM?
35. AS A FATHER, I CAN TELL HIM
36. NOTHING THAT HE'S GOING TO
37. REMEMBER OR LISTEN TO BECAUSE
38. I'M HIS FATHER.
39. >> SO, STUWART MILLER WROTE A
40. BOOK ENTITLED "THE BIKER CODE"
41. WISDOM FOR THE RIDE.
42. >> THE BOOK IS WRITTEN FOR THE
43. NONBIKER AND THE BIKER WANNAB
269
44. IF YOU LEARN ONE THING,
45. RESPECT YOURSELF.
46. RESPECT LIFE.
47. EDUCATION IS GOING TO BE
48. SHARING THE ROAD AND
49. RESPECTING THAT.
50. >> THEIR LOOKS CAN BE
51. INTIMIDATING, BUT BIKERS HAVE
52. A SOFTER SIDE ACCORDING TO
53. MILLER.
54. >> FACT THAT THESE ARE HUMAN
55. BEINGS THAT THE GUY WITH THE
56. TATTOOS IS A LAWYER OR A
57. DOCTOR OR AN ENGINEER, HE'S
58. GOT FIVE KIDS AND HE'S OUT
59. THERE BECAUSE THIS IS HIS
60. OTHER PERSONA.
61. HE'S NOT A BAD GUY.
62. HE'S YOU AND HE'S ME AND HE'S
63. OUT FOR THE FREEDOM AND THE
64. FUN OF THE RIDE.
65. HE'S GOT MULTIPLE PERSONALITY,
66. HE'S A BIKER AND HE'S A
67. SURGEON.
68. HAVING A PREDISPOSED ATTITUDE
69. TOWARD THE BIKER IS WHAT THE
70. BOOK PUTS DOWN.
71. IF I MET A BIKER AND INVITE
72. HIM TO DINNER HE WILL PROBABLY
73. MORE FRIENDLY OR MORE
74. INTERESTED OR MORE OF RAPPORT
75. THAN IF INVITED AN UNCLE OR
76. AUNT OR COUSIN WHO HAVE
77. NOTHING TO DO WITH.
78. BECAUSE THERE'S A COMMUNITY
79. OUT ARES ABOUT
80. EACH OTHER.
81. L OTHER AND
82. SISTER.
83. >> YOU MAY NOT BE RIDING THAT
84. KIND OF BIKE OR LIVING THAT
85. KIND OF LIFESTYLE.
86. BUT ANYONE WHO RIDES A
87. MOTORCYCLE YOU CAN READ IT AND
88. THINK, HEY, I FEEL THAT WAY.
89. >> SAFETY IS SOMETHING THAT
90. CONCERNS EVERYONE ON THE ROAD.
91. >> WHEN YOU THINK OF EVERY CAR
92. AS BLIND AS A BAT YOU HAVE TO
93. BE %DEFEE.
94. YOU TAKETHE D FOR
95. D YOU GOT PROBLEM.
96. WHENOU HAVE TRS,YOU
270
97. SHOU HECK THE BIKE AND GO
98. TO A MOVIE.
99. YOU DON'T WANT TO ESCAPE YOUR
100. PROBLEMS WITH A MOTORCYCLE.
101. YOU HAVE TO BE LIKE FLYING A
102. PLANE, IF YOU'RE NOT %,
103. CHECK THE PLANE OUT, CHECK
104. YOURSELF OUT.
105. YOU GOT TO MAINTAIN IT.
106. YOU GOT TO UNDERSTAND IT.
107. IT'S GOT TO BE PART OF YOU,
108. FEEL IT.
109. GO TO MOTOS
110. SCHOOL.
111. TAKEA COURSE.
112. EVEN IYOU RIDIDI YEARS,
113. N'T INK YOU KNITALL.
114. BECAUSNODY K IT ALL.
115. DON'T EVER LET YOUR GUARD
116. DOWN.
117. >> MILLER DEVELOPED THIS BIKER
118. SPECIFIC ROAD SIGN WHICH
119. COAUTHOR, JEFFREY MOSS.
120. >> A HINKG OUT
121. SHAVERING THE ROAD AS
122. PHILOSOPHY, IT'S A WONDERFUL
123. PHILOSOPHY.
124. I'M TALKING TO A BAIT.
125. I'M SETTING MYSELF UP, MY
126. ATTENTION TO WORK ROAD SIGN
127. INTO THEAPE WHERE
128. PEOPLE RIDE ALONG SAY, BIKER.
129. IT'S NOT A COW.
130. IT'S NOT A DEER.
131. IT'S NOT A RUNNER.
132. IT'S A BIKER.
133. I JUST SAW SIGN THAT SAYS
134. BIKER.
135. AROUND THE NEXT CORNER THEY
136. MAY SEE A BIKER.
137. ALL OF A SUDDEN IT'S IN THE
138. SUBCONSCIOUS.
139. >> MOST PEOPLE THINK OF US AS
140. IN THE MOVIES AS HELL'S ANGELS
141. WHICH WE'RE NOT.
142. >> SOUNDS LIKE A WASHING
143. MACHINE.
144. >> JUST PEOPLE OF DIFFERENT
145. WALKS OF LIFE WHO LIKE TO RIDE
146. MOTORCYCLES.
147. THE BOOK PROVES IT.
148. >> MILLER SAYS, RIDING IS
149. ABOUT THE JOURNEY NOT ABOUT
271
150. THE DESTINATI.
151. >> LOVE LIFE ENOUGH TO EMBRACE
152. IT IN THE SENSE THAT I'M GOING
153. TO DRIVE A LITTLE SAFER, I'M
154. GOING TO THINK A ALITTLE MORE,
155. I'M NOT GOING TO TRY TO GET
156. FROM POINT A TO B AS FAST AS I
157. CAN.
158. I'M GOING TO ENJOY THE RIDE.
159. >> RIDING IS SYMBOLIC OF
160. AMERICAN FREEDOM.
161. THE FREEDOM OF ALL PEOPLE
162. SHARING THE ROAD.
272
APPENDIX C
RETELL AND EXTENDED INTERVIEW QUESTIONS: NNS
CESL level finished ____________
RECALL Questions:
Your friend walks in to the room, and asks you “What are you watching? Is it any
good?” Tell him/her about what you were watching.
Do you remember anything else about what you were watching?
What do you think about the topic or the story?
TEXT 1: ________________
TEXT 2: _________________
273
Background questionnaire: (yes/no, short answer)
Male/female age:
1. What country are you from?
2. How long have you been in the US?
3. Have you only studied English at CESL? Where else?
4. What other languages have you studied? Which do you think you are
the strongest in? the weakest? (List in order:
5. What kind of hobbies do you have?
6. Do you like to read? What kind of things do you like to read?
7. Do you like to read in a different language? Do you ever read in a
different language? What do you read?
8. Do you like movies? What kind of movies? Why do you watch movies?
9. What is/was your favorite subject in school? Why?
10. What do you think about closed captions and subtitles?
11. Do you know the difference between cc & subtitles?
12.
274
A. MOTIVATION:
Why do you want to learn English?
expand: do you want to get a job that requires English?
Do you want to continue studying at a university in an English speaking country? (What
degree?)
Do other members of your family speak English?(Who?)
Do your friends speak English?... (How many?)
B. PAST EXPERIENCE:
1. What other languages have you studied?
How ‘good’ at those languages do you think you are?
Is it easy to learn other languages for you? Is it difficult?
What is easy or hard for you when you learn a new language? (writing, reading,
vocabulary, verb tenses/grammar…)
2. How is English the same/different as other languages you’ve learned?
When did you start to learn English?
Did you like learning English at first? How about now?
What makes is easier/harder now?
Did you learn English from a book, your teachers, and/or in other ways (what other
ways?)?
Did you speak English outside of the classroom?
275
C. HABITS:
What do you think is the best way for YOU to learn another language?….
What strategies do you engage in (use) to help you learn another language –
in the classroom? - on your own?
D. MEDIA ASSISTANCE (subtitles, dubbing, closed captioning?):
1. In your home country, are movies shown on television dubbed into your language or
do they have subtitles?
Are the subtitles in your writing system (in Arabic script) or in the Roman writing system
(like in English)?
Which subtitles do you like better/why?
Are television programs ever subtitled in Arabic (is the writing in the same language as
the talking?)
2. When you live at home, do you watch any television or movies in English?
Can you give examples?
Are there subtitles?
3. Here, in the United States, how often do you watch television or movies at home?
How often do you watch television or movies in Arabic?
Do you watch any in English?
With or without subtitles?
Are the subtitles in English? Or in Arabic?
What kind of movies do you watch in English at home… can you name some of them?
And for television? What do you normally watch?
276
APPENDIX D
RETELL AND EXTENDED INTERVIEW QUESTIONS: NS
RECALL Questions:
Your friend walks in to the room, and asks you “What are you watching? Is it any
good?” Tell him/her about what you were watching.
Do you remember anything else about what you were watching?
What do you think about the topic or the story?
TEXT 1: ________________
TEXT 2: _________________
277
Gender = Age= Year= Major= L1=
L2= L3=
Background questionnaire: (yes/no, short answer)
Male/female age:
13. Where are you from?
14. How long have you been at the UofA?
15. What is your major?
16. What other languages have you studied?
Which do you think you are the strongest in? the weakest? (List in order:
17. What kind of hobbies do you have?
18. Do you like to read? What kind of things do you like to read?
19. Do you like to read in a different language? Do you ever read in a
different language? What do you read?
20. Do you like movies? What kind of movies? Why do you watch movies?
21. What is/was your favorite subject in school? Why?
22. What do you think about closed captions and subtitles?
23. Do you know the difference between cc & subtitles?
278
A. MOTIVATION:
B. PAST EXPERIENCE:
1. What other languages have you studied?
How ‘good’ at those languages do you think you are?
Is it easy to learn other languages for you? Is it difficult?
What is easy or hard for you when you learn a new language? (writing, reading,
vocabulary, verb tenses/grammar…)
Have you ever lived in a different country?
How did you learn the language? What helped you the most in learning the language…?
did you want to learn the language?
C. HABITS:
What do you think is the best way for YOU to learn another language?….
What strategies do you engage in (use) to help you learn another language –
in the classroom? - on your own?
D. MEDIA ASSISTANCE (subtitles, dubbing, closed captioning?):
1. Here, in the United States, how often do you watch television or movies at home?
How often do you watch television or movies with subtitles or closed captioning?
279
APPENDIX E
PRACTICE READING & RETELL
Please read the following story silently.
Afterwards, I’d like to ask you a few questions about it.
This is a news story.
>> After spending years
literally watching the corn
grow, a University of Arizona
Molecular Biologist and
Geneticist was recently
elected to the National
Academy of Sciences.
Membership in the academy is
considered one of the highest
honors a U.S. scientist can
achieve.
Tonight we look at the work of
Vicki Chandler.
>> Scientist Vicky Chandler
was in bed sound asleep when a
friend called very early in
the morning to tell her the
good news.
>> She says, Vicki, I'm in
Washington, we just elected
you to the National Academy.
I thought -- in fact I said
if this is a joke it's a bad one.
She goes, no, no, no joke.
and passed the phone.
So there were about 15 people
gathered around by the cell
phone to give me
congratulations.
It was pretty exciting.
>> For most of her career, this
Molecular Biologist and
Geneticist has literally been
watching the corn grow.
280
>> The corn plant actually has
about the same number of
DNA genes as humans.
Probably more.
Probably more genes than humans.
That always shocks people.
They look at a corn plant, so simple.
We’re so complex.
But in fact plants are amazing
biochemical factories.
>> This is not your run of the
mill country corn field.
>> One plant is green and one
plant is purple.
This plant is both green and
purple.
It has stripes.
>> But a highly specialized
crop of maize crosses
will help Dr. Chandler find
out how certain genes turn on
and off.
>> The corn plant, there's a
different set of genes that
are made in the leaves versus
the roots.
Versus the flowering parts of
the plant.
And us, there's very different
genes that are expressed in
our liver versus our skin
cells, versus our brain cells.
turning the genes on in the
wrong place causes disease.
It causes -- in many cases cancer.
and the same one that causes
disease in plants and other
animals.
So controlling where and when
genes are expressed is really key.
>> Dr. Chandler and her
colleagues are more than
281
halfway through decoding the corn
genome.
Dr. Chandler’s discoveries
and crop genetics can be
applied to many different
species.
>> Part of what I love so much
about the research is that it
allows me to be outside,
working with the plants and
really doing biology - but also
in the lab doing the research.
>> Now all the hard work has
paid off.
Dr. Chandler’s now one of the
elite few who are members of
the National Academy of Sciences.
>> There’s probably several
hundred thousand practicing
scientists in the U.S.
so it's a small number that do
get elected.
There’s a lot more people worthy.
that's what makes it feel so
good is like, oh, somebody noticed.
Somebody cared about the work
I’ve been doing for the last 17 years.
Yeah, it was very thrilling.
When you finish please
tell me.
282
APPENDIX F
SAMPLE OF LEARNING STYLES INVENTORY (LSI) QUESTIONS
(Educational Activities Software)
(used with permission from EAS)
Least Most
Like ME Like ME
I understand spoken directions better than written ones. (1) (2) (3) (4)
Writing a spelling word several times helps me
remember it better. (1) (2) (3) (4)
It is easy for me to tell about the things I know. (1) (2) (3) (4)
The things I write on paper sound better than when I
say them. (1) (2) (3) (4)
When I really want to understand what I have read, I
read it softly to myself. (1) (2) (3) (4)
I do well in classes where most of the information
has to be read. (1) (2) (3) (4)
283
APPENDIX G
HANDOUT ABOUT LEARNING STYLES
WHAT DO YOUR RESULTS MEAN?
WHAT ARE YOUR LEARNING STYLES?
COGNITIVE STYLE
refers to the preferred mode of taking in information. There are five areas:
AUDITORY LANGUAGE: Cognitive Learning Styles: You learn from hearing words
spoken. You might like to read out loud, especially when you’re learning new things. You might
be good at understanding and remembering words or facts when you hear them.
VISUAL LANGUAGE: Cognitive Learning Styles: You learn well from seeing words in
books, on computers, on the board or on charts. You may like to take notes so that you can learn
by seeing the words on paper. You remember and use information better if you have read it.
AUDITORY NUMERICAL: Cognitive Learning Styles: You learn from hearing numbers
and oral explanation. You may remember phone and locker numbers easily, and you are
probably successful with oral numbers, games and puzzles. You may do just as well in math
without your textbook, since written materials aren’t necessary for you to learn. You can probably
work out problems in your head. You might like to quietly say problems out loud as you think
about them.
VISUAL NUMERICAL: Cognitive Learning Styles: You have to see numbers on the board,
in a book, on a computer, in a video or on paper in order to work with them. You are more likely
to remember and understand math facts if you have seen them.
TACTILE CONCRETE: Cognitive Learning Styles: You learn best by experience. You
need a combination of stimuli. The manipulation of material, along with sight and sound, really
help you understand and learn. You like to handle, touch and work with what you are learning.
SOCIAL STYLE
refers to a learner’s preference to work alone or in a group
INDIVIDUAL LEARNING: Social Learning Styles: You get more work done alone. You
think best and remember more when you have learned alone. You do not allow other students
and their opinions to influence you: you value your own opinions.
GROUP LEARNING: Social Learning Styles: You like to study with at least one other
students and you don’t get as much done when you’re alone. Others’ opinions and preferences
are valued by you. Working in groups increases your learning, and later, recognition of facts.
284
EXPRESSIVE STYLE
refers to the preferred methods of giving out information
ORAL EXPRESSIVENESS: Expressive Style: You can easily tell the teacher what you
know. You like to talk and communicate what you know and mean. You don’t mind giving
reports or talking to the teacher or to your classmates. You might not like writing as much as
talking, and writing and organizing your thoughts on paper can be tedious for you.
WRITTEN EXPRESSIVENESS: Expressive Style: You can write essays fluently and
answer essay questions to show what you know. You might feel uncomfortable when you have
to talk in class and give answers to questions. You like to organize your thoughts and write them
on paper.
285
APPENDIX H
MULTIMODAL ANALYSIS OF AIRBAG TEXT
Multimodal Analysis of “Airbag” text. CAPS, Ariel indicate closed captioning. Text
used with permission from KUAT, University of Arizona, Tucson, AZ.
Picture
1
2
3
Description
(graphic)
Transcription
(audio)
>> It looks like salt o
r
Sugar. But it
:: is a deadly poison.
A
lethal chemical
:: called
Sodium Azide.
>> There’re incidences of people
w
ho have taken a half a
CC text
(written)
A
LETHAL CHEMICAL. PAM WHITE
T
ONIGHT ON THE POISON IN YOUR
A
IRBAG
>> IT LOOKS LIKE SALT OR
SUGAR.
BU
>> IT LOOKS LIKE SALT OR
SUGAR.
BUT IT IS A DEADLY POISON.
A
LETHAL CHEMICAL CALLED
SODIUM AZIDE.
Picture
4
5 6
Description
(graphic)
Transcription
(audio)
t
easpoon, or less, and have
collapsed and taken to the
hospital and even under intensive
care could not be saved.
>> U of A scientist, Dr. Eric
Betterton. knows
all about it. Even more than he
bargained for.
>> I actually uh poisoned myself. I
inhaled
Some of the poison fumes of
sodium azide in water.
CC text
(written)
BUT IT IS A DEADLY POISON.
A
LETHAL CHEMICAL CALLED
SODIUM AZIDE.
>> THE INCIDENTS OF PEOPLE
W
HO HAVE TAKEN A TEASPOON,
OR LESS, AND HAVE COLL
A
PSED
A
ND TAKEN TO THE HOSPITAL
A
ND EVEN UNDER INTENSIVE
CARE COULD NOT BE SAVED.
A
ND EVEN UNDER INTENSIVE
CARE COULD NOT BE SAVED.
>> U OF A SCIENTISTS,
DR. ERIC B. KNOWS ALL ABOUT
IT.
EVEN MORE THAN HE BARGAINED
FOR.
>>
A
EVEN MORE THAN HE BARGAINED
FOR.
>> ACTUALLY, I POISONED
MYSELF.
Picture
7 8 9
Description
(graphic)
286
Transcription
(audio)
A
nd um it had such a dramatic effect on me that I remembered it
f
or a long, long time.
>> Doctor Betterton says Sodium
A
zide is a broad spectrum biocide
and in high
CC text
(written)
>> ACTUALLY, I POISONED
MYSELF.
I INHAL
>> ACTUALLY, I POISONED
MYSELF.
I INHALED SOME OF THE POISON
FUMES OF SODIUM AZIDE IN
W
ATER.
IT HAD SUCH A DRAMATIC
EFFECT ON ME THAT I
REMEMBERED IT FOR A LONG,
LONG TIME.
Picture
10 11 12
Description
(graphic)
Transcription
(audio)
doses can be lethal in anything it
comes in contact with.
So when he found the chemical
w
as being used in airbags he was
stunned. >> I was
reading some of the science alert
kind of literature and saw that
Sodium Azide was the active
ingredient in airbags. And at first I
didn't believe it. I thought it was a
t
ypo, because of the effect that
t
he Azide had had on me when I
w
as a graduate student. And I
t
racked it down and lo and behold
it was not a typo, it is the active
ingredient.
CC text
(written)
REMEMBERED IT FOR A LONG,
LONG TIME.
>> THE DOCTOR SA
Y
THE
CHEMIC IS A BROAD SPECTRUM
>> THE DOCTOR SAYTHE
CHEMIC IS A BROAD SPECTRUM
BIOCIDE AND IN IHIGH DOSES
CAN BE LETHAL IN ANYTHING IT
COMES IN CONTACT WITH.
W
HEN H
CAN BE LETHAL IN ANYTHING IT
COMES IN CONTACT WITH.
W
HEN HE FOUND THE CHEMICAL
W
AS BEING USED IN AIRBAGS HE
W
AS STUND.
>> I WAS READING SOME OF THE
SCIENCE ALERT KIND OF
LITERATURE AND SAW SODIUM
A
ZIDE WAS THE ACTIVE
INGREDIENT IN AIRBAGS.
A
T FIRST I DIDN'T BELIEVE
IT.
I THOUGHT IT WAS A TYPO,
BECAUSE OF THE EFFECT THE
Picture
13 14 15
Description
(graphic)
Transcription
(audio)
T
his is roughly 100 grams give or take of it. >> airbags are inflated when an
electric
CC text
(written)
I THOUGHT IT WAS A TYPO,
BECAUSE OF THE EFFECT THE
A
ZIDE HAD HAD ON ME.
A
ND IT WAS NOT
BECAUSE OF THE EFFECT THE
A
ZIDE HAD HAD ON ME.
A
ND IT WAS NOT A TYPO IT IS
T
HE ACTIVE INGREDIENT.
A
ZIDE HAD HAD ON ME.
A
ND IT WAS NOT A TYPO, IT IS
T
HE ACTIVE INGREDIENT.
T
HIS IS ROUGHLY 100
287
Picture
16 17 18
Description
(graphic)
Transcription
(audio)
Current ignites the Sodium Azide
and the powdered form of the
chemical changes
t
o gas.
>> What happens in an accident
is an electrical current passes
t
hrough
t
he contacts at the back of the
inflator module
CC text
(written)
A
ND IT WAS NOT A TYPO, IT IS
T
HE ACTIVE INGREDIENT.
T
HIS IS ROUGHLY 100 GRAMS
GIVE OR TAKE OF IT.
>> AIRBAGS INFLATED WHEN AN
ELECTRIC CU
GIVE OR TAKE OF IT.
>> AIRBAGS INFLATED WHEN AN
ELECTRIC CURRENT IGNITES THE
SODIUM AZIDE AND THE
POWDERED FORM OF THE
SODIUM AZIDE AND THE
POWDERED FORM OF THE
CHEMICAL CHANGES TO GAS.
>> WHAT
Picture
19
2
0
2
1
Description
(graphic)
Transcription
(audio)
It causes the Sodium Azide to
decompose and the
decomposition product is nitrogen
gas which
escapes from the holes around
t
his this can here. And that
escaping nitrogen gas is
used to inflate the airbag, and
here is an example of an airbag.
>> To
CC text
(written)
CHEMICAL CHANGES TO GAS.
>> WHAT HAPPENS IN AN
A
CCIDENT, AN ELECTRIC RAL
CURRENT PASSES THROUGH THE
BACK OF THE INFLATABLE
MODULE AND CAUSES THE
BACK OF THE INFLATABLE
MODULE AND CAUSES THE
CHEMICAL TO DECOMPOSE AND
T
HE PRODUCT IS NITROGEN GAS
W
HICH ESCAPES FROM THE
HOLES
A
ROUND THIS C
A
T
HE PRODUCT IS NITROGEN GAS
W
HICH ESCAPES FROM THE
HOLES
A
ROUND THIS CAN HERE.
A
ND THAT ESCAPING NITROGEN
GAS IS USED TO INFLATE THE
Picture
2
2
2
3
2
4
Description
(graphic)
Transcription
(audio)
see what happens, a technician at Royal Buick set :: off these airbags for us
CC text
(written)
A
ROUND THIS CAN HERE.
A
ND THAT ESCAPING NITROGEN
GAS IS USED TO INFLATE THE
A
IRBAG, AND HERE
A
ND THAT ESCAPING NITROGEN
GAS IS USED TO INFLATE THE
A
IRBAG, AND HERE IS AN
EXAMPLE OF AN AIRBAG.
>> TO
A
IRBAG, AND HERE IS AN
EXAMPLE OF AN AIRBAG.
>> TO SEE WHAT HAPPENS, A
288
Picture
2
5
2
6
2
7
Description
(graphic)
Transcription
(audio)
>> That's perfect. >> and while Doctor Betterton
says the gas form of Sodium
A
zide is supposedly safe,
:: the real danger, what happens
down the line when the car is
j
unked?
>> For now, we're getting towards
a decade or so, getting
CC text
(written)
EXAMPLE OF AN AIRBAG.
>> TO SEE WHAT HAPPENS,
A
T
ECHNICIAN IN
A
ROYAL BUICK
SET OFF THE AIRBAG.
>> TO SEE WHAT HAPPENS,
A
T
ECHNICIAN IN A ROYAL BUICK
SET OFF THE AIRBAG.
>> THAT'S PERFECT.
>> AND WHILE THE DOCTOR SAYS
T
HE GAS FORM OF
>> THAT'S PERFECT.
>> AND WHILE THE DOCTOR SAYS
T
HE GAS FORM OF SODIUM AZIDE
IS SUPPOSEDLY SAFE, THE REAL
DANGER, WHAT HAPPENS DOWN
T
HE LINE WHEN THE CAR IS
J
UNKED?
Picture
2
8
2
9 30
Description
(graphic)
Transcription
(audio)
T
owards life-times of vehicles :: so now we're gonna have a
stream of millions of
v
ehicles every year going to the
scrapyard each containing
CC text
(written)
T
HE LINE WHEN THE CAR IS
J
UNKED?
>> NO
W
WE'RE GETTING
T
OWARDS
A
DECADE OR SO
J
UNKED?
>> NOW WE'RE GETTING
T
OWARDS
A
DECADE OR SO, GETTING
T
OWARDS
>> NOW WE'RE GETTING
T
OWARDS
A
DECADE OR SO, GETTING
T
OWARDS LIFE-TIMES OF
V
EHICLES, NOW WE'LL HAVE
A
Picture
31 32 33
Description
(graphic)
Transcription
(audio)
on the order of half a pound of
sodium azide.
>> Sodium Azide the doctor says
has a long shelf life and these
salvage yards
::::::::::: could one day become
some of America’s most
poisonous places.
>> the genie in the bottle, I’d
prefer to keep the cap on
CC text
(written)
T
OWARDS LIFE-TIMES OF
V
EHICLES, NOW WE'LL HAVE
A
STREAM OF VEHICULAR VEHICLES
V
EHICLES, NOW WE'LL HAVE
A
STREAM OF VEHICULAR
V
EHICLES,
GOING TO THE SCRAP EACH
HOLDING ONE HALF POUND OF
STREAM OF VEHICULAR
V
EHICLES,
GOING TO THE SCRAP EACH
HOLDING ONE HALF POUND OF
SODIUM AZIDE.
>> AND THE DOCTOR SAYS IT
289
HAS A LONG SHELF LIFE AND
T
HESE SALVAGE YARDS COULD
BECOME SOME OF AMERICA'S
MOST
Picture
34 35 36
Description
(graphic)
Transcription
(audio)
T
he bottle >> A researcher working on
T
ucson’s PCE
w
ater contamination problem,
CC text
(written)
T
HESE SALVAGE YARDS COULD
BECOME SOME OF AMERICA'S
MOST POISONOUS PLACES.
BECOME SOME OF AMERICA'S
MOST POISONOUS PLACES.
>> T HE GENIE IN THE
BECOME SOME OF AMERICA'S
MOST POISONOUS PLACES.
>> THE GENIE IN THE BOTTLE
I'EFER TO KEEPT CAP
ON
Picture
37 38 39
Description
(graphic)
Transcription
(audio)
Doctor Betterton also wants to
f
ind out
how dangerous Sodium Azide is
t
o the environment.
>> What happens to all this toxic
material which will soon be
accumulating in our junk yards?
CC text
(written)
>> THE GENIE IN THE BOTTLE,
I'EFP
ON THE BOTTLE.
>> A RESEARCHER WORKING
>> THE GENIE IN THE
BOTTLE,I'EFP
ON THE BOTTLE.
>> A RESEARCHER WORKING ON
A
W
ATER CONTAMINATION
PROBLEM,
T
HE DOCTOR ALSO WANTS TO
FI
W
ATER CONTAMINATION
PROBLEM,
T
HE DOCTOR ALSO WANTS TO
FIND OUT HO
W
DANGEROUS
SODIUM AZIDE IS TO THE
ENVIRONMENT.
>> WHAT HAPPENS TO ALL THE
T
OXIC MATERIAL WHICH WIL
Picture
4
0
4
1
4
2
Description
(graphic)
Transcription
(audio)
>> And he’s also hoping to
convince lawmakers that
something should be done about
poisonous airbags.
>> Some kind of regulations
instituted where by airbags would
automatically be deployed
w
hen the car comes to a junk
y
ard, which I think
w
ould be the most
290
CC text
(written)
>> WHAT HAPPENS TO ALL THE
T
OXIC MATERIAL WHICH WILL
SOON BE ACCUMULATING IN OUR
J
UNK YARDS?
>> AND HES ALSO HOPING TO
CONVINCE LAWMAKERS THAT
SOMETHING SHOULD BE DONE
A
BOUT POISONOUS AIRBAGS.
>> SOME KIND OF REGULATIONS
A
BOUT POISONOUS AIRBAGS.
>> SOME KIND OF REGULATIONS
INSTITUTED WHERE BY AIRBAGS
W
OULD AUTOMATICALLY BE
INSTITUTED WHERE BY AIRBAGS
W
OULD AUTOMATICALLY BE
DEPLOYED WHEN THE CAR
COMES
T
O A JUNK
Picture
4
3
4
4
4
5
Description
(graphic)
Transcription
(audio)
sensible solution to this potential :: problem. Because :: upon deploying an airbag, you
convert this very toxic substance,
A
zide, into a benign substance,
nitrogen gas, which is mostly what
t
he atmosphere is composed of.
CC text
(written)
W
OULD AUTOMATICALLY BE
DEPLOYED WHEN THE CAR
COMES
T
O A JUNK YARD, WHICH I
T
HINK WOULD
DEPLOYED WHEN THE CAR
COMES
T
O A JUNK YARD, WHICH I
T
HINK WOULD BE THEMOST
SENS
T
O A JUNK YARD, WHICH I
T
HINK WOULD BE THE MOST
SENSIBLE SOLUTION TO THIS
POTENTIAL PROBLEM.
BECAUSE UPON DEPLOYING AN
A
IRBAG, YOU CONVERT THE
T
OXIC SUBSTANCE INTO
A
BENIGN SUBSTANCE
Picture
4
6
4
7
4
8
Description
(graphic)
Transcription
(audio)
>> But why would something so
t
oxic be used in something as
w
idely used as airbags? Um, I’ve
t
ried to surmise what the answer
may be.
A
zide has been used by the
military for a long time as a
propellant. Not just as an
explosive but as a propellant in
ejector seats in fighter planes, for
example.
A
nd my guess is that um
CC text
(written)
A
IRBAG, YOU CONVERT THE
T
OXIC SUBSTANCE INTO
A
BENIGN SUBSTANCE, NITROGEN
GAS, WHICH IS MOSTLY WHAT
T
HE ATMOSPHERE IS COMPOSED
OF.
>> WHY WHY WOULD SOMETHING
SO TOXIC BE USED IN
SOMETHING SO WIDEL
Y
>> WHY WHY WOULD SOMETHING
SO TOXIC BE USED IN
SOMETHING SO WIDELY USED AS
A
IRBAGS?
I'VE TRIED TO SURMISE WHAT
T
HE ANSWER IS.
A
ZIDE HAS BEEN USED BY THE
MILITARY FOR A LONG TIME AS
A
PROPEL ANT.
NOT JUST AS AN EXPLOSIVE
MILITARY FOR A LONG TIME AS
A
PROPEL ANT.
NOT JUST AS AN EXPLOSIVE BUT
PROPEL ANT IN EJECTOR SEATS
291
Picture
4
9 50 51
Description
(graphic)
Transcription
(audio)
t
he um the airbag industry,
automobile industry, needed
something in a hurry because
Congress mandated that this
w
ould go in effect in 1994
>> the fatal poison is the subject
of a new best selle
r
By mystery writer JA Jance called
Partner in Crime
CC text
(written)
NOT JUST AS AN EXPLOSIVE BUT
PROPEL ANT IN EJECTOR SEATS
IN FIGHTER PLANES, FOR
EXAMPLE.
MY GUESS IS THAT THE AIRBAG
INDUSTRY, AUTOMOBILE
INDUSTRY, NEEDED SOMETHING
INDUSTRY, AUTOMOBILE
INDUSTRY, NEEDED SOMETHING
BECAUSE -- IN A HURRY
BECAUSE CON GRACE MANDATED
T
HIS
BECAUSE -- IN A HURRY
BECAUSE CON GRACE MANDATED
T
HIS WOULD GO IN EFFECT IN
1994
Picture
52 53 54
Description
(graphic)
Transcription
(audio)
:: in fact, she got the idea for her
book
after reading about Dr. Betterton’s
research. We had dinner and we
chatted about
Sodium Azide and the toxic
properties, and incorporation of
t
his material into airbags. And
she floated a few ideas for a plot
f
or her mystery, and I don't know
if those ideas were incorporated
in the book, because I haven't
read it yet.
CC text
(written)
T
HIS WOULD GO IN EFFECT IN
1994
>> THE FATAL POISON THE
SUBJECT OF A NEW BEST SELLE
R
CALLED "PART
>> THE FATAL POISON THE
SUBJECT OF A NEW BEST SELLER
CALLED "PARTNER INSHSH THE A
FO H
BOOK AERERRDINGABABTHE
DOCT'S SEARCH.
W
E HAD
BOOK AERERRDINGABABTHE
DOCT'S SEARCH.
W
E HAD DINNER ANDCHATTED
A
BOUT SODIUM AZIDE AND THE
T
OXIC PROPERTIES, AND
INCORPORATING THIS MATERIAL
INTO AIRBAGS.
SHE FLOATED A FEW IDEAS FOR
A
PLOT FOR HER MYSTERY, AND
I DON'T KNOW IF TH
Picture
55 56 57
Description
(graphic)
Transcription >> Doctor Betterington says ca
r
manufacturers are finally ::: to phase out Sodium Azide for
292
(audio) beginning safer compounds, but it could be
t
oo little, too late.
>> The question is,
CC text
(written)
SHE FLOATED A FEW IDEAS FOR
A
PLOT FOR HER MYSTERY, AND
I DON'T KNOW IF THOSE IDEAS
W
ERE INCORPORATED IN THE
I DON'T KNOW IF THOSE IDEAS
W
ERE INCORPORATED IN THE
BOOK, BECAUSE I HAVEN'T READ
IT YET.
BOOK, BECAUSE I H
A
VEN'T READ
IT YET.
>> THE DOCTOR SAYS CAR
MANUFACTURERS ARE FINALL
Y
BECOMING TO PHASE OUT
SODIUM
A
ZIDE FOR SAFER COMPOUNDS,
BUT IT COULD BE
Picture
58
Description
(graphic)
Transcription
(audio)
Even if we eliminated all the Azide
t
oday, there would still be tens of
millions of pounds out in the street
t
hat we have to deal with
somehow in the future.
CC text
(written)
BECOMING TO PHASE OUT
SODIUM
A
ZIDE FOR SAFER COMPOUNDS,
BUT IT COULD BE TOO LITTLE,
T
OO LATE.
>> THE QUESTION IS, EVEN IF
W
E ELIMINATED ALL THE AZIDE
T
ODAY, THERE WOULD STILL BE
T
ENS OF MILLIONS OF POUNDS
OUT IN THE STREET WE HAVE TO
DEAL WITH SOMEHOW IN THE
FUTURE.
293
REFERENCES
ASL tracking systems (pdf) Applied Science Laboratories. http://www.a-s-l.com
Ashby, J., Rayner, K., & Clifton Jr., C. (2005). Eye movements of highly skilled and
average readers: Differential effects of frequency and predictability. The
Quarterly Journal of Experimental Psychology, 58A (6) 1065-1086.
Bakhtin, M. M. (1986). The problem of speech genres. In P. Morris (Ed.). The Bakhtin
Reader. (1994). (pp. 80-87). New York: St. Martin’s Press Inc.
Barthes, R. (1957). Mythologies. Reprinted and translated in 1972. New York, Hill and
Wang.
Breckenridge Church, R., Ayman-Nolley, S. & Mahootian, S. (2004). The Role of
gesture in bilingual education: Does gesture enhance learning? Bilingual
Education and Bilingualism, 7(4) 303-319.
Carrell, P. (1998). Some causes of text-boundedness nd schema interference in ESL
reading. In P. Carrell, J. Devine & D. Eskey (Eds). Interactive Approaches to
Second Language Reading. Cambridge, UK: Campbridge, UP.
Cooper, R. (1996). Comprehending the genre of the television news report. TESOL
Matters (5) 10.
Csapó-Sweet, R.M. (1997). ’Sesame Street’, English vocabulary and word usage of
Hungarian ESL students. Communications 22(2). 175-190.
Culler, J. (1986). Ferdinand de Saussure. Ithaca, NY: Cornell University Press.
d'Ydewalle, G., & Gielen, I. (1992). Attention allocation with overlapping sound, image,
and text. Eye Movements and Visual Cognition: Scene Perception and Reading,
415-427.
d'Ydewalle, G. & Pavakanun, U. (1995). Acquisition of a second/foreign language by
viewing a television program. Psychology of Media in Europe, 51-64.
d'Ydewalle, Van de Poel, M. (1999). Incidental foreign-language acquisition by children
watching subtitled television programs. Journal of Psycholinguistic Research, 28
(3) 227-244.
d'Yewalle, G., Van Rensbergen, J., Pollet, J. (1987). Reading a message when the same
message is available auditorily in another language: The case of subtitling. Eye
movements: From physiology to cognition, 313-321.
de Bot, K., Jagt, J., Janssen, H., Kessels, E., Schils, E. (1986). Foreign television and
language maintenance. Second Language Research 2(1) 72-82.
De Gelder, B. & Vrooman, J. (2000). The perception of emotions by ear and by eye.
Cognition and Emotion, 14(3), 289-311.
Drieghe, D., Brysbaert, M, Desmet, T. & De Baecke, C. (2004). Word skipping in
reading: On the interplay of linguistic and visual factors. European Journal of
Cognitive Psychology, 16, (1/2), 79-103.
Duchowski, A. (2002). A breadth-first survey of eye tracking applications. Behavior
Research Methods, Instruments and Computers, 1, 1-16.
Duckett, P. (2002). New Insights: Eye fixations and the reading process. Talking Points,
13(2). 16-21.
Duckett, P. (2001). First grade beginning readers’ use of pictures and print as they read:
294
A miscue analysis and eye movement study. Unpublished doctoral dissertation,
University of Arizona, Tucson.
Duranti, A. (1997). Linguistic Anthropology. West Nyack, NY: Cambridge University
Press.
Eco, U. (1987). The influence of Roman Jakobson on the development of semiotics. In
(Eds.). M. Krampen, K. Oehler, R. Posner, T. Sebeok, and T. von Uexküll. M.
Classics of semiotics. New York: Plenum.
Fisch, S., McCann Brown, S & Cohen, D. (2001). Young children’s comprehension of
educational television: The role of visual information and intonation, Media
Psychology, 3, 365-378.
Flurkey, A. & Goodman, Y. (2004). The role of genre in a text: Reading through the
Waterworks. Language Arts, 81 (3) 233-244.
Fort, A., Delpuech, C., Pernier, J., & Giard, M. H. (2002). Early auditory-visual
interactions in human cortex during nonredundant target identification. Cognitive
Brain Research, 14, 20-30.
Freeman, Y. & Freeman, D. (1992). Whole language for second language learners.
Portsmouth, NH: Heinemann.
Freeman, A. (2001). The eyes have it: Oral miscue and eye movement analysis of the
reading of fourth grade Spanish/English bilinguals. Unpublished doctoral
dissertation, University of Arizona, Tucson.
Garza, T. (1991). Evaluating the use of captioned video materials in advanced foreign
language learning. Foreign Language Annals, 24(3). 239-258.
Gerard, J. (2007). The reading of formulaic language in an English text by native and
non-native speakers of English: An eye movement study. Unpublished doctoral
dissertation, University of Arizona, Tucson.
Giard, M.H. & Peronnet, F. (1999). Auditory-Visual integration during multimodal object
recognition in humans: A behavioral and electrophysiological study. Journal of
Cognitive Neuroscience 11(5), 473-490.
Goldman, M. (1996). If you can read this, thank TV. TESOL Journal, Winter, 15-18.
Goldman M, & Goldman, S (1988). Reading with close-captioned TV Journal of
Reading, 31(5) 458-461.
González Rodríguez, M. (2006). El subtitulado cinematográfico: Fusión de palabra,
gesto y movimiento escénico. Meunchen, Germany: LINCOM GmbH.
Goodman, K.S. & Goodman, Y.M. (1990). Vygotsky in a whole-language perspective. In
L. Moll (Ed.). Vygotsky and education. (pp.223-250). Cambridge, UK:
Cambridge, UP.
Goodman, K. (2003/1970). Psycholinguistic universal in the reading process. In Flurkey,
A. and J. Xu (Eds.). On the revolution of reading: The selected writings of
Kenneth S. Goodman. (pp. 246-253). Portsmouth, NH: Heinemann.
Goodman, K. (2003/1976). What’s universal about the reading process. In Flurkey, A.
and J. Xu (Eds.). On the revolution of reading:The selected writings of Kenneth S.
Goodman. (pp. 87-93). Portsmouth, NH: Heinemann.
Goodman, K. (2003/1976). Miscues analysis: Theory and reality in reading. In A.
Flurkey, and J. Xu (Eds.). On the Revolution of reading: The selected writings of
295
Kenneth S. Goodman. (pp. 124-136). Portsmouth, NH: Heinemann.
Goodman, K. (1994). Reading, writing, and written texts: A transactional
sociopsycholinguistic view. In R. Ruddell, M. Ruddell, & H. Singer (Eds.).
Theoretical models and processes of reading. (pp.1093-1130). Newark, DE: IRA.
Goodman, K. (1996). On reading. Portsmouth, NH: Heineman.
Goodman, Y. & Burke, C. (1972). Reading miscue inventory. Toronto, Canada:
MacMillan Publ Co.
Goodman, Y., Watson, D., Burke, C. (2005). Reading miscue inventories: Alternative
procedures. Katonah, NY: Richard C Owen Publ.
Graber, D. (1990). Seeing is remembering: How visuals contribute to learning from
television news. Journal of communication 40 (3), 134-155.
Hall, S. (2001/1980). Encoding/Decoding. In M. Durham & D. Kellner (Eds) Media and
cultural studies, (pp. 166-176). Blackwell Publishers.
Hegarty, M. (1992). The mechanics of comprehension and comprehension of mechanics.
Eye Movements and Visual Cognition: Scene Perception and Reading, 428-443.
Hull, G. & Nelson, M.E. (2005). Locating the semiotic power of multimodality. Written
Communication, 22(2), 224-261.
Jakobson, R. (1956). Two aspects of language and two types of aphasic disturbances. In
Fundamentals of language. … with Morris Halle, reprinted in L. R. Waugh & M.
Monville-Burston (Eds.), On Language (1990), (pp. 115-133). Cambridge, MA:
Harvard University Press.
Jakobson, R. (1984). “La théorie saussurienne en rétrospection.” Linguistics 22:161-196.
Reprinted as “Langue and parole: Code and message”. In L. R. Waugh & M.
Monville-Burston (Eds.), On Language (1990), (pp. 81–109). Cambridge, MA:
Harvard University Press.
Jakobson, R. (1960). Linguistics and Poetics. In T.A. Sebeok (Ed.), Style in language
(Cambridge, MA: MIT Press). Reprinted as “The speech event and the functions
of language” In L. R. Waugh & M. Monville-Burston (Eds.), On Language
(1990), (pp. 69-79). Cambridge, MA: Harvard University Press.
Jensema, C., Sarma Danturthi, R., Burch, R. (2000a). Time spent viewing captions on
television programs. American Annals of the Deaf, 145 (5) 464-468.
Jensema, C., Sharkawy, S., Sarma Danturthi, R., Burch, R. & Hsu, D. (2000b). Eye
movement patterns of captioned television viewers. American Annals of the Deaf,
145 (3) 275-285.
Jewitt, C. (2004). Multimodality and new communication technologies.
In P. Levine and R. Scollon (Eds). Discourse & technology: Multimodal discourse
analysis. (pp. 184-195). Washington, DC: Georgetown UP.
Just, M. & Carpenter, P. (1980). A theory of reading: from eye fixations to
comprehension. Psychological Review, 87(4) 329-354.
Jylhä-Laide, J. & Karreinen, S. (1993). Play it again, Laura: Off-air cartoons and video as
a means of second language learning. In K. Sajavaara & S. Takala (Eds.) Finns as
Learners of English: Three Studies (pp.89-146).
Kalyuga, S., Chandler, P., & Sweller, J. (1999). Managing split-attention and redundancy
in multimedia instruction. Applied Cognitive Psychology (13) 35-371.
296
Kenner, C. & Kress, G. (2003). “The multisemiotic resources of biliterate children”
Journal of Early Childhood Literacy, 3(2), 179-202.
Kern, R. (2006). Perspectives on technology in learning and teaching languages. TESOL
Quarterly, 40(1) 183-210.
Kinsella, K. (1995). Understanding and empowering diverse learners in ESL classrooms.
In J. Reid (Ed.). Learning Styles in the ESL/EFL classroom. (pp.170-194.) New
York: Heinle & Heinle Publ.
Koda, K. (1994). Second language reading research: Problems and possibilities. Applied
Psycholinguistics, 15, 1-28.
Koda, K. (1990). The use of L1Reading strategies in L2 reading: effects of L1
orthographic structures on L2 phonological recoding strategies. Studies in Second
Language Acquisition 12, 393-410.
Koda, K. (2004). Insights into Second Language Reading: A Cross-Linguistic Approach.
Cambridge, UK: Cambridge UP.
Koolstra, Van der Voort & d’Ydewalle, D. (1999). Lengthening the presentation time of
subtitles on television: Effects on children’s reading time and recognition.
Communications, 24, 407-422.
Koskinen, P., Knable, J., Markham, P., Jensema, D., & Kane, K. (1995). Captioned
television and the vocabulary acquisition of adult second language correlational
facility residents. Journal of Educational Technology Systems, 24(4) 359-373.
Kothari, B., Takeda, J., Joshi, A., & Pandey, A. (2002). Same language subtitling: a
butterfly for literacy? International Journal of Lifelong Education 21(1) 55-66.
Kress, G. & Van Leeuwen, T. (2001). Multimodal discourse: The modes and media of
contemporary communication. London: Arnold.
Kress, G. (2003). Literacy in the New Media Age. Routledge.
Kress, G. & T. Van Leeuwen. (1995). Reading Images. The Grammar of Visual Design.
London ; New York : Routledge.
Lincoln, F. & Rademacher, B. (2006). Learning styles of ESL students in community
colleges. Community college journal of research and practice, 30, 485-500.
Linebarger, D. (2001). “Learning to read from television: The effects of using captions
and narration” Journal of Educational Psychology, 91(2), 288-298.
Love, K. (2003). Mediating generation shift in secondary English teaching in Australia:
The case study of “BUILT”. Educational Studies in Language and Literature, 3.
21-51.
Luke, C. (2003) Pedagogy, connectivitiy, multimodality, and interdisciplinarity. Reading
Research Quarterly 38(3) 397-403.
Markham, P. L. (1993). Captioned television videotapes: effects of visual support on
second language comprehension. Educational Technology.
Markham, P., Peter, L., & McCarthy, T. (2001). The effects of native language vs. target
language captions on foreign language students’ DVD video comprehension.
Foreign Language Annals, 34(5). 439
Mayer & Moreno, (1998). A split-attention effect in multimedia learning: Evidence for
dual processing systems in working memory. Journal of Educational Psychology
90(2) 312-320.
297
Mayer, R. & Sims, V. (1994). For whom is a picture worth a thousand words? Extensions
of a dual-coding theory of multimedia learning Journal of Educational
Psychology, 86(3), 389-401.
Mayer, R. (1997). Multimedia Learning: Are we asking the right questions?. Educational
Psychologist, 32(1), 1-19.
Mayer, R. (2005) Multimedia Learning: Guiding Visuospatial Thinking with
Instructional Animation, in The Cambridge Handbook of Visuospatial Thinking,
ed. Shah, P & A. Miyake. NY: Cambridge UP. pp 477-508.
Mayer, R. Heiser, J. & Lonn, S. (2001). Cognitive constraints of multimedia learning:
When presenting more material results in less understanding. Journal of
Educational Psychology, 93(1) 187-198.
Mayer, R., Moreno, R., Boire, M., & Vagge, S. (1999). Maximizing constructivist
learning from multimedia communications by minimizing cognitive load. Journal
of Educational Psychology, 91(4) 638-643.
Mayer, R. (2001). Multimedia Learning. New York: Cambridge UP.
McConkie, G.W., Kerr, P.W., Reddix, M.D. & Zola, D. (1988). Eye movement control
during reading: The location of initial eye fixations in words. Vision Research,
28(10), 1107-1118.
McGurk, H. & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746-
748.
McNeill, D. (1992). Hand and Mind: What gestures reveal about thought. Chicago, IL:
University of Chicago Press.
Moreno, R. & Mayer, R. (2000). A coherence effect in multimedia learning: The case for
minimizing irrelevant sounds in the design of multimedia instructional messages.
Journal of Educational Psychology 92(1) 117-125.
Murray, W. (2000) Sentence processing: Issues and measures. In Kennedy, A., R.
Radach, D. Heller & J. Pynte. (Eds.), Reading as a perceptual process. (pp. 649-
664).
Nelson, M. (2006). Mode, meaning, and synaesthesia in multimedia L2 writing.
Language Learning & Technology, 10(2), 56-76.
Neuman, S. & Koskinen, P. (1992). Captioned television as comprehensible input:
Effects of incidental word learning from context for language minority students.
Reading Research Quarterly 27(1) 95-106.
Norris, S. (2004). Analyzing multimodal interaction: A methodological framework.
Routledge: New York.
O’Brien de Ramirez, (2008). Eye movements and miscues during silent and oral reading
of English and French by native and non-native readers. Unpublished doctoral
dissertation, University of Arizona, Tucson.
O’Regan, K. (1992). Optimal viewing position in words and the strategy-tactics theory of
eye movements in reading. Eye movements and visual cognition: Scene
perception and reading. 333-354.
Oller, J. & Tullius, J. (1973). Reading skills of non-native speakers of English.
International Journal of Applied Linguistics 11(1) 67-80.
Oxford, R. (1990). Language learning strategies. New York: Harper Collins.
298
Paivio, A. (1986). Mental representations. Oxford UP: New York.
Paivio, A. (2007). Mind and its evolution. Malwah, NJ: LEA.
Parry, K. (1996). Culture, Literacy, and L2 Reading. TESOL Quarterly, 30(4) 665-692.
Paulson, E. & Goodman, K. (1999). Influential studies in eye-movement research.
Reading Online. www.readingonline.org /research/eyemove.html
Paulson, E. (2000). Adult readers’ eye movements during the production of oral miscues.
Unpublished doctoral dissertation, University of Arizona, Tucson.
Plass, J., Chun, D., Mayer, R., & Leutner, D. (1998). Supporting visual and verbal
learning preferences in a second-language multimedia learning environment.
Journal of Educational Psychology, 90(1), 25-36.
Plass, J., Chun, D., Mayer, R., & Leutner, D. (2002). Cognitive load in reading a foreign
language text with multimedia aids and the influence of verbal and spatial
abilities. Computers in Human Behavior, 19, 221-243.
Plass, J, & L. Jones (2005). Multimedia learning in second language acquisition. In R.
Mayer (Ed.) The cambridge handbook of multimedia learning. Cambridge, UK:
Cambridge UP.
Pourtois, G., de Gelder, B., Bol, A. & Crommelinck, M. (2005). Perception of facial
expressions and voices and of their combination in the human brain. Cortex, 41,
49-59.
Radach R. & Kennedy, A. (2004).Theoretical perspectives on eye movements in reading:
past controversies, current issues, and an agenda of future research. European
Journal of Cognitive Psychology, 16, (1/2), 3-26.
Rayner, K. & Well, A. (1996). Effects of contextual constraint on eye movements in
reading: A further examination. Psychonomic Bulletin & Review, 3 (4) 504-509
Rayner, K.(1998). Eye movements in reading and information processing: 20 years of
research. Psychological Bulletin, 124(3), 372-422.
Riley, G., & Lee, J. (1996). A comparison of recall and summary protocols as measures
of second language reading comprehension. Language Testing, 13(2), 173-187.
Reid, J. (1995). Learning styles in the ESL/EFL classroom. New York: Heinle & Heinle
Publishers.
Rosenblatt, L.M. (1994). The transactional theory of reading and writing. In R. Ruddell,
M. Ruddell, & H. Singer (Eds.). Theoretical models and processes of reading.
(pp. 1057-1092). Newark, DE: IRA.
Sadoski, M & A. Pavio (2001). Imagery and text: A dual coding theory of reading and
writing. Mahwah, NJ: LEA Publishers.
Scarcella, R. & Oxford, R., (1992). The tapestry of language learning: The individual in
the communicative classroom. Florence, KY: Heinle & Heinle Publishers.
Schmidt, R. (1990). The role of consciousness in second language learning. Applied
Linguistics, 11, 129-158.
Slykhuis, D., Wiebe, E., Annetta, L. (2005). Eye-Tracking students’ attention to
PowerPoint photographs in a science education setting. Journal of Science
education and technology, 14, (5/6), 509-520
Starr, M & Inhoff, A. (2004). Attention allocation to the right and left of a fixated word:
Use of orthographic information from multiple words during reading. European
299
Journal of Cognitive Psychology, 16, (1/2), 203-225.
Stewart, M. & Pertusa, I. (2004). Gains to language learners from viewing target
language closed-captioned films. Foreign Language Annals, 37, 438-447.
Tannenhaus, M., Spivey-Knowlton, M., Eberhard, K., and Sedivy, J. (1995). Integration
of visual and linguistic information in spoken language comprehension. Science
268(5217). 1932-1634.
Thorlacious, L. (2002). A model of visual, aesthetic communication focusing on web
sites. Digital Creativity, 13(2), 85-98.
Townsend, J. & Bever, T. (2001). Sentence Comprehension: The integration of habits
and rules. Cambridge MA: MIT Press.
Tan, A., Fujioka, Y., Bautista, D., Maldonado, R., Tan, G. & Wright, L.(2000). Influence
of television use and parental communication on educational aspirations of
Hispanic children. The Howard Journal of Communications, 11 :107–125.
Vanderplank, R. (1988). The value of teletext sub-titles in language learning. ELT
Journal (42)4 272-281.
Vanderplank, R. (1993). A verbal medium: Language learning through closed captions.
TESOL Journal 3(1) 10–14.
Vanderplank, R. (1990). Paying attention to the words: Practical and theoretical problems
in watching television programmes with uni-lingual (CEEFAX) sub-titles. System
(18)2 221-234.
Verschueren, J. (1999). Understanding pragmatics. London: Arnold.
Vincent, J. (2006). Children writing: Multimodality and assessment in the writing
classroom. Literacy, 40(1), 51-57.
Vrooman, J., Driver, J., & de Gelder, B. (2001). Is cross-modal integration of emotional
expressions independent of attentional resources. Cognitive, Affective, &
Behavioural Neuroscience, 1(4), 382-37-87.
Vygotsky, L. (1986) Thought and Language. Cambridge, MA: The MIT Press.
Waugh, L. R. & Monville-Burston, M. (1990). On language. Cambridge, MA: Harvard
University Press.
Weissenrieder, M. (1983). Listening to the news in spanish. In Oller J. & Amato, R.
(EDs). Methods that work: A smorgasboard of ideas for language teachers. (pp.
267-271). Rowley, MA: Newbury House Publishers.
Wertsch, J. (1991). Voices of the mind: A sociocultural approach to mediated action.
Cambridge, MA: Harvard University Press.
www.geocities.com/mcpoodle43/SCC_TOOLS/DOCS/SCC_TOOLS.HTML
... Unfortunately, considering the position of the word in the caption is also practically demanding and beyond the scope of this study. In addition, while L2 learners hear the target words all at the same instant relative to the imagery, they differ from each other regarding the time spent reading captions (Specker, 2008). In fact, patterns of shifting between scenes and captions differ not only between but also within learners. ...
... In fact, patterns of shifting between scenes and captions differ not only between but also within learners. To illustrate, it has been shown that viewers altered their reading patterns as they moved through video material and rhythmic captions (Perego et al., 2010;Specker, 2008). This trend, in turn, entails possible changes in attentional processing of written forms, depending on whether the target word occurs at the beginning or the end of the video. ...
Thesis
The aim of this thesis is to study the role of imagery in L2 captioned video by examining modality (Study 1), contiguity (Study 2), and spacing (Study 3) effects in incidental vocabulary learning from extensive TV viewing. An experimental design was employed in which one hundred seventy-three Algerian EFL learners in their third year of the Linguistics Bachelor programme were randomly assigned to either a Control, View, or Non-View group. Treatment participants watched two full-length seasons of documentary series extending to eight viewing hours, over a six-week period of two-week intervals. The View group watched the episodes in the form of L2 captioned video while the Non-View group had the imagery hidden and were therefore exposed to L2 audio and L2 captions only. Four levels of word knowledge were measured: meaning recall and recognition (posttest only) and spoken and written form recognition (pretest-posttest). Study 1 assessed the effect of obscuring imagery on incidental learning of twenty words using a between-participants design. The results showed successful word learning regardless of the presence of imagery. Study 2 investigated the effect of verbal-visual contiguity (the co-occurrence of a word and its visual referent) on incidental learning of twenty-eight words using a within-participants design (View group only). It introduced contigfrequency, contigduration, and contigratio as three measures of contiguity on two timespans (∓7 seconds and ∓25 seconds) that were longer than those used in previous studies. The results showed that the amount of time visual referents appeared on the screen (contigduration), measured in a ∓25 second timeframe relative to the verbal occurrence, was predictive of learning. These results were more pronounced in the meaning recognition test. Study 3 explored whether words would be learned better when their occurrences were spread across viewing sessions (spaced condition), as compared to appearing within a single session (massed condition) by measuring the incidental learning of eight matched word pairs using a between-items design. It also examined whether learning in these two spacing conditions was influenced by the presence of imagery. The results revealed a positive effect of spaced occurrences in the Non-View group but not the View group, suggesting that a spacing advantage is more likely when fewer cues are available. These results were limited to knowledge of meaning only.
... In the field of AVT, Géry d'Ydewalle and colleagues in Belgium have been involved in psycholinguistic studies on the processing of subtitles using eye tracking since the late 1980s. To date, a number of studies have appeared in which eye movement was analysed in an attempt to understand how subtitles are processed; these include d' Ydewalle and Gielen (1992), d' Ydewalle and De Bruycker (2007), Specker (2008), Perego et al. (2010), Bisson et al. (2012), Ghia (2012), Rajendran et al. (2013), Krejtz et al. (2013), Kruger (2013), Kruger et al. (2013), Winke et al. (2013), and Kruger and Steyn (2014). Some of the most dominant foci in these studies include the difference between the way one and two-line subtitles are read, the attention distribution between the subtitles and the rest of the screen, and the effort involved in reading subtitles (see Kruger andSteyn 2014 andWinke et al. 2013 for an overview of some of these studies). ...
... What was not done satisfactorily in their study was to correlate reading with other measures like performance. Specker (2008) investigated the reading of scrolling or upward-rotating threelined captions that were produced by means of respeaking. She looked at fixation counts and average fixation duration, but also successive fixations, and then inspected fixation plots for individual subtitles, yielding very good qualitative interpretations of the reading of subtitles in thorough multimodal analyses supported by comprehension data. ...
Chapter
The exponential growth of Audiovisual Translation (AVT) in the last three decades has consolidated its place as an area of study within Translation Studies (TS). However, AVT is still a young domain currently exploring a number of different lines of inquiry without a specific methodological and theoretical framework. This volume discusses the advantages and drawbacks of ten approaches to AVT and highlights the potential avenues opened up by new methods. Our aim is to jumpstart the discussion on the (in)adequacy of the methodologies imported from other disciplines and the need (or not) for a conceptual apparatus and framework of analysis specific to AVT. This collective work relates to recent edited volumes that seek to take stock on research in AVT, but it distinguishes itself from those publications by promoting links in what is now a very fragmented field. Originally published as a special issue of Target 28:2 (2016).
... Additionally, the results indicated that eye fixation was prolonged when both the soundtrack and subtitles were in the English language, due to the students' inclination towards comparing the content of the soundtrack with the subtitles. Specker (2008) conducted a qualitative study that focused on the examination of subtitles and their reading process. The study involved the analysis of various cases, with particular attention given to fixation counts and the duration of time allocated to specific words in the subtitles. ...
Article
Full-text available
Background/purpose. The popularity of foreign-language films and shows with subtitles has been driven by their cost-effectiveness in comparison to audio dubbing. The primary objective of this research was to explore the reading and viewing behaviors of proficient learners in the contemporary era, characterized by significant progress in eye-tracking technologies. These advancements have provided researchers with the capability to precisely analyze eye fixation patterns, offering valuable insights into how individuals engage with written and visual content. Materials/methods. The study's focus was on adults between the ages of 20 and 40 years old who watched videos with English language soundtracks and Standard Arabic subtitles, as well as videos with Moroccan Arabic and Standard Arabic soundtracks and subtitles. To assess the impact of different subtitling speeds on reading behavior, participants were exposed to varying speeds. Also, a detailed questionnaire was administered to the participants after the sessions in order to obtain further information and the collected data were then analyzed using descriptive statistics. Practical implications. The findings of this study revealed that doctoral candidates spent less time focusing on the subtitles, and some even completely ignored them. Master's students exhibited similar eye movement patterns to the doctoral candidates, while undergraduates had longer fixation times on the subtitles. Conclusion. This research provides valuable insights into how the speed of subtitles influences viewer behavior and has implications for language learning and the production of audiovisual content. By understanding how different subtitling speeds affect reading behavior, language learners and content creators can make informed decisions to enhance the learning experience and optimize the production of audiovisual materials. Ed-Dali | 8
Chapter
In an influential study, Ryan and Meara (Read Foreign Lang. 7(2):531–540, 1991) posited that errors in the L2 reading and writing of L1 Arabic learners could be due to vowel blindness: a reduced sensitivity to written vowels deriving from the learners’ L1 Arabic reading experience. Vowel blindness is frequently cited as a cross-linguistic effect influencing the L2 reading and writing outcomes of Arabic learners, yet its conceptual validity has rarely been scrutinised. This article evaluates the validity of the theory as an explanation for differences in L1 Arabic readers’ written word recognition of languages with alphabetic writing systems. The empirical studies included in this review were identified in a systematic scoping review of Arabic L2 word recognition of alphabetic writing systems (reported in Allmark N. A systematic scoping review of evidence pertaining to L2 word recognition among L1 readers of Arabic and its implications for the validity of vowel blindness. Unpublished master’s thesis, University of Oxford, Oxford, UK, 2019) whose methodology is summarised in this chapter. The review demonstrates that the current, published evidence is too limited and conflicting to validate vowel blindness. Furthermore, the included studies have methodological weaknesses that limit the overall trustworthiness of their findings. Further research is therefore needed before this phenomenon can be accepted as a factor in poor word recognition among Arabic learners. Recommendations are made for future avenues of research that could improve our understanding of either word recognition or the validity of vowel blindness, and for raising the methodological standards of research in this field.
Thesis
Full-text available
Many Arabic learners demonstrate difficulty in developing L2 English reading skills including slower and less accurate word-recognition (e.g. Masrai and Milton, 2018; Saigh & Schmitt, 2012). However, empirical research into Arabic speakers’ L2 reading development is fragmented contributing to limited understanding of Arabic L2 reading, and a lack of criticality in the interpretation of prominent theories (Alghamdi, 2015). This systematic scoping review gathers, evaluates and synthesises empirical evidence pertaining to L1 Arabic readers’ L2 written word recognition of alphabetic writing systems. It uses this data to evaluate the validity of Ryan and Meara’s (1991) theory of vowel blindness, a cross-linguistic theory which suggests that Arabic language users are less sensitive to written vowels due to Arabic orthography typically excluding short vowels. The review addressed the following research questions: 1. What is the published evidence pertaining to the word-recognition processes of L1 Arabic readers engaged in L2 reading of non-Abjad, alphabetic writing systems? 2. According to the literature identified in RQ1, what factors are identified as influencing L1 Arabic readers’ L2 word recognition processes, and what is the nature and extent of their contributions to word recognition? 3. To what extent does existing evidence support the conceptual validity of vowel blindness?
Chapter
As a discipline Translation Studies (TS) has undergone a considerable number of ‘turns’ since its official genesis with Holmes’ paper on The Name and Nature of Translation Studies (1988/2004). These overlapping and loosely defined paradigms have encapsulated the shifting focus of Translation Studies over time, reflecting not only a variety of approaches to conduct translation, but also, more importantly, the diversity of ways to study translation. Starting with the linguistic turn, passing through the cultural turn in the 1980s, the interdisciplinary turn in the 1990s, and the empirical turn in the 2000s (see Snell-Hornby 2006), the 2010s has seen the focus split between the sociology of translation (Angelelli 2014), cognitive inquiries into translation (Ferreira and Schwieter 2015) and, most recently, the so-called predictive turn (Schaeffer et al. 2019). This re-orientation towards the use of data collection methods in what we have now come to refer to as Cognitive Translation Studies has tended to concentrate on the process of translation, as opposed to the product of translation. House (2013), for instance, when discussing this ‘linguistic-cognitive orientation’, mused about behavioural experiments focusing on the translation process. Other books addressing the cognitive dimensions of Translation Studies also tend to adopt the same process-oriented focus, the only exceptions being studies on the products of audiovisual translation (AVT). Back in 2006, Jean Boase-Beier was one of the first scholars to specifically mention a ‘cognitive turn’ (Boase-Beier 2020, 84ff—the notion was first mentioned in a conference paper in 2006), in part due to the rise in think-aloud studies, studies on the reading of source texts and research incorporating broader cognitive issues such as relevance theory. It is within this latter conception of ‘Cognitive Translation Studies’—drawing on the cognitive element of stylistics (and cognitive poetics) and the cognitive experience of style—that this book falls, concentrating not on the process, but on the cognitive effects of the product of translation in conjunction with the effects of the original text. This reception-oriented paradigm—for text-based translations, at least—is a truly nascent field, with only a small handful of studies as yet dedicated to studying the cognitive dimensions of written translations. As will become clear over the coming chapters, the research presented in this book draws on a variety of scholarly traditions—new and old—with a view to tying them together into a unified, yet multifaceted approach to study the experience of translated products alongside source texts.
Chapter
The coming of age of audiovisual translation studies has brought about a much-needed surge of studies focusing on the audience, their comprehension, appreciation or rejection of what reaches them through the medium of translation. Although complex to perform, studies on the reception of translated audiovisual texts offer a uniquely thorough picture of the life and afterlife of these texts. This volume provides a detailed and comprehensive overview of reception studies related to audiovisual translation and accessibility, from a diachronic and synchronic perspective. Focusing on all audiovisual translation techniques and encompassing theoretical and methodological approaches from translation, media and film studies, it aims to become a reference for students and scholars across these fields.
Article
Full-text available
Este artigo prove uma visão geral dos estudos com rastreamento ocular em legendas (também conhecidas como legendagem) e faz recomendações para estudos cognitivos futuros na área da Tradução Audiovisual (TA). Achamos que todos os estudos conduzidos na área até os dias de hoje falham ao abordarem o verdadeiro processamento real da informação verbal contida nas legendas e, em vez disso, enfocam o impacto das legendas no comportamento de visualização. Também mostramos como o rastreamento ocular pode ser utilizado para medir não apenas a leitura de legendas, mas também o impacto de elementos estilísticos, como o uso da linguagem e questões técnicas, como a presença de legendas durante mudanças no processamento cognitivo do texto audiovisual como um todo. Apoiamos a nossa visão geral com evidências empíricas de vários estudos de rastreamento ocular realizados em várias línguas, combinações de idiomas, contextos de visualização e diferentes tipos de espectadores/ leitores, como ouvintes, pessoas com deficiência auditiva e pessoas surdas.
Chapter
The seminal work of Russian theorist Lev Vygotsky (1896–1934) has exerted a deep influence on psychology over the past 30 years. Vygotsky was an educator turned psychologist, and his writings clearly reflected his pedagogical concerns. For Vygotsky, schools and other informal educational situations represent the best cultural laboratories to study thinking. He emphasized the social organization of instruction, writing about the 'unique form of cooperation between the child and the adult that is the central element of the educational process'. Vygotsky's emphasis on the social context of thinking represents the reorganization of a key social system and associated modes of discourse, with potential consequences for developing new forms of thinking. This volume is devoted to analyzing Vygotsky's ideas as a means of bringing to light the relevance of his concepts to education. What does Vygotsky's approach have to offer education? Distinguished scholars from various countries and representing several disciplines discuss the essence and significance of Vygotsky's work, analyze the educational implications of his thoughts, and present applications in practice, addressing educational issues such as school organization, teacher training, educational achievement, literacy learning and development, uses of technology, community-based education, and special education.
Chapter
The project of a science studying all possible varieties of signs and the rules governing their production, exchange, and interpretation is a rather ancient one. Pre-Socratic poetry and philosophy are frequently concerned with the nature of natural signs and divine messages. The Hippocratic tradition deals with the interpretation of symptoms, while the Sophists were critically conscious of the power of language. Plato’s Cratylus is a treatise on the origins of words, and the Sophist can be considered the first attempt to apply a binary method to semantic definitions.