Content uploaded by Mark Feng Teng
Author content
All content in this area was uploaded by Mark Feng Teng on Jan 06, 2020
Content may be subject to copyright.
Vol.:(0123456789)
The Australian Educational Researcher
https://doi.org/10.1007/s13384-018-0279-6
1 3
Incidental vocabulary learning forprimary school students:
theeects ofL2 caption type andword exposure frequency
FengTeng1
Received: 14 March 2018 / Accepted: 14 September 2018
© The Australian Association for Research in Education, Inc. 2018
Abstract
Within instructed second language research, there is growing interest in research
focusing on primary school vocabulary learning. Research has emphasized class-
room-based learning of vocabulary knowledge, with growing focus on the potential
for using captioned videos and increased word encounters. The present study inves-
tigated the effects of various captioning conditions (i.e. full captioning, keyword
captioning, and no captions), the number of word encounters (one and three), and
the combinations of these two variables on incidental learning of new words while
viewing a video. Six possible conditions were explored. A total of 257 primary
school students learning English as a second language (ESL) were divided into six
groups and randomly assigned to a condition in which 15 target lexical items were
included. A post-test, measuring the recognition of word form/meaning and recall of
word meaning, was administered immediately after participants viewed the video.
The post-test was not disclosed to the learners in advance. The group viewing the
full captioning video scored significantly higher than the keyword captioning group
and the no-captioning group. Repeated encounters with the targeted lexical items led
to more successful learning. The combination of full captioning and three encoun-
ters was most effective for incidental learning of lexical items. This quasi-experi-
mental study contributes to the literature by providing evidence which suggests that
captioned videos coordinate two domains (i.e. auditory and visual components) and
help ESL learners to obtain greater depth of word form processing, identify meaning
by unpacking language chunks, and reinforce the form-meaning link.
Keywords Full captions· Keyword captions· Frequency· Incidental vocabulary
learning
* Feng Teng
tengfeng@uni.canberra.edu.au; markteng@life.hkbu.edu.hk
1 Department ofEducation Studies, Hong Kong Baptist University, KowloonTong, HongKong,
China
F.Teng
1 3
Introduction
Educators in Australia and worldwide have generally acknowledged the potential
for using captioned videos to enhance primary school learners’ vocabulary develop-
ment, an important part of English language learning that carries long-term implica-
tions for future academic development. Captions,1 which turn videos into a story-
book with a stream of written text presented synchronously with video and audio
reinforcement, were originally developed for the deaf or hard-of-hearing (Danan
2004). Scholars contended that the cognitive process involved in watching captioned
videos was not as overwhelming as the process for bi-modal input but instead served
as a support, offering multiple representations of the same information to learn-
ers (Vanderplank 2016). Captioned videos are informative because they supply the
learners with different channels of information, including the pictorial information,
the original sound track, and the on-screen text in the same language as the sound
track. Captioned videos are thus becoming increasingly common as a tool for teach-
ing and learning English, likely because of the expanding accessibility to authen-
tic videos via DVD, YouTube, ViewPoint, and mobile phone apps. Audio–visual
materials enhanced with captions represent a powerful pedagogical tool believed to
improve vocabulary learning (Montero Perez etal. 2018). In addition, videos can be
easily captioned by classroom practitioners and curriculum developers using soft-
ware such as Adobe Premier, MAGpie, or iMovie. Many schools across the world,
eager to enhance students’ English performance, are beginning to offer hybrid or
blended-instruction courses in which some instruction is conducted within the class-
room and some occurs independently outside the classroom (Teng 2017). Such
classes naturally incorporate more online content, which often includes captioned
videos.
The increasing use of captioned videos, particularly in multimedia-oriented
classrooms, marks a shift away from the print medium towards digital media. This
shift has promoted the development and distribution of resources for learning and
teaching. Ubiquitous computer and Internet technologies have also allowed text and
graphics to be combined in different ways, resulting in new methods for knowledge
representation along with novel ways in which this knowledge is communicated
and, in many cases, learned (Chan and Unsworth 2011). Given that online process-
ing demands pose challenges to vocabulary learning through audio–visual input
(Montero Perez etal. 2014), exploring whether the various sources of information
in captioned videos can be processed simultaneously for learners’ incidental vocab-
ulary learning is of interest. This topic has been debated internationally because
incidental vocabulary learning has only demonstrated limited achievements thus far
(Webb and Nation 2017).
1 To clarify the technical terms used in this field, it is necessary to distinguish subtitles, which refer to
on-screen text in students’ native language combined with a second language soundtrack, from captions,
which refer to on-screen text in a given language combined with a soundtrack in the same language. For
the purposes of this research, the term “captions” is used throughout to avoid repeated, potentially con-
fusing shifts in terminology.
1 3
Incidental vocabulary learning forprimary school students:…
Discerning practical and effective ways to enhance students’ vocabulary is essen-
tial, as vocabulary is the primary building block for language learning (Teng 2018a;
Webb and Nation 2017). However, teaching primary school students English as a
second language (ESL) is no easy task. These learners may lack the attentional con-
trol to assimilate the given term for subsequent use (Koolstra and Beentjes 1999).
Learners may also find it difficult to establish a form-meaning link, defined as “the
assignment of meaning to the orthographical representation of the word” (Rott 2007,
p. 166). Earlier research has revealed that incidental vocabulary learning—namely,
the learning of a new word or expression without conscious intent to commit the ele-
ment to memory, e.g. “picking up” an unknown word from language input (Hulstijn
2013)—was challenging for students learning English as a foreign language (EFL)
(Teng 2016a; Webb and Chang 2015). Limited language processing during exposure
to language input may not allow learners to derive a term’s meaning and construct a
form-meaning link (Hulstijn 2001).
The main factors affecting incidental vocabulary learning performance include
target word frequency (Reynolds and Teng 2018; Reynolds and Wible 2014) and
contextual information provided to infer the meaning of target words (Teng 2016b;
Webb 2008). Different methods have been utilized to boost learners’ vocabulary
through input, such as providing captions to enhance comprehension of audio–visual
input (Montero Perez etal. 2018) and increasing encounters with target words (Chen
and Teng 2017). The larger portion of these studies was conducted in EFL contexts
in which, besides the language learning classroom, learners had little exposure to
the target language. In contrast, learners in ESL contexts that are immersed in Eng-
lish-speaking environments surrounded by an abundance of incidental vocabulary
exposure have received less attention from vocabulary researchers. While word fea-
tures (e.g. recurrences and use of captions) have the potential to attract learners’
attention and the salience of novel words, the use of these features and learners’
increased attention to the words is only the initial step leading to gains in vocabulary
knowledge.
Captioned videos may offer a new perspective in exploring incidental vocabulary
learning. However, if learners are not allowed to return to a previous word or sen-
tence while watching videos, vocabulary learning through audio–visual input could
be limited due to rapid online processing demands and the difficulties in guessing
unknown words (Winke et al. 2010). This limitation suggests a need to investigate
the interplay between types of captions and word exposure frequency for incidental
vocabulary learning. Although studies have analysed these two options separately,
research combining the use of captions and word-exposure frequency for students’
vocabulary learning remains underexplored. Primary school ESL learners with lim-
ited English proficiency tend to stay away from printed reading material and exhibit
processing difficulties in English syntax, vocabulary, accessing phonological repre-
sentations, making inferences, understanding figurative language, and using short-
term memory efficiently (Koolstra and Beentjes 1999). Thus, the combination of
captioned videos and word exposure frequency may provide theoretical and practical
possibilities to enhance incidental vocabulary learning in primary school students.
Given international research interest in captioned videos, the present study
explores learners’ incidental learning of new lexical items while watching videos.
F.Teng
1 3
Two variables were considered: the frequency of exposure to target words (one
occurrence and three occurrences) and the type of captioning (full captioning, key-
word captioning, and no captions); alone and in combination. Specifically, this study
investigates (1) the effect of the type of captioning on initial learning of unknown
words from audio–visual input, (2) the effect of word exposure frequency on the ini-
tial learning of unknown words from audio–visual input, and (3) the effect of inter-
play of the type of captioning and word exposure frequency on incidental vocabu-
lary learning. This quasi-experimental study aimed to provide insight into the value
of using captioned videos for primary school ESL learners. The author also sought
to measure the combined effects of L2 caption types and word exposure frequency
on incidental word learning, thereby contributing to the limited body of empirical
literature that have integrated these two variables.
Literature review
Theoretical frameworks forcaptions
Theoretical frameworks supporting the effectiveness of captions on language learn-
ing include Paivio’s (1986) dual-coding theory, Baddeley’s (1986) working memory
model, and Fletcher and Tobias’s (2005) multimedia principle. The dual-coding the-
ory asserts that two separate representational systems—the verbal system and the
imagery system (mental images and pictures)—can co-activate each other as dually
coded items are linked by rich and meaningful referential interconnections. The
written text provided in captioned videos fulfils the function of supplying a synop-
sis of a dynamic speech, resulting in learners obtaining better language recall and
using new words more appropriately when learned in direct association with appro-
priate nonverbal referents (e.g. objects or experiential elements, such as events and
emotions).
Baddeley’s (1986) working memory model (Fig. 1) covers three components:
central executive, visuo-spatial sketchpad, and phonological loop, of which the latter
two are memory subsystems. The central executive component—a memory control
system—is responsible for coordinating information retrieved from the two stor-
age subsystems. The visuo-spatial sketchpad—a component of visual coding—is
involved in handling spatial imagery information. Acoustic or phonological coding,
represented by the phonological loop, plays a pivotal role in learning reading and
vocabulary. Based on the model, it is assumed that the effectiveness of processing
Fig. 1 Baddeley’s (1986) work-
ing memory model
1 3
Incidental vocabulary learning forprimary school students:…
knowledge can be enhanced if verbal and pictorial representations of textual infor-
mation are presented simultaneously via auditory and visual output.
The multimedia principle asserts that learners can break down aural input from
videos into meaningful units and visualize it, thus gaining better comprehension of
the text (Fletcher and Tobias 2005). The multimedia principle also proposes that
adding pictures to words, instead of presenting text alone, may help learners better
understand and identify the meaning of textual information. Integrating images and
words may help learners better understand information and guide learners to selec-
tively attend to key items in the presented information.
In light of these frameworks, using captioned videos appears to be a promising
means of instruction to help learners grasp visual cues and language input. Dual-
coding theory (Paivio 1986), the working memory model (Baddeley 1986), and the
multimedia principle (Fletcher and Tobias 2005) offer sophisticated explanations
of the benefits of captioned videos. The bi-modal input provided by captioned vid-
eos serves as a support rather than a hindrance (Vanderplank 2016), offering learn-
ers multiple representations of the same language input. Captioned videos appear
to provide easily understandable input (Krashen 1985) and may guide learners in
developing cognitive processes and gathering attentional resources to notice and use
language input (Schmidt 2001). Visual associations with memory coupled with the
mnemonic power of imagery support the potential utility of video input, thus lead-
ing to better performance while learning L2 vocabulary when words are associated
with actual objects or imagery (Vanderplank 2016).
Captions andincidental vocabulary learning
Existing studies have explored the effectiveness of captions in enhancing vocabu-
lary. For example, Montero Perez et al. (2014) investigated the effects of types of
captioning on incidental learning of unknown words by university students across
four experimental groups. The first group watched videos without captions (n = 32);
the second watched videos with captioned keywords (n = 34); the third watched
fully captioned videos (n = 30); and the fourth watched fully captioned videos with
highlighted keywords (n = 37). Results indicated that captioning did not affect mean-
ing recall (i.e. ability to supply a meaning of a target word), but greatly enhanced
form recognition (i.e. ability to identify the correct target word). Although captions
appeared to have the potential to help learners build form-meaning connections in
their mental lexicon, captions were more helpful in recognizing the meaning of new
words (Neuman and Koskinen 1992). In contrast to Winke et al.’s (2010) findings
that revealed a beneficial effect of captioning on meaning recall, recalling target
word meanings with the help of captioning appeared to remain challenging for some
students.
As acknowledged by Montero Perez etal. (2014), the reasons for this difficulty
included the following: (a) captioning provided little information regarding the
meaning of difficult words, rendering meaning construction based on learners’
inferred process unreliable; (b) learners were not given sufficient time to infer word
meaning from context while watching the videos; and (c) inferring word meaning
F.Teng
1 3
was a challenging and unsuccessful process, and the meaning recall test adminis-
tered to the learners after viewing the video once was too demanding. Similarly,
Peters et al. (2016) conducted two exploratory studies investigating the effect of
first language (L1) subtitles and captions on various aspects of word knowledge (i.e.
form recognition and meaning recall). Findings showed a positive effect of captions
on the word form learning, but not on the learning of meaning for the Belgian Eng-
lish learners. Two factors contributed to these results: learners’ vocabulary size and
the frequency with which the target words occurred. Montero Perez etal. (2018)
recently focused on measuring the effects of L2 captioning types (i.e. no caption-
ing, keyword captioning, full-captioning, and glossed keyword captioning) and
test announcement (i.e. informing vs. not informing students that a vocabulary test
would be administered after viewing the video) on incidental vocabulary learning.
A total of 227 Dutch-speaking university students participated in the study. Find-
ings revealed that students exposed to glossed keyword captioning scored highest
on form recognition and meaning recall. However, the test announcement did not
affect vocabulary learning results. Factors influencing test results included learners’
vocabulary size and their look-up behaviour in the glossed keyword condition.
Three general conclusions can be drawn from these studies. First, although par-
tial learning gains were demonstrated, the captions potentially helped learners con-
struct an initial form-meaning map in their mental lexicon. This result suggests that
captions helped learners recognize the form and meaning of target words. Second,
although captioning was found to be a powerful tool in helping learners recognize
word form and meaning, learners still encountered great challenges in recalling
word meaning. The success rate of captioning appeared to be dependent on the test
modality; namely, the aspect of word knowledge being tested. Noticing form is the
first step in the vocabulary-learning process (Hulstijn 2001) but is neither linear nor
guaranteed to lead to meaning recall. As proposed by the Noticing Hypothesis, input
does not become intake for language learning unless it is noticed (i.e. consciously
registered) (Schmidt 1990, 2001). Third, captions facilitated learners’ attentional
resources in comprehending the novel words that appeared in a video. The compre-
hension was a conscious selection process based on noting and gathering informa-
tion, which might have helped learners reflect on and notice disparities while com-
paring their L2 knowledge with captioned video input. Although existing studies
generally agree upon the benefits of captions in the context of vocabulary learning,
further investigation is required to determine whether the frequency of word occur-
rence is an important mediating factor in the potential for captions to facilitate inci-
dental vocabulary learning.
Frequency ofword occurrence andincidental vocabulary learning
Research on incidental vocabulary learning through reading (Pellicer-Sánchez
2016), listening (Van Zealand and Schmitt 2013), and reading-while-listening
(Teng 2016a) has demonstrated that although no consensus exists regarding the pre-
cise number of encounters required for successful incidental vocabulary learning,
repeated encounters with a lexical item occurring in the language input appears to
1 3
Incidental vocabulary learning forprimary school students:…
exert a positive effect on word learning. Even though word exposure frequency has
been shown to influence word learning, the gain from listening is apparently much
smaller than from reading (Vidal 2011), and the gain from reading appears even
smaller than from reading-while-listening (Teng 2016a). Further, the frequency of
word occurrence from reading-while-listening input has the potential to positively
affect word learning.
Few incidental vocabulary learning studies have explored the role of word expo-
sure frequency while viewing videos. Rodgers (2013) proposed an average correla-
tion (r = 0.30) between word exposure frequency and incidental vocabulary learn-
ing in a vocabulary test administered after viewing a captioned video. Peters etal.
(2016) argued that although the effect of word exposure frequency was related to
learners’ vocabulary size, repeated encounters with unknown words might increase
the likelihood of an item being noticed and retained. However, the effect of word
exposure frequency was larger in Experiment 2 (a Simpsons episode) than in Exper-
iment 1 (a documentary). This difference might be explained by the relevance of
some of the target words to input comprehension.
The effect of word exposure frequency was contingent on various factors in ear-
lier studies on incidental vocabulary learning, especially participants’ vocabulary
size (Peters etal. 2016), word form variation (Reynolds 2015), and the context sur-
rounding a word (Teng 2016b; Webb 2008). Learners may have also struggled to
connect a new word form with its meaning when learning incidental vocabulary
simply from reading (Webb and Nation 2017). Captions, which provide audio–vis-
ual input, may therefore help learners to better establish a linking map between the
form and meaning of an unknown word with fewer word encounters.
The present study
Taken together, the results of the preceding studies indicated that using captions
may represent an effective approach to increase incidental vocabulary learning. In
addition, repeated encounters with a lexical item were likely to enhance incidental
vocabulary-learning performance. The current study differed methodologically from
previous work in measuring the interplay of captions and word exposure frequency,
specifically in assessing the combined effects of captions and word exposure fre-
quency on incidental word learning. Therefore, this study is expected to enrich the
limited information available in this context. The aim of this work is to discover an
optimal combination of captioned glosses and word exposure frequency. The present
study will thus provide substantial insight into incidental vocabulary learning from
captions, specifically by addressing the following questions:
1. To what extent does incidental learning of new words differ between three cap-
tioning types—full captioning, keyword captions, and no captions—when the
three conditions include the same number of encounters with target words?
2. To what extent does incidental learning of new words differ between the two
conditions involving different word occurrence frequencies (i.e. one and three)
for each captioning type?
F.Teng
1 3
3. To what extent does incidental learning of new words differ between specific
combinations of word exposure frequency and captioning conditions?
Method
Research design
The present study adopted a 2 × 3 between-subjects design. The first independent
variable was the frequency of target word exposure (one occurrence and three occur-
rences). The second independent variable was the type of captioning: full captions,
keyword captions, and no captions, as recommended by Montero Perez etal. (2013).
The combination of these independent variables resulted in six experimental groups.
Details on the combinations of various types of captioning and word encounter fre-
quencies among the six groups are shown in Table1.
Participants
Participants included 257 students in Grade 6 (131 boys, 126 girls) from six primary
schools in Hong Kong. All schools were Hong Kong direct subsidy scheme schools,
using English as the main language of instruction. English language instruction
for students began in kindergarten. All participants (Mage = 11.67, SD = 1.17) were
learning ESL and spoke Chinese (Cantonese or Putonghua) as their L1. Approxi-
mately 100 sixth grade students attended each school. With the help of six primary
school teachers, a vocabulary-level test (see “Measures”) was administered to all
students from the chosen schools. All participating teachers were willing to help
with this study; the author and teachers knew each other, as they belonged to the
same Christian church. The vocabulary test served to measure participants’ vocabu-
lary proficiency.2
At the beginning of the study, 278 students who did not differ significantly
in their English vocabulary proficiency were selected. Participants and their par-
ents were informed that this study would involve watching video clips, complet-
ing corresponding English exercises, and learning English with the assurance that
students’ performance in this experiment would not affect their normal studies.
Consent was obtained from the participants, their parents, the teachers, and the
schools. Data from 21 learners whose first language was not Chinese (e.g. Tamil)
were not included for analysis; hence, the final dataset consisted of 257 students,
with 42 (21 boys, 21 girls), 43 (23 boys, 20 girls), 41 (21 boys, 20 girls), 46 (22
boys, 24 girls), 43 (22 boys, 21 girls), and 42 (22 boys, 20 girls) participants
2 Information on the validation of this test is described in Nation and Gu (2007). This test has been suc-
cessfully applied by Teng (2016b). The selected students’ average score was 22.24 out of 30 on the 2000-
word level test, and these scores did not differ significantly among the six experimental groups (Group 1:
21.65 out of 30; Group 2: 22.84 out of 30; Group 3: 22.68 out of 30; Group 4: 22.65 out of 30; Group 5:
21.75 out of 30; Group 6: 21.87 out of 30; p = 0.65).
1 3
Incidental vocabulary learning forprimary school students:…
Table 1 Combination of the various types of captioning and word encounter frequency in the six groups
Captioning types Word exposure frequency (3 times) Word exposure frequency (1 time)
Full captioning Group 1: Full captioning + 3-times word occurrence (n = 46) Group 2: Full captioning + 1-time word occurrence (n = 42)
Keyword captioning Group 3: Keyword captioning + 3-times word occurrence (n = 43) Group 4: Keyword captioning + 1-time word occurrence
(n = 43)
No captioning Group 5: No captioning + 3-times word occurrence (n = 42) Group 6: No captioning + 1-time word occurrence (n = 41)
F.Teng
1 3
recruited from each school, respectively. Participants from each school were
assigned to a single group using convenience sampling (Table 1). That said,
each school was randomly assigned to a group. Each teacher was assigned to one
group. The six participating teachers had taught in primary schools for at least six
years.
Materials
Video selection
A clip from a series of English video stories for young learners was selected for
this study. The video Polar Exploration was chosen, which shows how scientists use
robots to explore underwater mountain ranges in the Arctic Ocean. A short video
with a certain number of target words occurring three times was unavailable. Scripts
were discussed and edited to allow for target words occurring three times in appro-
priate contexts. Ultimately, two versions of the scripts were approved: one with tar-
get words occurring thrice for Groups 1, 3, and 5, and one with target words occur-
ring once for Groups 2, 4, and 6. A native speaker was invited to watch the original
video and read the story aloud in time with the predetermined caption speed. The
program Wondershare Filmora was used to replace the original audio track (i.e.
stream) with a new audio file created by the native speaker. Thus, the video included
a single narrator providing background information. The clip with target words
occurring three times ran for 28min and 59s, whereas the clip with target words
occurring once ran for 26min and 52s. Rodgers and Webb (2017) determined that
to obtain sufficient L2 aural input, learners should be advised to view full-length
episodes, such as those ranging from 22 to 42min.
Discussion with the six English teachers indicated that the difficulty level, topic,
and image functionality of the clip were appropriate for the participants. The teach-
ers noted that the video images would support the content without being highly
explicit. Captions presented a highly synchronized, nearly verbatim text to accom-
pany the soundtrack. The caption speed was set at 90 words per minute, a reason-
able rate for children’s television programs (Tyler et al. 2009). The video clip with
words occurring three times contained approximately 2700 words, of which nearly
65% belonged to the 1000-word vocabulary frequency level with 31% in the 2000-
word vocabulary frequency level. The clip with words occurring once contained
approximately 2600 words, of which nearly 65% were at the 1000-word level while
30% were at the 2000-word level. These proportions were measured using T. Cobb’s
Vocabprofile (Cobb n. d.). Considering participants’ vocabulary levels, the author
assumed that learners would not struggle to comprehend videos with 95% of their
words assessed at lower than the 2000-word level (Nation and Gu 2007), allowing
them to focus on extracting the meaning of the target words.
1 3
Incidental vocabulary learning forprimary school students:…
Video withcaptions
The author added full or keyword captions for the video using MAGpie.3 Full cap-
tions represented verbatim transcription of dialogues and keyword captions repre-
sented a single word (e.g. decorate) or a maximum of three consecutive words (e.g.
strong and courageous). Keywords were defined as those words in a sentence that
were essential for the learners to construct meaning. The six teachers read the scripts
and highlighted keywords, representing approximately 17.4% of the total words
(e.g. 472 out of 2700 words). To make the keyword condition comparable to the full
captioning group, target words were included among the keywords. The keywords
appeared in isolation and were centred on the captioning line. The presentation time
for keywords was approximately one second (Guillory 1998), but the exact time
considering the length and type of the keyword (e.g. a single-word item or multi-
word unit) was increased or decreased accordingly.
Target words
Potential target words were selected from the video clip based on discussion with the
six teachers. Initially, teachers selected approximately 30 items from the scripts. A
pilot study was conducted with 40 learners who were not involved in this study but
had similar vocabulary frequency levels. They were required to provide the meaning
and then choose one correct L1 translation from four options for each target word.
Participants’ knowledge of the test items was not pretested to avoid a test effect in
which participants might discern form-meaning links by completing a meaning rec-
ognition test. Fifteen items that were unknown to the learners and occurred three
times in the video were chosen, including luxurious, visible, enormous, horrible,
isolated, decorate, erupt, accompany, twist, identify, adventure, advantage, oxygen,
eggshell, and obstacles. These items consisted of an equal number of adjectives,
verbs, and nouns, which are common parts of speech in natural texts. The same set
of target words was used for each group.
Measures
Vocabulary levels test
To ensure the equivalence of vocabulary knowledge among participants, individuals
were selected based on their scores on the Vocabulary Levels Test (VLT; Schmitt
etal. 2001). VLT is a tool that measures written receptive vocabulary knowledge.
VLT assesses learners’ knowledge at four frequency levels of English word families:
2000, 3000, 5000, and 10,000. This test was suggested as an appropriate measure for
3 MAGpie (Media Access Generator) is a free online tool used to create the L1 Chinese or L2 Eng-
lish subtitles. More information is available at http://ncam.wgbh.org/inven t_build /web_multi media /tools
-guide lines /magpi e.
F.Teng
1 3
Hong Kong primary and secondary school learners’ vocabulary size (Tang 2007).
Based on a pilot study involving a group of learners with similar proficiency levels
and backgrounds, the 2000-word level was deemed suitable for participants in this
study; the participating teachers also confirmed the suitability of this instrument.
The Cronbach’s alpha for this test was 0.88, indicating a high level of reliability.
Post‑test tomeasure vocabulary performance
Participants’ learning gains were measured using a computerized vocabulary test
developed by the author. This test consisted of three parts. The first part measured
receptive form recognition in which the learners were required to select “yes” if the
target word appeared in the clip or “no” if it did not. If learners selected “yes”, the
computer automatically proceeded to the next part, which measured meaning recall.
If learners selected “no”, the computer instead proceeded to the next word. The
meaning recall test was a productive test in which learners were required to provide
the L1 translation of a given target item. Following this step, learners were oriented
to a multiple-choice word meaning recognition test. They were required to select the
correct response from four Chinese translations (e.g. luxurious: A. 漂亮的 B.可愛
的 C. 奢侈的 or D. 真實的). The sequence of test sections was designed to prevent
test effects. The design of the multiple vocabulary test parts was intended to track
slight increases in word knowledge, thereby building a form-meaning link. Accord-
ing to Nation (2001), the range of vocabulary knowledge includes word form, mean-
ing, and use. However, at the beginning of the incremental learning process, meas-
uring the form-meaning link was deemed most appropriate (Schmitt 2010). Thus,
various vocabulary measures were used in this study to examine different stages of
learning word form and meaning. Another set of 15 words randomly selected from
the first1,000-word list was randomly added to the target words to keep learners
from focusing solely on the target words. Cronbach’s alpha for the test parts ranged
from 0.81 to 0.85, indicating acceptable reliability.
Scoring
All test parts were scored binomially, awarding one point for a correct answer and
zero for an incorrect answer. Two raters who did not teach any of the six groups
were invited to rate the test independently. Answers for which raters were unable to
reach a consensus were examined by a third rater. The final decision was based on
majority opinion. Interrater reliability for meaning recall was 0.98, and full consen-
sus was reached for the form and meaning recognition test.
Procedures
All students were asked to watch the video and focus on its content without being
aware of the main objective of the experiment (i.e. to assess the effect of caption-
ing types and the frequency of word occurrences). The vocabulary test was not
announced; therefore, this study could be considered an incidental vocabulary
1 3
Incidental vocabulary learning forprimary school students:…
learning study (Hulstijn 2013). During the learning session, each student worked
individually on a personal computer and wore a headset. The selected video clip
was available via a hyperlink, and students were allowed to watch it only once. The
author and one teacher ensured the participants followed the procedure precisely and
had no trouble watching the videos. Learners immediately proceeded to complete
the test once they finished watching the video. Thirty minutes were alloted for the
test, as suggested by the pilot study. The experiment took approximately an hour to
complete.
Data analysis
Multivariate analysis of variance (MANOVA) was employed to measure the over-
all captioning condition and word exposure frequency effects on each of the three
aspects of word knowledge and interaction between the two main variables. A one-
way ANOVA with repeated measures was performed to compare effects of the three
caption types in each frequency band. A least-squares means procedure, a post hoc
method to compare conditions, was performed to compare the frequency effects
(once vs. three times) in each captioning condition. Effect sizes were calculated
using η2, with 0.01 regarded as a small effect, 0.06 as a medium effect, and 0.14 as a
strong effect (Cohen 1988). The significance level was set at 0.05.
Results
Table 2 presents the descriptive statistics for the three vocabulary post-test parts.
Within the ‘one word occurrence’ condition, full captioning appeared to lead to a
better outcome than the keyword captioning and no captions. Within the ‘three word
occurrences’ condition, full captioning also led to better performance than keyword
captioning and no captions. Within each captioning condition, three-time word
occurrences also yielded better performance than a single occurrence. Overall, the
group exposed to full captioning with target words occurring thrice demonstrated
Table 2 Descriptive statistics of vocabulary test parts
The maximum score for each test part is 15 points
Word exposure frequency Type of captioning nForm recogni-
tion
Meaning
recall
Meaning
recognition
MSD MSD MSD
One occurrence Full captioning 42 11.78 3.02 4.57 3.06 7.91 2.98
Keyword captioning 43 7.08 3.21 2.01 3.08 5.71 2.92
No captions 41 4.89 3.01 0.14 3.12 3.02 2.93
Three occurrences Full captioning 46 13.14 2.99 7.01 3.07 10.15 2.96
Keyword captioning 43 10.12 3.02 4.89 3.05 7.05 3.08
No captions 42 7.02 3.06 2.45 3.08 4.56 2.97
F.Teng
1 3
the best performance among the groups (form recognition: 13.14; meaning recall:
7.01; meaning recognition: 10.15).
The first research question addressed the extent to which incidental learning of
new words differed between the three captioning types—full captioning, key word
captioning, and no captions—within each word exposure frequency range. A one-
way repeated measures ANOVA was performed (Table3). The F values and p val-
ues (p < 0.05) in Table3 revealed significant differences in learners’ vocabulary per-
formance when encountering target words in the three captioning conditions. Full
captioning was found to yield the best performance in learning new words, and this
pattern was consistent across the three word knowledge dimensions.
The second research question addressed the extent to which the two word expo-
sure frequency conditions (one and three occurrences) differed in terms of learning
new words under identical captioning conditions. The present study compared fre-
quency effects in each captioning condition through the least-squares means proce-
dure. The findings in Table4 indicate that the three-occurrence condition yielded
significantly better results than the one-occurrence condition (p < 0.05), implying
that encountering words thrice was more beneficial to participants irrespective of
the captioning conditions.
The third research question investigated how all six experimental conditions—
combinations of word occurrence frequency and captioning conditions—con-
trasted with each other. MANOVA was performed after prerequisites were veri-
fied. Correlations between the three test parts were checked to analyse the effects
of the independent variables on the dependent variables. The three test parts were
significantly correlated: the form recognition test correlated significantly with the
Table 3 Types of captioning effect per word exposure frequency band (F-value)
*p < 0.05
Meaning recall Meaning recognition Form recognition
One occurrence: full caption-
ing, keyword captioning, and
no captions
11.13 (2, 123)* 10.21 (2, 123)* 11.14 (2, 123)*
Three occurrences: full cap-
tioning, keyword captioning,
and no captions
10.78 (2, 128)* 10.15 (2, 128)* 12.25 (2, 128)*
Table 4 Comparison of word exposure frequency effect per captioning condition (F-value)
*p < 0.05
Meaning recall Meaning
recogni-
tion
Form recognition
Full captioning: 1 occurrence versus 3 occurrences 10.01* 10.24* 11.31*
Keyword captioning: 1 occurrence versus 3 occurrences 10.14* 12.15* 11.51*
No caption: 1 occurrence versus 3 occurrences 11.25* 12.22* 11.36*
1 3
Incidental vocabulary learning forprimary school students:…
meaning recall test (r = 0.49, p < 0.05) and the meaning recognition test (r = 0.48,
p < 0.05). The meaning recall test correlated significantly with the meaning recogni-
tion test (r = 0.41, p < 0.05). The significant correlation between the three test parts
suggested the suitability of multivariate analysis. Thus, the two-way MANOVA was
conducted. The results showed a significant main effect of type of captioning on the
three test parts [Wilk’s lambda F(3, 254) = 19.20, p < 0.05, η2 = 0.08]. The analy-
sis also revealed a significant main effect of word exposure frequency on the three
test parts [Wilk’s lambda F(3, 254) = 18.20, p < 0.05, η2 = 0.07]. A significant inter-
action effect was not detected between type of captioning and word exposure fre-
quency [Wilk’s lambda [F(3, 254) = 18.26, p > 0.05]. Although t he η2 value showed
a medium effect (ranging from 0.7 to 0.9), the practical significance was moderate to
Table 5 Results of MANOVA on vocabulary learning
*p < 0.05
Test Source df p η2
Form recognition Type of captioning 2 < 0.05* 0.07
Word exposure frequency 1 < 0.05* 0.08
Type of captioning × word exposure frequency 2 > 0.05 –
Word meaning recall Type of captioning 2 < 0.05* 0.08
Word exposure frequency 1 < 0.05* 0.07
Type of captioning × word exposure frequency 2 > 0.05 –
Word meaning recognition Type of captioning 2 < 0.05* 0.08
Word exposure frequency 1 < 0.05* 0.09
Type of captioning × word exposure frequency 2 > 0.05 –
Error 254
Table 6 Post hoc Tukey mean comparisons of captioning and ‘number of encounters’ combinations
FC full captioning, KC keyword captioning, NC no captioning
*p < 0.05, **p < 0.001
Test types Comparisons of different combinations of conditions
Meaning recall FC + 1 time KC + 1 time NC + 1 time
FC + 3 times 6.81* 7.35* 10.85**
KC + 3 times 1.75 8.15* 9.12*
NC + 3 times 1.15 1.26 6.18*
Meaning recognition FC + 1 time KC + 1 time NC + 1 time
FC + 3 times 8.21* 7.01* 11.89**
KC + 3 times 1.85 7.18* 8.20*
NC + 3 times 2.01 2.05 7.11 *
Form recognition FC + 1 time KC+1 time NC + 1 time
FC + 3 times 7.85* 6.18* 12.21**
KC + 3 times 1.62 6.78* 7.12*
NC + 3 times 2.11 2.18 6.55*
F.Teng
1 3
high (Cohen 1988). This outcome suggested a considerable effect of captioning type
and word exposure frequency on the three test parts (Table5).
Regarding the post hoc Tukey’s tests, Table 6 shows the differences in mean
scores between all possible pairs of experimental conditions. Irrespective of test
types, learners who encountered words thrice in the full captioning condition signifi-
cantly outperformed those who encountered words only once in the full captioning
(p < 0.05), keyword captioning (p < 0.05), and no captions conditions (p < 0.001).
However, encountering words thrice in the keyword captioning condition was not
found to be more effective than encountering them once in the full captioning condi-
tion (p > 0.05) but was determined to be more effective than encountering the words
once in the keyword captioning (p < 0.05) and no captions conditions (p < 0.05).
Likewise, encountering words thrice in the no captions condition was not observed
to be more effective than encountering words once in the full captioning (p > 0.05)
and keyword captioning conditions (p > 0.05) but was found to be more effective
than encountering the words once in the no captions condition (p < 0.05). These pat-
terns suggest that type of captioning and word exposure frequency each exerted a
pronounced effect on the learning of new words. Overall, the combination of three
encounters and the full captioning condition yielded the best results, which were
significantly better than any of the other five combinations. This finding remained
consistent across all dimensions of the vocabulary test.
Discussion
The present study explored incidental vocabulary learning for primary school ESL
students insituations using two variables and their combinations. In total, six exper-
imental combinations (two word exposure frequency conditions × three captioning
types) were examined. Learners encountered the same 15 target words for each com-
bination and were monitored through a computerized post-test administered imme-
diately after the intervention. The post-test analysed three dimensions of vocabulary
knowledge: word form recognition, recall, and word meaning recognition.
The results demonstrated that, when provided with target-language video mate-
rial, primary school ESL students demonstrated improved incidental vocabulary
learning to the greatest extent when exposed to fully captioned videos, followed
by keyword captioned videos and non-captioned videos. In line with a study by
Montero Perez et al. (2013), full captioning produced greater gains in language
learning than keyword captioning and no captioning. This finding corroborated pre-
vious findings regarding the benefits of full captioning on vocabulary learning (e.g.
Montero Perez etal. 2018; Peters etal. 2016). Consequently, enhanced vocabulary
learning through fully captioned videos seemed plausible for primary learners con-
sidering they mostly learned L2 vocabulary in a classroom setting where they were
heavily dependent on printed materials, as opposed to native speakers for whom it
would be easier to learn vocabulary in an informal setting outside the classroom.
Captioning, through a combination of different visuals (i.e. pictures and words)
paired with auditory stimuli, possibly helped the primary school students establish a
connection between the spoken word and the printed word while drawing upon their
1 3
Incidental vocabulary learning forprimary school students:…
background knowledge, vocabulary, and comprehension strategies (Vanderplank
2016).
Learners’ mean scores in the keyword captioning group were lower than scores
in the full captioning group. These results are consistent with most studies in which
the beneficial role of full captions has generally been acknowledged. Similar to
previous research (e.g. Guillory 1998; Montero Perez etal. 2013), learners in the
keyword captioning condition demonstrated a weaker form-meaning link. However,
this result contradicts findings reported by Montero Perez etal. (2018), which sug-
gested that keyword captioning with access to meaning may enhance the quality of
students’ focus and result in better incidental vocabulary learning performance than
the full captioning condition. Montero Perez etal. (2018) provided glosses in the
keyword captioning condition, which may have led to a higher uptake rate for the
form-meaning link. In the present study, an explanation for the more pronounced
effects of full captioning video may be that full captions provided more information
for learners to process the linguistic message. It stands to reason that full caption-
ing provided more semantic and syntactic context clues from which students could
derive meaning. In contrast, the keyword captions, which represented only 17.4%
of the total script, may not have provided enough information. Further studies on
keyword captions using a higher percentage of words in the script may be needed to
substantiate this explanation.
The benefits of fully captioned videos in incidental vocabulary learning could
be explained through Baddeley’s (1986) working memory model, wherein the use
of two separate perceptual domains (i.e. a visual and verbal task) do not inter-
fere with each other. The coordination of verbal associations and visual imagery
is assumed to be governed by the central executive, a supervisory system that
controls the flow of information, and the episodic buffer, a limited capacity sys-
tem that provides temporary storage of information (Baddeley 2000). In other
words, while learners receive imagery information, they may be able to derive
verbal information from the auditory channel, supporting a dual-modal presen-
tation technique (Xu etal. 2008). These results can also be explained through
Paivio’s (1986) dual-coding theory. According to Paivio, perceptual associa-
tions with image coding and semantic associations with verbal coding reinforces
connections between the dual-modal representations. In other words, coding
a stimulus (in this case, vocabulary) in two distinct ways promotes more effec-
tive recall than if the stimulus were coded through either representation alone.
However, the above-mentioned models or frameworks did not address a form-
meaning link regarding how new words could be effectively retained and easily
integrated into memory. Two forms of mental representation, verbal and picto-
rial, interact with learners’ prior knowledge and form an enhanced level of work-
ing memory available for processing concrete information related to new words.
Given the results and insights, the present study proposed a new model (Fig.2) to
delineate the dual-modal presentation technique on enhancing the form-meaning
link of new words learned from captioned videos. In this model, it is assumed
that captioned videos may provide opportunities for learners to carry out cogni-
tive coordination of the two different perceptual domains (i.e. auditory and visual
components) because information presented in each modality was insufficient for
F.Teng
1 3
meaning-making. The dual modal representation (integration of verbal modal
from ears and pictorial modal from eyes), while interacting with prior knowledge,
may facilitate learners’ working memory for a form-meaning link of new words.
Learners’ working memory—functioning like a central executive—coordinate the
information retrieved from the two storage subsystems (visuo-spatial sketchpad,
and phonological loop). The coordination of information facilitates a form-mean-
ing link of new words.
In terms of the effectiveness of word occurrence frequency in each caption
condition, participants demonstrated greater gains in learning new words when
target words were encountered more often. This observation conformed to previ-
ous studies (e.g. Peters etal. 2016; Teng 2016a, b; Webb 2007), which suggested
that repeated encounters with unknown words increased the likelihood of notic-
ing these items. Repeated encounters with target words may help learners attend
to and notice linguistic features of input (Schmidt 2001). Incidental vocabulary
learning is largely a side effect of linguistic processing, wherein attention must be
focused. Word exposure frequency appears to help learners focus on word form
(i.e. pronunciation and spelling) along with the available cues in audio–visual
Fig. 2 Dual modal representation in facilitating a form-meaning link of new words
1 3
Incidental vocabulary learning forprimary school students:…
input that can lead to meaning identification, encouraging more effective inci-
dental vocabulary learning. Additionally, the frequency effect resulted in substan-
tial improvements in incidental vocabulary learning irrespective of the captioning
type, contributing to knowledge in this line of research. These findings did not
reveal the exact number of encounters necessary for incidental vocabulary learn-
ing; however, this parameter should be considered when developing videos for
primary school students. In addition, the odds of learning a word increased in line
with the frequency of word encounters while watching a video.
The combination of full captioning and three encounters of a word greatly
enhanced incidental word learning. For example, in the post-test, learners recog-
nized 13.14 out of 15 (87.6%) word forms. The learners also managed to recognize
word meaning (10.15 out of 15; 67.6%) and recall word meaning (7.01 out of 15;
46.7%). These outcomes suggest that students learned a notable degree of vocabu-
lary knowledge, especially in recognizing form-meaning links. Their performance is
encouraging, given that this study was based only on incidental learning with young
learners using a post-test administered following a 25-min video. In a recent study
(Montero Perez etal. 2018), learners recognized only 10.47 out of 18 (58.1%) word
forms in the full captioning condition, but the effect of word exposure frequency
on the full captioning condition further improved incidental word learning in the
present study. Given these positive results, the combination of full captioning and
three encounters could be argued to have reduced learners’ decoding load and pro-
vided them more time to allocate attention towards interpretation (e.g. Markham
etal. 2001; Pulido 2007; Rodgers and Webb 2017) or reduced demand for cognitive
workload required for information comprehension (Peters etal. 2016), thus assisting
them to establish a form-meaning link (Winke etal. 2010).
Although the provision of on-screen text encouraged primary school ESL stu-
dents to notice word form, they may still have trouble linking the meaning of the tar-
get item to the L2 aural form in a verbal stream. For example, learners in the present
study who encountered a word once could only recall the meaning of 4.57, 2.01, and
0.14 words out of 15 in the full captioning, keyword captioning, and no-captioning
conditions, respectively. This outcome was quite disappointing as the percentages
for meaning recall were only 30.4% (full captioning), 13.4% (keyword caption-
ing), and 0.9% (no captioning). Providing different kinds of information through
captioning possibly enhanced the students’ incidental learning of word forms but
overwhelmed their capacity to recall word meaning. As Barcroft (2002) asserted,
when learners with limited cognitive resources focus on one aspect (e.g. form), they
may apply less attention to other features (e.g. meaning). Hence, with higher auto-
maticity of some word knowledge aspects (e.g. form recognition), more resources
can be focused. By contrast, learners may be unable to process sufficient informa-
tion for less automatic word knowledge aspects (e.g. meaning recall). The develop-
ment of the ability to recall word meaning may not always be linear in the sense that
with every new exposure, knowledge of word meaning becomes progressively more
complete and precise. Consistent with Peters etal. (2016), vocabulary learning was
demonstrated as an incremental process through which words should be encountered
and retrieved repeatedly before they can be firmly entrenched in learners’ mental
lexicon. Thus, expectations of what and how much word meaning could be learned
F.Teng
1 3
through audio–visual input during a brief intervention for primary school ESL learn-
ers should be tempered given challenging processing demands.
The disappointing results of meaning recall provide insights into reframing
Schmidt’s (1990, 2001) Noticing Hypothesis. Although captioned videos offered
comprehensible input (Krashen 1985), tracking genuine intake from captioned input
to recalling word meaning in a decontextualized scenario still presented a marked
challenge for primary school ESL learners. Simply having access to comprehensible
input does not mean that extraction and recall of word meaning will occur seam-
lessly (Qin and Teng 2017; Teng 2018b). The actions of noticing and attending are
fleeting, situated at the bottom of the affective pyramid in cognitive and affective
taxonomy, lower than “responding” and “valuing” and very far from internalizing
(Vanderplank 2016, p. 243). Learners need time to develop their own conscious and
critical faculties to draw word meaning from captioned videos and build it into their
own competencies. Attention or noticing is not a single mechanism but includes
various sub-mechanisms or subsystems, including alertness, orientation, detection
within selective attention, facilitation, and inhibition (Schmidt 2001). Bearing this in
mind, while learners could learn vocabulary to some extent, vocabulary acquisition
while watching captioned videos requires subsequent reinforcement.
Concluding remarks
Overall, this article proposes that ESL students’ vocabulary learning can be
improved by using captioned videos. This finding is applicable to other ESL con-
texts, regardless of class size, given that beginning learners benefit from repeated
exposure to unknown vocabulary during literacy-learning task execution. Although
a common goal in literacy learning, English curriculum, and TESOL (teaching
English to speakers of other languages) work is to improve English vocabulary out-
comes, ESL contexts may also represent the challenging issue of large numbers of
students possessing weak English proficiency and overexposure to print literacy
activities. The findings of the present study indicate that primary school ESL learn-
ers can indeed learn new words by watching a video in English with full captions. In
addition, the frequency of word occurrence may also augment incidental vocabulary
learning. When words were encountered three times in the full captioning condition,
learners demonstrated better learning outcomes than in the other five conditions.
Nevertheless, the number of words learned in the present study was modest, particu-
larly regarding word meaning recall. Progressing from no knowledge of an item to
form recognition, meaning recognition, and meaning recall appeared to remain chal-
lenging for primary school ESL learners.
The results of the present study provide a series of pedagogical implications for
teaching and learning ESL in a global context. First, findings highlight audio–visual
input (namely captioned videos) as a rich language resource for ESL teaching and
learning. This discovery is useful considering that ESL teaching practices are tradi-
tionally dependent on written input. In today’s society, the development of Internet
technology and emerging importance of DVDs, YouTube, ViewPoint, and mobile
phone apps have made videos readily accessible, thus offering opportunities for ESL
1 3
Incidental vocabulary learning forprimary school students:…
students to strengthen knowledge of previously learned words and to stimulate the
noticing of new words. The true value of audio–visual input corresponds to Webb’s
(2015) plea for an extensive viewing program, i.e. routine, silent, uninterrupted
viewing of L2 television, in out-of-class contexts. Within and beyond classroom
practices, ESL teachers can use captioned videos as a cognitive counterweight to
the affective pull of well-constructed programmes designed for entertainment, easy
viewing, and vocabulary literacy development. Future studies may consider using
captioned videos to examine the literacy development of students from other back-
grounds and age groups. Second, as repeated encounters with target words can have
a significant effect on the incidental learning of new lexical items, teachers should
pay special attention to the number of times new words occur in a video so they will
be encountered and retrieved repeatedly to promote assimilation into learners’ men-
tal lexicon, similar to the proposal that vocabulary learning is incremental (Schmitt
2010). If possible, difficult words should be highlighted several times when using
captioned videos for ESL learners. The increasing availability of online software,
such as Adobe Premier, MAGpie, and iMovie, facilitates captioning for language
teaching. Because combining L2 captioning videos and word exposure frequency
can lead to better vocabulary learning, teachers should incorporate such exercises
into the design of courses and learning materials. Future studies can use the soft-
ware to edit words in captioned videos and ensure the words appear more than three
times; this repetition would better measure the effects of word exposure frequency
while watching captioned videos. Finally, the decline in recalling word meaning
should be considered. Although word recognition is relatively sensitive to increased
word encounters while watching L2 captioned videos, word recall remains a difficult
dimension to learn. When designing conditions for vocabulary learning in future
studies, ESL teachers need to consider learners’ varying trajectories in learning the
different components of a word.
This study is not without limitations. First, captions in the present study were
presented in the target language only. Subtitles displayed as L1 or L2 captions might
yield different results (Peters etal. 2016). Future studies should include multilin-
gual captions to promote incidental vocabulary learning. Second, participants in
the present study were not informed that a test would be administered. The advance
announcement of a test might encourage learners to work harder to process informa-
tion. The potential effects of test announcement should be explored in future studies
that compare incidental and intentional learning in different captioning conditions.
Third, previous studies measuring the effects of captions on L2 learning used differ-
ent types of videos, such as television reports (Montero Perez etal. 2013), television
programs (Rodgers and Webb 2017), and documentary videos (Markham etal 2001),
whereas the present study solely used storytelling videos for young learners. Further
investigations are essential to determining the most effective types of audio–visual
input for promoting incidental vocabulary learning. Fourth, performance differences
between boys and girls and learners with larger or smaller vocabulary size, were
not analysed; only learners’ vocabulary proficiency level was considered as a crite-
rion for participant selection. Participants’ English proficiency levels might need to
be tested as well. Fifth, many studies investigating the frequency of word exposure
on incidental vocabulary learning have shown that L2 learners need at least 6–20
F.Teng
1 3
exposures before learning word knowledge. However, only videos with three or less
target word occurrences were included in this work, presenting an opportunity for
future studies to investigate captioned videos involving more than three target word
encounters. Finally, future studies may wish to employ eye-tracking technology to
determine students’ attention allocation and facilitate objective analysis of caption
effectiveness.
Despite these limitations, the present study is the first to investigate different
combinations of captioning types and word exposure frequency in ESL learning. A
brief intervention combining three occurrences of target words encounters with a
full-captioning condition resulted in positive incidental vocabulary learning among
primary school ESL students. Therefore, this study should inspire further research
on incidental vocabulary learning through audio–visual input.
References
Baddeley, A. D. (1986). Working memory. New York: Oxford University Press.
Baddeley, A. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive
Sciences, 4(11), 417–423.
Barcroft, J. (2002). Semantic and structural elaboration in L2 lexical acquisition. Language Learning,
52(2), 323–363.
Chan, E., & Unsworth, L. (2011). Image–language interaction in online reading environments: Chal-
lenges for students’ reading comprehension. Australian Educational Researcher, 38(2), 181–202.
Chen, X. N., & Teng, F. (2017). Assessing the effects of word exposure frequency on incidental vocabu-
lary acquisition from reading and listening. Chinese Journal of Applied Linguistics, 40, 35–52.
Cobb, T. (n.d.). Web Vocabprofile. Retrieved January 2018 from http://www.lextu tor.ca/vp/, an adaptation
of Heatley, Nation & Coxhead’s (2002) Range.
Cohen, J. (1988). Statistical power analysis for the behavioural sciences (2nd ed.). New York: Routledge.
Danan, M. (2004). Captioning and subtitling: Undervalued language learning strategies. Meta: Transla-
tors’ Journal, 49, 67–77.
Fletcher, J. D., & Tobias, S. (2005). The multimedia principle. In R. Mayer (Ed.), The Cambridge hand-
book of multimedia learning (pp. 117–133). Cambridge: Cambridge University Press.
Guillory, H. G. (1998). The effects of keyword captions to authentic French video on learner comprehen-
sion. CALICO Journal, 15, 89–108.
Hulstijn, J. H. (2001). Intentional and incidental second language vocabulary learning: A reappraisal
of elaboration, rehearsal and automaticity. In P. Robinson (Ed.), Cognition and second language
instruction (pp. 258–286). Cambridge: Cambridge University Press.
Hulstijn, J. H. (2013). Incidental learning in second language acquisition. In C. A. Chapelle (Ed.), The
encyclopaedia of applied linguistics (pp. 2632–2640). Chichester: Wiley-Blackwell.
Koolstra, C. M., & Beentjes, J. W. (1999). Children’s vocabulary acquisition in a foreign language
through watching subtitled television programs at home. Educational Technology Research and
Development, 47, 51–60.
Krashen, S. D. (1985). The Input Hypothesis: Issues and implications. New York: Longman.
Markham, P. L., Peter, L. A., & McCarthy, T. J. (2001). The effects of native language vs. target language
captions on foreign language students’ DVD video comprehension. Foreign Language Annals,
34(5), 439–445.
Montero Perez, M., Peters, E., Clarebout, G., & Desmet, P. (2014). Effects of captioning on video com-
prehension and incidental vocabulary learning. Language Learning & Technology, 18, 118–141.
Montero Perez, M. M., Peters, E., & Desmet, P. (2013). Is less more? Effectiveness and perceived useful-
ness of keyword and full captioned video for L2 listening comprehension. ReCALL, 26, 21–43.
Montero Perez, M. M., Peters, E., & Desmet, P. (2018). Vocabulary learning through viewing video: The
effect of two enhancement techniques. Computer Assisted Language Learning, 31, 1–26.
1 3
Incidental vocabulary learning forprimary school students:…
Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University
Press.
Nation, I. S. P., & Gu, P. Y. (2007). Focus on vocabulary. Sydney: National Centre for English Language
Teaching and Research.
Neuman, S. B., & Koskinen, P. (1992). Captioned television as comprehensible input: Effects of inci-
dental word learning from context for language minority students. Reading Research Quarterly, 27,
95–106.
Paivio, A. (1986). Mental representation: A dual-coding approach. New York: Oxford University Press.
Pellicer-Sánchez, A. (2016). Incidental L2 vocabulary acquisition from and while reading. Studies in Sec-
ond Language Acquisition, 38, 97–130.
Peters, E., Heynen, E., & Puimège, E. (2016). Learning vocabulary through audiovisual input: The dif-
ferential effect of L1 subtitles and captions. System, 63, 134–148.
Pulido, D. (2007). The relationship between text comprehension and second language incidental vocabu-
lary acquisition: A matter of topic familiarity? Language Learning, 57, 155–199.
Qin, C., & Teng, F. (2017). Assessing the correlation between task-induced involvement load, word
learning, and learners’ regulatory ability. Chinese Journal of Applied Linguistics, 40(3), 261–280.
Reynolds, B. L. (2015). The effects of word form variation and frequency on second language incidental
vocabulary acquisition through reading. Applied Linguistics Review, 6(4), 467–497.
Reynolds, B. L., & Teng, F. (2018). Vocabulary bridge-building: A book review of Norbert Schmitt
(2010), I. S. Paul Nation & Stuart Webb (2011), and Paul Meara & Imma Miralpeix (2016). Applied
Linguistics. https ://doi.org/10.1093/appli n/amy02 1.
Reynolds, B. L., & Wible, D. (2014). Frequency in incidental vocabulary acquisition research: An unde-
fined concept and some consequences. TESOL Quarterly, 48(4), 843–861.
Rodgers, M. P. H. (2013). English language learning through viewing television: An investigation of
comprehension, incidental vocabulary acquisition, lexical coverage, attitudes, and captions. Unpub-
lished Ph.D. thesis. Victoria University of Wellington.
Rodgers, M. P. H., & Webb, S. (2017). The effects of captions on EFL learners’ comprehension of Eng-
lish-language television programs. CALICO Journal, 34, 20–38.
Rott, S. (2007). The effect of frequency of input-enhancements on word learning and text comprehension.
Language Learning, 57, 165–199.
Schmidt, R. (1990). The role of consciousness in second language learning. Applied Linguistics, 11,
129–158.
Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp.
3–32). Cambridge: Cambridge University Press.
Schmitt, N. (2010). Researching vocabulary: A vocabulary research manual. Basingstoke: Palgrave
Macmillan.
Schmitt, N., Schmitt, D., & Clapham, C. (2001). Developing and exploring the behavior of two new ver-
sions of the vocabulary levels test. Language Testing, 18, 55–88.
Tang, E. (2007). An exploratory study of the English vocabulary size of Hong Kong primary and junior
secondary school students. The Journal of Asia TEFL, 4, 125–144.
Teng, F. (2016a). Incidental vocabulary acquisition from reading-only and reading-while-listening:
A multi-dimensional approach. Innovation in Language Learning and Teaching. https ://doi.
org/10.1080/17501 229.2016.12033 28.
Teng, F. (2016b). The effects of context and word exposure frequency on incidental vocabulary acquisi-
tion and retention through reading. The Language Learning Journal. https ://doi.org/10.1080/09571
736.2016.12442 17.
Teng, F. (2017). Flipping the classroom and tertiary level EFL students’ academic performance and satis-
faction. The Journal of Asia TEFL, 14(4), 605–620.
Teng, F. (2018a). A learner-based approach of applying online reading to improve learner autonomy and
lexical knowledge. Spanish Journal of Applied Linguistics, 31, 104–134.
Teng, F. (2018b). The effect of focus on form and focus on forms instruction on the acquisition of phrasal
verbs by Chinese students. Asian EFL Journal, 20(2), 136–164.
Tyler, M. D., Jones, C., Grebennikov, L., Leigh, G., Noble, W., & Burnham, D. (2009). Effect of caption
rate on the comprehension of educational television programmes by deaf school students. Deafness
& Education International, 11(3), 152–162.
Van Zealand, H., & Schmitt, N. (2013). Incidental vocabulary acquisition through L2 listening: A dimen-
sions approach. System, 41, 609–624.
F.Teng
1 3
Vanderplank, R. (2016). ‘Effects of’ and ‘effects with’ captions: How exactly does watching a TV pro-
gramme with same-language subtitles make a difference to language learners? Language Teaching,
49(2), 235–250.
Vidal, K. (2011). A comparison of the effects of reading and listening on incidental vocabulary acquisi-
tion. Language Learning, 61, 219–258.
Webb, S. (2007). The effects of repetition on vocabulary knowledge. Applied Linguistics, 28, 46–65.
Webb, S. (2008). The effects of context on incidental vocabulary learning. Reading in a Foreign Lan-
guage, 20(2), 232–245.
Webb, S. (2015). Extensive viewing: Language learning through watching television. In D. Nunan & J.
C. Richards (Eds.), Language learning beyond the classroom (pp. 159–168). New York/London:
Routledge.
Webb, S., & Chang, A. C.-S. (2015). Second language vocabulary learning through extensive reading:
How does frequency and distribution of occurrence affect learning? Language Teaching Research,
19(6), 667–686.
Webb, S., & Nation, I. S. P. (2017). How vocabulary is learned. Oxford: Oxford University Press.
Winke, P., Gass, S. M., & Sydorenko, T. (2010). The effects of captioning videos used for foreign lan-
guage listening activities. Language Learning & Technology, 14, 66–87.
Xu, S., Fang, X. W., Brzezinski, J., & Chan, S. (2008). Development of a dual-modal presentation of texts
for small screens. International Journal of Human-Computer Interaction, 24(8), 776–793.
Feng Teng is a language teacher educator with extensive teaching and research experience in China. He
is now studying for a Ph.D. degree in Hong Kong Baptist University. His professional interests include
metacognition and writing, EFL vocabulary development, and identity research. He has published widely
in international SSCI journals, including Thinking Skills and Creativity, Applied Linguistics, Spanish
Journal of Applied Linguistics, TESOL Quarterly, and Applied Linguistics Review. His recent book was
published by Springer. He is a guest editor of a special issue on L2 literacy development for the Asian
EFL Journal, as well as a special issue on metacognition and writing for the Journal of Writing Research.