About
28
Publications
30,538
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
525
Citations
Introduction
Additional affiliations
April 2019 - present
September 2006 - March 2019
Publications
Publications (28)
Vocabulary’s relationship to reading proficiency is frequently cited as a justification for the assessment of L2 written receptive vocabulary knowledge. However, to date, there has been relatively little research regarding which modalities of vocabulary knowledge have the strongest correlations to reading proficiency, and observed differences have...
In response to our State-of-the-Scholarship critical commentary (Stoeckel et al., 2021), Stuart Webb (2021) asserts that there is no research supporting our suggestions for improving tests of written receptive vocabulary knowledge by (a) using meaning-recall items, (b) making fewer presumptions about learner knowledge of word families, and (c) usin...
Hashimoto (2021) reported a correlation of −.50 (r 2 = .25) between word frequency rank and difficulty, concluding the construct of modern vocabulary size tests is questionable. In this response we show that the relationship between frequency and difficulty is clear albeit non-linear and demonstrate that if a wider range of frequencies is tested an...
In this short response to Webb's commentary, we argue that the Word Family (WF6) unit is likely inappropriate as a one-size-fits-all standard, and that rather, different units are likely to be appropriate for given populations and usages. We also address a number of points Webb raises regarding potential problems with using smaller lexical units th...
Validated under a Rasch framework (Beglar, 2010), The Vocabulary Size Test (VST) (Nation & Beglar, 2007) is an increasingly popular measure of decontextualized written receptive vocabulary size in the field of second language acquisition. However, although the validation indicates that the test has high internal reliability, still unaddressed is th...
Recent literature in the field of L2 vocabulary assessment has advocated for the development of written receptive vocabulary tests such as Vocabulary Levels Tests (VLTs) that use: (a) meaning-recall item formats, (b) a minimum of 40 item counts per 1,000-frequency band to improve level estimates, and (c) lemmas (not word-families) as the lexical un...
This study investigated the effectiveness of word-frequency and teacher judgments in determining students' vocabulary knowledge and compared the predictive powers of both approaches when estimating vocabulary knowledge. Twenty-nine second language (L2) Spanish teachers were asked to predict how likely their students would know words from a 216-word...
*** OPEN ACCESS ***
https://journals.sagepub.com/doi/epub/10.1177/02655322231162853
The purpose of this paper is to (a) establish whether meaning recall and meaning recognition item formats test psychometrically distinct constructs of vocabulary knowledge which measure separate skills, and, if so, (b) determine whether each construct possesses uni...
For the last few decades, word-frequency has been widely used to identify which words L2 learners are more or less likely to know (Hashimoto, 2021). However, research indicates that teachers often prefer to rely on their own intuition rather than using corpus-based vocabulary lists for making decisions about the words they want to teach in the clas...
While psychologists often use a combination of physiological and self-reported data to examine the dynamic effects of stress on performance, the impact of affective states on Foreign Language (FL) speaking performance has almost exclusively been assessed using self-report methodology (e.g., questionnaires, interviews). In fact, studies that correla...
In recent years, there has been increasing debate and research regarding which modality of vocabulary knowledge has the strongest correlation to reading, with particular focus on distinctions between testing L2 form and L2 meaning, and between recall of answers from memory and recognition of answers from fixed options. However, relatively little at...
Wearable technology could signify a breakthrough in the measurement of Language Learning (LL) anxiety because it allows researchers to objectively track student responses in "real time". Via a grant from Kyushu Sangyo University's Computer Network Center, IT specialist Nick May was commissioned to code an original research web application designed...
The last three decades has seen an increase of tests aimed at measuring an individual's vocabulary level or size. The target words used in these tests are typically sampled from word frequency lists, which are in turn based on language corpora. Conventionally, test developers sample items from frequency bands of 1,000 words; different tests employ...
The choice of lexical unit is a significant issue in L2 vocabulary research and pedagogy. This brief review examines two important questions bearing on this issue: (i) How encompassing a lexical unit can learners deal with receptively? and (ii) How much difference does the choice of lexical unit make in practice? Regarding the former, empirical evi...
The Fitbit Data Collection System (FDCS) is designed to facilitate measuring heart rate response related to Language Learning (LL) anxiety. Using Fitbit Wristbands and the Fitbit Cloud, the FDCS can track the heart rates for multiple test subjects on-demand or automatically. This data can be aggregated, synchronized and transferred directly to a st...
The Vocabulary Size Test (VST) was designed to measure the vocabulary needed for reading. Recent research, however, has questioned the “meaning-recognition” construct measured by the VST, arguing that “meaning-recall” is a more accurate estimate of reading vocabulary. The present study compared four variants of the VST to determine which, if any, c...
This chapter explores the methodological choices made in an illustrative complex and longitudinal study of classroom interest in a language task. They walk the reader through choices that must be made in a quantitative analysis step by step while also advocating for best practices in quantitative research, such as using technology as a partner in r...
Stewart (2014) questioned vocabulary size estimation methods proposed by Beglar and Nation for the VST, further arguing Rasch mean square (MSQ) fit statistics cannot determine the proportion of random guesses contained in the average learner’s raw score, as the average value will be near 1 by design. He illustrated this by demonstrating this is tru...
The Vocabulary Size Test (VST) was created to provide a reliable estimate of a second language learner’s written receptive vocabulary size, measuring from the most frequent fourteen 1000 word families of the spoken subsection of the British National Corpus (Nation & Beglar, 2007). While Beglar (2010) and Elgort (2013) recommend that users should li...
Perhaps the most qualitatively interpretable vocabulary test score is an estimate of the total number of words the learner knows in the tested domain, such as a frequency word list, or vocabulary taught as part of a course curriculum. In cases where it is not possible to test the entire domain word-for-word, vocabulary tests such as the vocabulary...
Yes/No tests offer an expedient method of testing learners' vocabulary knowledge, although a drawback of this method is that since the method is self-report, actual knowledge cannot be con-firmed. "Pseudowords" have been used within such lists to test if learners are reporting knowledge of words they cannot possibly know, but it is unclear how to u...
Unlike classical test theory (CTT), where estimates of reliability are assumed to apply to all mem-bers of a population, item response theory provides a theoretical framework under which reliabil-ity can vary by test score. However, different IRT models can result in very different interpreta-tions of reliability, as models that account for item qu...
Most researchers distinguish between receptive (passive) and productive (active) word knowledge. Most vocabulary tests employed in second language acquisition (SLA), such as the Vocabulary Levels Test (VLT) and Vocabulary Size Test (VST), test receptive knowledge. This is unfortunate, as the multiple-choice format employed on most receptive tests i...
Second language vocabulary acquisition has been modeled both as multidimensional in nature and as a continuum wherein the learner's knowledge of a word develops along a cline from recognition through production. In order to empirically examine and compare these models, the authors assess the degree to which the Vocabulary Knowledge Scale (VKS; Pari...
It has been frequently stated that Item Response Theory produces interval-scale measures where raw scores can only provide ordinal measures, and that therefore, researchers should choose IRT measures when selecting variables for common statistical tests, because raw scores may not meet their assumptions (Wright, 1992; Harwell & Gattie, 2001). In th...
Multiple choice tests such as the Vocabulary Levels Test (Nation, 1990) are often viewed as a preferable estimator of vocabulary knowledge when compared to yes/no checklists, as self-reports introduce the possibility of students over- or under- reporting how many words they know. However, multiple-choice tests have their own unique disadvantage: if...