Chapter

Utilizing Technology in Language Assessment

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This entry presents an overview of the past, present, and future of technology use in language assessment, also called computer-assisted language testing (CALT), with a focus on technology for delivering tests and processing test takers’ linguistic responses. The past developments include technical accomplishments that contributed to the development of computer-adaptive testing for efficiency, visions of innovation in language testing, and exploration of automated scoring of test takers’ writing. Major accomplishments include computer-adaptive testing as well as some more transformational influences for language testing: theoretical developments prompted by the need to reconsider the constructs assessed using technology, natural language-processing technologies used for evaluating learners’ spoken and written language, and the use of methods and findings from corpus linguistics. Current research investigates the comparability between computer-assisted language tests and those delivered through other means, expands the uses and usefulness of language tests through innovation, seeks high-tech solutions to security issues, and develops more powerful software for authoring language assessments. Authoring language tests with ever changing hardware and software is a central issue in this area. Other challenges include understanding the many potential technological influences on test performance and evaluating the innovations in language assessment that are made possible through the use of technology. The potentials and challenges of technology use in language testing create the need for future language testers with a strong background in technology, language testing, and other areas of applied linguistics.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... According to Chapelle and Voss (2008), Higgins et al. (2011), and Leacock and Chodorow (2003), it is also designed to analyze the test-taker's responses and provide feedback or scores based on various linguistic features, such as grammar, vocabulary, fluency, pronunciation, or discourse. Moreover, as noted by Roever and McNamara (2006), NLP can help improve the validity and fairness of language tests by ensuring that the content and difficulty are suitable for a specific purpose and a particular social group. ...
... Another trend that is gaining momentum is the use of AI for automated assessment. AI systems have the capability to swiftly evaluate test responses, providing accurate and consistent results (Chapelle & Voss, 2008;Shermis & Burstein, 2013). This eliminates the possibility of human error and bias that can sometimes creep into manual assessments. ...
... To address these concerns, scholars in second language assessment have proposed the complementary use of a uniform largescale computer-based assessment program (Alderson et al., 2015;Davison & Michell, 2014) to support students and teachers. There have been numerous computer-based assessment tools for teachers' use with reported success in using computer-adaptive tests (CAT) for formative purposes (Brown & Hattie, 2012;Hattie et al., 2003), however there are issues associated to their adoption including access (Cleary & Zimmerman, 2012;Davison, 2013), ICT literacy requirements (Brown & Abeywickrama, 2018;Chapelle & Voss, 2017), and the interpretation and use of results to improve student learning (Bachman, 2015;Bonner, 2009;Leung et al., 2018;Kane, 2013). This study investigates teachers' perception regarding the use of assessment data among classroom teachers. ...
... International Journal of Instruction, April 2023 • Vol.16, No.2 The utility of CAT has been demonstrated in many projects to support specific group of teachers including the online diagnostic language assessment -DIALANG (Alderson & Huhta, 2005); Diagnostic English language needs assessment -DELNA and Canadian Academic English Language Assessment for diagnostic purposes -CAEL (Doe, 2014(Doe, , 2015; Assessment tools for Teaching and Learning -asTTle (Brown, 2014;Brown, O'Leary, & Hattie, 2018;Hattie, Brown, & Keegan, 2003;Hattie & Brown, 2008); and Tools to Enhance Assessment Literacy for Teachers of English as an Additional Language -TEAL (Davison, 2018). The use of a CAT can ensure more teacher consistency in identifying student strengths and weaknesses (Alderson et al., 2015) and streamline the assessment process for large-scale assessment (Chapelle & Voss, 2017). It provides teachers with valuable data for formative assessment and helps in building teacher assessment literacy (Davison, 2018(Davison, , 2019Mizumoto, Sasao & Webb, 2019), which consequently enhances classroom-based assessment system (Alderson et al., 2015, p. 238). ...
Article
Full-text available
The use of assessment for formative purposes has become a major component of assessment reforms in many educational systems due to its potential to provide important data for teacher-decision making to improve learning. However, there is yet a study with a robust objective measurement model to set up a continuum of teacher perceptions of the uses of a computer adaptive test (CAT) for enhancing formative practices. This study explores teachers’ perceptions of the potential use of an externally developed CAT, an assessment aimed to support the learning and teaching of English as an additional language (EAL). A Teacher Perception of the Use of CAT Scale (TPUCAT) was developed using both theoretical and empirical approaches to determine the indicators of the construct. A questionnaire, with six�point Likert type scale and 36 items were administered to EAL teachers in one state educational system in Australia. Using the Rasch item analysis, four statistically different possibilities of use for the CAT emerged from the data. These groupings of teachers were used to develop a typology of teachers’ perceptions of potential CAT use to support individual students in their learning. We establish that teachers’ perception about the use of CAT is varied, and hence present a professional development opportunity. Our study is the first to establish this typology of teacher perception, which is a critical contribution to the theorisation of assessment. This typology from basic to expert provides a better description of potential teacher uses of a CAT for formative purposes and allows for targeted professional development for teachers to ensure that CAT is optimised to support teacher practices and student learning. Keywords: assessment for formative purposes, computer-adaptive test, typologies of practices, teachers, assessment
... Specifically, the same scheme could be applied to observable measures derived from constructed responses; computational indicators (textual properties) are related to constructs, and such relations have an underlying measurement model. Chapelle and Voss (2017) remarked that the technological advances in language testing and other natural language-processing evaluations need to show their comparability with other classic psychoeducational tests to improve the current approaches to language assessment (note a similar rationale behind how the term "validity" changed for language assessment in Chapelle, 1999). While there is an important relation between technological advances and language assessment (e.g., Chapelle & Voss, 2016, 2021, it needs to continuously improve the design of computer-assisted language tests and the ways to demonstrate their validity. ...
... promising proposal is to make the technological advances in language testing and other natural language-processing tasks comparable to classic psychoeducational tests (see, for example, different rationales behind how the term "validity" have changed for language assessment scheme: Chapelle, 1999;Chapelle & Voss, 2017). In this line, the automatic assessment of constructed responses can be useful to infer different constructs from a big set of indicators (e.g., Foltz et al., 2013). ...
Article
Full-text available
In this paper, we highlight the importance of distilling the computational assessments of constructed responses to validate the indicators/proxies of constructs/trins using an empirical illustration in automated summary evaluation. We present the validation of the Inbuilt Rubric (IR) method that maps rubrics into vector spaces for concepts’ assessment. Specifically, we improved and validated its scores’ performance using latent variables, a common approach in psychometrics. We also validated a new hierarchical vector space, namely a bifactor IR. 205 Spanish undergraduate students produced 615 summaries of three different texts that were evaluated by human raters and different versions of the IR method using latent semantic analysis (LSA). The computational scores were validated using multiple linear regressions and different latent variable models like CFAs or SEMs. Convergent and discriminant validity was found for the IR scores using human rater scores as validity criteria. While this study was conducted in the Spanish language, the proposed scheme is language-independent and applicable to any language. We highlight four main conclusions: (1) Accurate performance can be observed in topic-detection tasks without hundreds/thousands of pre-scored samples required in supervised models. (2) Convergent/discriminant validity can be improved using measurement models for computational scores as they adjust for measurement errors. (3) Nouns embedded in fragments of instructional text can be an affordable alternative to use the IR method. (4) Hierarchical models, like the bifactor IR, can increase the validity of computational assessments evaluating general and specific knowledge in vector space models. R code is provided to apply the classic and bifactor IR method.
... Over the past decades, the field has gradually grown in scope and sophistication as researchers have adopted various interdisciplinary approaches to problematize and address old and new issues in language assessment as well as learning. The assessment and validation of reading, listening, speaking, and writing, as well as language elements such as vocabulary and grammar have formed the basis of extensive studies (e.g., Chapelle, 2008). Emergent research areas in the field include the assessment of sign languages (Kotowicz et al., 2021). ...
... Fan and Yan conducted a narrative review of papers published in two journals in language assessment-Language Assessment Quarterly and Language Testing. A total of 104 papers on speaking assessment were classified under the six types of inferences in an argument-based validation framework (Chapelle, 2008). Nearly half of the papers (40.38-48.08%) ...
... Formative assessment practices focus on enhancing learning and prompt students to take more responsibility for their own work (Black & William, 2009;Stiggins, 2008) through the development of 'intrinsic motivation', improving 'self-esteem', fostering 'independent learning methods', as well as developing 'the ability to improve cognitive strategies in solving problems' (Wei, 2011). Chapelle (2008) suggests that technologies can have three purposes in assessment. Educators, Chapelle writes, may want to create instruments and tasks that can be administered more efficiently than 'paper and pencil' formats. ...
... If we follow the logic set forth by Chapelle (2008), then the use of technologies in EFL programs implies that learning designs must align with insitutional policies (Middaugh, 2010), departmental cultures (Boud, 2007) and classroom practices (Hill & McNamara, 2012). Accordingly, designs may help meet the students' expectations that assessment tasks are authentic, unambigous and allow for choice and flexibility throughout a university course (James, McInnes, & Devlin, 2002). ...
Conference Paper
Full-text available
Despite a rise of blended learning approaches in foreign language education programs, little research has examined how such integration of technologies in the classroom affects assessment designs. Any 'electric dreams' that technologies will improve learning remains unproven without clear assessment designs. In this paper, we undertake a qualitative study of formative blended assessments within an English language program at a major Saudi university. Data was gathered through observations, semi-structured interviews and Participatory Design (PD) sessions. Thematic analysis of the data resulted in four emergent themes: definitions, approaches, alignment and requirements. After setting out and discussing the four themes, we conclude our paper with suggestions for further research.
... ZPD manifests as a result of learners' responsiveness to mediation (Poehner, 2008), ultimately fostering independent functioning. The surge of innovative trends in language education has spurred test researchers and developers to integrate inventive approaches and methods into language testing (Chapelle & Voss, 2008). DA has emerged as a novel research frontier in various technologymediated studies (Authors, 2024;Rassaei, 2023; among many others). ...
Article
Full-text available
This study examined the effect of mobile-mediated dynamic assessment on reading comprehension and reading fluency skills of Iranian L2 learners. With 50 Iranian L2 learners, randomly assigned to the experimental and control groups, a pretest–posttest control group design was used. Whereas the control group received traditional instructor translation of reading materials, the experimental group received mobile-mediated dynamic assessment including interactive reading tasks mediated by the teacher via a mobile application. Pre- and post-assessment of L2 reading comprehension and reading fluency was achieved by means of researcher-made tests for data collecting. Conducting independent samples t-tests, the results of the study showed that, in both reading comprehension and reading fluency, no difference existed between the two groups on the pretest. However, on the posttest, the experimental group outperformed the control group on both measures of reading comprehension and reading fluency demonstrating the effectiveness of mobile-mediated dynamic assessment in improving L2 reading skills. The design and execution of the study benefited from theoretical foundations derived from mobile-mediated language learning and dynamic assessment. The results of the study add to the body of knowledge by showing the possibility of mobile-mediated dynamic assessment as a creative approach to raise reading competency of L2 learners. The implications for language teachers, policy makers, syllabus design, and materials development are discussed and recommendations for future studies to investigate the efficacy of mobile-mediated dynamic assessment in several instructional environments are offered.
... Menurut studi terbaru oleh Li et al. (2023), "Penggunaan AI dalam TELL dapat meningkatkan personalisasi pembelajaran dan memberikan umpan balik yang lebih akurat dan tepat waktu kepada peserta didik. Penelitian oleh Chapelle dan Voss membahas implikasi TELL dalam penilaian Bahasa ditemukan bahwa penilaian bahasa berbasis komputer yang memanfaatkan teknologi TELL memungkinkan penilaian yang lebih otentik dan komprehensif terhadap kemampuan bahasa peserta didik, terutama dalam aspek produksi lisan dan tulisan (Chapelle & Voss, 2016). ...
Article
Full-text available
Pelatihan peningkatan kemampuan evaluasi pembelajaran melalui pendekatan Technology Enhanced Language Learning (TELL) dilaksanakan di Madrasah Aliyah Muhammadiyah Balassuka. Pelatihan ini bertujuan untuk meningkatkan kemampuan guru dalam melakukan evaluasi menggunakan berbagai aplikasi sehingga proses evaluasi pembelajaran yang dilakukan oleh guru semakin objektif dan menyenangkan. Pengabdian dilakukan melalui metode pelatihan, peserta menerima pemahaman terkait pendekatan TELL kemudian melakukan praktik pengelolaan materi evaluasi sesuai mata pelajaran masing-masing. Hasil dari pengabdian ini menunjukkan bahwa penggunaan pendekatan Technology Enhanced Language Learning (TELL) dalam evaluasi pembelajaran merupakan hal yang baru didapatkan oleh guru-guru sehingga memberikan motivasi dalam melakukan pengelolaan pembelajaran khususnya evaluasi pembelajaran.
... According to a study by (Cumming et al., 2005), the TOEFL iBT Reading section evaluates not only reading comprehension but also the ability to integrate information from multiple sources, which is crucial for academic success in higher education. A study by (Chapelle & Voss, 2008) emphasizes that TOEFL test scores are significantly correlated with students' performance in university-level writing courses, highlighting the test's role in predicting writing proficiency. In their research, (Weigle & Goodwin, 2016) found that TOEFL Listening scores are a strong predictor of students' ability to comprehend academic lectures and participate effectively in classroom discussions. ...
Article
Full-text available
Adequate English language competency is very necessary in today's modern era so as not to discuss difficulties in global communication and interaction. The average English language skills of students or generations aged 18-20 years in Indonesia show low results. To solve this problem, efforts are needed to increase English language competency through appropriate learning activities. The system development method for this application uses the waterfall method. The use of the forward channeling method in an expert system will help determine appropriate and effective ways of teaching and learning English to students. The aim of this research is to apply an expert system to obtain recommendations for effective English learning based on TOEFL ITP scores. Three aspects of TOEFL ITP, namely listening, reading and structure, will be used in the analysis process using forward chaining. The forward chaining method carries out processing starting from the data set and then performs inference according to the rules used until the optimal inference is found. The inference engine will continue to go through the process to arrive at the right decision result. The expert system application is designed to be website-based so that it makes it easier for users to use it at any time
... Another notable example would be the use of bubble-card readers in the early 1970s, which exerted a powerful impact on scoring processes and contributed to the popularity of multiple-choice item types. Furthermore, the integration of technology into language testing and assessment practices gained increased momentum in the 1980s and the 1990s, specifically fuelled by developments such as the availability of statistical programs, databases and language processing and recognition tools associated with Computer Assisted Language Learning (CALL) movement (Burstein et al., 1996;Chapelle & Voss, 2017). For instance, the LTRC (Language Testing Research Colloquium) conference in 1985 ignited the initial attempts associated with computer-assisted language testing (CALT) (Chalhoub-Deville, 2001) while many papers delivered at the 1985 LTRC dealt with computer-assisted language testing oriented topics, such as item-bank construction and computer adaptive testing (CAT). ...
Chapter
Full-text available
Natural language processing is a subfield of artificial intelligence investigating how computers can be utilised to understand and process natural language text or speech to accomplish useful things in various areas, and it draws on various disciplines, such as computer science, linguistics, and robotics. Natural language processing applications, including automated speech recognition and scoring, have several exciting prospects for language testing and assessment practices. These prospects include addressing practical constraints associated with test administration and scoring, securing standardisation in test delivery, ensuring objectiveness and reliability in scoring procedures, and providing personalised feedback for learning. This chapter focuses on automated speech scoring and its applications in language testing and assessment and discusses how these systems can be employed in assessment contexts. The chapter also discusses the potential benefits and drawbacks of automated speech scoring while focusing on construct-related and practical challenges surrounding such systems.
... Discourse analytic measures used in Knoch et al. (2014) unclear how production measures, such as the number of clauses per T-unit or the average word length, can be interpreted within a model of L2 proficiency or how these measures can be used to provide actionable feedback for learners. That said, testing experts have effectively used production features to examine validity questions in the area of automated scoring and feedback systems (Chapelle, 2008). Several studies have, for example, correlated measures of production features with the extended, spontaneous production of written or spoken texts (Burnstein, van Moere, & Cheng, 2010), or have examined the use of e-rater production measures as a complement to human scoring of learners' writing essays (Enright & Quinlan, 2010). ...
Chapter
The sociocultural, economic, and geopolitical forces in education, the workplace, and in our daily lives have significantly increased the linguistic competencies needed to function successfully in today's world. In this context, applied linguists need to assess language users' ability to use a range of linguistic resources for expressing propositions for a variety of purposes in written, spoken, and visual forms. Linguistic resources include the knowledge of grammar and its use for creating the meanings required for humans to interact and cooperate with others, or perform other real‐world tasks. This entry explains how the representations of linguistic resources of communication have been conceptualized by language testers first as knowledge of grammatical forms, and more recently as resources for communicating contextualized meanings. Assessment of linguistic resources for meaning making requires a meaning‐oriented framework as a basis for test development and validation. This meaning‐oriented model of L2 proficiency provides a framework not only for the assessment of forms and meanings, but also for form‐meaning mappings used to convey propositions, which function in context to convey a range of meanings. This entry also discusses how these resources have been measured and researched in applied linguistics and suggests the need for additional research.
... I tak, w zadaniu rozszerzonej odpowiedzi (RO) uczeń powinien zaprezentować rozwiniętą, wieloelementową 7 W literaturze tematu rozróżnia się computer-based testing (CBT), web-based testing (WBT) (Malec, 2020, s. 102) oraz computer-assisted language testing (CALT) (cf. Chapelle & Voss, 2017), jednak na użytek niniejszego tekstu będziemy posługiwać się terminami testowanie komputerowe lub testowanie z użyciem komputera w szerokim znaczeniu, również w odniesieniu do testowania w trybie zdalnym z użyciem Internetu. i odpowiednio uporządkowaną odpowiedź w postaci słownej, z użyciem schematów, wzorów itp. ...
Book
Full-text available
[Web-based Education – Theories and Applications. Companion for Language Teachers and Other Educators] Książka kierowana jest do praktyków szukających podpowiedzi i inspiracji w zakresie metod i form nauczania zdalnego, studentów kierunków filologiczno-pedagogicznych, chcących zrozumieć, jakie wyzwania przed nimi stoją, i do badaczy zainteresowanych problematyką językowego kształcenia zdalnego z perspektywy nauczyciela. Przedstawione na jej łamach rozważania mają także na celu pomóc nauczycielom w opracowaniu zestawu dobrych praktyk służących umiejętnemu zastosowaniu wybranych metod i form charakterystycznych dla kształcenia zdalnego. Publikacja składa się z pięciu rozdziałów, z których cztery mają charakter teoretyczny. Autorki poruszają w nich zagadnienia takie jak: rozwój koncepcji edukacji zdalnej w perspektywie wybranych nurtów, przegląd terminologii związanej z e-kształceniem oraz formy organizacji procesu kształcenia zdalnego (rozdział I), modele wykorzystywania nowych technologii i multimediów w nauczaniu zdalnym, kompetencje, zadania i rola nauczyciela wykorzystującego w pracy narzędzia cyfrowe (rozdział II), problematyka oceniania i kontroli w nauczaniu na odległość (rozdział III), a także wybrane zdalne metody i techniki wykorzystywane w nauczaniu JO, które mogą zostać zastosowane także w przypadku innych przedmiotów (rozdział IV). Ostatni rozdział (V) przedstawia natomiast wyniki badania pt. Kształcenie zdalne oczami nauczycieli języków obcych.
... This allows for the assessment of dynamic language use as a result of interactions occurred amongst speakers (Lopez, Turkan, & Guzman-Orth, 2017). 6) Multimodel assessment: This classification of assessment is proposed since texts in different languages are conceptualized in multifaceted modes, presented with several meanings, delivered on-screen, live or on paper, presented with various sensory modes, and presented through various channels and media (Chapelle & Voss, 2017). ...
Article
Full-text available
An amendment to this paper has been published and can be accessed via the original article.
... Innovative language education trends propelled test researchers and developers to introduce innovative approaches and methods into language testing (Chapelle and Voss 2008). DA has recently emerged as an innovative research area in several technologymediated studies (Andujar 2020;Ebadi and Rahimi 2019;Rezaee et al. 2019). ...
Article
Full-text available
This study utilizing a sequential explanatory mixed-methods design explored the impact of mobile-based dynamic assessment (MDA) on EFL learners’ writing skills. Three intact classes (N = 30), including intermediate EFL learners attending a private English language school in Iran, participated in this study. They were evenly divided into two experimental and one control groups. The DIALNG online diagnostic test was used to assess the participants’ written proficiency and also as an instrument to collect the pre-and post-test scores. During the treatment sessions, the students were required to complete writing tasks over their Google Docs mobile app shared with the instructor. She provided text- and voice-based mediations to the experimental groups following an interactionist DA using both WhatsApp and Google Docs. Follow-up interviews were conducted to assess the experimental groups’ perceptions towards each mediation type. T-test and ANOVA along with the thematic analysis were used to analyze the quantitative and qualitative data, respectively. The findings showed that only the text group’s post-test scores significantly improved and there was a significant difference among the three groups in their post-test scores which indicated outperformance of the voice group. The overall results showed that MDA enhances EFL learners’ written proficiency as a result of the collaborations between the learners and the instructor using text and voice-based mediation. The thematic analysis of data revealed the participants’ satisfaction with both mediation types in terms of being efficient, convenient, and causing less social pressure.
... The changes brought about by the Internet in the domain of education are likewise mind-boggling; teachers and students can capitalize on the potential of the Internet-based media not only to communicate but also to manage their teaching and learning innovatively (Bakia, 2010; In the realm of second language (L2) teaching and learning, particularly in the field of English as a second language (ESL) or English as a foreign language (EFL), Internet-based technologies are helping L2 to improve the quality of instruction by creating learnercentered environments, providing learning opportunities outside the classroom, and promoting independent learning, among other things (Chapelle & Sauro, 2017;Godwin-Jones, 2011;Levy, 2009;. Such technological innovations are also shaping new paths in the way we assess learners' language proficiency (Burstein, Frase, Ginther, & Grant, 1996;Chapelle, 2003;Chapelle & Voss, 2017) and how we research issues related to language teaching and learning (Smith, 2017). ...
Article
Full-text available
In an increasingly digital world, online educational resources, apps, and other technologies can serve as incredibly effective tools to facilitate both teaching and learning. One such online tool is the Google Dictionary. This dictionary, an online service of Google, is probably one of the simplest dictionaries for English learners. The definitions usually use simple words and therefore are easy to understand. In addition to the definitions, examples, pictures, and usage notes, there is a separate pronunciation entry with interesting characteristics. This newly added entry provides users with the pronunciation of a word in two different accents, visemes, slow playback, and an option that lets Google collect feedback about the accuracy and helpfulness of the pronunciation recordings from users. This review paper offers a descriptive account of the entry, along with critical evaluation including its strong points and limitations. The review concludes with some suggestions to improve the educational quality of the pronunciation entry..
... This allows for the assessment of dynamic language use as a result of interactions occurred amongst speakers (Lopez, Turkan, & Guzman-Orth, 2017). 6) Multimodel assessment: This classification of assessment is proposed since texts in different languages are conceptualized in multifaceted modes, presented with several meanings, delivered on-screen, live or on paper, presented with various sensory modes, and presented through various channels and media (Chapelle & Voss, 2017). ...
Article
Full-text available
Recently, we have witnessed a growing interest in developing teachers' language assessment literacy. The ever increasing demand for and use of assessment products and data by a more varied group of stakeholders than ever before, such as newcomers with limited assessment knowledge in the field, and the knowledge assessors need to possess (Stiggins, Phi Delta Kappa 72:534-539, 1991) directs an ongoing discussion on assessment literacy. The 1990 Standards for Teacher Competence in Educational Assessment of Students (AFT, NCME, & NEA, Educational Measurement: Issues and Practice 9:30-32, 1990) made a considerable contribution to this field of study. Following these Standards, a substantial number of for and against studies have been published on the knowledge base and skills for assessment literacy, assessment goals, the stakeholders, formative assessment and accountability contexts, and measures examining teacher assessment literacy levels. This paper elaborates on the nature of the language assessment literacy, its conceptual framework, the related studies on assessment literacy, and various components of teacher assessment literacy and their interrelationships. The discussions, which focus on what language teachers and testers need to learn, unlearn, and relearn, should develop a deep understanding of the work of teachers, teacher trainers, professional developers, stakeholders, teacher educators, and educational policymakers. Further, the outcome of the present paper can provide more venues for further research.
... Research can be done to compare differences between computer-based tests and traditional paper-and-pencil tests and fairness of scoring (Yu & Zhang, 2017). However, focus should be put on security issues of computer-based and web-based tests (Chapelle & Voss, 2016), the expansion of passage scoring abilities, and integration of automatic assessment with formative assessment (Liu, 2013). ...
Article
Full-text available
Based on the articles written by mainland Chinese scholars published in the most influential Chinese and international journals, the present article analyzed the language testing research, compared the tendencies of seven categories between 2000-2009 and 2010-2019, and put forward future research directions by referring to international hot topics. Of all the seven categories of research topics, validity, performance test and China’s Standards of English Language Ability were three most popular themes, while classroom assessment, technology, rater/test taker differences and professionalization were much less popular. Except for research on performance test and technology, the other five aspects showed an increase in the second decade, with that of China’s Standards of English Language Ability rising the most dramatically. Referring to international research trends, the research predicted that validity, classroom assessment, China’s Standards of English Language Ability and professionalization, especially the ethics and social justice, might be the promising research topics for language testers.
... The use of AWE for formative purposes-i.e. to support the development of students' writing-seems to hold more promise, especially in the field of ELT. Research has shown that the use of AWE software for formative purposes can encourage students to review their work (Chapelle 2008), as well as increase students' motivation for writing (Warschauer and Grimes 2008). A mixed-methods study with English language learners found that the use of Criterion not only led to increased revisions of written work, but also that the accuracy of that work improved over drafts due to the corrective feedback provided by the AWE software (Li, Link, and Hegelheimer 2015). ...
Article
Full-text available
In this series, we explore technology-related themes and topics. The series aims to discuss and demystify what may be new areas for some readers and to consider their relevance for English language teachers. © The Author(s) 2018. Published by Oxford University Press; all rights reserved.
Chapter
Language proficiency assessment is the process by which an individual's ability to communicate effectively in a given language is delineated and measured. This chapter examines the different approaches and techniques used to assess language proficiency. It explores in depth the various tools available, such as standardized tests, oral and written assessments, as well as assessment methods based on emerging technologies. According to the chapter, it is essential to select the appropriate evaluation method according to the specific objectives and the context of use. We will also discuss the challenges associated with the assessment of language skills, particularly with respect to validity, reliability and fairness. Our chapter provides an overview of the various methods and best practices in the field of language skills assessment, while highlighting the paramount importance of implementing language skills assessment tools and methods that are appropriate to the objectives defined at the beginning of the training, effective and valid.
Conference Paper
Full-text available
BOA HELMETO CONFERENCE 2024
Chapter
In Language Assessment Across Modalities: Paired-Papers on Signed and Spoken Language Assessment, volume editors Tobias Haug, Wolfgang Mann, and Ute Knoch bring together—for the first time—researchers, clinicians, and practitioners from two different fields: signed language and spoken language. The volume examines theoretical and practical issues related to 12 topics ranging from test development and language assessment of bi-/multilingual learners to construct issues of second-language assessment (including the Common European Framework of Reference [CEFR]) and language assessment literacy in second-language assessment contexts. Each topic is addressed separately for spoken and signed language by experts from the relevant field. This is followed by a joint discussion in which the chapter authors highlight key issues in each field and their possible implications for the other field. What makes this volume unique is that it is the first of its kind to bring experts from signed and spoken language assessment to the same table. The dialogues that result from this collaboration not only help to establish a shared appreciation and understanding of challenges experienced in the new field of signed language assessment but also breathes new life into and provides a new perspective on some of the issues that have occupied the field of spoken language assessment for decades. It is hoped that this will open the door to new and exciting cross-disciplinary collaborations.
Conference Paper
Full-text available
The objective of this empirical paper is to assess the vocabulary sophistication of learners' writings and to highlight some of the common measures used for gauging the lexical richness of texts. It aims to compare the academic writings of PhD and Master students with the intention of identifying any substantial variations in terms of lexical sophistication between the two groups. The study solicits to answer the following research questions: "What are the mean scores of lexical sophistication reflected in the abstracts of EFL theses and dissertations?" and "How does lexical sophistication change in relation to the academic level of students?". The present research implements a retrospective design and involves a comparative study based on a corpus composed of 120 abstracts. The analysis of the corpus is conducted through the use of the web-based lexical complexity analyser (LCA) introduced by Ai and Lu (2010). Multiple independent samples t-tests were implemented to gauge the significance of the statistical differences in the mean scores corresponding to the lexical and verb sophistication reflected in students' writings. The major results showed that the discrepancies between most of the average scores of lexical/verb sophistication in the abstracts of doctoral theses were significantly greater (p< .05) than those pertaining to master dissertations. A positive correlation is traced between the academic level of students and the degree of lexical/verb sophistication of abstracts. The reported effect sizes suggest the presence of small to very large differences depending on the used proxy measures of lexical complexity.
Conference Paper
Full-text available
This article examines the interpretation of homonymous words in “Lugаti turkiy”compiled by Fazlullah Barlos in the 18thcentury in India. This work is an important source on the history of Turkic languages, in particular, the lexicology of the Uzbek language, which summarizes the dictionaries of its time, the main lexical units in them.
Article
The conceptualization of language assessment literacy (LAL) is currently a subject of debate in the language testing community, especially with regard to the role language components play in this construct. This is part of a broader discussion of generic and discipline-specific assessment literacy. Previous research on language teachers’ LAL aimed to uncover whether self-reported assessment knowledge complies with pre-established definitions. What transpired in the current study is a move in the opposite direction: following a generic course on assessment literacy, language teachers were asked to apply the course contents to their language-teaching objectives in designing a test and a performance task. Content analysis of the assessment artifacts created by 16 English and Hebrew teachers, points at partial employment of the generic assessment features acquired, with a more scant application and consideration of language-related construct components required as part of LAL for language assessment. LAL emerged as a process-oriented phenomenon, requiring an amalgamation of AL with language-related components but also with context-relevant variables.
Chapter
Full-text available
Individuals´ learning of a second or foreign language has been traditionally measured with paper-and-pencil tests. Unfortunately, such assessment practice prevents learners from demonstrating the skills gained throughout the teachinglearning processes and thus, their actual ability to use the target language effectively. It also limits learners from receiving positive feedback; which opens doors for them to improve their language skills. The language teaching field demands that English as Foreign Language (EFL) teachers have a vast knowledge of the fundamental concepts and theories that surround the assessment of EFL learning. It also requires that professionals who teach a foreign language keep up to date with assessment tendencies that go beyond paper-and-pencil tests as is the case of authentic assessments. Assessment practices that go beyond traditional paper-and-pencil tests provide students with opportunities to be assessed in mental stress-free environments. Teachers who promote this alternative form of assessment prompt learners to perform real-world tasks so that they can demonstrate their capability to apply essential knowledge and skills in creative and meaningful ways. In other words, teachers gain insights about how much students have grasped by their actual ability to perform in a specific situation instead of the number right or wrong answers they have made on a test.
Article
Full-text available
Foreign language departments with the goal of advanced literacy require optimizing student learning, especially at the initial stages of the program. Current practices for admission and placement mainly rely on students’ grades from previous studies, which may be the main reason why intra-group language proficiency often varies dramatically. One essential step for creating an environment that enables students to progress according to their skill level is the development of assessment procedures for admission and placement. Such assessment must prominently include proficiency in the target language. This article promotes the incorporation of an automated C-test into gateway and placement procedures as an instrument that ranks candidates according to general language proficiency. It starts with a review of the literature on aspects of validity of the C-Test construct and contains an outline of the functional design of such an automated C-Test. The article highlights the economic benefits of an automated C-Test platform and the central role of proficiency-based student placement for the success of programs aiming to develop advanced literacy in a foreign language. The findings implicate that developing and using the outlined C-Test platform has the potential to increase student achievement in advanced foreign language instruction significantly.
Book
Full-text available
Individuals´ learning of a second or foreign language has been traditionally measured with paper-and-pencil tests. Unfortunately, such assessment practice prevents learners from demonstrating the skills gained throughout the teachinglearning processes and thus, their actual ability to use the target language effectively. It also limits learners from receiving positive feedback; which opens doors for them to improve their language skills. The language teaching field demands that English as Foreign Language (EFL) teachers have a vast knowledge of the fundamental concepts and theories that surround the assessment of EFL learning. It also requires that professionals who teach a foreign language keep up to date with assessment tendencies that go beyond paper-and-pencil tests as is the case of authentic assessments. Assessment practices that go beyond traditional paper-and-pencil tests provide students with opportunities to be assessed in mental stress-free environments. Teachers who promote this alternative form of assessment prompt learners to perform real-world tasks so that they can demonstrate their capability to apply essential knowledge and skills in creative and meaningful ways. In other words, teachers gain insights about how much students have grasped by their actual ability to perform in a specific situation instead of the number right or wrong answers they have made on a test.
Article
Full-text available
This work explores the benefits of supporting learners affectively in a context-aware learning situation. This features a new challenge in related literature in terms of providing affective educational recommendations that take advantage of ambient intelligence and are delivered through actuators available in the environment, thus going beyond previous approaches which provided computer-based recommendation that present some text or tell aloud the learner what to do. To address this open issue, we have applied TORMES elicitation methodology, which has been used to investigate the potential of ambient intelligence for making more interactive recommendations in an emotionally challenging scenario (i.e. preparing for the oral examination of a second language learning course). Arduino open source electronics prototyping platform is used both to sense changes in the learners’ affective state and to deliver the recommendation in a more interactive way through different complementary sensory communication channels (sight, hearing, touch) to cope with a universal design. An Ambient Intelligence Context-aware Affective Recommender Platform (AICARP) has been built to support the whole experience, which represents a progress in the state of the art. In particular, we have come up with what is most likely the first interactive context-aware affective educational recommendation. The value of this contribution lies in discussing methodological and practical issues involved.
Conference Paper
Full-text available
Whilst in recent years there has been an increasing trend to design mobile APPs for foreign language learning, most of the available APPs support mainly one way teacher-to-learner interaction and use mobile devices to deliver content rather than encouraging learners to interact amongst each other. To address this lack and to provide learners with more versatile opportunities to communicate amongst each other, by sharing, assessing and co-constructing their knowledge, we designed an APP based on a highly interactive, ubiquitous and constructive learning approach. This means that learning contents are not just being delivered to our learners but, instead of this, they are being integrated into versatile tasks that although individually performed, affect the community of learners. The current paper presents our first experience using the Guess it! Language Trainer APP. Firstly, we detail how the APP was used to support students’ language learning outside the classroom, secondly, how it helped learners to get actively involved in their own learning process by adding themselves content to the knowledge base, and thirdly, we show how the data on students’ usage of the APP and participation in the learning process is used to automate the assessment of different competences by the language instructor. A pre-posttest comparison showed very good results on students’ language acquisition. This is complemented with the analysis of the system logs that highlight firstly, that students dedicated to playing the APP much longer time than we estimate they usually employ for independent learning and secondly, that they significantly improved their learning outcomes whilst using the APP. Additionally, other objective indicators on proficiency of other skills such as the ability to explain terms in a foreign language or the competence to assess definitions of mates were obtained automatically.
Article
Full-text available
http://www.ifets.info/journals/17_2/3.pdf Few tests were delivered using mobile phones a few years ago, but the flexibility and capability of these devices make them valuable tools even for high stakes testing. This paper addresses research done through the PAULEX (2007-2010) and OPENPAU (2012-2014) research projects at the Universidad Politécnica de Valencia and Universidad de Alcalá (Spain) to provide a powerful but low cost delivery system for the foreign language paper of the Spanish College Entrance Examination (henceforth PAU). The first project, PAULEX, intended to create a robust mobile platform for language testing while the second, OPENPAU, examined the specific applications of ubiquitous devices to create more dynamic forms of assessment. This paper focuses on the projects’ design, testing theory, and technical evolution including visual ergonomics. The current results demonstrate the technical and didactic feasibility of mobile-based formal assessment that aligns student needs with the kind of inferences that the mobile based language test should provide academic authorities.
Article
Full-text available
The English Profile Programme is a major inter-disciplinary, research-based collaborative project to develop detailed reference level descriptions (RLDs) of the English of L2 language learners, linked to the general principles and approaches of the Council of Europe. The Common European Framework of Reference (CEFR) provides language proficiency bands against which the profile is categorised. Large samples of writing and speech are required for the Programme, with the Cambridge Learner Corpus providing data for the initial phase.
Article
Full-text available
In the past twenty years, language testing research and practice have witnessed the refinement of a rich variety of approaches and tools for research and development, along with a broadening of philosophical perspectives and the kinds of research questions that are being investigated. While this research has deepened our understanding of the factors and processes that affect performance on language tests, as well as of the consequences and ethics of test use, it has also revealed lacunae in our knowledge, and pointed to new areas for research. This article reviews developments in language testing research and practice over the past twenty years, and suggests some future directions in the areas of professionalizing the field and validation research. It is argued that concerns for ethical conduct must be grounded in valid test use, so that professionalization and validation research are inseparable. Thus, the way forward lies in a strong programme of validation that includes considerations of ethical test use, both as a paradigm for research and as a practical procedure for quality control in the design, development and use of language tests.
Article
Full-text available
While the psychometric and statistical models underlying the design of computer adaptive tests (CAT) are well understood, relatively few working models exist for the purpose of foreign language assessment. Likewise, little has been published concerning the practical considerations affecting the implementation of such tests. in the process of constructing the Monash/Melbourne French CAT, we discovered much about putting testing theory into practice. The present paper reports this experience in three parts. In a preliminary section, we describe the academic context in which the French CAT was created and trialed. This is followed by a detailed consideration of the test presentation platform and operating algorithms. Lastly, we give an evaluation of the first administration of the French CAT, accompanied by a discussion of the test's reliability and validity as a placement instrument for first year Australian university students.
Article
Full-text available
This paper describes the construction and small-scale implementation of a computer program which can be used on a self-access basis to assess secondary school students' ESL listening proficiency. The test involves an extended dictation which is in the form of a dialogue. Subjects both hear and see on the screen (to provide context) the first speaker's utterances, but only hear the second speaker's utterances. After each exchange, subjects have to type in the second speaker's utterance, and the match between their input and the utterance is scored. Results indicate a good correlation with traditional pen-and-paper tests, suggesting that the concept has the potential to assess listening other than by administering a test to a group of subjects via a taped recording at a single sitting.
Article
Full-text available
Co-constructing communicative effectiveness is often challenging in English as a lingua franca (ELF): speakers have considerably less to go on in terms of shared expectations of cultural knowledge and linguistic norms. A university environment provides a convenient backdrop for sharing at least academic conventions – although these vary more than might be surmised from the uniform labelling of such event types. This paper looks into some discourse and lexicogrammatical features in academic ELF, using ELFA as the database. The data consists of spoken language, which provides direct access to the ways in which meanings are negotiated in ongoing discourse, and the speech events are typically polylogic. ELF discourse requires close cooperation from the participants, which is reflected in its enhanced explicitness among other things. The explicitation strategies speakers display facilitate mutual comprehensibility and contribute to social cohesion within the multi-participant groups. Such strategies also help overcome the potential problems participants might have in dealing with a variety of formal deviations from ordinary English as a native language (ENL). Most of the time ELF bears a very close resemblance to Standard English, but signs of incipient ELF-specific developments are also in evidence.
Article
Full-text available
Despite improvements in educational indicators, such as enrolment, significant challenges remain with regard to the delivery of quality education in developing countries, particularly in rural and remote regions. In the attempt to find viable solutions to these challenges, much hope has been placed in new information and communication technologies (ICTs), mobile phones being one example. This article reviews the evidence of the role of mobile phone-facilitated mLearning in contributing to improved educational outcomes in the developing countries of Asia by exploring the results of six mLearning pilot projects that took place in the Philippines, Mongolia, Thailand, India, and Bangladesh. In particular, this article examines the extent to which the use of mobile phones helped to improve educational outcomes in two specific ways: 1) in improving access to education, and 2) in promoting new learning. Analysis of the projects indicates that while there is important evidence of mobile phones facilitating increased access, much less evidence exists as to how mobiles promote new learning.
Article
Full-text available
Persistent elements and relationships underlie the design and delivery of educational assessments, despite their widely varying purposes, contexts, and data types. One starting point for analyzing these relationships is the assessment as experienced by the examinee: 'What kinds of questions are on the test?,' 'Can I do them in any order?,' 'Which ones did I get wrong?,' and 'What's my score?' These questions, asked by people of all ages and backgrounds, reveal an awareness that an assessment generally entails the selection and presentation of tasks, the scoring of responses, and the accumulation of these response evaluations into some kind of summary score. A four-process architecture is presented for the delivery of assessments: Activity Selection, Presentation, Response Processing, and Summary Scoring. The roles and the interactions among these processes, and how they arise from an assessment design model, are discussed. The ideas are illustrated with hypothetical examples. The complementary modular structures of the delivery processes and the design framework are seen to encourage coherence among assessment purpose, design, and delivery, as well as to promote efficiency through the reuse of design objects and delivery processes.
Book
This book explores implications for applied linguistics of recent developments in technologies used in second language teaching and assessment, language analysis, and language use. Focusing primarily on English language learning, the book identifies significant areas of interplay between technology and applied linguistics, and it explores current perspectives on perennial questions such as how theory and research on second language acquisition can help to inform technology-based language learning practices, how the multifaceted learning accomplished through technology can be evaluated, and how theoretical perspectives can offer insight on data obtained from research on interaction with and through technology. The book illustrates how the interplay between technology and applied linguistics can amplify and expand applied linguists’ understanding of fundamental issues in the field. Through discussion of computer-assisted approaches for investigating second language learning tasks and assessment, it illustrates how technology can be used as a tool for applied linguistics research.
Book
In 1998 and 1999, three of the largest providers of educational tests introduced computer-based versions of proficiency tests for English as a foreign language. Around the same time, many institutions began to offer Web-based tests for particular language courses and classes. These two phenomena have greatly added to the momentum of work in computer-assisted testing and mean that assessment through computer technology is becoming a fact for language learners in educational settings and therefore for teachers and researchers. This book is the first to consider the theoretical, methodological and practical issues and their implications for language-teaching professionals wishing to engage with computer-assisted assessment. It overviews the work in the field, evaluates examples of assessment though computer technology, and provides language teachers and researchers with practical guidelines for implementation.
Book
University students must cope with a bewildering array of registers, not only to learn academic content, but also to understand course expectations and requirements. While many previous studies have investigated academic writing, we know comparatively little about academic speech; and no linguistic study to date has investigated the range of academic and advising/management registers that students encounter. This book is a first step towards filling this gap. Based on analysis of the T2K-SWAL Corpus, the book describes university registers from several different perspectives, including: vocabulary patterns; the use of lexico-grammatical and syntactic features; the expression of stance; the use of extended collocations ('lexical bundles'); and a Multi-Dimensional analysis of the overall patterns of register variation. All linguistic patterns are interpreted in functional terms, resulting in an overall characterization of the typical kinds of language that students encounter in university registers: academic and non-academic; spoken and written.
Chapter
This chapter summarises the rationale for the development and validation work that took place over 2.5 years before the launch of the computer-based (CB) format of the Cambridge English Young Learners English tests (YLE). Several rounds of trials were carried out in a cyclical way, in a number of different locations across various countries, to ensure data was collected from a representative sample of candidates in terms of geographical location, age, L1, language ability, familiarity with the YLE tests, and experience of using different computer devices – PC, laptop and tablet. Validity evidence is presented from an empirical study, using a convergent mixed methods design to explore candidate performance in and reaction to the CB YLE tests. Regression analyses were conducted to investigate which individual test taker characteristics contribute to candidate performance in CB YLE tests. The results indicate that CB delivery presents a genuine choice for candidates in line with the Cambridge English ‘bias for best’ principle. Positive feedback from trial candidates, parents and examiners suggests that CB YLE tests offer a contemporary, fun, and accessible alternative to paper-based (PB) YLE tests to assess children’s English language ability.
Article
Young children’s literacy experiences at home shape the development of emergent literacy skills. Due to the increasing use of touch screen tablets (e.g., iPads) in homes and early education settings it is important to investigate the relationship between digital tools and emergent literacy. The present study examined the relationships between children’s (N = 57; aged 2 to 4 years) emergent literacy skills and home use of tablets for writing and reading. Correlational analysis showed a positive association between children’s access to apps and print knowledge. A positive association was found between the frequency of writing with tablets and print awareness, print knowledge, and sound knowledge. No associations occurred between emergent literacy skills and frequency of e-book reading. Further research is needed to investigate the effects of tablet writing on emergent literacy development.
Article
Nonverbal behavior plays an integral part in a majority of social interaction scenarios. Being able to adjust nonverbal behavior and influence other's responses are considered valuable social skills. A deficiency in nonverbal behavior can have detrimental consequences in personal as well as in professional life. Many people desire help, but due to limited resources, logistics, and social stigma, they are unable to get the training that they require. Therefore, there is a need for developing automated interventions to enhance human nonverbal behaviors that are standardized, objective, repeatable, low-cost, and can be deployed outside of the clinic. In this thesis, I design and validate a computational framework designed to enhance human nonverbal behavior. As part of the framework, I developed My Automated Conversation coacH (MACH)-a novel system that provides ubiquitous access to social skills training. The system includes a virtual agent that reads facial expressions, speech, and prosody, and responds with verbal and nonverbal behaviors in real-time. As part of explorations on nonverbal behavior sensing, I present results on understanding the underlying meaning behind smiles elicited under frustration, delight or politeness. I demonstrate that it is useful to model the dynamic properties of smiles that evolve through time and that while a smile may occur in positive and in negative situations, its underlying temporal structures may help to disambiguate the underlying state, in some cases, better than humans. I demonstrate how the new insights and developed technology from this thesis became part of a real-time system that is able to provide visual feedback to the participants on their nonverbal behavior. In particular, the system is able to provide summary feedback on smile tracks, pauses, speaking rate, fillers and intonation. It is also able to provide focused feedback on volume modulation and enunciation, head gestures, and smiles for the entire interaction. Users are able to practice as many times as they wish and compare their data across sessions. I validate the MACH framework in the context of job interviews with 90 MIT undergraduate students. The findings indicate that MIT students using MACH are perceived as stronger candidates compared to the students in the control group. The results were reported based on the judgments of the independent MIT career counselors and Mechanical Turkers', who did not participate in the study, and were blind to the study conditions. Findings from this thesis could motivate further interaction possibilities of helping people with public speaking, social-communicative difficulties, language learning, dating and more..
Article
Investigating how visuals affect test takers’ performance on video-based L2 listening tests has been the focus of many recent studies. While most existing research has been based on test scores and self-reported verbal data, few studies have examined test takers’ viewing behavior (Ockey, 2007; Wagner, 2007, 2010a). To address this gap, in the present study I employ eye-tracking technology to record the eye movements of 33 test takers during the Video-based Academic Listening Test (VALT). Specifically, I aim to explore test takers’ oculomotor engagement with two types of videos – context videos and content videos – from the VALT, and the relationship between the test takers’ viewing behavior and test performance. Eye-tracking measures comprising fixation rate, dwell rate, and the total dwell time for context and content videos were compared using paired-samples t-tests. Additionally, each measure was correlated with test scores for items associated with each video type. Results revealed statistically significant differences between fixation rates and between total dwell time values, but no difference between the dwell rates for context and content videos. No statistically significant relationship was found between the three eye-tracking measures and the test scores. Directions for future research on video-based L2 listening assessment are discussed.
Article
Two examples demonstrate an argument-based approach to validation of diagnostic assessment using automated writing evaluation (AWE). Criterion ®, was developed by Educational Testing Service to analyze students’ papers grammatically, providing sentence-level error feedback. An interpretive argument was developed for its use as part of the diagnostic assessment process in undergraduate university English for academic purposes (EAP) classes. The Intelligent Academic Discourse Evaluator (IADE) was developed for use in graduate EAP university classes, where the goal was to help students improve their discipline-specific writing. The validation for each was designed to support claims about the intended purposes of the assessments. We present the interpretive argument for each and show some of the data that have been gathered as backing for the respective validity arguments, which include the range of inferences that one would make in claiming validity of the interpretations, uses, and consequences of diagnostic AWE-based assessments.
Article
This article outlines the current state of and recent developments in the use of corpora for language assessment and considers future directions with a special focus on computational methodology. Because corpora began to make inroads into language assessment in the 1990s, test developers have increasingly used them as a reference resource to become well versed in terms of the linguistic characteristics of expert and novice speakers’ usage and identify the test construct. In regard to developing and validating language tests, large representative corpora, learner corpora, and specialized corpora have been actively used, as these corpora have made it possible to systematically compare the linguistic features associated with expert users with those found in learner language. Recent advances in computational approaches to assessment can facilitate this comparison to a great extent using technologies in automated essay scoring and learner language analysis. As an emerging area in the field of language assessment, corpus-based research should extend to less explored areas including compilation and longitudinal analysis of developmental corpora, fine-grained microanalysis of learner's development, and assessment attuned to individual learners who use different linguistic varieties.
Book
Ronald K. Hambleton; H. Swaminathan; H. Jane Rogers., The following values have no corresponding Zotero field: Label: B496 ID - 337
Article
Testlets are subsets of test items that are based on the same stimulus and are administered together. Tests that contain testlets are in widespread use in language testing, but they also share a fundamental problem: Items within a testlet are locally dependent with possibly adverse consequences for test score interpretation and use. Building on testlet response theory (Wainer, Bradlow, & Wang, 2007), the listening section of the Test of German as a Foreign Language (TestDaF) was analyzed to determine whether, and to which extent, testlet effects were present. Three listening passages (i.e., three testlets) with 8, 10, and 7 items, respectively, were analyzed using a two-parameter logistic testlet response model. The data came from two live exams administered in April 2010 (N = 2859) and November 2010 (N = 2214). Results indicated moderate effects for one testlet, and small effects for the other two testlets. As compared to a standard IRT analysis, neglecting these testlet effects led to an overestimation of test reliability and an underestimation of the standard error of ability estimates. Item difficulty and item discrimination estimates remained largely unaffected. Implications for the analysis and evaluation of testlet-based tests are discussed.
Article
Technology continues to move forward with smaller and faster hardware, more efficient software, multimedia capabilities, and universal reach through advanced telecommunications. Evidence for the successful use of technology in education is overwhelming (Kulik 1994, Snow and Mandinach 1991) given that there are reasoned use of that technology—uses that reflect an understanding of what is to be taught and tested. With the above in mind, this paper reviews current and developing technology uses that are relevant to language assessment and discusses examples of recent liguistic applications from our own laboratory at Educational Testing Service. Below, we describe the processes of language test development and the functions that they serve from the perspective of a large testing organization. We encourage readers to think of technology broadly, to include not only hardware, software, and telecommunications, but other technologies as well, such as job analysis, Item Response Theory (IRT) scaling, standards-setting, and group assessments. Only in this way can the broad activities of language testing, and the relevance of new technologies, be assessed.
Article
Videoconferencing offers new opportunities for language testers to assess speaking ability in low-stakes diagnostic tests. To be considered a trusted testing tool in language testing, a test should be examined employing appropriate validation processes (Chapelle, C.A., Jamieson, J., & Hegelheimer, V. (20038. Chapelle , C. 1994. Are C-tests valid measures for L2 vocabulary research?. Second Language Research, 10: 157–187. [CrossRef], [CSA]View all references). Validation of a web-based ESL test. Language Testing, 20, 409–439.). While developing a speaking test, language testers need to gather evidence to build a validity argument with theoretical rationales. These rationales should be based on test purpose and validation considerations that affect decision making on test design and validation (Chapelle, C. (20017. Chang , H. 2004. “Understanding computerized adaptive testing: From Robbins-Montro to Lord and beyond”. In The SAGE handbook of quantitative methodology for the social sciences, Edited by: Kaplan , D. 1–17. London: Sage. View all references). Computer applications in second language acquisition: Foundations for teaching, testing, and research. Cambridge: Cambridge University Press.). To obtain theoretical soundness in validation, spec-driven test development (Davidson, F., & Lynch, B. (200215. Clark , R.E. 1994b. Media will never influence learning. Education Technology Research & Development, 42: 21–29. [CrossRef], [Web of Science ®]View all references). Testcraft: A teacher's guide to writing and using language test specifications. New Haven, CT and London: Yale University Press.) was applied to speaking test development. Experimental tests were carried out with 40 test takers using face-to-face and videoconferenced oral interviews. Findings indicated no significant difference in performance between test modes, neither overall nor across analytic scoring features. Findings from qualitative data also evidenced the comparability of the videoconferenced and face-to-face interviews in terms of comfort, computer familiarity, environment, non-verbal linguistic cues, interests, speaking opportunity, and topic/situation effects with little interviewer effect. Data taken from test spec evolution, test scores, post interview, and observations were analyzed to build a validity argument using Bachman and Palmer's (19963. Bachman , L.F. and Palmer , A.S. 1996. Language testing in practice, Oxford: Oxford University Press. View all references. Language testing in practice. Oxford: Oxford University Press.) usefulness analysis table. The collected evidence suggests that the videoconferenced interview was comparable to the face-to-face interview with respect to reliability, construct validity, authenticity, interactiveness, impact, and practicality.
Article
The web offers new opportunities to realize some of the current ideals for interactive language assessment by providing learners information about their language ability at their convenience. If such tests are to be trusted to provide learners with information that might help to improve their language ability, the tests need to undergo validation processes, but validation theory does not offer specific guidance about what should be included in a validity argument. Conventional wisdom suggests that low-stakes tests require less rigorous validation than high-stakes tests, but what are the factors that affect decisions about the validation process for either? Attempting to make these contributing factors explicit, this article examines the ways in which the purpose of a low-stakes web-based ESL (English as a second language) test guided its design and the validation process. The validity argument resulting from the first phase of the validation process is illustrated.
Article
With the advent of the digital revolution, language testers have endeavored to utilize state-of-the-art computer technology to satisfy the ever-growing need for a tool to measure English communication skills with maximal accuracy and efficiency. Thanks to the concerted efforts made by experts in such fields as computational linguistics, computer engineering, computer-assisted language learning, and psychometrics, language testers have recently succeeded in developing computer/web-based language tests. Among them are the TOEFL CBT by Educational Testing Service and CommuniCAT by the University of Cambridge Local Examinations Syndicate. As with the paper-based language test (PBLT), more rigorous research is now being conducted on the validity of computer-based language tests (CBLT) and computer adaptive language tests (CALT). Content analyses and comparability studies of PBLT and CBLT/CALT are prerequisites to such validation research. In this context, utilizing an EFL test battery entitled the Test of English Proficiency developed by Seoul National University (TEPS), the present study is aimed at addressing the issue of the comparability between PBLT and CBLT based on content and construct validation employing content analyses based on corpus linguistic techniques in addition to such statistical analyses as correlational analyses, ANOVA, and confirmatory factor analyses. The findings support comparability between the CBLT version and the PBLT version of the TEPS subtests (listening comprehension, grammar, vocabulary, and reading comprehension) in question.
Article
Tests with their many inexact measurements have been the concern of psychometricians for many years. The two issues of adequate precision and a common 'yardstick' for measuring persons of different abilities have been particularly difficult to deal with. Computer-adaptive testing holds great promise in dealing with these and related issues in the field of testing. This paper describes several test variables of concern and explains the concept of computer-adaptive testing and its relationship to these variables.
Article
Computerization of L2 reading tests has been of interest among language assessment researchers for the past 15 years, but few empirical studies have evaluated the equivalence of the construct being measured in computerized and conventional L2 reading tests and the generalizability of computerized reading test results to other reading conditions. In order to address various issues surrounding the effect of mode of presentation on L2 reading test performance, the present study reviews the literature in cognitive ability testing in educational and psychological measurement and the non-assessment literature in ergonomics, education, psychology, and L1 reading research. Generalization of the findings to computerized L2 assessment was found to be difficult: The nature of the abilities measured in the assessment literature does not necessarily involve language data; mode of presentation studies in the non-assessment literature involving L2 readers are scarce; and there are limitations in the research methodologies used. However, the literature raises important issues to be considered in future studies of mode of presentation in language assessment.
Article
The planned introduction of a computer-based Test of English as a Foreign Language (TOEFL) test raises concerns that language proficiency will be confounded with computer proficiency, introducing construct-irrelevant variance to the measurement of examinees' English-language abilities. We administered a questionnaire focusing on examinees' computer familiarity to 90,000 TOEFL test takers. A group of 1,200 “low-computer-familiar” and “high-computer-familiar” examinees from 12 international sites worked through a computer tutorial and a set of 60 computer-based TOEFL test items. We found no meaningful relationship between level of computer familiarity and level of performance on the computerized language tasks after controlling for English language ability. We concluded that no evidence exists of an adverse relationship between computer familiarity and computer-based TOEFL test performance due to lack of prior computer experience.
Article
Previous simulation studies of computerized adaptive tests (CATs) have revealed that the validity and precision of proficiency estimates can be maintained when review opportunities are limited to items within successive blocks. Our purpose in this study was to evaluate the effectiveness of CATs with such restricted review options in a live testing setting. Vocabulary CATs were compared under four conditions: (a) no item review allowed, (b) review allowed only within successive 5-item blocks, (c) review allowed only within successive lO-item blocks, and (d) review allowed only after answering all 40 items. Results revealed no trust-worthy differences among conditions in vocabulary proficiency estimates, measurement error, or testing time. Within each review condition, ability estimates and number correct scores increased slightly after review, more answers were changed from wrong to right than from right to wrong, most examinees who changed answers improved proficiency estimates by doing so, and nearly all examinees indicated that they had an adequate opportunity to review their previous answers. These results suggest that restricting review opportunities on CATs may provide a viable way to satisfy examinee desires, maintain validity and measurement precision, and keep testing time at acceptable levels.
Article
The success of computerized instruction in second-language acquisition requires that some FL teachers learn enough about the operation of computers to become able to direct the preparation of computerized materials. The kind of relationship that needs to exist between FL teachers and computer programmers is examined.The present limitations of computerized instruction are analyzed; much remains to be done in error analysis and adequate computer software must be developed. New technologies in the audio-visual domain (synthetic speech, digital compressed speech, videodiscs, etc.) may provide us with random, immediate access equipment which is either lacking or too expensive today.
Article
There is no published material in the language testing literature on the process of, or good practice in, developing an interface for a computer-based language test. Nor do test development bodies make publicly available any information on how the interface for their computer-based language tests was developed. This article describes a three phase process model for interface design drawing on practices developed in the software industry, adapting them for computer-based language tests (CBTs). It describes good practice in initial design, emphasizes the importance of usability testing, and argues that only through following a principled approach to interface design can the threat of interface-related construct-irrelevant variance in test scores be avoided. The article also charts concurrent test development activities that take place during each phase of the design process. The model may be used in CBT project management, and it is argued that the publication of good interface design processes contributes to the mix of validity evidence presented to support the use of a CBT.
Innovations in language testing: Can the microcomputer help? Special Report No 1 Language Testing Update
  • J C Alderson
  • JC Alderson
Alderson, J. C. (1988). Innovations in language testing: Can the microcomputer help? Special Report No 1 Language Testing Update. Lancaster: University of Lancaster.
Computer-enhanced language assessment
  • C Corbel
Corbel, C. (1993). Computer-enhanced language assessment. In G. Brindley (Ed.), Research report series 2, National Centre for English Language Teaching and Research. Sydney: Marquarie University.
Development and research in computer adaptive language testing. Cambridge: University of Cambridge Examinations Syndicate
  • M Chalhoub-Deville
Chalhoub-Deville, M. (Ed.). (1999). Development and research in computer adaptive language testing. Cambridge: University of Cambridge Examinations Syndicate/Cambridge University Press.
Multipurpose language tests: Is a conceptual and operational synthesis possible?
  • J L D Clark
  • JLD Clark
Clark, J. L. D. (1989). Multipurpose language tests: Is a conceptual and operational synthesis possible? In J. E. Alatis (Ed.), Georgetown university round table on language and linguistics. Language teaching, testing, and technology: Lessons from the past with a view toward the future (pp. 206-215). Washington, DC: Georgetown University Press.
Computer-assisted testing of reading comprehension: Comparisons among multiple-choice and open-ended scoring methods
  • G Henning
  • M Anbar
  • C Helm
  • S Arcy
Henning, G., Anbar, M., Helm, C., & D'Arcy, S. (1993). Computer-assisted testing of reading comprehension: Comparisons among multiple-choice and open-ended scoring methods. In D. Douglas & C. Chapelle (Eds.), A new decade of language testing research. Alexandria: TESOL.
Longman English interactive
  • M Rost
Rost, M. (2003). Longman English interactive. New York: Pearson Education.
A validity argument for score meaning of a computer-based ESL academic collocational ability test based on a corpus-driven approach to test design (Doctoral dissertation
  • E Voss
Voss, E. (2012). A validity argument for score meaning of a computer-based ESL academic collocational ability test based on a corpus-driven approach to test design (Doctoral dissertation). Available from ProQuest Dissertations and Theses database. (UMI No. 3539432).
Diagnosing foreign language proficiency: The interface between learning and assessment
  • J C Alderson
  • JC Alderson
Alderson, J. C. (2005). Diagnosing foreign language proficiency: The interface between learning and assessment. London: Continuum.
Computer-adaptive test of listening and reading comprehension: The Brigham Young University approach
  • H S Madsen
  • HS Madsen
Madsen, H. S. (1991). Computer-adaptive test of listening and reading comprehension: The Brigham Young University approach. In P. Dunkel (Ed.), Computer-assisted language learning and testing: Research issues and practice (pp. 237-257). New York: Newbury House.
The Lexile framework for reading
  • Metametrics
Technology and language testing
  • C Stansfield
Stansfield, C. (Ed.). (1986). Technology and language testing. Washington, DC: TESOL Publications.
The English language profile -The first three years
  • N Saville
  • R Hakey
Saville, N., & Hakey, R. (2010). The English language profile -The first three years. English Language Journal, 1(1), 1-14.
Using mobile phones to improve educational outcomes: An analysis of evidence from Asia. The International Review of Research in Open and Distance Learning
  • J H Valk
  • A T Rashid
  • L Elder
  • JH Valk
Valk, J. H., Rashid, A. T., & Elder, L. (2010). Using mobile phones to improve educational outcomes: An analysis of evidence from Asia. The International Review of Research in Open and Distance Learning, 11(1), 117-140.