Book

Testing for Language Teachers

Authors:

Abstract

This book provides an accessible guide to concepts in language testing and the testing of specific skills and systems. It combines theory and practical recommendations to help teachers understand the principles of testing and how they can be applied, supporting them to write better tests. The third edition has been extensively revised and updated to reflect recent developments in the field, while retaining the straightforward approach that made the earlier editions essential reading for trainee and experienced teachers alike. It features new content on technology, including computer adaptive testing and the use of automated scoring for all skills. It also includes an extended discussion of language testers' responsibilities, new chapters on non-testing methods of assessment and a checklist to help teachers choose tests.
... Reliability is the consistency of language test scores, whether obtained on separate occasions, using varying measurement methods, or employing different assessments (Bachman & Palmer, 1996;Hughes & Hughes, 2020). When designing the test, the test designers attempt to minimize the inconsistencies of the measurement by taking the controllable factors affecting test performance under control. ...
... Construct validity in language testing means the meaningfulness and appropriateness of justified interpretations drawn from the test scores, backed by substantial evidence to support the validity of the interpretations (Bachman & Palmer, 1996). For instance, if all items within a language test were intended to evaluate specific skills, yet there was a lack of empirical evidence to measure those skills of the test-takers, it would cast doubt on the construct validity of the test (Hughes & Hughes, 2020). Although the multiple-choice format often focuses on receptive skills (Jackson, 2022;Sato & Ikeda, 2015), incorporating CEFR descriptors into test specifications can enhance both the validity and reliability of the test (Al Lawati, 2023). ...
... Interactiveness is used to evaluate the individual test-taker's language proficiency in real-world language usage, including language and topic knowledge, strategic competence, and affective responses to complete language test tasks (Bachman & Palmer, 1996;Cheewasukthaworn, 2022;Hughes & Hughes, 2020). According to Mattoussi (2018) and Ying (2020), improving the interactiveness of tests and test performance requires careful consideration of test-takers' topical knowledge, characteristics, and socio-cultural background during test content design. ...
Article
Full-text available
Chiang Mai Rajabhat University Test of English Proficiency (CMRU-TEP) is a required English proficiency test for all CMRU students before graduation. Despite its meticulous design, there is an opportunity for students to improve their scores through focused efforts and targeted support. This study employs an explanatory sequential mixed-methods design, utilizing surveys and interviews, to explore student perceptions of CMRU-TEP and to propose improvement strategies to enhance test performance. Guided by Bachman and Palmer’s (1996) model, the study involves 1,037 fourth-year students, including 155 English majors and 882 non-English majors. Both groups consider CMRU-TEP “useful”, addressing six qualities of test usefulness. The perceptions of majors and non-majors are similar. To enhance CMRU-TEP, the following recommendations are proposed: 1) Develop a test administration handbook. 2) Integrate a writing and speaking portfolio as part of the proficiency assessment. 3) Ensure the test aligns with the focus on communicative skills. 4) Shorten test time and tasks. 5) Design tasks to stimulate real-world language use. 6) Explore the potential of computer-based testing as an alternative. The paper concludes by proposing tailored support for English and non-English majors based on identified needs.
... Moreover, such an assessment process might lead students to view assessment as only high-stakes causing high stress and fostering anxiety rather than utilizing the best assessment practices through which students' overall potentials can be considered. In addition, the heavy dependence on summative assessments might result in biasness in grading-whether or not due to subjective grading or inconsistency in dealing with the students, a procedure that might further complicate the students' perceptions of fairness throughout the whole process (Hughes, 2003). Another essential concern lies in the design and administration of assessments tools. ...
... Both the designing of tests and their administration play a pivotal role in shaping students' perceptions of the assessment (Cheng, 2017). Poorly designed assessments which might lack face validity and do not properly align with the course objectives or real-life language use can have undue implications on the outcomes of the assessments conducted (Hughes, 2003). According to the best knowledge of the researchers, no studies have been conducted to investigate Kurdish EFL students' attitudes of the assessment process in terms of test and assessment design, administration of assessments, purpose of assessment, effectiveness of assessment, scoring and grading practices, feedback, and washback of the assessment process. ...
... Furthermore, effective authoritarian control characterized by effective invigilation during tests results in the provision of a setting equitable to all students in terms of security and control (Bachman & Damböck, 2018;Crossley, 2022;Van Bergen & Lane, 2014). This is further confirmed by Hughes (2003) who states that invigilator's conduct might cause discomfort to the test taker, but successful management and proctoring result in positive student performance ensuring a fair and transparent administration of the test. Likewise, excessive noise level can negatively affect students' performance. ...
Article
Full-text available
The current study aims to investigate Kurdish EFL students' views of the assessment process conducted at EFL departments of public universities in the Kurdistan Region of Iraq (KRI). Due to the fact that assessment is the core factor for students' learning, involvement, and evaluation as the only gauge for their progress and development, much attention needs to be given to the assessment process. This study specifically aims at studying the perceptions of the Kurdish EFL students of the criteria including design, administration, purpose, effectiveness and washback, scoring and grading, and feedback of testing and assessment process. Hence, for the purpose of data collection, a questionnaire was administered to 116 students of semesters 3,5, and 7 at the English language departments of some public universities in the KRI during the academic year 2024-2025. Cronbach Alpha was used to analyze the reliability of the items of the questionnaire along with SPSS (version 25) to analyze the mean values of the items and ANOVA was utilized to compare the mean values across the six criteria. Findings indicate significant challenges in the alignment and execution of testing and assessment processes in higher education. While testing and assessment items align with course objectives, they often fail to adequately measure critical thinking and comprehensive language skills. Procedural issues, including unclear instructions, unfair scoring and grading practices, and overemphasis on grading rather than fostering students' progress and engagement, have badly affected the effectiveness of assessments. Additionally, environmental factors such as cheating, unsupportive classroom dynamics, and poor seating quality negatively impact students' performance. A lack of constructive feedback further hinders the development of students' overall skills and learning outcomes. The findings further highlight the need for a holistic approach to assessment that emphasizes student growth, fair evaluation, and the integration of diverse language competencies.
... Temeljne publikacije iz polja vrednovanja jezične kompetencije (npr. Alderson i sur., 1995; Bachman, 1990; Bachman i Palmer, 2010; Brown, 2004;Brown i Abeywickrama, 2018;Hughes, 2003;McKay, 2006) napisane su na engleskome jeziku i uvelike se oslanjanju na istraživanja engleskoga kao inoga (stranoga/drugoga/dodatnoga) jezika. Većina istraživanja u ovoj knjizi također proizlazi iz toga konteksta. ...
... Dijagnostičko vrednovanje služi utvrđivanju početnoga ili trenutnoga znanja i vještina učenika, najčešće na početku školskoga razdoblja ili nove nastavne cjeline (Hughes, 2003 Dijagnostičko vrednovanje igra ključnu ulogu u ranome učenju stranih jezika jer omogućuje učiteljima detaljan uvid u predznanja, kognitivne sposobnosti i potencijalne poteškoće učenika. Njegova svrha nije samo procjena trenutnoga znanja, već i identifikacija specifičnih područja koja zahtijevaju dodatnu podršku, čime se omogućava prilagodba nastave individualnim potrebama učenika (Hughes, 2003). ...
... Dijagnostičko vrednovanje služi utvrđivanju početnoga ili trenutnoga znanja i vještina učenika, najčešće na početku školskoga razdoblja ili nove nastavne cjeline (Hughes, 2003 Dijagnostičko vrednovanje igra ključnu ulogu u ranome učenju stranih jezika jer omogućuje učiteljima detaljan uvid u predznanja, kognitivne sposobnosti i potencijalne poteškoće učenika. Njegova svrha nije samo procjena trenutnoga znanja, već i identifikacija specifičnih područja koja zahtijevaju dodatnu podršku, čime se omogućava prilagodba nastave individualnim potrebama učenika (Hughes, 2003). U tome kontekstu, dijagnostičko vrednovanje predstavlja temelj za kvalitetno planiranje nastave budući da omogućuje sustavno praćenje napretka učenika i razvoj diferenciranih strategija poučavanja. ...
Book
Full-text available
Knjiga pruža sveobuhvatan pregled suvremenih znanstvenih spoznaja o vrednovanju jezične kompetencije učenika rane školske dobi (od šest do jedanaest godina). Polazeći od razvojnih obilježja djece srednjega djetinjstva, sustavno se analiziraju temeljna načela poučavanja stranoga jezika u ranoj školskoj dobi te različite vrste vrednovanja s posebnim naglaskom na razvojno primjeren pristup. Središnji dio knjige posvećen je vrednovanju četiriju temeljnih jezičnih vještina – slušanja, govorenja, čitanja i pisanja – pri čemu se svako poglavlje temelji na relevantnome teorijskom okviru te nudi konkretne tehnike i primjere zadataka prilagođene mlađim učenicima. Posebna se pažnja posvećuje emocionalnoj sigurnosti učenika, važnosti formativnoga vrednovanja, uporabi digitalnih alata i poticanju samovrednovanja i vršnjačkoga vrednovanja. Knjiga također obrađuje teme povratne informacije i ispravljanja pogrešaka. Namijenjena znanstvenicima i nastavnicima visokih učilišta koji obrazuju buduće učitelje koji će predavati strani jezik od prvoga do četvrtoga razreda osnovne škole, studentima koji se obrazuju za učitelje i konačno samim učiteljima, knjiga služi kao teorijska podloga za oblikovanje pravednoga, poticajnoga i učinkovitoga sustava vrednovanja u nastavi stranoga jezika u nižim razredima osnovne škole.
... Language Testing International (LTI: the US-based test-development organization) defines language testing as "a broad category of testing that assesses aspects of a person's ability to understand or communicate in a particular language." The pioneer researchers and theorists in language assessment (Henning, 1987;Davies, 1990;Brown, 2004;Hughes & Hughes, 2020) identify five major kinds of language test including proficiency, achievement, diagnostic, placement and aptitude. These language tests are used for a variety of purpose such as assessment of language in academic and professional contexts. ...
... High stake tests require higher levels of validity because their purpose is also critical such as placement, recruitment, achievement and proficiency. The basic argument of test validation is that the whole test or any of its component parts should adequately and appropriately measure the language skill it is supposed to measure (Henning, 1987;Brown, 2004;Hughes & Hughes, 2020). Henning asserts on the primary question that is the consistency between the test content and goal. ...
... Hence, the decision made on the basis of test scores are also invalid and unreliable. Validity of a test can also be taken synonymous to the term 'accuracy' defined by Hughes and Hughes (2020) as the "accurate measures of the test-takers ability" (p. 1). A language test is valid if it measures the intended ability by an accurate method. ...
Article
Full-text available
The current research aims to evaluate the validity of the Recruitment Test for Lecturer English (TRLE) conducted by the Punjab Public Service Commission (PPSC). The test scores are used to allocate subject experts to teach English language at degree colleges in Punjab. The researcher follows a mixed-method approach focusing on quantitative data analysis of the five PPSC-TRLEs held between 2013 and 2022. The theories of language test validity informed the validity argument and guided the researcher to devise a tailored analytical framework. The quantitative data analysis and comparison of five tests reveal that it under-represents certain areas of language and exhibits inter-test and intra-test inconsistency. The inclusion of inappropriate and insufficient content leads to invalid test development and unfair decisions. The findings suggest that the test is not entirely accurate for assessing candidates’ language proficiency and predicting their language ability. The study recommends further research to investigate the reliability of the same test and recruitment tests for other subjects.
... She proposed with the application of specific rubrics and multiple raters in the assessment of speaking tests that speaking tests would be improved in reliability. Hughes (2003) further addressed this, suggesting formalized rating scales in order to promote homogeneity in the process of oral language assessment. Orr (2002) also studied the ability of speaking tests and stressed the importance of high-quality training for assessors of exactly how to discern fluency, coherence and pronunciation characteristics in spoken tests. ...
... Creating balanced rubrics has also been a formidable task in resource poor contexts (Brown & Abeywickrama, 2010). The lack of qualified teachers negatively impacts the variability of assessment processes (Hughes, 2003). These challenges are particularly acute in the Global South, where both a shortage of time, limited resources, and deficient infrastructure complicate robust evaluation. ...
... Urban students reported higher confidence due to more opportunities to practice English, while those from under-resourced areas struggled, reflecting inequities in exposure and resources (Hughes, 2003). ...
Article
Full-text available
Purpose. The current study aims to explore problems with the introduction of authentic speaking assessments in English as a Foreign Language (EFL) courses at Wollo University, Ethiopia. Methodology. Methods data were collected from 25 ELT teachers, and 30 EFL students at the end of their ELT class via structured questionnaires, semi-structured interviews, and document reviews. Quantitative data were statistically analyzed, and qualitative data were thematically analyzed to identify deficiencies in assessment practice. Results. The results identified challenges including the absence of task-relevant context, variable criteria of assessment, insufficient resources, limited teacher professionalization, and factors influencing students (e.g., anxiety, lack of feedback). On the other hand, teachers have a huge challenge to ensure that the different learners have their assessment needs satisfied, in terms of the relationship between assessment procedures and real-life communication. Meanwhile the students also have limited opportunities to speak and limited chances to receive valuable feedback. Challenges include time limitations, professional development limitations, and the absence of classroom resources. Qualitative findings identify the critical need for culturally appropriate assessments, i.e., to recognize the language plurality in Ethiopia. Conclusions. The mismatch between the assessment method and students' requirement for communicative course work also is limiting. Due to the demand of genuine, equitable and reliable assessment tools to enhance students' spoken English, the current study recommends specific teacher training to address these issues, including life size assessment in assessment, and effective exploitation of resources.
... Test validity is defined as the quality having an impact on the writing assessment (Kunkun, 2015). Hughes (2003) explains the validity if "It measures accurately what is intended to measure'' (p.26). In order to evaluate the research quality, it is important that the utilization of findings is to be with great care. ...
... Test reliability demonstrates the operationalized measures of the research inquest that other researchers have repeated with similar and consistent findings of the research. This definition is similar to Hughes (2003) and Noble and Smith (2015), who state that test reliability requires consistency of analytical procedures and test scores that are employed for research. Moreover, it has also been reported that establishing a higher level of reliability and validity shows the significant and appropriate data collection methods and use of results in terms of decision-making or conclusions (Noble & Smith, 2015). ...
... Moreover, it has also been reported that establishing a higher level of reliability and validity shows the significant and appropriate data collection methods and use of results in terms of decision-making or conclusions (Noble & Smith, 2015). Moreover, other reliability considerations in the writing test involve external evidence, calculations of coefficient and correlation reliability, test stability, consistency of test items and scores, and inter and intra-rater reliability (Alderson, et al., 1995;Hughes, 2003;Lissitz & Samuelsen, 2007;Kunkun, 2015). Table 1 outlines the establishment of the marking criteria of reliability. ...
... They define washback as the effect of tests on various aspects of the educational process, encompassing teachers, students, content, methods, pacing, sequencing, depth, degree, and attitudes towards teaching and learning. Washback is widely acknowledged as the influence of tests on teaching and learning (Pearson, 1988;Hughes, 1989;Bailey, 1996;Alderson & Wall, 1993). Bailey (1996) acknowledges that washback "can be either positive or negative, promoting or hindering the achievement of educational goals" (p. ...
... Depending on its orientation, a test can significantly impact daily teaching and learning practices, thus guiding educators in enhancing the process. Hughes (1989) argues that positive washback can be enhanced by designing direct, cost-effective, objectivedriven, and criterion-referenced tests. Negative washback, on the other hand, arises from teaching focused solely on test preparation, neglecting the development of essential skills expected of learners, or from misalignment between curriculum goals and examination content (Cheng & Curtis, 2004). ...
... The initial negative perception of washback has evolved to encompass both positive and negative effects. Hughes (1989) advocated for assessments promoting desirable skills, while Xin (2021) emphasised optimising tests for teachers and learners. Building on this, researchers have identified six key dimensions of washback: direction (positive or negative), extent (reach across different stakeholders), intensity (variation with test stakes), intentionality (planned or unintended effects), length (short-term or enduring impacts), and specificity (precise domains affected) (Bachman & Palmer, 2010;Green, 2007;Watanabe, 2004;Cheng, 2005). ...
... According to Hughes (2003), a test is considered valid if it accurately measures what it is intended to measure. Language tests are designed to assess theoretical constructs such as reading ability, speaking fluency, and control of grammar. ...
... In her view, a reliable English test is one that consistently measures the intended constructs under all conditions. Hughes (2003) identified two primary reasons for test unreliability: first, the interaction between the individual taking the test and the characteristics of the test itself; and second, the scoring process, which can also introduce sources of unreliability. ...
Article
Full-text available
Different testing tasks are used to measure the learners’ awareness of English language grammar. These tasks can be classified into mainly two types: tasks that test the recognition of the correct language languagestructures and tasks that test the ability to produce acceptable language forms. This research applied an empirical approach to measure the consistency between rule recognition tasks and language production tasks in testing English language grammar. It also aims to shed light on the validity and reliability of these testing tasks as a means for testing grammatical ability. The author used a split-half exam to test a group of 30 students. Half of the exam consisted of rule recognition items, while the other half consisted of language production items. The correlation between the scores of the two halves was calculated, and the results were very consistent. The results suggest that both of the two methods can give a clear evaluation of the learners’ grammatical ability. Key words: grammar testing, rule recognition tasks, language production tasks
... Language testing is one of the obligatory steps in most educational systems; consequently, the role of the examination is crucial. For an examination to be valid, which means that "to be valid, a test must accurately measure what it is intended to measure [1,26]. ...
... Therefore, testing tends to have a considerable influence on the learners and language instructors. One of the central and most valuable elements of tests described in the literature is the outcome, called 'backwash' or 'washback' [1,23], which is the impact which every test tends to have on both the teaching and learning processes [3], [4], [5]. According to the literature analysis, it was also identified that backwash is an influence on the examination of teachers and learners, which can encourage or discourage further learning [6,98]. ...
Article
This paper describes the creation process of the interview guide creation and validation, which is one of the data collection tools for qualitative research design. The interview guide allows careful planning of the structure and content of the interview, that consequently may lead to getting deeper analysis within the amount and scope of information collected. To prepare a meaningful and insightful interview guide that covers all significant areas to be researched, it needs careful planning and preparation stage. The method implemented is based on two existing models for interview validation. The presented research portrays the process of interview guide design and formation, followed by a practical illustration of how, based on the existing framework, it can be validated and piloted for the research. It is important to mention that the major aim of the interview guide is to collect data for the qualitative part of the Basic English Examination and help to contribute to the validation process of this crucial assessment procedure from a broader perspective. Overall, the paper brings valuable insights into qualitative research and facilitates the effective production of the research tool or the improvement of the existing methodology.
... According to Hughes (2003), a test is considered valid if it accurately measures what it is intended to measure. Language tests are designed to assess theoretical constructs such as reading ability, speaking fluency, and control of grammar. ...
... In her view, a reliable English test is one that consistently measures the intended constructs under all conditions. Hughes (2003) identified two primary reasons for test unreliability: first, the interaction between the individual taking the test and the characteristics of the test itself; and second, the scoring process, which can also introduce sources of unreliability. ...
Article
Full-text available
Different testing tasks are used to measure the learners’ awareness of English language grammar. These tasks can be classified into mainly two types: tasks that test the recognition of the correct language structures and tasks that test the ability to produce acceptable language forms. This research applied an empirical approach to measure the consistency between rule recognition tasks and language production tasks in testing English language grammar. It also aims to shed light on the validity and reliability of these testing tasks as means for testing grammatical ability. The author used a split-half exam to test a group of 30 students. Half of the exam consisted of rule recognition items while the other half consisted of language production items. The correlation between the scores of the two halves was calculated and the results were very consistent. The results suggest that both of the two methods can give a clear evaluation of the learners’ grammatical ability.
... Finally, because the purpose of this test was to measure participants' pragmatic ability and their competence of the concept being tested, the researcher calculated the facility value (difficulty index) of each test item to investigate the degree to which an item was measuring what it was supposed to measure. The higher the value, the easier the item would be (Hughes, 2003). The facility value includes the percentage of learners who answered each item correctly. ...
... To ensure consistency of the test measurement, the researcher took into consideration a number of points in designing the instrument: (a) the researcher provided explicit instructions on the test so that participants could understand clearly what was expected of them; (b) the researcher identified participants anonymously by their ID number, not name, to ensure more reliable and accurate answers, and (c) the researcher acknowledged that a test that has only a few items does not often represent participants' ability. The more items a test has, the more reliable it becomes (Hughes, 2003). Participants were asked to analyze four types of conversational implicature twice in the test. ...
Article
Full-text available
The present study investigates the effectiveness of consciousness-raising approach in interpreting conversational implicature using audiovisual input. The study was conducted on 126 Saudi female students at the Department of English Language and Literature at Al-Imam Muhammad Ibn Saud Islamic University in Riyadh. The experimental group was deductively and inductively exposed to 12 video extracts on four types of conversational implicature (i.e., irony, indirect criticism, manner, and relevance) taken from the American sitcom Friends. The control group had no treatment and was instructed from the coursebook. Both groups were given a pre-test and a post-test to complete in the form of multiple-choice discourse completion test. Findings revealed the effectiveness of consciousness-raising approach in facilitating foreign language learners’ interpretation of conversational implicature types. In addition, a significant improvement was recorded in the experimental groups’ performance in analyzing pragmalinguistic and sociopragmatic features, which indicates the effectiveness of focused attention directed to these features in context. However, the non-significant improvement recorded in the experimental group’s performance in analyzing metapragmatic features implies that focused attention is not necessary in interpreting all pragmatic features and that global attention is more effective in facilitating awareness of the relationship between language and context based on social factors (power and distance) between interlocutors.
... There will be a washback effect, which may be positive or negative. Hughes (1989) defined the washback-effect as the impact of language testing on teaching and learning. He also distinguishes between two types of it, positive and negative. ...
... (2) Hughes's "PPP" Washback Effect Model Hughes (1989) proposed a three-part model of washback effect (3P or PPP), that is, a "Participants-Process-Product" model, which represents the mechanism of washback effect. Participants refer to learners, teachers, test administrators, test material editors, syllabus makers, publishers, etc. Process refers to the behaviors of participants in the learning process, such as the improvement of teaching methods, the formulation of plans, the compilation of textbooks, etc. Product refers to the results obtained by the participants after taking the test, such as the knowledge, skills and quality of learning. ...
Article
Full-text available
This study uses the literature review method to explore the washback effect of HSK (Hanyu Shuiping Kaoshi) on students’ learning and teachers’ teaching. It is found that HSK has washback effect on both learning and teaching, and the positive effect on both is greater than the negative ones. In general, the washback effect on learning is more significant than that on teaching, and the motivation for taking the test, the perception of HSK, the difficulty of the test and the regulations of colleges and universities are the main factors that affect the intensity of the washback effect. Based on the above findings, this paper puts forward feasible suggestions for students, teachers and the test itself, in order to give full play to the positive washback effect of HSK on Chinese teaching and learning, and promote the development both of test and teaching. In addition, at the end of the paper, the limitations of this study and the prospect of future research are put forward.
... QAE2 and QAE3 asked the participants to express their thoughts on the impact of the English Baccalaureate Exam on their teaching practices in terms of WHAT and HOW they teach. 59 ...
... Therefore, continuous professional development is required to assist teachers in reflecting on their current teaching and assessment practices and developing positive attitudes and action plans so that they become more effective. Regardless should be designed to motivate teachers to participate in and learn how to prepare their students for the English Baccalaureate Exam without generating negative washback [59][60][61][62]. ...
... Both the NWSC and DST were therefore screened on the occurrence of these four linguistic processes. Criterion-related validity compares test results with those provided by an independent and dependable assessment of the test taker's capacity (Hughes, 2013). In this study, we used students' academic grades (GPA) and English academic reading as measured by their IELTS 3 scores as the criteria measures against which the NWSC and DST tasks were validated. ...
... The four scores indicate a weak to moderate positive correlation. For a better understanding regarding the level of agreement between these scores, the coefficient should be converted into a percentage (Hughes, 2013). After performing this calculation, the scores on the NWSC predicted 8 per cent of the variation in scores on students' GPA and 8 Fig. 2. ICC-curves for the sixteen DST items. ...
... Promoting the computer-based diagnostic assessment, Hughes (2003) described what digital diagnostic tests should look like and offered a possible solution to the numerous problems of designing diagnostic tests. Following Hughes (2003), Alderson (2005) listed a number of hypothetical features of diagnostic assessment, such as (a) providing explicit focus on the remedies in future performance, (b) running detailed analysis of the problematic responses to certain items or tasks, (c) being more likely discrete-point than integrative, (d) focusing more on 'low-level' language skills such as sounds discrimination or letter-sound correspondence, and (e) casting immediate diagnostic feedback. ...
... Promoting the computer-based diagnostic assessment, Hughes (2003) described what digital diagnostic tests should look like and offered a possible solution to the numerous problems of designing diagnostic tests. Following Hughes (2003), Alderson (2005) listed a number of hypothetical features of diagnostic assessment, such as (a) providing explicit focus on the remedies in future performance, (b) running detailed analysis of the problematic responses to certain items or tasks, (c) being more likely discrete-point than integrative, (d) focusing more on 'low-level' language skills such as sounds discrimination or letter-sound correspondence, and (e) casting immediate diagnostic feedback. Recently, in a seminal work, Jang and Wagner (2014) compared the L2 diagnostic assessment to the traditional test feedback, by their argument that traditional feedback is commonly product-oriented and relies upon the test scores or other summative information, whereas diagnostic feedback is more specific and learner-oriented which targets L2 learners' language processes, cognitive strengths and weaknesses. ...
Article
Full-text available
Grounded in Vygotsky's sociocultural theory of mind and the learner-centered approach to second/foreign language acquisition (SLA), this study investigated the extent to which the embedded differentiated instructions and diagnostic assessment, being mediated on Google Meet™ computer-mediated communication platform, would impact the improvement of mixed-ability English-as-a-Foreign-Language (EFL) learners' English words pronunciation and their degree of engagement in language learning. In a repeated-measures research design, an intact group of 66 EFL learners were partitioned into three tiers of higher, mid-and lower achievers to complete a virtual pretest of listening comprehension, followed by a series of parallel tiered performance tasks of English words pronunciation on a weekly basis. Their task outcomes were subsequently subjected to collective computer-mediated diagnostic assessment. After 10 sessions of intervention, the participants performed on an immediate virtual posttest of listening comprehension, and a post hoc interview. The results of mixed between-within subjects analysis of variance (ANOVA) indicated both the significant learning progress by the tiers, and the outperformance of the lower achievers on the tiered tasks. The statistical results of an analysis of covariance (ANCOVA) similarly reported significant improvement of the tiers' performance on the pretest-posttest summative assessment in this study. The inductive content analysis of the participants' responses to the structured interview elicited seven themes which were interpreted as the participants' strong approval of the usefulness of differentiated instructions, effectiveness of diagnostic assessment, and successful appeal of Google Meet platform.
... In summary, speaking is an oral form of language that includes the phonological system, the grammatical system, or both. Speaking also means to convey one's thoughts clearly, precisely, fluently, and accurately, which encompasses five key components: pronunciation, vocabulary, grammar, fluency, and comprehension (Arthur Hughes, 2010, as cited in Murti & Jabu, 2022. ...
Article
Full-text available
This study investigated the impact of the Free4Talk online language platform on enhancing university students' English speaking skills. Using a quantitative descriptive approach with a total sampling method of 50 participants, the research employed a speaking test to evaluate university students' English speaking skills. Data analysis using Partial Least Square (PLS) methodology revealed a significant positive effect of Free4Talk usage on students' speaking skills (t-statistic = 17.140, p < 0.05). The platform demonstrated substantial influence across all speaking components, with particularly strong effects on fluency (loading factor = 0.907) and grammar (loading factor = 0.903). The R² value of 0.637 indicates that 63.7% of the variation in students' speaking skills can be attributed to Free4Talk usage. The findings denoted that Free4Talk serves as an effective tool for enhancing university students' English speaking capabilities by providing authentic interaction opportunities and immediate feedback mechanisms. The descriptive analysis further illustrates patterns of language development across different speaking components, offering insights into how virtual language exchange platforms can be optimally integrated into EFL education.
... Different explanations can be given for the discrepancies between the tutors' assessment and the raters' scores. One explanation is that though the tutors are following criterion-referenced assessment as it is usually the case when using the CEFR scales (Fleckenstein et al. 2018;Hughes 2002), there is still the possibility that the tutors tended to compare the students within or between their classes (norm-referenced assessment) (Fleckenstein et al. 2018;Lok et al. 2016). However, the grades assigned by the tutors were the most discriminating (different average CEFR levels assigned to elementary, intermediate and advanced level students), whereas students and raters gave the same levels to elementary and intermediate students. ...
Article
Full-text available
This study explores the writing proficiency levels of Saudi Arabian medical track students after completing a one-year Preparatory Year Programme (PYP), as well as the applicability of the Common European Framework of Reference for Languages (CEFR) in assessing their proficiency. The standardized writing exam administered at the end of the PYP revealed a ceiling effect, with the majority of students achieving high scores, despite the fact that the PYP teaches English at three different levels (beginner, intermediate, advanced). To obtain a more nuanced understanding of students’ writing skills, alternative assessment methods were explored using selected CEFR scales, including self-assessment, tutor assessment, and assessment by raters recruited from the UK (experts in using CEFR scales). The study aimed to determine if these CEFR-based assessments can reliably differentiate among the three PYP levels, and if the CEFR scales are practical and applicable in this context. The findings show that the CEFR-based scores from all three assessor groups can reliably separate students according to their PYP level. The results highlight that the CEFR can serve as a valuable tool for understanding students' writing proficiency, even in non-European settings. This study encourages further exploration in the use of CEFR scales to assess proficiency levels.
... Moreover, the test can check the students' knowledge before starting a particular course. Hughes (1989) adds that diagnostic tests are supposed to spot the students' weak and strong points. Heaton (1990) compares such type of test with a diagnosis of a patient, and the teacher with a doctor who states the diagnosis. ...
Article
Full-text available
This study explores the significance and application of diagnostic tests in English language teaching. Diagnostic tests serve as essential tools for identifying students' strengths and weaknesses, enabling teachers to tailor instruction and enhance learning outcomes. The paper discusses theoretical perspectives on testing, emphasizing the role of diagnostic assessments in curriculum design and student motivation. It highlights the challenges associated with test construction, the impact of testing on learners' attitudes, and strategies for effective evaluation. The findings suggest that well-designed diagnostic tests can facilitate more targeted teaching interventions, improving students' language proficiency and overall learning experience.
... A central aspect of formative assessment in communicative grammar instruction is its multidimensional approach, incorporating self-assessment, peer assessment, and teacher feedback (Hughes, 2003). Selfassessment fosters learner autonomy, encouraging students to take responsibility for their own learning by evaluating their strengths and weaknesses (Cauley & McMillan, 2010). ...
Conference Paper
Full-text available
Foreign language teaching employs diverse methods and approaches that have evolved significantly over time. In response to critiques of earlier methodologies, the communicative approach emerged, emphasizing effective, sustainable, and practical language instruction. The communicative approach, along with its methodological and didactic principles, has reshaped teaching and learning practices, particularly in the assessment and evaluation of language skills. This study aims to explore the perspectives of German teacher candidates regarding the application of formative assessment techniques in communicative grammar instruction. The central research question guiding this inquiry is: "What are the perceptions of German teacher candidates on the integration of formative assessment techniques in communicative grammar lessons?" The study employs a qualitative research design and adopts a descriptive analysis framework. The purposive sample consists of 14 first-year German teacher candidates enrolled in the German Teaching Department at Trakya University. Over the course of four weeks, communicative grammar lessons were conducted, followed by data collection through semi-structured focus group interviews. The qualitative data were transcribed and systematically analyzed using the MAXQDA software, adhering to rigorous content analysis protocols. The findings offer valuable insights into German teacher candidates’ perceptions, emphasizing the role of assessment techniques in practice-oriented teacher training. The results underline the significance of these techniques in further developing communicative teaching methods, particularly in fostering practical and student-centered grammar instruction. Future research could build on these findings by involving larger and more diverse samples or by conducting interdisciplinary comparative studies.
... The following can be seen in tables 1 and 2 below. Poor 0 -40 Nota: Brown (2004) and Hughes (2013) ...
Article
Full-text available
Introduction: This study explored the effects of the Gallery Walks technique on students' academic performance and self-confidence, considering gender differences. Objective: To assess the impact of a project-based strategy integrated with physical exercise through the Gallery Walks technique on students' academic performance and self-confidence. This study accounted for gender-based variations in its analysis. Methodology: The research employed a quasi-experimental methodology, incorporating a pre-test and post-test framework. The study encompassed 40 second-semester students from Universitas PGRI Palembang. Participants were randomly assigned to two groups: an experimental group and a control group, each comprising 20 students (10 males and 10 females) to ensure equitable gender representation. Results: The results showed that the experimental group achieved significant improvements in academic performance (+9.10 points, Sig. = 0.000) and self-confidence (+23.85 points, Sig. = 0.000), while the control group showed little improvement. Female students outperformed their male counterparts academically, while male students displayed greater self-confidence. Discussion: The Gallery Walks method markedly enhanced students' academic performance and self-assurance, with variations in impact according to gender. Conclusions: This study advocates for the broader implementation of this learning technique, emphasizing the importance of addressing gender disparities to enhance student learning results.
... Their insights, however, were largely overlooked. It was not until the 1980s that washback was recognized as a vital but complex educational phenomenon (Hughes, 1988). In their seminal work entitled "Does washback exist?", ...
Article
Full-text available
The present research employed bibliometric analyses to systematically examine empirical washback studies published in applied linguistics journals over the past three decades (1993 to 2023). The research identified the distribution of washback scholarships globally and in Asia, the key research themes, and their evolution over time. The primary studies were retrieved from three established databases: Web of Science, Scopus, and ERIC and supplemented by tracking the reference lists of recent publications. Only journal articles of empirical studies written in English were included in the final collection (N = 243). The analysis focused on the distribution of research contexts, most productive journals and authors, prominent research themes, and their evolution over time. The 243 washback studies published in 149 journals were conducted in 40 educational contexts worldwide by 386 researchers. They showed pronounced distribution characteristics, with Asian countries and regions leading this line of research and also exhibiting differences from the research outside of Asia in research foci and methods. The topmost productive and impactful journals, authors, and references of washback research of the past 32 years, alongside the evolving basic, niche, and motor research themes, provide valuable information to understand where we have been and should go.
... Different explanations can be given for the discrepancies between the tutors' assessment and the raters' scores. One explanation is that though the tutors are following criterion-referenced assessment as it is usually the case when using the CEFR scales (Fleckenstein et al. 2018;Hughes 2002), there is still the possibility that the tutors tended to compare the students within or between their classes (norm-referenced assessment) (Fleckenstein et al. 2018;Lok et al. 2016). However, the grades assigned by the tutors were the most discriminating (different average CEFR levels assigned to elementary, intermediate and advanced level students), whereas students and raters gave the same levels to elementary and intermediate students. ...
Article
Full-text available
This study explores the writing proficiency levels of Saudi Arabian medical track students after completing a one-year Preparatory Year Programme (PYP), as well as the applicability of the Common European Framework of Reference for Languages (CEFR) in assessing their proficiency. The standardized writing exam administered at the end of the PYP revealed a ceiling effect, with the majority of students achieving high scores, despite the fact that the PYP teaches English at three different levels (beginner, intermediate, advanced). To obtain a more nuanced understanding of students’ writing skills, alternative assessment methods were explored using selected CEFR scales, including self-assessment, tutor assessment, and assessment by raters recruited from the UK (experts in using CEFR scales). The study aimed to determine if these CEFR-based assessments can reliably differentiate among the three PYP levels, and if the CEFR scales are practical and applicable in this context. The findings show that the CEFR-based scores from all three assessor groups can reliably separate students according to their PYP level. The results highlight that the CEFR can serve as a valuable tool for understanding students' writing proficiency, even in non-European settings. This study encourages further exploration in the use of CEFR scales to assess proficiency levels.
... Menurut Hedge (2000) kelancaran bererti penutur dapat bertutur dengan jelas, menggunakan intonasi yang sesuai dan berupaya menghubungkaitkan idea yang dituturkan secara kohesif dan koheren. Hughes (2003) mengatakan bahawa kelancaran ialah kebolehan penutur untuk bertutur tentang sesuatu perkara tanpa tersekat-sekat. Menurut Leong dan Ahmadi (2017), kemahiran bertutur berupaya untuk mengintegrasi kemahiran bahasa yang lain. ...
Article
Kertas ini bertujuan untuk mengemukakan proses pembangunan dan pelaksanaan Soal Selidik Kemahiran Interaksi Pertuturan Bahasa Melayu (SSKIBM) berdasarkan Model Penilaian CIPP. Kajian ini hanya fokus pada tiga aspek sahaja, iaitu penilaian konteks, penilaian input dan penilaian proses. Metodologi yang digunakan adalah dengan melakukan analisis dokumen beberapa soal selidik yang dibangunkan oleh beberapa orang penyelidik. Prosedur yang dilakukan adalah dengan membandingkan setiap ciri atau elemen yang perlu ada dalam soal selidik bagi memastikan bahawa soal selidik yang dibangunkan bermutu. Seterusnya SSKIPBM dibangunkan dengan merujuk kepada ketiga-tiga penilaian konteks, input dan proses dalam Model Penilaian CIPP. Dapatan kajian menunjukkan bahawa soal selidik guru yang dibangunkan berdasarkan Model Penilaian CIPP bagi aspek konteks mengandungi 8 item, aspek input mengandungi 15 item dan aspek proses mengandungi 10 item dan dua soalan terbuka. Manakala soal selidik bagi pelajar pula mengandungi enam item bagi aspek konteks, lapan item bagi aspek input dan 11 item bagi aspek proses serta dua soalan terbuka. Kesimpulannya, SSKIPBM mengemukakan soal selidik berkaitan kemahiran interaksi pertuturan dengan komprehensif. Implikasinya, SSKIPBM boleh digunakan oleh guru-guru atau penyelidik untuk mendapatkan maklumat berkaitan keberkesanan pelaksanaan kemahiran interaksi pertuturan yang dilaksanakan.
... Araştırmanın deneysel çalışmaları için "DİLKOB Testi" nicel veri toplama aracı olarak geliştirilmiş ve kullanılmıştır. Başarı testleri bireyin ders ya da programdan ne düzeyde öğrenme hedefine ulaştığını ölçmek amacıyla dersin veya programın sonunda uygulanan testtir (Hughes, 2003;Cizek, 2010). Araştırmada dil ve konuşma güçlüğü olan öğrencilere sahip öğretmenlerin bilgi düzeylerini ölçmek amacıyla başarı testi geliştirilmiştir. ...
Thesis
Full-text available
Bu araştırmanın amacı, dil ve konuşma güçlüğüne sahip öğrencilerle çalışan öğretmenlere yönelik geliştirilen "Dil ve Konuşma Güçlüğü Öğretmen Eğitim Programı'nın (DİLKEP)" öğretmenlerin dil ve konuşma güçlüğü alanı yeterlikleri üzerine etkililiğini değerlendirmektir. Araştırmada karma yöntemlerden "keşifsel sıralı karma yöntem" kullanılmıştır. Üç aşamada gerçekleştirilen bu araştırmada ilk aşama olarak öğretmenlerin dil ve konuşma güçlüğüne sahip öğrencilerin eğitimi ile ilgili ihtiyaçlarının belirlenmesi amaçlanmıştır. Bu aşamada yarı yapılandırılmış görüşme tekniği kullanılarak dil ve konuşma güçlüğüne sahip öğrencileri olan 25 öğretmenin gereksinimlerine yönelik görüşleri alınmıştır. Bu görüşmelerden elde edilen veriler betimsel analiz tekniği ile analiz edilmiştir. Araştırmanın ikinci aşamasında ise "Dil ve Konuşma Güçlüğü Öğretmen Başarı Testi (DİLKOB)" ve "DİLKEP Eğitim Programı" geliştirilmiştir. Araştırmanın bu aşamasında "Ön test ve son test kontrol gruplu deneysel araştırma modeli" kullanılmıştır. İstanbul ilinde dil ve konuşma güçlüğüne sahip öğrenciler ile çalışan 40 öğretmen deney grubunda, 40 öğretmen ise kontrol grubunda olmak üzere araştırmaya katılım için gönüllü olarak yer almıştır. Deney grubuna DİLKEP Eğitim Programı uygulanmadan önce ve sonra, kontrol grubuna ise eğitim programı verilmeden ön test ve son testte DİLKOB Testi kullanılarak nicel veriler toplanmıştır. Toplanan veriler bağımlı ve bağımsız örneklem t testi kullanılarak analiz edilmiştir. Araştırmanın üçüncü aşamasında ise yarı yapılandırılmış görüşme tekniği kullanılarak deney grubundaki 20 öğretmenin eğitim programı hakkında görüşleri alınarak betimsel analiz tekniği ile analiz edilmiştir. Araştırmanın sonucunda deney grubuna uygulanan DİLKOB Testi, ön test ve son test puanları arasında istatistiksel olarak anlamlı bir fark olduğu, kontrol grubu ile DİLKEP Eğitim Programı'na katılan deney grubu öğretmenlerinin başarı testi sonuçlarında deney grubu lehine bir sonuca ulaşılmıştır. Genel olarak DİLKEP Eğitim Programı'na katılan öğretmenlerin dil ve konuşma güçlüğü alan bilgisi ve uygulama yeterliklerini önemli ölçüde arttırdığı söylenebilir. DİLKEP Eğitim Programı uygulandıktan sonra deney grubundaki 20 öğretmen ile yapılan görüşmede DİLKEP Eğitim Programı sonucunda dil ve konuşma güçlüğü alanında kendilerini daha yeterli gördükleri, eğitim programının bilgi, beceri ve tutumları üzerinde etkili olduğu sonucuna ulaşılmıştır. Sonuç olarak DİLKEP Eğitim Programı'nın dil ve konuşma güçlüğü alanı yeterliklerini arttırdığı ve bu alanda bilgi, beceri ve tutum kazandırma konularında etkili bir öğretmen eğitim programı olduğu sonucuna ulaşılmıştır. The purpose of this study is to evaluate the effectiveness of the "Speech and Language Difficulties Teacher Training Program (DİLKEP)" developed for teachers working with students who have speech and language difficulties, on the teachers' competencies in this field. The exploratory sequential mixed methods design was employed in this research, which was conducted in three stages. In the first stage, to determine the needs of teachers regarding the education of students with speech and language difficulties, semi-structured interviews were conducted with 25 teachers who had such students. The data obtained were analyzed using content analysis technique. In the second stage, the "Speech and Language Difficulties Teacher Achievement Test (DİLKOB)" and the DİLKEP Training Program were developed and implemented. In this phase, a "pretest-posttest control group experimental research model" was used. In the study, 40 teachers from Istanbul who worked with students having speech and language difficulties participated voluntarily in the experimental group, and 40 teachers in the control group. Quantitative data were obtained using the DİLKOB Test for both the experimental group (before and after the DİLKEP Training Program) and the control group (without the training program). The data obtained were analyzed using dependent and independent sample t-tests. In the final stage of the study, semi-structured interviews were conducted with 20 teachers in the experimental group to gather their views on the DİLKEP Training Program, and the data were analyzed using content analysis technique. The results indicated a statistically significant difference between the pretest and posttest scores of the DİLKOB Test applied to the experimental group. The findings revealed that teachers in the experimental group, who participated in the DİLKEP Training Program, showed significantly higher overall achievement and sub-dimension scores compared to the control group. In other words, the DİLKEP Training Program effectively enhanced the knowledge and practical competencies of the teachers in the field of speech and language difficulties. Additionally, interviews conducted with 20 teachers in the experimental group after the program indicated that they felt more competent in the field of speech and language difficulties and that the training program positively improved their knowledge, skills, and attitudes. The last finaling indicated that the teacher training program increased the teachers competencies working with students speech and language difficulties and was effective in imparting knowledge, skills, and attitudes to the teachers in this area.
... A key concept in understanding how SBA influences instructional practices and student learning is the washback phenomenon, which refers to the effects of assessments on teaching, learning, and curriculum implementation (Alderson & Wall, 1993;Cheng, 1998;Hughes, 1989). Washback is not a linear process; its impact depends on various factors at both macro and micro levels of an education system. ...
Article
Full-text available
This study explores the contextual factors mediating the washback effects of Malaysia's learning-oriented English language assessment reform, implemented at the lower-secondary level. The reform aims to balance formative and summative assessments to foster meaningful learning, reduce test-oriented teaching, and enhance critical thinking skills. Employing a mixed-methods approach, the study draws on data from document analysis, semi-structured interviews, and a survey. Participants of this study were 2 policymakers, 6 school administrators, 9 teachers from three secondary schools in Penang, a state in the northern region of Malaysia, and 124 teachers from four states in the northern region of Malaysia. The findings highlight both opportunities and challenges in implementing the assessment reform. While intended washback effects include enhanced communicative teaching, the integration of higher-order thinking skills (HOTS), and formative assessment practices, systemic contextual factors at the macro and micro levels impede these outcomes. Key challenges include inadequate teacher training, resource constraints, and societal emphasis on grades, which reinforce central summative assessment over the School-Based Assessment (SBA) component. Notably, the study identifies the paradox of central summative assessments undermining formative assessment goals despite their complementary role. The results underscore the need for targeted professional development, better stakeholder engagement, and policy recalibration to align assessment practices with reform objectives. These insights contribute to the understanding of washback from SBA, emphasizing the contextual dynamics of educational reforms.
... The influence of high-stakes tests on teaching and learning, a phenomenon known as "washback" or "backwash" (Hughes, 1989), has been extensively studied in applied linguistics, particularly following Alderson and Wall's (1993) work, which shifted the field from theoretical exploration to empirical investigation. Since then, research has illuminated the complex nature of washback, revealing its variability in intensity (Cheng, 1997), value (Watanabe, 2004), intentionality (McNamara, 1996), scope, and sustainability (Huang, 2011). ...
Article
Full-text available
Although it has been well-noted that high-stakes tests can intensify washback effects and contextual factors can mediate the manifestation of these effects, how these effects are influenced by policy-driven variables remains underexplored. This study adopted a hybrid design combining longitudinal tracking and cross-sectional approaches, examining the changes and sustainability of washback effects within the context of the twice-yearly National Matriculation English Test (hereafter NMET) reform. A total of 582 senior English teachers from one of the pilot cities for the NMET reform were surveyed and interviewed in 2015, 2016, and 2020. The findings revealed that teachers’ beliefs about exam-oriented teaching strategies have shifted from a supportive to a more cautious stance and that teachers’ attitudes towards test policy have evolved from skepticism to cautious acceptance. The sustainability of these effects was shaped by a combination of factors, including test design, institutional constraints, student autonomy, and societal pressures. The findings elucidate the influence of test policy on washback and provide important insights for researchers and teachers on enhancing the positive impact of high-stakes tests.
... This grading sheet is designed using the General Rating Criteria of the Center for Teaching & Learning Service at the University of Minnesota (Alderson, Clapham, & Wall, 1995;Bachman, 1990;Buck, 1990;Dandonoli & Henning, 1990;Hughes, 1989;Lumley & McNamara, 1995). It considers all elements of the five strategies, including international accents and word choices in the fourth point. ...
Conference Paper
Full-text available
Based on the historical influence of the British Empire's extensive colonization, as well as various factors such as the economic power of the United States, international diplomacy, and the use of technology and the Internet in electronic communication, English has fostered the development of a functional perception of English as a global language (EIL) (Li et al., 2025; Zhao et al., 2022). Consequently, new terms such as ESL (English as a second language), EIAL (English as an international auxiliary language), and EWL (English as a world language) have been introduced and achieved global recognition. As a result, this research emphasizes that students should develop pragmatic concepts to communicate effectively in international contexts. People from different nations use an "international language" to interact. University students should be encouraged to learn English from a broader perspective, utilizing appropriate materials that include source culture, target culture, international culture, and pragmatic concepts, to cultivate their ability to manage international situations. Overall, this study recommends employing the right EIL materials and interactive pedagogical and sensible strategies for non-English major learners in Taiwan, teachers can observe significant advancements in students' international interactive competencies as they are trained with practical examples highlighting cultural differences. This article proposes that learners' familiarity and learning paradigms evolve into more effective ones encompassing interactive skills within Anglo-English communities and various global debating scenarios (Hasan, 2023) in non-native speaking countries. The primary goal of this project is to assess the improvement in students' communication skills after applying five interactive strategies to an experimental group. This class focuses on pragmatics and will be compared with a control group where interactive approaches are not emphasized and are practiced openly. The target participants for the experimental and control groups are 60 freshmen who are non-English majors in two classes. The students in the experimental group are expected to demonstrate more significant improvements in communication competence after being guided by interactive strategies developed for this research, which appropriately highlights the importance of interaction. This project underscores the vital importance of EFL learners' proficiency in listening and speaking, as applied in real-world contexts. To enhance students' learning and support their progress in effective communication and real-life English use, this study suggests that international strategies should be provided to college students with fewer preparatory exam requirements than high school students. The interactive strategies designed in this research are categorized into pedagogical and pragmatic models. They are hypothesized to be practical, capable, and realistic for university students in Taiwan since they encompass both traditional and innovative pedagogical and pragmatic skills for fluent and appropriate communication. These strategies activate the knowledge that English students learn in school, making them more familiar with real-world applications of English as they facilitate message transmission, international expression, and multicultural interaction. This study advocates for English to be recognized as an official language worldwide, proposing the evaluation of inter-rater reliability by having two raters-one a native speaker and the other a non-native speaker-assess from an international perspective to ensure fairness.
... Secara esensial, unsur hiburan memberikan tiga manfaat penting bagi kehidupan manusia diantaranya alat menjaring jumlah wisatawan secara masif, memenuhi skala kesenangan, dan memberikan pengalaman berharga (Adeboye, 2012). Wisata entertainment sebagai kegiatan wisata bernuansa ringan, menawarkan kesenangan, tidak menuntut wisatawan untuk melakukan suatu kegiatan tertentu, dan tidak memerlukan apresiasi yang tinggi terhadap produk hiburan yang ditawarkan (Hughes, 2000). Wisata entertainment di Indonesia yang menarik kunjungan wisatawan adalah live performance Bali Langen Kecang and Barong Dance Show yang terletak di Daya Tarik Wisata (DTW) Kawasan Pulau Peninsula, The Nusa Dua. ...
Article
Bali Langen Kecak and Barong Dance Show The Nusa Dua performances are presented to revive the existence of Kecak and Barong Dance Show arts, especially in the Nusa Dua area. This study aims to determine the perception of visitors to the Bali Langen Kecak and Barong Dance Show performances at Taksu Art Stage, Peninsula Island The Nusa Dua. This study uses a six-dimensional method according to (Natoradjo, 2011) which includes marketing materials, transportation, access, and welcoming guests, nuances, atmosphere and decoration, dishes or culinary, entertainment or activities and facilities. The results of this study indicate that the perception of visitors to the Bali Langen Kecak and Barong Dance Show The Nusa Dua performances is good or satisfactory with an average of 3.79.
... In teaching practice, teachers still mostly pay more attention to the results, but less attention to the students' responses to the test and the process of independent learning (Hughes, 2003). The evaluation is overall uncomprehensive and unreasonable and lacks specific guidance. ...
Article
Full-text available
Teaching and testing have been two crucial elements in education. They are complementary and mutually influential. As a part of senior high school English teaching, English language testing has been important in gauging students’ competence. Scientific and reliable classroom teaching tests and assessments can effectively help teachers achieve objectives, change pedagogical approaches, and continuously innovate their methods. Through the use of questionnaires and data gathered from 40 English teachers from senior high school, this paper found that there are still many problems in the senior high school English classroom tests, which are mainly manifested by arbitrary test planning, lack of logic and stratification in questions proposal as well as insufficient research on the washback. The reasons for this situation are mainly due to factors like teachers’ scant understanding of the educational function of classroom tests, the irrationality of the existing evaluation mechanism and inadequate knowledge of students. practical suggestions are made in this regard in order to maximize the value of English classroom assessments zzand facilitate a productive English teaching and learning environment in the future.
... Washback refers to the influence of language assessment on language learning, teaching, and instruction, which can be either beneficial or detrimental; positive washback promotes effective teaching and learning, while negative washback leads to adverse outcomes such as rote memorization. Hughes (2010) states, "the effect of testing and learning is known as washback, and can be harmful or beneficial" (p.1). Theoretical frameworks for multimodal assessment (Kress, 2010) emphasize various pedagogical modes of communication, representative of language in use today. ...
Article
This paper looks into the shifts from traditional to innovative and inclusive language testing methods, focusing on developments such as alternative assessment, AI integration, and computer-adaptive testing. This study employs a desk-based approach to the literature which synthesizes relevant theories in order to highlight the increased focus on fairness, inclusivity and authentic assessment. The key findings highlight that dynamic and performance-based assessments address the different learning needs of the learners; promote instrumentality towards language use and incorporate socio-political and ethical dimensions in test construction. The paper also underlines the transformative power of technology to improve access and efficiency considering the potential challenges and inequalities associated with its use. By integrating traditional and modern practices, this study contributes to both the theoretical discourse and practical advancements in language assessment that uphold equitable, ethical and effective testing methods consistent with twenty-first century educational standards and learner diversity.
... Additionally, teaching experience is also believed to be one key teacher factor generating washback (Alderson & Wall, 1993;Cheng, 1999;Hughes, 2003;Onaiba, 2013;Shohamy, 1993). ...
Article
Washback, or the effects of tests on learning and teaching, is one of the important test qualities (Bachman & Palmer, 1996). There have been a few empirical studies on the washback of different tests on different stakeholders and their actions under the test use such as those by Brown (1997), Cheng (1997), McKinley & Thompson (2018), Nguyen (2017), Pizarro (2010), Shih (2009), Xu & Liu (2018), to name but a few. The results of such studies have shown that the washback of different tests varies in terms of mechanism, direction, and intensity of teaching and learning. This study explores the washback of the high-stakes English tests in the Vietnamese National High School Graduation Exam on the teaching of EFL high school teachers. Six teachers, who were teaching English to students at grade 12 in the research site of Buon Ma Thuot City (Dak Lak Province, Vietnam) were purposefully selected for the study. As a case study, the research employed was a two-phase explanatory design with the use of a questionnaire and follow-up interviews. The findings reveal that various aspects of teaching, such as the teachers’ choices of textbook coverage, time allotment for teaching content, provision of extracurricular content, in-class assessment tasks, their choices of teaching methods, application of new teaching techniques, choices of classroom organization and language for instructions were affected by the high—stakes English tests. In addition, the study discloses the unique teacher factors of the participants under the influence of the tests.
... Each level is described in terms of specific attributes and corresponding scores. Please refer to Table 4. Hughes' Oral Speaking Assessment: Fluency Section (1989, 2003; Compiled by Tu). The effectiveness of pedagogical interventions was evaluated using quantitative methods. ...
Article
Full-text available
The purpose of this research was to find out the empirical evidence of the student's effectiveness in writing narrative text by using WhatsApp in the Blended Learning model on students' writing abilities. This research used a Pre-Experimental method. The sample in this research was 34 students (XI MIPA 1) taken by cluster sampling. The pre-experimental class has given the treatments using WhatsApp in Blended Learning, Pre-test was given before giving treatments. The result showed as follows: first, the pre-test score showed that the average score was 44.73. After being given treatments, a post-test was given. The post-test result showed that the average score was 84.35. Second, the sample score of the independent T-test shows the significant value (2-tailed) is 5.41 1.693. In other words, (Ho) was rejected and (Ha) was accepted. In short, it can be announced that students who write narrative text using WhatsApp in the Blended Learning model are effective. So, it can be concluded that WhatsApp in Blended Learning was effective to the eleventh-grade students at SMA N 5 Model Lubuklinggau in academic year 2022/2023.
Article
This project introduces an innovative, AI-driven platform tailored to streamline and enhance the process of question paper generation for educators at various academic levels. Utilizing cutting-edge technologies such as Natural Language Processing (NLP), Knowledge-Augmented Generation (KAG), and Large Language Models (LLMs), the system is capable of understanding and extracting key concepts from a variety of educational resources. These include user-uploaded materials such as textbooks, existing question banks, lecture notes, and previous year exam papers. Educators are provided with a user-friendly interface where they can define specific parameters, including subject area, topic coverage, question complexity (difficulty level), and the desired number of questions. Based on these inputs, the platform intelligently generates a diverse and balanced set of questions, ensuring coverage across cognitive levels—such as knowledge recall, application, and analysis. Additionally, each question is accompanied by a corresponding answer key, which helps educators expedite the evaluation process. Keywords - Automated Question Paper Generation, NLP-Based Assessment Design Tool, AI-Powered Educational Assessment System.
Article
This article analyzes the importance of data collection and measurement of progress in the English education curriculum. Accurate data collection and effective measurement methods play an important role in evaluating the effectiveness of the English education curriculum. Some of the relevant data collection methods include tests and examinations, use of portfolios, and class observations. Meanwhile, progress measurement methods include grades, performance scales, formative tests, project assessments, and self-evaluations. By using these methods, teachers can get comprehensive information about students' progress in English, which can then be used to design appropriate teaching strategies. Accurate data collection and effective measurement of progress will provide significant benefits in improving students' English learning
Article
Full-text available
As the field of heritage language acquisition expands, there is a need for proficiency to compare speakers across groups and studies. Elicited imitation tasks (EITs) are efficient cost-effective tasks with a long tradition in proficiency assessment of second language (L2) learners, first language children, and adults. However, little research has investigated their use with heritage speakers (HSs), despite their oral nature, which makes them appropriate for speakers with variable literacy skills. This study is a partial replication of Solon, Park, Dehghan-Chaleshtori, Carver & Long (2022), who administered an EIT originally developed for advanced L2 learners on a group of HSs. In this study, we administered the same EIT with minor modifications to 70 HSs and 132 L2 learners of Spanish with different levels of proficiency and ran a Rasch analysis to evaluate the functioning of the task with the two groups. To obtain concurrent validity evidence, scores on the EIT were compared with participants’ performance in an oral narration; evaluated for complexity, accuracy, and fluency (CAF); and compared with a standardized oral proficiency test, the Versant Spanish Test. Results of Rasch analyses showed that the EIT was effective at distinguishing different levels of ability for both groups, and analyses showed moderate to strong correlations between CAF measures and the EIT and very strong correlations between the EIT and the Versant Spanish Test. These results provide evidence that the EIT is an efficient and adequate proficiency test for HSs and L2 learners of Spanish; its use in research settings is recommended.
Article
Full-text available
Reading is useful for language acquisition. It provides that students have to comprehend what they read.Comprehending a reading text involves constructing and extracting meaning. Thus reading comprehension (henceforth, RC) depends on several cognitive processes. Critical thinking (henceforth, CT) is a higher order thinking skill which includes a purposeful, self-regulatory judgment which ends in interpretation, analysis and evaluation. Critical thinking may help university students to improve their RC. The present study aims at finding: The average level of the students' CT. The average level of the students' achievement in RC. Whether there is any significant difference between the students’ achievement at the recognition level and that at the production level of RC. The possible correlation between students’ level of CT, on one hand and their achievement in RC, on the other hand. In order to achieve the aims of this study, the following questions are raised: Is the average level of the EFL students within the theoretical mean scores of CT? Is the average level of the EFL students’ achievement is within the theoretical mean scores of achievement, in RC? Is there any significant difference between students’ achievement at the recognition and production levels of RC? 4.Is there any significant correlation between students ‘level at CT and their level of achievement in RC ? A sample of a hundred EFL second year students has been selected from the College of Education for Humanities, University of Tikrit and involved in the current study . The involved sample represents 42.19% of its original population .The study is conducted during the first semester of the academic year 2022-2023. The data gathered by using a questionnaire and a diagnostic test to assess students‘ CT and RC . Results of the study indicate that the EFL second year university students ’ average level in each of CT and RC is above the theoretical level of CT and RC.Results also show that there is a positive correlation between the students’ level of CT and their achievement in RC .
Article
This research explores the correlation between creativity and writing report texts among EFL university students at Tikrit University. A quantitative research design is adopted to measure the degree of association between creativity and writing report texts. Using a correlation analysis. A sample of 100 third year college students is randomly selected from the population. Data collection involves two diagnostic tests the first one is to asses creativity and the second diagnostic test is to asses writing report text. Tests are scored based on predefined scoring schemes. The results of the correlation analysis utilizing person correlation coefficient, reveal that there is a correlation coefficient between Iraqi EFL University students' creativity and writing report texts.
Article
Full-text available
Textual competence is important for an efficient writer because it concerned with coherence and rhetorical organization. The present research aims to identify, Iraqi University students' textual competence in writing an essay at Tikrit university ; Iraqi University students' knowledge about cohesion and rhetorical organization in writing an essay at Tikrit university and finally the difficulty in using textual competence among Iraq university students according to gender variable at Tikrit university. The sample consists of 160 Iraqi university students at forth stage in the department of English, College of Education for humanities at Tikrit University during the academic year 2023-2024. An achievement test is used to gather the data about students' textual competence in writing an essay. Results show that university students have high level textual competence than the theoretical mean score , also university students have a higher level in cohesion than the rhetorical organization finally, male students is better than female in their abilities in writing an essay.
Article
Full-text available
The current study addresses the crucial role of Critical Thinking Skills (CTSs) in today's decision-making and problem-solving landscape. It emphasizes the integration of CTSs within the context of English as a Foreign Language (EFL) instruction, with a particular focus on the situation in Iraq. While there is ample theoretical discourse on the significance of CTSs, their practical application in EFL classrooms, especially in Iraq, presents notable challenges, primarily because of the dearth of specialized teaching methodologies. The study's central objective is to evaluate the impact of incorporating Critical Thinking (CT) into EFL instruction, specifically in the analysis of literary texts. To achieve this, the research adopts a control-experimental pretest-posttest design and employs a CT test as an assessment tool. The findings of the study are noteworthy, demonstrating a significant improvement in the CTSs and textual interpretation abilities of students in the experimental group, especially when analysing poetry and short stories. This highlights the potential benefits of integrating CT into EFL instruction, enhancing students' analytical capabilities and their understanding of literary content.
Article
Full-text available
In the realm of globalization, English has progressively achieved the medium in each area of communication, both in nearby and worldwide settings. Accordingly, the interest for utilizing English viably is essential in each country. Teaching and learning English, apart from the local language, is hence essential for open purposes to adapt to the developing neighborhood, public and worldwide requests for English skills. The process involves by developing certain skills of communication. In communicative approaches there are four essential skills of acquiring foreign language like English i.e., Listening, Speaking, Reading and Writing. When we talk about learning English at any level of education whether it is essential, auxiliary, moderate or at the tertiary level, it requires showing the four skills of the language giving equivalent significance on every one of these skills. The developing requirements for utilizing English around the world is the result of the job of English as the world's global language have offered need to finding more powerful approaches to show these essential language skills. Language is an arranged system of communication. Oral language is intended to be tuned into and to sound conversational, which implies that decision must be easier, increasingly causal repetitive. The basic need of communication is language Students find easier to communicate with others and also with the teachers. Learners find effective tools as technology. Through technology students find easier and simplest ways to learn a language. The universal technology is the language, it helps to manipulate the ideas, concepts and analyze the language better. Technology based learning can be adopted through electronic technology including audio and video conferencing chat rooms, CD-ROM. This paper deals with the different methods of teaching English language to students.
Article
Full-text available
This research was conducted in order to improve students’ writing descriptive text by using scaffolding technique to the eighth zgrade students of SMP Negeri 3 Onan Ganjang. In conducting this research, the writer used Classroom Action Research (CAR). This research showed that teaching writing through Scaffolding Technique to the eighth-grade students of SMP Negeri 3 Onan Ganjang could improve their writing descriptive text. The result of this research showed that the students’ responses after being taught by using Scaffolding Technique are very good. The result of the test showed that the students’ mean score of pre-tests is 46,87 with 4,16% students got score >68, in formative test, the students’ mean score is 63,04 with 41,66% students got score >68, and in post-test, the students’ mean score is 81,75 with 70,83% got score >68. The research findingsscowed that using Scaffolding Technique could improve the students’ writing descriptive text. It is advisable that English teachers could apply Scaffolding Technique in order to improve their students’ writing. Keywords : Scaffolding Technique, Writing, Descriptive Text
Thesis
Full-text available
The current study aims at investigating a number of variables related to the vocabulary development of first intermediate female Saudi learners. An analysis of the coverage and repetition of the two thousand high frequency words in the first intermediate girls' textbook, entitled Say it in English, was carried out using the RANGE and FREQUENCY software (Heatly, Nation & Coxeahd, 2002). The textbook's coverage was then compared to participants' performance in two vocabulary size tests, namely the 1000 vocabulary size test, and the 2000 vocabulary level test (Nation, 2001b). Furthermore, an achievement test, constructed by the researcher, was implemented to explore participants' vocabulary performance in two parts: the vocabulary sections and the rest of the book. Finally, information about participant's most and least commonly used vocabulary learning strategies were collected using a questionnaire adapted from Schmitt's (1997) taxonomy of vocabulary learning strategies. Overall, two hundred and eighty six female students participated in the present study. The RANGE analysis indicated that the coverage of the 1 st and 2 nd thousand high frequency words in the textbook are 5.6% and 25.8%, respectively. As to participants vocabulary size, it has been found that the average participant knows at least 14.09% of the 1 st thousand most frequent words and 4.84% of the 2 nd thousand high frequency words. The frequency results showed that the fifty highest frequent words are function words. Furthermore, most of the words occur less than ten times in the textbook. On the other hand, the achievement test results showed that participants know at least 40% of the words in the textbook. In addition, participants' performance in the items presented in the vocabulary sections of the textbook surpassed their achievement in the vocabulary items found in the rest of the book. In relation to vocabulary learning strategies, the most commonly used strategies indicate that learners are interested first in gaining surface knowledge of words. On the other hand, the least commonly used strategies were neglected for requiring deep processing of information. A number of applications can be drawn from the results of the present study. To start with, textbook writers need to plan the sequencing, presentation and repetition of high frequency words bearing in mind that learners might end up learning less than 40% of the vocabulary items presented to them. Teachers, on the other hand, are urged to provide rich instruction of high frequency words and to train learners to use several vocabulary learning strategies. II Acknowledgments
Article
The current paper has focused on evaluating Saudi junior students’ writing performance at Albaha University. In this direction, the study aims to assess the academic key performance indicators of Saudi male students who study Writing (1) course in the first term of (2019) in the College of Science and Arts in Almandaq. About 24 students participated in this study. They were divided into two groups: a control group (n=12) and an experimental group (n=12). A pretest and posttest were carried out to collect the necessary data for both groups. To analyze the data, the study used percentages and the comparison of key performance indicators based on the Thorndike Approach for evaluating students' results and the natural distribution for their grades. Results revealed that while 24% of the control group participants failed to pass the posttest, all the experimental group respondents passed in the posttest after implementing the course through Blackboard. These results showed a statistically significant difference between the control group and the experimental group in favor of the experimental group. In light of these findings, the academic key performance indicated that the students' writing skills improved after implementing the treatment through the Blackboard Portal which helps in enhancing students’ writing skills and achieving the students’ learning outcomes in the College of Science and Arts in Almandaq and other Colleges at Albaha University.
Article
Speaking is one of the most challenging skills in studying English, raising concerns among many English teachers about how to help improve their students’ speaking performance. While many shy students prefer to work individually, others indicate their competence in group work. Therefore, this study investigates the effect of group discussion in authentic role-play on the speaking performance of English as a Foreign Language (EFL) young learners at a suburban primary school in the Mekong Delta, Vietnam. Participants in this study include 80 students: an experimental group (N=40) and a control group (N=40). During six weeks of the role-play session, experimental group students will work in groups, whereas the control group’s participants will prepare individually. Pre-tests and post-tests were employed to examine the students’ speaking performance, and interviews are conducted to compare students’ perceptions of the two role-play formats. The results from the speaking test show that the scores of the experimental group were higher than those of the control group. In the experimental group, five components are improved: pronunciation, content, vocabulary, grammar, and fluency. In contrast, only three components (pronunciation, content, and grammar) are enhanced in the control group. The findings from the interviews revealed that students of both group discussion and individual work engaged in role-play activities and substantially improved their language skills. However, they also faced some challenges in effectively using the two formats of role-play during speaking lessons.
Chapter
This chapter presents a subset of results from a study that explores source-based writing skills in pre- and in-service English language teachers from Antofagasta, Chile. This exploratory study was conducted during an advanced academic English course offered at a university in this city over two consecutive semesters in 2021 and 2022. The qualitative data from verbal protocols and interviews were analysed following a thematic analysis approach and the essays were analysed using a coding rubric covering different aspects of source use. The overall results revealed that while there is evidence of students’ enhanced awareness and skill with some formal aspects of direct quoting and paraphrasing, other aspects of source-based writing, such as understanding core information in the sources and effectively integrating that information into their essays, remained problematic for some students. Findings suggest that more extended and consistent scaffolded instruction may be needed to foster the development of more complex and sophisticated academic literacy skills.
Chapter
Full-text available
This paper addresses the current state of Cantonese proficiency assessments and introduces a new test aligned with the Common European Framework of Reference for Languages (CEFR). This new test fills gaps in existing evaluations by focusing on Cantonese oral proficiency for less advanced learners (CEFR levels A1, A2, and B1) without requiring prior knowledge of English, Written Chinese, or Mandarin. Its design also considers cultural neutrality and avoids assumptions about Hong Kong-specific knowledge, emphasizing oral communication without bias towards accent or precise pronunciation. Conducted via one-to-one video conferencing, the test features a variety of tasks to assess interactive communication skills. This test aims to meet the sociolinguistic needs of diverse Cantonese learners globally and to encourage further promotion and development of relevant learning materials.
ResearchGate has not been able to resolve any references for this publication.