Chapter

Valid for the Elites? The Trade-Off Between Test Fairness and Test Validity

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The relationship between validity and fairness has been heatedly debated in the literature (Kunnan, 2010). The orthodoxy is that test fairness should be subsumed under validity; that is, a valid test ensures fairness. Tracking the test retrofits of a high stakes national language test in Iran, known as Specialized English Test (SET) used to determine admission to tertiary English language programs, and collecting data on test takers’ language learning experiences, this article argues against the established view that more valid tests would necessarily promote fairness. Being an achievement measure based on secondary school English curriculum, the previous version of the SET was widely criticized for its construct underrepresentation (Farhady & Hedayati, 2009) and fizziness. In its current version, the SET is more construct representative, for it goes beyond the high school curriculum and covers more areas of communicative competence. Data collected from 173 undergraduate students of English translation and literature in three national universities across the country revealed that an overwhelming majority of students come from families with high socio-economic status, with poorer students represented only in low-tire university student population. This finding indicates that the improvement in validity has come with a cost in fairness and social mobility; hence, reproducing existing social order by denying underprivileged applicants access to quality tertiary education language programs. The paper further discusses issues of test validity and fairness and calls for a broader understanding of test consequences within a larger sociocultural perspective.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The data for this study came from participants who were attending test preparation programs offered in private institutes. There is evidence that shadow education in general and test preparation in particular are an advantage conferred to students who are socio-economically privileged (Buchmann, Condron, and Roscigno 2010;Razavipour 2018). Accordingly, caution should be exercised in extending the findings because it does not represent test takers of lower social class who do prepare for the Higher Education Admission Test but on a self-study basis. ...
Article
Of the many possible institutional and individual factors bearing on test preparation, one is how individuals go about choosing their achievement goals. Yet, the literature on the relationship between the two phenomena remains slim. The objectives of this study are twofold. First, it explores the range of test preparation practices exercised by test takers in preparing for the English module of the Higher Education Admission Test in Iran. Secondly, it investigates how individual goal orientations mediate test preparation. A goal orientations scale was translated, validated and administered to the participants, who were 357 test candidates, a convenience sample. The participants also completed a test preparation questionnaire with two underlying factors including desired and undesired test preparation practices. Descriptive statistics and paired samples t-tests revealed that preparation for the Higher Education Admission Test entailed a mix of both detrimental and beneficial practices, with the frequency of the former being significantly higher. It was also revealed that mastery goal orientations are associated with educationally defensible test preparation practices. Findings carry implications for testers, test preparation instructors and educational policy makers.
Chapter
Full-text available
This paper explores the relationship between testing and society from an untraditional angle, focusing on the effect of society on testing. Language testing in Norway, and more specifically the development and public reception of the national tests of English for Norwegian school children, is discussed as an illustration of this phenomenon.
Article
Full-text available
Background This study investigates the social impact of a policy requiring university graduates to pass an English proficiency test by examining the consequences of test use in the workplace in Taiwan. Methods Interviews were conducted with 19 business people in charge of recruiting potential employees in 17 industries across Taiwan. All these 19 employers hired the gratudes from a technological university in southern Taiwan. These interviews sought to discover the importance of English certification as an element of job hunting,the opinions of businesses regarding various certification tests, and their attitudes towards the exit requirement. Results and conclusionsFindings indicate that although these employers were favorably disposed towards this policy, only 13 % of them required English certificates as a hiring criterion. Another finding was that 53 % of employers regarded the certificates as evidence that applicants who possessed them were diligent and likely to be hard-working employees. These informants interpreted tests differently from testers, focusing on cultural notions of what personal qualities tests highlight rather than on language ability. Due to this and other factors, the impact of the test remained weak.
Article
Full-text available
Differential Item Functioning (DIF) exists when examinees of equal ability from different groups have different probabilities of successful performance in a certain item. This study examined gender differential item functioning across the PhD Entrance Exam of TEFL (PEET) in Iran, using both logistic regression (LR) and one-parameter item response theory (1-p IRT) models. The PEET is a national test consisting of a centralized written examination designed to provide information on the eligibility of PhD applicants of TEFL to enter PhD programs. The 2013 administration of this test provided score data for a sample of 999 Iranian PhD applicants consisting of 397 males and 602 females. First, the data were subjected to DIF analysis through logistic regression (LR) model. Then, to triangulate the findings, a 1-p IRT procedure was applied. The results indicated (1) more items flagged for DIF by LR than by 1-p IRT (2) DIF cancellation (the number of DIF items were equal for both males and females), as revealed through LR, (3) equal number of uniform and non-uniform DIF, as tracked via LR, and (4) female superiority in the test performance, as revealed via IRT analysis. Overall, the findings of the study indicated that PEET suffers from DIF. As such, test developers and policymakers (like NOET & MSRT) are recommended to take these findings into serious consideration and exercise care in fair test practice by dedicating effort to more unbiased test development and decision making.
Article
Full-text available
This study addressed a need to examine and improve current assessments of listening comprehension (LC) of university EFL learners. These assessments adopted a traditional approach where test-takers listened to an audio recording of a spoken interaction and then independently responded to a set of questions. This static approach to assessment is at odds with the way teaching listening was carried out in the classroom, where LC tasks often involved some scaffolding. To address this limitation, a dynamic assessment (DA) of a listening test was proposed and investigated. DA involves mediation and meaning negotiation when responding to LC tasks and items. This paper described: (a) the local assessment context, (b) the relevance of DA in this context, and (c) the findings of an empirical study that examined the new and current LC assessments. Sixty Tunisian EFL students responded to a LC test with two parts, static and dynamic. The tests were scored by 11 raters. Both the test-takers and raters were interviewed about their views of the two assessments. Score analyses, using the Multi-Facet Rasch Measurement (MFRM) (FACETS program, version, 3.61.0), indicated that test-taker ability, rater behavior and item difficulty estimates varied across test types. Qualitative data analysis indicated that although the new assessment provided better insights into learners' cognitive and meta-cognitive processes than did the traditional assessment, raters were doubtful about the value of and processes involved in DA mainly because they were unfamiliar with it. The paper discussed the findings and their implications for listening assessment practices in this context and for theory and research on listening assessment.
Article
Full-text available
The search for fairness in language testing is distinct from other areas of educational measurement as the object of measurement, that is, language, is part of the identity of the test takers. So, a host of issues enter the scene when one starts to reflect on how to assess people's language abilities. As the quest for fairness in language testing is still in its infancy, even the need for such a research has been controversial, with some (e.g., Davies, 2010) arguing that such research is entirely in vain. This paper will provide an overview of some of the issues involved. Special attention will be given to critical language testing (CLT) as it has had a large impact on language testing research. It will be argued that although CLT has been very effective in revealing the ideological and value implications of the constructs of focus in language testing, extremism in this direction is not justified.
Article
Full-text available
Despite its geopolitical reputation, Iran has been quite underrepresented in studies of its sociopolitically dominated religious educational framework and its language teaching and assessment policy. In the aftermath of the Islamic Revolution in 1979, major restructuring was planned for foreign language teaching and assessment to accord with Islamic values. However, due to political obstacles, including the long-lasting war between Iran and Iraq, most of the plans could not be implemented until years after the revolution. Since high-stakes language tests including university entrance examinations are developed, administered, and scored by the government agencies, independent researchers do not have access to test data. A critical review of the language assessment policy in Iran requires research-based data, which is, in many cases, lacking or sporadic. This article is an attempt to provide some basic documented information about the educational system, foreign language teaching, and assessment in Iran.
Article
Full-text available
What two standards can teachers and administrators use to decide whether a particular way of preparing students for a test is appropriate? How do five commonly used test-preparation practices stack up to these standards? How do educators and school board members view these five practices?
Book
Validity is the hallmark of quality for educational and psychological measurement. But what does quality mean in this context? And to what, exactly, does the concept of validity apply? These apparently innocuous questions parachute the unwary inquirer into a minefield of tricky ideas. This book guides you through this minefield, investigating how the concept of validity has evolved from the nineteenth century to the present day. Communicating complicated concepts straightforwardly, the authors answer questions like: What does 'validity' mean? What does it mean to 'validate'? How many different kinds of validity are there? When does validation begin and end? Is reliability a part of validity, or distinct from it? This book will be of interest to anyone with a professional or academic interest in evaluating the quality of educational or psychological assessments, measurements and diagnoses.
Book
This book provides key insights into how educational leaders can successfully navigate the turbulence of political debate surrounding leading student assessment and professionalised practice. Given the highly politicised nature of assessment, it addresses leaders and aspiring leaders who are open to being challenged, willing to explore controversy, and capable of engaging in informed critical discourse. The book presents the macro concepts that these audiences must have to guide optimal assessment policy and practice. Collectively, the chapters highlight important assessment purposes and models, including intended and unintended effects of assessment in a globalised context. The book provides opportunities to explore cultural similarities and particularities. It invites readers to challenge taken-for-granted assumptions about ourselves and colleagues in other settings. The chapters highlight the cultural clashes that may occur when cross-cultural borrowing of assessment strategies, policies, and tools takes place. However, authors also encourage sophisticated critical analyses of potential lessons that may be drawn from other contexts and systems. Readers will encounter challenges from authors to deconstruct their assessment values, beliefs, and preconceptions. Indeed, one purpose of the book is to destabilise certainties about assessment that prevail and to embrace the assessment possibilities that can emerge from cognitive dissonance.
Chapter
In this chapter I first give a brief overview of different approaches to translation evaluation. Secondly, I sketch some ways of drawing on recent developments in the language sciences to improve translation evaluation procedures. Concretely, I suggest that translation quality assessment might benefit from contrastive pragmatic discourse studies involving many different lingua-cultures, corpus-linguistic approaches to validate translation evaluations by relating them to comparable and reference corpora, psycho-linguistic and socio-psychological approaches to complement corpus-based methods and integrate product-based and process-based approaches including accounts of translation in process via computer monitoring as well as recent neuro-linguistic and assessment work.
Chapter
Chinese education is historically examination-oriented. For centuries, high-stakes public examinations have been used as the primary assessment tool to make decisions on learning outcomes, educational upward movement, and social mobility. In present day China, various initiates have been adopted to address issues related to the deeply entrenched testing-oriented practices. Regardless, testing continues to play a major role in education and educational assessment, in particular, for admission, progressing, and accountability purposes. To have an indepth understanding of such a situation, this chapter explores social-cultural factors which influence the determination of the fairness in a high-stakes test. This chapter first reviews the historical development of educational assessment in China and illustrates the significant influence of testing on classroom teaching and learning, followed by an introduction of four major, large-scale educational testing systems. Following that, the chapter focuses on one test in one of the testing systems and explores students’, teachers’, and administrators’ perceptions about the fairness of this test. Results found that the participants endorsed such a testing-oriented system for various reasons, including the fair testing process, the merit-based value, the testing-oriented tradition, and pursuit of efficiency. Finally, the study questions whether fairness, driven by cultural and political ideology, serves to impede the development of fairness for those in the least advantaged positions in China. Dangers remain in treating testing as a predominantly fair and legitimate tool. As formative learning models gain impetus in the twenty-first century, educational policies and practices need to consider balanced and aligned assessment that represents the real benefits and interests of all students.
Chapter
Summarisies the short history of attention to ethical issues in langage testing, and argues for more attention to this area. Uses examples such as the UNESCO use of language tests as a barrier to migration, and asks where responsibility lies for misuses of language tests--an issue linked with research on test washback and impact. Refers to the (at that time) developing ILTA Code of Ethics and asks how we may include the voices of stakeholders other than the language testing professionals in the debate about the role of ethics in language testing.
Article
Unlike static assessment, which relies on a student's assessment score as the primary indicator of an individual's abilities, dynamic assessment (DA) unifies instruction with assessment to provide learners with mediation to promote their hidden learning potential during assessment. Since many Freshman English classes in Taiwan are large in size, providing human-to-human mediation to each individual learner can be unrealistic. In this action research project, the Viewlet Quiz 3 software was used to develop a computerized dynamic assessment (C-DA) program that integrated mediation with assessment to support 68 Taiwanese college EFL learners' inferential reading skills. The C-DA program and the mediation design are presented in detail in this article. The participants' written reflections in their working portfolio are presented to show the effects of C-DA on promoting Taiwanese EFL college students' metacognitive reading strategies in making inferences. In addition, the participants' pre- and post-test scores were compared to determine whether the participants showed any significant progress after receiving computerized mediation in the C-DA program.
Article
Routledge Introductions to Applied Linguistics is a series of introductory level textbooks covering the core topics in Applied Linguistics, primarily designed for those beginning postgraduate studies, or taking an introductory MA course as well as advanced undergraduates. Titles in the series are also ideal for language professionals returning to academic study.
Article
Practical Language Testing equips you with the skills, knowledge and principles necessary to understand and construct language tests.
Article
While university admissions testing most likely began in eighteenth-century Europe, standardized admissions testing in the USA began with the establishment of the College Entrance Examination Board in 1900. The most widely used tests for college admission, the SAT and ACT, as well as graduate and professional school admissions tests used in the USA, are produced by private rather than governmental organizations. Higher education admissions processes vary internationally in terms of the role of admissions tests and the degree of government oversight and centralization. Validity evidence for admissions tests is typically provided in terms of the utility of test scores in predicting subsequent grades.
Article
Re-examining Language Testing explores ideas that form the foundations of language testing and assessment. The discussion is framed within the philosophical and social beliefs that have forged the practices endemic in language education and policy today. From historical and cultural perspectives, Glenn Fulcher considers the evolution of language assessment, and contrasting claims made about the nature of language and human communication, how we acquire knowledge of language abilities, and the ethics of test use. The book investigates why societies use tests, and the values that have driven changes in practice over time. The discussion is presented within an argument that an Enlightenment inspired view of human nature and advancement is most suited to a progressive, tolerant, and principled theory of language testing and validation. Covering key topics such as measurement, validity, accountability and values, Re-examining Language Testing provides a unique and innovative analysis of the ideas and social forces that shape the practice of language testing. It is an essential read for advanced undergraduate and postgraduate students of Applied Linguistics and Education. Professionals working in language testing and language teachers will also find this book invaluable.
Article
Routledge, 2010. 423 pages. ISBN 10: 0-8058-6185-8. When Larson-Hall wrote this book, she had in mind, those second language researchers who feel a little uncomfortable when dealing with statistics. With this introduction to statistics through the Statistical Package for the Social Sciences (SPSS) she found a suitable way to help researchers first to interpret the statistics tests and then to learn how to generate descriptive statistics, choose a statistical test, and conduct and interpret the basic tests that a researcher may need. In A Guide to Doing Statistics in Second Language Research Using SPSS, Larson-Hall mainly draws her data sets from real Second Language Acquisition studies, and these are featured in a companion website (http://www.routledge.com/textbooks/9780805861853) so that readers can use raw data to complete the exercises contained in the book. Thanks to these sets the author allows the reader to access and to work on other researchers' data. Although at the beginning she considered working with the statistical program "R" and with the SPSS, she decided to make things easier and to focus on the SPSS as the R program, although freeware, is more difficult to cope with. Thus, the main issues of this book are illustrated with no few SPSS windows, tables, figures and the proposal of a series of exercises and activities whose answers are found in the abovementioned website. The book is meant to be read in a chronological order. Part I presents fundamental concepts in statistics and Part II provides information about statistical tests that are commonly used in second language research.
Article
This paper considers dynamic assessment (DA) as it relates to second language (L2) development. DA is grounded in Vygotsky's (1987) sociocultural theory of mind, which holds that human consciousness emerges as a result of participation in culturally organized social activities where mediation plays a key role in guiding development. In DA, the evidential basis for diagnosing and promoting development includes independent as well as mediated performance. Unlike in conventional testing, objectivity derives not from standardization, which treats everyone in precisely the same manner; rather, objectivity is argued on the grounds that mediation is guided by a viable theory of mind. Fairness in assessment is thus reframed with the understanding that the quality of support offered may vary across individuals. The paper illustrates how this transpires in the case of classroom-based and formal testing applications of DA with L2 learners.
Chapter
For the purpose of this chapter, philosophy will be seen as the study of the beliefs that we have about the world, and the use of rational argument and evidence in the formulation and support of those beliefs. In recent years it has been widely argued that language testing exists in a social context and that language assessment practices and test use can be understood only in terms of the exercise of power. This position is based on a particular view of the world in which meaning is created by human interaction and social structures, but where there is little use for reference in Frege's sense. It is a position that does not ask ontological questions, but prioritizes a social ethics of test use for instrumental purposes. This illustrates a deeper truth than any claim that tests exist in, and are shaped by, social conditions. Namely, that how we understand the role of tests and social conditions is determined by our prior understanding of how we think the world works. In this chapter I will attempt to unravel some of the key philosophical issues that face the language‐testing profession, and in the process help to shed some light on fault lines in current debate that are much deeper than mere disagreement on procedure or outcomes. These will include: the nature of our constructs; the role of “procedure” in validity theory; realism, instrumentalism, and constructionism; the place of “inference”; reductionism and language assessment as a “social” science. The discussion will be embedded in a loose historical narrative that relates language assessment to the development of philosophical beliefs about “assessing man” in the social sciences.
Article
INTRODUCTION Scientific theories can be viewed as attempts to explain phenomena by showing how they would arise, if certain assumptions concerning the structure of the world were true. Such theories invariably involve a reference to theoretical entities and attributes. Theoretical attributes include such things as electrical charge and distance in physics, inclusive fitness and selective pressure in biology, brain activity and anatomic structure in neuroscience, and intelligence and developmental stages in psychology. These attributes are not subject to direct observation but require an inferential process by which the researcher infers positions of objects on the attribute on the basis of a set of observations. To make such inferences, one needs to have an idea of how different observations map on to different positions on the attribute (which, after all, is not itself observable). This requires a measurement model. A measurement model explicates how the structure of theoretical attributes relates to the structure of observations. For instance, a measurement model for temperature may stipulate how the level of mercury in a thermometer is systematically related to temperature, or a measurement model for intelligence may specify how IQ scores are related to general intelligence. The reliance on a process of measurement and the associated measurement model usually involves a degree of uncertainty; the researcher assumes, but cannot know for sure, that a measurement procedure is appropriate in a given situation.
Article
Stable URL:http://links.jstor.org/sici?sici=0039-8322%28199822%2932%3A2%3C329%3AETPPTC%3E2.0.CO%3B2-5TESOL Quarterly is currently published by Teachers of English to Speakers of Other Languages, Inc. (TESOL).Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available athttp://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtainedprior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content inthe JSTOR archive only for your personal, non-commercial use.Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp://www.jstor.org/journals/tesol.html.Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission.The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academicjournals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community takeadvantage of advances in technology. For more information regarding JSTOR, please contact support@jstor.org.http://www.jstor.orgTue Feb 26 14:18:04 2008
Article
Previous test fairness frameworks have greatly expanded the scope of fairness, but do not provide a means to fully integrate fairness investigations and set priorities. This article proposes an approach to guide practitioners on fairness research and practices. This approach treats fairness as an aspect of validity and conceptualizes it as comparable validity for all relevant groups. Anything that weakens fairness compromises the validity of a test. This conceptualization expands the scope and enriches the interpretations of fairness by drawing on well-defined validity theories while enhancing the meaning of validity by integrating fairness in a principled way. TOEFL (R) iBT (TM) is then used to illustrate how a fairness argument may be established and supported in a validity argument. The fairness argument consists of a series of rebuttals to the validity argument that would compromise the comparability of score-based interpretations and uses for relevant groups, and it provides a logical mechanism for identifying critical research areas and setting research priorities. This approach will hopefully inspire more investigations motivated by and built on a central fairness argument. It may also foster a deeper understanding and expanded explorations of actions based on test results and social consequences, as impartiality and justice of actions and comparability of test consequences are at the core of fairness.
Article
The articles in this special issue were written for a symposium on the ethics of language testing held at the triennial congress of the Association Internationale de Linguistique Appliquée in August 1996 in Finland. This special issue addresses the role of ethics (and the limits of that role) in professional activities such as language testing. The nine articles that follow can be divided into four sections. The two articles in the first section (Spolsky and Hawthorne) consider language testing as a means of political control. The two in the second section (Elder, and Norton and Starfield) are concerned essentially with the definition of the test con struct — Elder considering whether the construct has the same meaning for differ ent groups and Norton and Starfield questioning whether it is form or content that is being assessed. The two articles in the third section (Hamp-Lyons and Rea- Dickins) consider the effects of language tests on the various stakeholders who are involved. In the fourth section, the three articles (Shohamy, Lynch and Davies) offer criteria for promoting ethicality in language testing, Lynch by an approach from principle (deontological), Davies through the professionalizing of the activity (teleological), and Shohamy through attention both to method and to consequence.
Article
For many years it was asserted that language tests had a negative impact on teach ing and thereby on learning/learners. This impact has become widely known in language testing as 'washback' ('backwash'). A theory of washback is emerging and its domain being delineated; in this article the theory of washback is linked with the broader concept of 'impact' in educational measurement, and thence to the recent debate on construct validity associated with Messick.
Article
This paper looks at Galton’s work from the perspective of its influence on subsequent developments in assessment and especially psychometric latent variable models. It describes how Galton’s views on scientific validity and assumptions about data analysis came to be incorporated into later perspectives.
Article
The article reviews the usefulness of several models of proficiency that have influenced second language testing in the last two decades. The review indicates that several factors contribute to the lack of congruence between models and test construction, and makes a case for distinguishing between theoretical models, which attempt to represent the proficiency construct in various contexts and operational assessment frameworks, which depict the construct in particular contexts. Additionally, the article underscores the significance of an empirical, contextualized and structured approach to the development of assessment frameworks.
Article
Washback, a concept prominent in applied linguistics, refers to the extent to which the introduction and use of a test influences language teachers and learners to do things they would not otherwise do that promote or inhibit language learning. Some proponents have even maintained that a test's validity should be appraised by the degree to which it manifests positive or negative washback, a notion akin to the proposal of 'systemic validity' in the educational measurement literature. This article examines the concept of washback as an instance of the consequential aspect of construct validity, linking positive washback to so-called authentic and direct assessments and, more basically, to the need to minimize construct under- representation and construct-irrelevant difficulty in the test.
Article
Language tests reflect the complexities and power struggles of society. Critical language testing recognizes this aspect of language testing and broadens the field by engaging it in the sphere of social dialog. Studying, protecting, and guarding language tests is part of the process of providing quality learning and preserving democratic cultures, values, and ethics. (SLD)
Article
Cheating manifests in creative ways in examinations and competitions in educational settings and is justified by numerous rationales or codes of (im)morality. Through the analysis of 3 filmic representations of such instances, this article examines ethical problems that arise in educational contexts and deals explicitly with notions of right conduct, fairness, and justice. The potential for conflict between the teacher-as-moral-agent's abstract, absolute beliefs and the complexities presented by the ethical dilemma in its immediate context is explored. It was concluded that teachers' decisions ultimately hinge on personally held beliefs regarding social and political inequities and that an educator's professional and personal identities are inseparable.
Article
Since Messick (1989) included test use in his validity matrix, there has been extensive debate about professional responsibility for test use. To theorize test use, some researchers have relied upon Foucault's social criticism, thereby stressing the negative role of tests in the surveillance of the marginalized. From a wider perspective, Shohamy (2001a) sees negative test impact as stemming from centralizing agencies, which still leaves open the possibility of positive test use. In this article I argue that how tests are used is a reflection of the wider political philosophy of a society. Political philosophy can generally be characterized as placing more emphasis on either the state or the citizen, leading to collectivist or individualist solutions to problems, be they real or perceived. In collectivist societies, tests, like history, are used to achieve conformity, control, and identity. In individualistic societies, they are used to promote individual progress. The role of tests within each broad approach will be described and illustrated. Finally, I briefly describe effect-driven test architecture as a method for testers to proscribe unintended uses of their tests.
Article
Most academics' intuitions about statistics follow those of naive laypeople – that is, we often think that a sample should reflect the population characteristics more closely than it does, and expect less variability in samples than is truly found in them. These intuitions may prevent us from understanding why modern developments in statistics are needed. Another intuition most researchers hold is that it is better to be conservative when performing statistics, and this may involve adjusting p-values for multiple tests, using more conservative post hoc tests, or setting an alpha value lower than .05 when possible. However, the more we try to control against making an error in being overeager to find differences, the stronger the probability that we will make an error in not finding differences that actually exist. These two forces need to be counterbalanced, and this involves increasing the power of our tests. Robust statistics can increase the power of statistical tests to find real differences. I discuss the need for robust techniques to avoid reliance on classical assumptions about the data. Examples of robust analyses with t-tests, correlation, and one-way ANOVA are shown.
Article
In this book, the authors foreground an aspect of language testing that is usually not much discussed and is frequently considered an "advanced" topic: the social dimension of language testing. They see various social dimensions in language testing. There are socially oriented language tests, i.e., tests which assess learners' ability to use language in social settings. These are primarily oral proficiency interviews and tests of second language pragmatics. But the authors also understand "social" as "societal" and look at the larger-scale impact of tests on individual learners or groups of learners by discussing fairness and bias in language testing. They also broaden their view and discuss the role of language testing in a macro-social context, e.g. as accountability measures in education systems, as gatekeeping instruments for migration, and as tools for constructing and defining social groups. Their discussion is anchored in traditional and more recent views of validity theory.
Article
You may be able to identify several ways students can be prepared to take an upcoming test, but which ways are ethical to use? There are available materials that you may purchase and use to prepare students for a test: Are they effective and is it ethical to use them? What constitutes an ethically acceptable test preparation procedure?