Article

Relationship of admission test scores to writing performance of native and Nonnative Speakers of English

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Four writing samples were obtained from 638 applicants for admission to U.S. institutions as undergraduates or as graduate students in business, engineering, or social science. The applicants represented three major foreign language groups (Arabic, Chinese, and Spanish), plus a small sample of native English speakers. Two of the writing topics were of the compare and contrast type and the other two involved chart and graph interpretation. The writing samples were scored by 23 readers who are English as a second language specialists and 23 readers who are English writing experts. Each of the four writing samples was scored holistically, and during a separate rating session two of the samples from each student were assigned separate scores for sentence‐level and discourse‐level skills. Representative subsamples of the papers also were scored descriptively with the Writer's Workbench computer program and by graduate‐level subject matter professors in engineering and the social sciences. In addition to the writing sample scores, TOEFL scores were obtained for all students in the foreign sample. GRE General Test scores were obtained for students in the U.S. sample and for a subsample of students in the foreign sample. Students in the U.S. sample also took a multiple‐choice measure of writing ability. Among the key findings were the following: 1) holistic scores, discourse‐level scores, and sentence‐level scores were so closely related that the holistic score alone should be sufficient; 2) correlations among topics were as high across topic types as within topic types; 3) scores of ESL raters, English raters, and subject matter raters were all highly correlated, suggesting substantial agreement in the standards used; correlations and factor analyses indicated that scores on the writing samples and TOEFL were highly related, but that each also was reliably measuring some aspect of English language proficiency that was not assessed by the other; and (5) correlations of holistic writing sample scores with scores on item types within the sections of the GRE General Test yielded a pattern of relationships that was consistent with the relationships reported in other GRE studies.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... More recently, the GRE Board and TOEFL Policy Council have supported a study of the relationships of scores on sections of the GRE General Test and the TOEFL to a variety of scores on direct measures of writing, or writing samples (Carlson, Bridgeman, camp, & Waanders, 1985). The scoring of these writing samples focused on the evaluation of writing ability from the perspective of academic competency in written English. ...
... The students in the sample consisted primarily of nonnative speakers of English whose native language is Arabic (6), Chinese (73), Spanish (35), and a sample of native speakers of English (89).* The bulk of these data were collected for the previous GREjTOEFL project (Carlson et al., 1985), in which papers written by each of 132 students on four topics were scored to reflect writing skills. To supplement the original sample of native speakers, additional writing samples were collected from 77 speakers of English and 3 students whose native language was Arabic. ...
... Writing Skills A major portion of the writing samples had been holistically scored for the evaluation of writing skills for the TOEFL/GRE study. A full description of the scoring appears in the report of that research (Carlson et al., 1985). This method was duplicated in the scoring of the 80 additional papers per topic that were collected to supplement the writing samples for this study. ...
Article
The major objective of the study was to gain more information about the reasoning skills tapped by the GRE analytical measure by examining how performance on its constituent item types relate to alternative criteria. A second objective was to ascertain the extent to which additional information on examinees' analytical skills might be obtained from further analyses of their writing performance. The data base for this study consisted of 406 writing samples, prepared by 203 students who had recently taken the GRE General Test for admission to institutions of higher education in the United States. The bulk of these data were collected for research funded by the GRE Board and TOEFL Policy Council, in which the writing samples were scored to reflect writing skills; these scores were related to scores on the GRE General Test and the TOEFL, as well as other measures. In order to supplement the sample of native speakers, additional writing samples were collected from 77 native speakers of English and 3 native speakers of Arabic who had recently taken the GRE General Test and who were in their first year of graduate education in the United States. The final sample included subsamples of nonnative speakers of English (6 Arabic, 73 Chinese, 35 Spanish) and the subsample of 89 native speakers of English. The objectives of this study were accomplished by developing several scoring methods that focused on the reasoning skills that are reflected in these papers. These scores, in addition to the scores for writing skills, were related to item type subscores derived from the verbal and analytical reasoning sections of the GRE General Test in order to determine if these item types relate differently to judgments of examinees' thinking and writing skills. Three scoring methods did not appear to provide additional information beyond what is obtained from the analytical reasoning and verbal sections of the GRE General Test. The Moss scheme, however, yielded scores that were relatively independent of these sections of the GRE. It is possible that these scores tapped verbal reasoning skills that are not assessed by the GRE General Test, but further research is needed to determine whether they represent important developed abilities. Writer's Workbench computerized text analyses suggested that the different writing tasks elicited different kinds of writing performance, and that the writing performance of students representing different native language groups may vary in complex ways in response to these tasks.
... A single twenty-five-minute essay was administered. The essay topic was one that had been used in a earlier study by Carlson. Bridgeman, Camp. and Waanders (1985). and was recommended by Carlson as especially successful in earlier research. (The essay question may be found in Appendix B.) Further information is contained in a report by Fowles (1986). ...
... These criteria were appropriate to the range of writing observed in this study. but were virtually identical to those used in earlier studies of TOEFL essays (Carlson et al •• 1985). ...
... t -total test score n = number of part scores V. = variance of a part score (i) V 1 = variance of total test score t Turning to the issue of estimating the reliability of the graded essay score, the problem--in our study--is resolved by analogous reasoning. (A thorough discussion of the reliability of essay grades of TOEFL candidates may be found in Carlson et al., 1985.) In our case we obtained essay grades of two readers independently. ...
Article
The purpose of this study was to investigate the validity of cloze-elide tests of English proficiency for students who are similar to the TOEFL candidate population. Cloze-elide tests used in this research consisted of exercises in which an examinee is required to edit a prose passage by eliminating extraneous words that have been randomly interspersed throughout the original text of the passage. Students enrolled in university-level intensive English language programs were administered a series of tests including, in addition to cloze-elide tests, a form of the TOEFL, a multiple-choice cloze test, a traditional cloze exercise, and an essay that was holistically scored. Students were also rated by their instructors in twelve areas of English proficiency, and the students rated their own language competency through self-assessments in ten areas. A variety of student background information was also obtained. The design of the study aimed at contributing evidence of the construct validity of cloze-elide tests. Concurrent validity information is relevant to this question, but the evidence arising from factor analyses of the intercorrelations among these many measures is more pertinent. In summary, the new cloze-elide measures demonstrated very strong concurrent validity for TOEFL and other more widely used measures of second language proficiency. The factor analyses suggest that cloze-elide tests are good, indirect measures of English language proficiency, comparing very favorably with more commonly used testing procedures. Multiple regression analyses confirmed the usefulness of cloze-elide tests, which were generally one of the two best predictors of teacher ratings of students' English proficiency.
... (a) in samples tested in developmental research (Carlson, Bridgeman, Camp, and Waanders, 1985) involving the TOEFL and prototypical versions of the TWE, and (b) in samples taking both tests under fully operational conditions (see ETS, 1992b). ...
... In a comprehensive study (Carlson, Bridgeman, Camp, and Waanders, 1985) associated with development of the Test of Written English (TWE), holistic scores for ESL-writing samples were found to be somewhat more closely related to scores on the nonlistening portions of the TOEFL (structure and Written Expression [SWE] and Reading Comprehension and Vocabulary [RCV]), respectively, than to the TOEFL Listening Comprehension score. ...
... It is particularly noteworthy in this connection that the levels of concurrent relationship observed in the LACCD context, between scores on a shortened version of the SLEP and scores on local writing tests--with attendant differences in topic, rating procedures, and so on--parallel levels that have been found to obtain between scores on the Test of English as a Foreign Language and the Test of written Expression (writing samples elicited under standard conditions, scored under controlled conditions by at least two raters, and so on) in (more highly selected) (a) samples tested in developmental research (Carlson, Bridgeman, Camp, and Waanders, 1985) involving the TOEFL and prototypical versions of the TWE, and (b) samples taking both tests under fully operational conditions (see ETS, 1992b). 10 Such findings attest to the validity and relevance for placement of both the SLEP and the local writing tests. ...
Article
This is the report of a study that was underaken to obtain direct empirical evidence regarding aspects of the validity and usefulness for ESL placement of (a) a shortened version of the Secondary Level English Proficiency (SLEP) Test, being used for ESL placement by colleges in the Los Angeles Community College District (LACCD), and (b) locally developed and scored writing tests. The LACCD Central Office provided scores on the shortened version of the SLEP and the writing samples, grades in ESL courses, and background data (gender, language, educational status, and so on) for over 10,000 students. This report documents and evaluates • patterns of performance on the shortened SLEP and the writing tests in the general ESL population and in selected demographic subpopulations, • concurrent relationships among scores on the components of the LACCD placement battery, • observed levels of correlation between scores on the shortened SLEP test, the writing test, and the placement composite, on the one hand, and student performance in ESL courses, as indexed by grade earned (a grade on the “A‐F” scale, or a Pass/Fail grade), on the other, by course and by college, and in various subgroups (e.g., gender, educational level, age, language), and • the extent to which observed relationships in placed samples are influenced by non‐validity‐related factors (for example, differential restriction of range on the tests that were used to place students, sample size, and type of grading system). The findings of this collaborative undertaking provide direct empirical evidence that the shortened SLEP and locally developed writing tests are providing valid information regarding related aspects of ESL proficiency in the demographically diverse ESL student population being served by the LACCD; and the findings logically extend available evidence supportive of the validity of the Secondary Level English Proficiency Test for ESL assessment purposes in the LACCD and elsewhere. Based on study findings, the local writing tests and the shortened SLEP appear to be providing an effective basis for placing students, within time constraints that appear to be considered necessary, from an administrative perspective. Research is needed to address questions regarding the extent to which use of the full‐length version of the SLEP would enhance the overall validity of placement. Some pertinent lines of inquiry are suggested.
... A few studies have compared traditional essay tasks with data commentary tasks (i.e. describing information found in graphs)-and perhaps these two task types might invite different patterns of exposition (Carlson, Bridgeman, Camp, & Waanders, 1985;Park, 1988;Reid, 1990;Weigle, 1999). Park found, for example, that Chinese and English language background test takers who majored in -hard‖ sciences did better on a task describing information in a graph compared to a traditional essay task, which was not observed among those with social science majors. ...
... There is, in other words, an interaction. The findings of these studies in general are that these two task types resulted in different linguistic production (Reid, 1990), that inexperienced raters were more severe in rating data commentary tasks, perhaps due to unfamiliarity with the format (Weigle, 1999), but that correlations between the two types of tasks were generally high (Carlson, et al., 1985). That the two task-types studied are rated differently only by inexperienced raters seems to imply that, with rater training, scores on the two types of tasks can be comparable, thereby providing some evidence that the two kinds of tasks tap the same underlying ability. ...
Article
Performance assessments have become the norm for evaluating language learners??? writing abilities in international examinations of English proficiency. Two aspects of these assessments are usually systematically varied: test takers respond to different prompts, and their responses are read by different raters. This raises the possibility of undue prompt and rater effects on test-takers??? scores, which can affect the validity, reliability, and fairness of these tests. This study uses data from the Michigan English Language Assessment Battery (MELAB), including all official ratings given over a period of over four years (n=29,831), to examine these issues related to scoring validity. It uses the multi-facet extension of Rasch methodology to model this data, producing measures on a common, interval scale. First, the study investigates the comparability of prompts that differ on topic domain, rhetorical task, prompt length, task constraint, expected grammatical person of response, and number of tasks. It also considers whether prompts are differentially difficult for test takers of different genders, language backgrounds, and proficiency levels. Second, the study investigates the quality of raters??? ratings, whether these are affected by time and by raters??? experience and language background. It also considers whether raters alter their rating behavior depending on their perceptions of prompt difficulty and of test-takers??? prompt selection behavior. The results show that test-takers??? scores reflect actual ability in the construct being measured as operationalized in the rating scale, and are generally not affected by a range of prompt dimensions, rater variables, or test taker characteristics. It can be concluded that scores on this test and others whose particulars are like it have score validity, and assuming that other inferences in the validity argument are similarly warranted, can be used as a basis for making appropriate decisions. Further studies to develop a framework of task difficulty and a model of rater development are proposed.
... The basic design of the TWE was developed and validated in the study done by Carlson, Bridgeman, Camp, and Waanders (1985), in which scoring methods and the performance of different formats were investigated. ...
... Different formats, topics, and topic types might elicit different writing performances from the same examinee or may promote successful performance for one examinee while impeding successful performance for another. While Carlson et al. (1985) found comparability both within and across format, Freedman and Calfee (1983) found significant differences between scores for different formats and different topics within a given format. Given these disparate findings, research in this area takes on greater importance. ...
Article
The Test of Written English (TWE), administered with certain designated TOEFL ® examinations, consists of a single essay prompt to which examinees have 30 minutes to respond. It was introduced in 1986 to provide TOEFL score users with a direct measure of examinees' writing ability. Preliminary studies had indicated that the two different kinds of prompts: prose compare, contrast, and take a position , and describe or interpret a chart or graph , elicited comparable writing performance. However, questions were subsequently raised with respect to continued comparability of different TWE ® prompts administered under operational conditions. The present study was designed to elicit essays for prompts that differed in both subject matter (Topic) and in the level of explicitness with which the essay task was presented (Topic Type). Eight different prompts were spiraled worldwide at the October 1989 TOEFL administration, with each prompt eliciting approximately 10,000 essays. The results of the analyses indicated that there were small differences among the prompts. The most notable differences were obtained among the scores for topics using the explicit comparison. Across all the prompts, the chart‐graph with the explicit comparison statement produced the highest mean scores. Because it was the first study of its kind to focus on the comparability of prompts in a major testing program, the authors had difficulty making definitive statements regarding the meaningfulness of the obtained differences. While many of the differences in means observed in this study were so small as to be of no practical significance, differences observed across prompts in the numbers of examinees at each score level were not. Such differences may warrant further consideration by the TOEFL program.
... Thus, on balance, findings involving SLEP summary section scores (LC and RC) with interview and essay rating tend to be generally consistent not only with theoretical expectation but also with empirical findings involving similar measures in other testing contexts; and findings summarized in Table 9, suggest patterns of discriminant validity for item-type subscores that tend to parallel patterns observed for the summary scores to which they contribute-and the single exception to this (expected) parallelism involves a "listening comprehension" item type (Dictation) which appears to call for relatively extensive processing of written response options in order to respond to a spoken prompt. 10 In research undertaken as part of the develoment and validation of the TWE (e.g., Carlson, Bridgeman, Camp, and Waanders, 1985) scores on the nonlistening sections of the TOEFL correlated more highly with essay rating than did scores on the listening section. The sample of international students involved was linguistically heterogeneous and not restricted to students from the "Asian region". ...
... For further development of this general point, seeWilson (1989: pp. 60-61, p. 74); for a research review of problems involved in assessing ESL writing proficiency, seeCarlson et al. (1985); for a comprehensive analysis of the numerous problems involved in the direct assessment of "writing ability" generally, seeBreland (1983). ...
Article
The study reported herein assessed levels and patterns of concurrent correlations for Listening Comprehension (LC) and Reading Comprehension (RC) scores provided by the Secondary Level English Proficiency (SLEP) test with direct measures of ESL speaking proficiency (interview rating) and writing proficiency (essay rating), respectively, and (b) the internal consistency and dimensionality of the SLEP, by analyzing intercorrelations of scores on SLEP item-type “parcels” (subsets of several items of each type included in the SLEP). Data for some 1,600 native-Japanese speakers (recent secondary-school graduates) applying for admission to Temple University-Japan (TU-J) were analyzed. The findings tended to confirm and extend previous research findings suggesting that the SLEP – which was originally developed for use with secondary-school students–also permits valid inferences regarding ESL listening- and reading-comprehension skills in postsecondary-level samples.
... Clarke and Walker (1980) contend that practicing writing in a timed context benefits student ability to do well on essay type examinations. In several studies of timed-writing samples, the findings have been consistent: essays with higher scores are longer than their lower rated counterparts (Carlson, Bridgeman, Camp, & Waanders, 1985;Ferris, 1994;Frase, Faletti, Ginther, & Grant, 1999;Grant & Ginther 2000;Reid, 1986Reid, , 1990. ...
... Written texts that are rated highly are nearly always relatively long. Text length therefore appears to be a rather consistent predictor of perceived writing quality as discussed earlier (Carlson et al., 1985;Ferris, 1994;Frase et al., 1999;Grant & Ginther, 2000;Reid, 1986Reid, , 1990. ...
... In several studies of timed writing samples, the findings have been consistent: Essays with higher scores are longer than their lower rated counterparts (Carlson, Bridgeman, Camp, & Waanders, 1985;Ferris, 1994;Frase, Faletti, Ginther, & Grant, 1999;Grant & Ginther, 2000;Reid, 1986Reid, , 1990, use longer words on average (Frase et al., 1999;Grant & Ginther, 2000;Reid, 1986Reid, , 1990Reppen, 1994), and show more diverse word use than their lower quality counterparts (e.g., Grant & Ginther, 2000;Jarvis, 2002a;Reppen, 1994). However, the means used for producing a text may make a difference in text length. ...
... We do not know whether being a good writer makes one write more, whether writing more makes one a better writer, or whether raters are simply biased towards longer texts, but we do know that written texts that are rated highly are nearly always relatively long. Text length therefore appears to be a rather consistent predictor of perceived writing quality, as discussed earlier (Carlson et al., 1985;Ferris, 1994;Frase et al., 1999;Grant & Ginther, 2000;Jarvis, 2002a;Linnarud, 1986;Reid, 1986Reid, , 1990. ...
Article
Recent research has come a long way in describing the linguistic features of large samples of written texts, although a satisfactory description of L2 writing remains problematic. Even when variables such as proficiency, language background, topic, and audience have been controlled, straightforward predictive relationships between linguistic variables and quality ratings have remained elusive, and perhaps they always will. We propose a different approach. Rather than assuming a linear relationship between linguistic features and quality ratings, we explore multiple profiles of highly rated timed compositions and describe how they compare in terms of their lexical, grammatical, and discourse features. To this end, we performed a cluster analysis on two sets of timed compositions to examine their patterns of use of several linguistic features. The purpose of the analysis was to investigate whether multiple profiles (or clusters) would emerge among the highly rated compositions in each data set. This did indeed occur. Within each data set, the profiles of highly rated texts differed significantly. Some profiles exhibited above-average levels for several linguistic features, whereas others showed below-average levels. We interpret the results as confirming that highly rated texts are not at all isometric, even though there do appear to be some identifiable constraints on the ways in which highly rated timed compositions may vary.
... The TOFEL examination included a direct writing measure (Connor, 1991: 216) in 1986 for the test of written English that was marked holistically (TOFEL test of written English guide 1989). A great deal of research was conducted by the Educational Testing Service into the development and validation of a measure to assess communicative competence in writing (Bridgman Carlson, 1983;Carlson et al. 1985). A holistic scoring guide was developed to mark two general topics-comparison/contrast and describing a graph that had six levels and included syntactic and rhetorical criteria. ...
Article
Full-text available
It is now ten years since Communicative Language Teaching (CLT) has been introduced in secondary English curriculum of Bangladesh. Therefore, the test of English at the SSC level is now facing the challenge of assessing learners' communicative skills. This study looks at the existing model of the SSC English test and explores the possibilities of incorporating a more communicatively based test format. The study is based on an evaluation of the test items on writing skills set in the SSC test papers. It also explores the views of Bangladeshi secondary English teachers and internationally renowned language testing Experts. It is argued that though secondary English education in Bangladesh has stepped into a communicative era ten years back, the current SSC test is not in accordance with the curriculum objectives. The test items on writing lack both validity and reliability.Suggestions made for improving the current SSC test include: defining the purpose of communication in English for SSC level learners, drafting test specifications, setting test items which are relevant to a communicative purpose, and developing a marking scheme to mark the subjective items.
... • No of words : Several research projects have shown that higher-rated essays, in general, contain more words [4,8]. • Domain ID: Domain IDs are the broad level categorization of the responses. ...
Preprint
The Managed Care system within Medicaid (US Healthcare) uses Request For Proposals (RFP) to award contracts for various healthcare and related services. RFP responses are very detailed documents (hundreds of pages) submitted by competing organisations to win contracts. Subject matter expertise and domain knowledge play an important role in preparing RFP responses along with analysis of historical submissions. Automated analysis of these responses through Natural Language Processing (NLP) systems can reduce time and effort needed to explore historical responses, and assisting in writing better responses. Our work draws parallels between scoring RFPs and essay scoring models, while highlighting new challenges and the need for interpretability. Typical scoring models focus on word level impacts to grade essays and other short write-ups. We propose a novel Bi-LSTM based regression model, and provide deeper insight into phrases which latently impact scoring of responses. We contend the merits of our proposed methodology using extensive quantitative experiments. We also qualitatively asses the impact of important phrases using human evaluators. Finally, we introduce a novel problem statement that can be used to further improve the state of the art in NLP based automatic scoring systems.
... Thus word and sentence length can be used to indicate the sophistication of a writer (Hiebert 2011). Several research projects have shown that higher-rated essays, in general, contain more words (Carlson et al. 1985;Ferris 1994;Reid 1990) and generally use longer words (Frase et al. 1998;Reppen 1995). Thus sentence length, word length, 3 https://wordnet.princeton.edu/ 4 https://www.https://www.merriam-webster.com/ Figure 1: Pipeline for scoring short answers using AutoSAS. ...
Article
Full-text available
In the era of MOOCs, online exams are taken by millions of candidates, where scoring short answers is an integral part. It becomes intractable to evaluate them by human graders. Thus, a generic automated system capable of grading these responses should be designed and deployed. In this paper, we present a fast, scalable, and accurate approach towards automated Short Answer Scoring (SAS). We propose and explain the design and development of a system for SAS, namely AutoSAS. Given a question along with its graded samples, AutoSAS can learn to grade that prompt successfully. This paper further lays down the features such as lexical diversity, Word2Vec, prompt, and content overlap that plays a pivotal role in building our proposed model. We also present a methodology for indicating the factors responsible for scoring an answer. The trained model is evaluated on an extensively used public dataset, namely Automated Student Assessment Prize Short Answer Scoring (ASAP-SAS). AutoSAS shows state-of-the-art performance and achieves better results by over 8% in some of the question prompts as measured by Quadratic Weighted Kappa (QWK), showing performance comparable to humans.
... In this sense, rater training and previous experience of the raters are considered as effective factors to ensure reliability of scores in terms of the aspects of intra-and inter-rater consistency. Therefore, investigating raters' scoring background can help reduce the variability in essay scores (Carlson, Bridgeman, Camp, & Waanders, 1985;Cumming, 1990;Hamp-Lyons, 1990;Homburg, 1984;Myers, 1980;Najimy, 1981;Reid, 1993;Upshur & Turner, 1995). ...
Thesis
This dissertation aimed to investigate the impact of rater experience and essay quality on rater behavior and scoring. In doing so, the variability of essay scores assigned to high-quality and low-quality essays were examined quantitatively while raters’ decision-making strategies were investigated qualitatively. Using convergent parallel design as a mixed-methods approach, data were collected from 31 EFL instructors and two research assistants working at higher education institutions in Turkey. While 15 of the participants were from a specific university, the remaining participants represented various universities across Turkey. Based on their reported rating experience, participants were divided into three groups: low-experienced (n = 13), medium-experienced (n = 10), and high-experienced raters (n = 10). Using an analytic scoring rubric, each participant assessed a number of 50 essays of two distinct qualities (high- and low-quality) and simultaneously recorded think-aloud protocols to determine the raters’ decision-making processes while scoring EFL essays. In addition, raters’ written explanations for their ratings were used to triangulate the verbal protocols. A total of 9,900 scores (1,650 total scores and 8,250 sub-scores), 446 think-aloud protocols, and 5,425 written score explanations were obtained from the participants. The analysis of quantitative data relied on generalizability (G-) theory approach as well as descriptive and inferential statistics; qualitative data were analyzed through deductive and inductive coding. The results showed that high-experienced raters are more positive toward students’ essays and assign higher scores compared to their less experienced peers. Furthermore, the high-experienced and low-experienced groups differed significantly in their total scores and mechanics component sub-scores assigned to low-quality essays. Additionally, G-theory analyses were conducted to determine the sources of measurement error and their relative contributions to the score variability. The results yielded a smaller rater effect when high- and low-quality essays were considered collectively, but it was found that raters contributed more to score variation when separate analyses were conducted for each essay quality. The qualitative findings suggested that raters in different experience groups display different decision-making behaviors while assessing essays of different proficiency levels. Overall, the findings provide striking insights for rater reliability in EFL writing assessment. Implications are discussed with respect to EFL writing assessment in the local and wider context from the perspective of fairness and rater reliability. Keywords: EFL writing assessment, essay quality, generalizability theory, rater behavior, rater experience, score variability, think-aloud protocols
... ESL and non-ESL raters scored the papers written by native or non-native English test takers differently; even if the scores they reached were quite similar , the components they considered were different (e.g. Carlson, Bridgeman, Camp, & Waanders, 1985;O'Loughlin, 1993;Sweedler-Brown, 1993;Vann, Meyer, & Lorenz, 1984). Also an extensive body of research has addressed writing assessment by raters' being lay or professional (Cumming, 1990;Shohamy, Gordon, & Kraemer, 1992;Schoonen, Vergeer, and Eiting, 1997;Wolfe & Ranney, 1996). ...
Article
The present study reports the processes of development and use of an Analytic Dichotomous Evaluation Checklist (ADEC) which aims at enhancing both inter-and intra-rater reliability of writing evaluation. The ADEC consists of a total of 68 items that comprises five subscales of content, organization, grammar, vocabulary, and mechanics. Eight raters assessed the writing performance of 20 Iranian EFL students using the ADEC. Also, the raters were asked to rate the same sample of essays holistically based on Test of Written English (TWE) scale. To examine the inter-rater and intra-rater reliability of the ADEC, multiple approaches were employed including correlation coefficient, the dichotomous Rasch Model, and many-faceted Rasch measurement (MFRM). The findings of the study confirmed that the ADEC introduces higher reliability into scoring procedure compared with holistic scoring. Future research with greater number of raters and examinees may provide robust evidence to use analytic scale rather than holistic one. a
... Test validation is immensely significant for all test users because " accepted practices of the validation are critical to decisions about what constitutes a good language test for a particular situation " (Chapelle, 1999, p.254). Accordingly, review of assessment literature is highlighted by countless studies on examining reliability and validity of numerous proficiency, aptitude, Knowledge, and placement tests (see for example, Carlson et al., 1985; Chi, 2011; Compton, 2011; Dandonolli & Henning 1990; Drollinger et al., 2006; Eda et al., 2008; Greenberg, 1986; Johnson, 2001; Magnan, 1987; Patterson & Ewing, 2013; Sabers & Feldt, 1968; Stansfield & Kenyon, 1992; Thompson 1995; Zhao, 2013).The sensitivity and significance of university entrance exams (UEE), especially in countries where UEEs are perceived as the sole gateways to qualify for university programs, have remarkably necessitated undertaking numerous in-depth inquiries on their reliability and validity around the world (see for example, Frain, 2009; Hissbach et al., 2011; Ito, 2012; Kirkpatrick & Hlaing, 2013). Kirkpatrick and Hlaing (2013), for instance, sought to examine the reliability and validity of the English section of the Myanmar UEE and came to the point that the exam suffered from poor construct and content validity leading to negative washback with regard to learning and teaching. ...
Article
Full-text available
Owing to their scope, and decisiveness, Ph. D. program entrance exams (PPEE) ought to demonstrate acceptable reliability and validity. The current study aims to examine the reliability and validity of the new Teaching English as a Foreign Language (TEFL) PPEE from the perspective of both university professors and Ph. D. students. To this end, in-depth unstructured interviews were conducted with ten experienced TEFL university professors from four different Iranian state universities along with ten Ph. D. students who sat both the new and old PPEEs. A detailed content analysis of the data suggested that the new exam was assumed to establish acceptable reliability through standardization and consistency in administration and scoring procedures. Conversely, the new exam was perceived to demonstrate defective face, content, predictive, and construct validities. This study further discusses the significance and implications of the findings in the context of Iranian TEFL higher education.
... A strong empirical relationship, not only between the essay length and holistic score but also between essay length and each of the six analytic scores used, was confirmed in this study. This was not completely unexpected, given previous research findings on the strong relationships between essay length and holistic scores (Carson, Bridgeman, Camp, & Waanders, 1985; Ferris, 1994; Frase, Faletti, Ginther, & Grant, 1999; Grant & Ginther, 2000; Jarvis, 2002; Jarvis et al., 2003; Reid, 1986) and between lexical diversity measures and holistic scores (Engber, 1995; Laufer & Nation, 1995). Interestingly, the holistic and development scores were found to be most highly correlated to essay length, while the mechanics score was least correlated to essay length. ...
Technical Report
Full-text available
The main purpose of the study was to investigate the distinctness and reliability of analytic (or multitrait) rating dimensions and their relationships to holistic scores and e-rater® essay feature variables in the context of the TOEFL® computer-based test (CBT) writing assessment. Data analyzed in the study were analytic and holistic essay scores provided by human raters and essay feature variable scores computed by e-rater (version 2.0) for two TOEFL CBT writing prompts. It was found that (a) all of the six analytic scores were not only correlated among themselves but also correlated with the holistic scores, (b) high correlations obtained among holistic and analytic scores were largely attributable to the impact of essay length on both analytic and holistic scoring, (c) there may be some potential for profile scoring based on analytic scores, and (d) some strong associations were confirmed between several e-rater variables and analytic ratings. Implications are discussed for improving the analytic scoring of essays, validating automated scores, and refining e-rater essay feature variables.
... Many studies in L2 writing measurement paid much attention to only the matter of topic variables. Carlson et al. (1985), Spaan (1993), Hamp-Lyons and Pochnow (1990), and Reid (1990) studied the interaction between topics and task types affecting the writers' performance, and found that topic types were important factors affecting the writers' final product. ...
Article
Full-text available
Achievement test scores are used to diagnose strengths, weaknesses, and a basis for awarding prizes, scholarship, or degrees. They are also used in evaluating the influences of course of study, teachers, teaching methods, and other factors considered to be significant in educational practice. Still, sometimes there is a gap in the score of essay tests and the existing knowledge of examinees. In the present study, the relationship between writing skill and the academic achievement of Iranian EFL students was examined to find a logical connection between them. The results of four final exams as content scores were examined and scored again in term of writing ability in analytical scoring scheme according to IELTS criteria. Then the average of two sets of scores calculated by two raters was compared with content scores of the same tests. The results showed that correlation between content score of all students and their writing skills is meaningful at 0.01 level of significance. The results showed that there is a strong relationship between EFL students' degree of content score and their writing skill.
... If we want people to learn how to write compositions, we should get them to write compositions in the test" (p.54). Because the act of writing involves the production of a written piece, actual writing samples, or direct measures of writing, now are viewed as a more appropriate means for assessing writing performance because they more nearly approximate real discourse (Carlson, Bridgeman, Camp & Waanders, 1985). ...
Article
Full-text available
This study primarily investigated the validity and reliability of the writing assessments and their backwash effects on the undergraduates of Institute of International Studies, Ramkhamhaeng University (IIS-RU). The English-major students had academic writing skills problem, especially among the non-native English speakers, whose writing ability was critical to their academic achievement as they were required to produce many academic writing tasks. There were some native English speaking students who were also unable to write essays or compose properly. The IIS professors motivated their students to develop their writing skills by using writing tests (e.g., essay exam and writing prompt test) as the instruments to measure students' writing competence, ability, and knowledge in curriculum.
... 45). Carlson, Bridgeman, Camp, and Waanders (1985) studied essays written in English by foreign college students. Each student wrote two essays on compare/contrast topics and two essays on descriptions of charts or graphs. ...
Article
If a student writes two essays, the score reliability can be estimated from the correlation between the essays. However, if the essays are in different modes or require different skills, the reliability may be underestimated from the correlation. In Advanced Placement history examinations, students wrote one standard essay and one essay that required analysis and synthesis of historical documents that were included with the question statement. If these document-based questions (DBQ) were assessing substantially different skills from the standard essays, then the reliability of the DBQ scores would be underestimated from their correlation with a standard essay score. A sample of 1045 U. S. history students and 891 European history students participated in a special study in which they wrote essays for either 2 DBQ questions and one standard essay question or 3 standard essay questions. The DBQ correlated as highly with a standard essay as with another DBQ, suggesting that the simple correlation of the two types of scores did not underestimate the reliability of the essay scores.
... A study by Carlson, Bridgeman, Camp, and Waanders (1985) explored correlations among Writer's Workbench measures and holistic essay scores. Significant correlations were found for some features and topics, but only a small set of essays was used. ...
Article
This project had three main objectives: (1) to establish a database of essays written by different language groups on a variety of topics for the Test of Written English (TWE) that can be used in future research; (2) to summarize, analyze, and compare the linguistic properties of those essays; and (3) to determine how the TWE performance of language groups relates to essay styles. As part of the first objective of this project we created a database of 1,737 essays, a data matrix of essay variables, files containing sorted phrases and vocabularies of different language groups, and files of the common and unique vocabulary items for each pair of language groups. The essay sample consisted of TWE essays from five language groups, including Arabic, Chinese, English (including native‐English and nonnative‐English), and Spanish speakers. Essays from English‐speaking students in the United States were collected and scored to provide a baseline with which to compare essays by students who speak English as a second language (ESL). This report includes the analysis of 106 variables for each essay along with summary analyses, such as correlation, analyses of variance, discriminative analysis, and factor analysis. It also presents information on the accuracy and cost of data entry and on the accuracy of the text analysis programs, as well as extensive data on vocabulary. A program that assesses content was developed for this project. Results show that topic differences affected some essay variables, but these effects were generally felt equally by the different language groups. Factor analysis replicated work showing that major distinguishing features of academic writing include a nominal style, passive constructions, and complexity of sentence structure. Discriminant analysis suggest three features that might distinguish the performance of different language groups — directness, expressiveness, and academic stance. Nevertheless, a linguistic analysis of the accuracy of underlying text analysis programs shows that for some word classes implicated in defining “academic” or “expressive” styles, cautions are needed in interpreting program outputs. Two variables that can be measured unambiguously by computer — number of words and the average length of words — taken together are quite predictive of TWE essay scores of non‐English speakers (multiple R > .8).
... cit.). For example, the TOEFL Program uses two topic types at different administrations of its Test of Written English, although analyses of trial prompts of the two topic types in the TOEFL Test of Written English (TWE) (Carlson, Bridgeman, Camp, & Waanders, 1985) showed that they behaved rather differently (they correlated at around .70). Reid (1989) looked more closely than Carlson et al. at the four writing prompts in the TWE study, and found that the students' tests varied significantly from topic type to topic type, even when the differences did not result in differing score patterns. ...
Article
Dan Horowitz was well-known for his research into essay examination prompts, and was greatly respected for the intellec-tual clarity of his work and for his humanistic grounding of that work in the central questions facing practitioners in ESL class-rooms in colleges and universities. In this paper I review some of the work that has been done on prompt effects in ESL writing at the college or college preparatory level, focusing on just one small aspect in an attempt to move our work in this area toward a better general understanding. While I do not make explicit reference to Dan's work in the text, the collegial dialogue we maintained is an important underpinning of the paper. There has been a great deal of research into the question of whether topic types, topics, the linguistic structure of questions, and an array of what we may group together as "prompt effects" have a significant effect on the measured writing ability of native writers of English. While far less work has been done on those same issues in relation to the writing of nonnative English users, it seems likely that the effects of prom pts, if such exist, will only be exacerbated when we look at nonnative writers rather than native writers. An overview of the field (Hamp-Lyons, in press) suggests that in first language writing assessment the trend is to treat topics, and even topic types, as equivalent. In the major ESUEFL writing assessment programs the same trend emerges (Hamp-Lyons, op. cit.). For example, the TOEFL Program uses two topic types at different administrations of its Test of Written English, although analyses of trial prompts of the two topic types in the TOEFL Test of Written English (TWE) (Carlson, Bridgeman, Camp, & Waan-ders, 1985) showed that they behaved rather differently (they correlated at around .70). Reid (1989) looked more closely than Carlson et al. at the four writing prompts in the TWE study, and found that the students' tests varied significantly from topic type to topic type, even when the differences did not result in differing 37 CHALLENGING THE TASK score patterns. The variation was most marked for strong writers. Perhaps weaker writers have less language flexibility, while those with higher scores seem able to adapt to different topics. In looking at the issue of whether there is a significant effect on student essay test writing from the prompts used, then, it would seem that the answer depends on one's research orientation and the questions one asks as much as it does on "hard" numbers. I can illustrate this from two studies carried out in the Univer-sity of Michigan Testing and Certification Division. Both studies looked at prompts on the ME LAB (Michigan English Language Assessment Battery, a test for nonnative users of English filling the same function as the TOEFL, but including a composition as a basic component rather than as an occasional, optional, extra). In the first, Spaan (1989) describes an experimental study of two prompts from the MELAB, chosen because they appeared on the surface to be dramatically different. Spaan provides a linguistic analysis to show the linguistic, cognitive, and schematic differ-ences of the prompts, and interprets her score data as suggesting that even these prompts in fact yield scores which are significantly related. In the second study, Hamp-Lyons and Prochnow (1990), in a post-hoc study, looked at all sets of MELAB prompts used in the period 1986-89, including those Spaan had used. They found that expert judges and student writers felt able to recognize easy and difficult topics, and that their judgments of the relative easel difficulty of prompts were generally confirmed by score levels on prompts. While they confirm Spa an's assessment of her two topics as radically different in difficulty, considered a-contextually, by looking too at a general language proficiency measure they are able to suggest some reasons for essay scores that are less different than predicted. It seems that nonnative writers taking the MELAB writing test component assess which prompt is easier, and that students with weaker language proficiency choose the easier prompt while students with stronger language proficiency choose the harder prompt. Hypothesizing too that reader accommodation plays its part in pushing disparate prompts toward parity of treatment, they suggest that both weaker and stronger writers regress toward the mean in their writing score, relative to their scores on other language components. These two research studies 38
... Many previous studies measured fluency by number of words written (e.g., Carlson, Bridgeman, Camp & Waanders, 1985;Fathman & Whalley, 1996;Reid & Findlay, 1986;Reid, 1996). In this present study, fluency is defined as length of body paragraphs coherently related to the main statement. ...
Article
Full-text available
This study uses experimental and control group data to investigate whether learning to use transition words results in enhancing students' fluency in writing. Common sentence connectors, such as moreover, however, thus, etc were chosen in order that students learn the use of transition words in text and improve their writing fluency. 36 first-year university students were placed in an intermediate class: 18 control group students and 18 experimental group students. Over a 12-week period, both groups received equal amounts of writing assignments. During the first half of the period, both groups were given content and form feedback, but the experimental group was given additional marginal comments on the use of sen-tence connectors. After six weeks, both groups were given identical types of feedback and comments. Fluency was measured by the number of words written and successful connections (SCs). These results were analyzed to determine if there was a significant difference in fluency between the two groups. Find-ings suggest that writing teachers should teach students the effectiveness of using transition words in EFL writing classes, and this may in part help to improve students' fluency.
... Without looking at these issues in contrastive rhetoric, it is difficult to determine to what degree they might have contributed to failure of the writing courses. As a rule, instructors more often overlook foreign language errors if they do not impede the content and general form of the writing (Carlson & Bridgeman, 1983;Carlson, Bridgeman, Camp & Waanders, 1985). It is also unclear to me how the instructors graded their students, if a uniform rubric was used, or if a standardized list of criteria was a basis for scoring achievement. ...
Article
Many might assume that English language learners who are originally from other coun-tries, but are raised in the United States and graduate from American high schools (Generation 1.5) would fare better academically than English learners who graduate from high schools abroad and then migrate to the United States after graduation. However, as we demonstrate, this is not always the case. Through the perspectives of an ESOL teacher who interacts with students, and a quantitative researcher who measures students' performance, this paper dis-cusses success in college-level ESOL writing courses, the influence of acculturation through living in the US, and the quality and significance of prior secondary academic preparation in the home language. The ESOL classroom teachers' years of practical experience complement and clarify the findings of researchers, and present a more accurate picture of English learners and their authentic production of written English.
... a. These TOEIC findings are consistent with general evidence that various aspects of English proficiency are relatively closely intercorrelated: it appears that coefficients centering around .70 can be expected between direct and indirect measures of basic ESL macroskills (writing, speaking, listening, reading) in representative samples of educated ESL users/learners, such as those who take the TOEIC in corporate settings, or the TOEFL in academic settings (see, for example, Hale, 1986;Carlson, Bridgeman, Camp, and Waanders, 1985;Oller, 1983;Clark, 1979, 1980;Pike, 1979;Echternacht, 1970;Carroll, 1967aCarroll, , 1967b. This level of correlation (centering around .7) also tends to obtain between scores on the TOEFL and scores on the verbal sections of standard undergraduate-and graduate-level admission tests (e.g, SAT, GMAT, GRE), widely used in the United States, in both unselected samples (e.g., Wilson, 1982;Powers, 1980;Angelis, Swinton, and Cowell, 1979) and highly selected samples of enrolled graduate students (e.g., Yule and Hoffman, 1991;Wilson, 1986Wilson, , 1985Sharon, 1972). ...
Article
This exploratory study examined relationships between a test designed to assess English-language listening comprehension and reading skills in samples of nonnative-English speakers at lower levels of developed proficiency in English as a second or foreign language (ESL or EFL). The test--called ESL-EZY--was developed by using items similar to but easier, on the average, than those being used in an existing ESL proficiency test designed for intermediate-or higher-level ESL users/learners. This paper reports evidence regarding the relationship between ESL-EZY scores and teachers' ratings of oral English proficiency in samples--assessed in diverse settings in the United States and Japan--selected to include subgroups that tend to differ relatively widely in average level of developed English proficiency. Scores on ESL-EZY were found to correlate relatively strongly with teachers' ratings within the respective subgroups; correlations were especially strong in relatively less proficient subgroups, suggesting that for younger ESL-students and other ESL-learners with limited developed functional proficiency in English as a foreign or second language, a test embodying properties similar to those represented by ESL-EZY might provide useful supplementary assessment information.
... These three areas are discourse mode (e.g. Brown, Hilgers, & Marsella, 1991;Carlson, Bridgeman, Camp, & Waanders, 1985;Cumming et al., 2005;Nold & Freedman, 1977;Plakans, 2008Plakans, , 2010Quellmalz, Capell, & Chou, 1982;Reid, 1990), rhetorical specification (e.g. Brossell, 1983;Hult, 1987;Redd-Boyd & Slater, 1989;Yu, 2009), and wording and structure of writing prompts (e.g. ...
Article
The present study aims to continue in a vein of research which examines the effects of essay prompts on examinees’ writing performance by closely investigating 40 student essays produced from a university-wide reading-to-write test. Quantitative and qualitative results of this study show that native and non-native writers at different proficiency levels exhibit variety in their selection of lexical items and propositional material from the background reading. Among other things, it is found that the higher-rated native group outperformed the other groups in their ability to identify topical information and in a better sense of what details from the source text to include. The two non-native groups, although able to locate superordinate propositions of the source text, lack native writers’ ability to readjust their selection of material according to the author's epistemological stance. The lower-rated native writers paid little attention to the source text and merely used the substance of the text as a “springboard” to elicit their own opinions in response to the topic. Possible explanations for these results and their implications for writing pedagogy and assessment are also discussed.
... This consistency also has to be maintained between raters to avoid adverse effects on reliability. It has been shown, however, that with rater training and standardization, reliability can be achieved (Carlson et al., 1985;Homburg, 1984;Reid, 1993;Upshur & Turner, 1995). Studies on raters' rating processes using verbal protocol analysis (e.g. ...
Article
Full-text available
The present study was conducted with a twofold purpose. First, I aim to apply the socio-cognitive framework by Shaw and Weir (2007) in order to validate a summative writing test used in a Malaysian ESL secondary school context. Secondly, by applying the framework I also aim to illustrate practical ways in which teachers can gather validity evidence where this in turn would help them design and evaluate their tests in light of their teaching context and the purpose of assessment. In addition, teachers may be able to reflect on learners’ progress and areas where learners need to improve by looking at the interplay of tasks and learner's response. Twenty exam scripts written by 16-year old ESL learners were rated based on a marking scheme to identify scoring validity. Finally, I will conclude that the validity of score interpretations has been established to a certain degree and the framework is practical for the purposes of the study.
... It is suggested that consideration be given to collecting data for alternative measures of listening comprehension on the same day that examinees take the TOEFL test. To the extent possible, the same procedures used by Carlson, Bridgeman, Camp, and Waanders (1984) would be employed. Basically, these entailed gaining the cooperation of TOEFL test center supervisors or agents in several countries. ...
Article
A literature review was conducted in order to identify various parameters underlying listening comprehension. The results of this review were used as a basis for a survey of faculty in six graduate fields as well as undergraduate English faculty. The purpose of the survey was to a) obtain faculty perceptions of the importance to academic success of various listening skills and activities, b) assess the degree to which both native and non-native speakers experience difficulties with these skills or activities, and c) determine faculty views of alternative means of evaluating these skills. Faculty perceived some listening skills as more important than others for academic success. These included nine skills in particular that were related primarily to various aspects of lecture content (e.g. identifying major ideas and relationships among them). As might be expected, faculty perceived that non- native students experience more difficulty than native students with all listening activities, and that non-native students have disproportionately greater difficulty with some activities, such as following lectures given at different speeds and comprehending or deducing the meaning of important vocabulary. With respect to measuring listening comprehension, some general approaches and specific item types were judged to be more appropriate than others. These included tasks that entail answering questions involving the recall of details as well as those involving inference or deductions. The results of the survey are used to suggest further research on the construct validity of the Listening Comprehension section of the TOEFL.
... Probably due to its great size, the TWE test uses a group of raters who are predominantly English-trained and not ESL-trained, and current rater training pays no specific attention to how ESL writing features affect the TWE essays. While the weight of the studies reported above suggests that ESL-trained readers are likely to give more valid (and reliable) readings of ESL essays, ETS has not undertaken a study to see if non-ESL and ESL teachers react to the same features of an essay in arriving at a holistic TWE score since the Carlson, Bridgeman, Camp, and Waanders (1985) study. This study found only moderate agreement between the scores of the two kinds of readers, and this is an important issue that deserves to be revisited. ...
... See Breland (1983), for a comprehensive, detailed analysis of the numerous problems that are involved in the direct assessment of writing ability in samples of U.S. collegebound high-school seniors; see also Breland (1977). To obtain a useful overview of many of these problems as they apply to the development of writing samples suitable for use in testing ESL users/learners in the TOEFL testing context, see Carlson, Bridgeman, Camp, and Waanders, 1985). ...
Article
This study was undertaken to develop guidelines for making interpretive inferences from scores on the Test of English for International Communication (TOEIC), a norm-referenced test of English-language listening comprehension (LC) and reading (R) skills, about level of ability to use English in face-to-face conversation, indexed by performance in the Language Proficiency Interview (LPI) situation. LPI performance, rated according to behaviorally defined levels on the LPI/ILR/FSI quasi-absolute proficiency scale, was treated as a context-independent criterion, using the familiar regression model in an apparently novel application (for such criterion-referenced purposes) in the context of a large-scale ESL-testing program. The study employed TOEIC/LPI data-sets generated during operational ESL assessments in representative TOEIC-use settings (places of work or work-related ESL training) in Japan, France, Mexico, and Saudia Arabia, involving samples of adult, educated ESL users/learners in or preparing for ESL-essential positions with companies engaged in international commerce. The pattern of TOEIC/LPI concurrent correlations was consistent across samples and there was relatively close fit between sample LPI means and estimates from TOEIC scores, especially TOEIC-LC, using combined-sample regression equations. Theoretical and pragmatic implications of the findings are discussed. General guidelines are provided for making inferences about LPI-assessed level of oral English proficiency from TOEIC scores. Directions are suggested for further research and development activities in the TOEIC testing context. (Author)
... Recent developments in the testing of writing in a second language (e.g., Jacobs, Zinkgraf, Wormuth, Hartfiel, & Hughey, 1981;Carlson, Bridgeman, Camp, & Waanders, 1985;Hamp-Lyons, 1987), the continuing articulation of communicative testing theory (see especially, Canale & Swain, 1980; Alderson, 1981;Johns-Lewis; Morrow, 1981;Olshtain & Blum-Kulka, 19851, and the emergence of powerful new psychometric methodologies for the analysis of rating scales (e.g., Andrich, 1978;Wright & Masters, 1982;Henning & Davidson, 1987;Pollitt & Hutchinson, 1987) have together encouraged us to carry out this study of what we shall refer to as communicative writing profiles. Particularly of interest to us has been the question of the validity ofusing a multiple-trait scoring method for scoring ESL writing produced in timed, impromptu, direct tests of writing, and reporting the learners' performance as communicative writing profiles when the scoring instrument has not been designed specifically for the context in which it has been used. ...
Article
This study investigated the validity of using a multipletrait scoring procedure to obtain communicative writing profiles of the writing performance of adult nonnative English speakers in assessment contexts different from that for which the instrument was designed. Tran sferability could be of great benefit to those without the resources to design and pilot a multiple‐trait scoring instrument of their own. A modification of the New Profile Scale (NPS)was applied in the rating of 170 essays taken from two non‐NPS contexts, including 91 randomly selected essays of the Test of Written English and 79 essays written by a cohort of University of Michigan entering undergraduate nonnative English speaking students responding to the Michigan Writing Assessment. The scoring method taken as a who leappeared to be highly reliable in composite assessment, appropriate for application to essays of different timed lengths and rhetorical modes, and appropriateto writers of different levels of educational preparation. However, whereas the subscales of Communicative Quality and Linguistic Accuracy tended to show individual discriminant validity, little psychometric support for reporting scores on seven or five components of writing was found. Arguments for transferring the NPS for use in new writing assessment contexts would thus be educational rather than statistical.
... Surface code measures are those measures that assess word composition, lexical items, part of speech categories and syntactic composition at the surface level. In general, the studies that have used surface code measures have demonstrated that higher-rated essays contain more words (Carlson, Bridgeman, Camp & Waanders, 1985;Ferris, 1994;Frase et al., 1997;Reid, 1986Reid, , 1990, and use words with more letters or syllables (Frase et al., 1997;Grant & Ginther, 2000;Reid, 1986Reid, , 1990Reppen, 1994). Syntactically, L2 essays that are rated as higher quality include more surface code measures such as subordination (Grant & Ginther, 2000) and instances of passive voice (Connor, 1990;Ferris, 1994;Grant & Ginther, 2000). ...
Article
Full-text available
This study addresses research gaps in predicting second language (L2) writing proficiency using linguistic features. Key to this analysis is the inclusion of linguistic measures at the surface, textbase and situation model level that assess text cohesion and linguistic sophistication. The results of this study demonstrate that five variables (lexical diversity, word frequency, word meaningfulness, aspect repetition and word familiarity) can be used to significantly predict L2 writing proficiency. The results demonstrate that L2 writers categorised as highly proficient do not produce essays that are more cohesive, but instead produce texts that are more linguistically sophisticated. These findings have important implications for L2 writing development and L2 writing pedagogy.
... We used the same prompt for both L1 and L2 writing except for changing " a newspaper " to " an English newspaper " for L2 writing .6 We chose the same topic in L1 and L2 because different topics might affect both quality and quantity of writing (see Carlson, Bridgeman, Camp, & Waanders, 1985; Hamp-Lyons, 1990; Reid, 1990). We selected a topic on women's roles because a similar topic had been used in a number of previous studies (e.g., Cumming, 1989; Jones & Tetroe, 19871, and also because a similar topic was the most popular among Japanese first-year university students who had discussed 10 different topics in English (Hirose & Kobayashi, 1991). ...
Article
This study investigated factors that might influence Japanese university students’ expository writing in English. We examined 70 students of low‐ to high‐intermediate English proficiency along a variety of dimensions, namely, second language (L2) proficiency, first language (L1) writing ability, writing strategies in L1 and L2, metaknowledge of L2 expository writing, past writing experiences, and instructional background. We considered these multiple factors as possible explanatory variables for L2 writing. Quantitative analysis revealed that (a) students’ L2 proficiency, L1 writing ability, and metaknowledge were all significant in explaining the L2 writing ability variance; (b) among these 3 independent variables, L2 proficiency explained the largest portion (52%) of the L2 writing ability variance, L1 writing ability the second largest (18%), and metaknowledge the smallest (11%); and (c) there were significant correlations among these independent variables. Qualitative analysis indicated that good writers were significantly different from weak writers in that good writers (a) paid more attention to overall organization while writing in L1 and L2; (b) wrote more fluently in L1 and L2; (c) exhibited greater confidence in L2 writing for academic purposes; and (d) had regularly written more than one English paragraph while in high school. There was no significant difference between good and weak writers for other writing strategies and experiences. On the basis of these results, we propose an explanatory model for EFL writing ability.
Article
Full-text available
This study delves into the intricate landscape of second language (SL) writing assessment, with a focus on the impact of Dynamic Assessment (DA) and learners’ personality traits on their writing achievement. DA is an approach that combines instruction and assessment to mediate with the areas that students need help with to promote learners understanding and writing skills. To conduct this study, 90 participants were first given an online Oxford placement test and then, they were asked to write a descriptive essay as a pre-test, according to which, the students were randomly assigned to interventionist, interactionist, and control groups. Subsequently, the two experimental groups attended five weekly sessions in which the instructor introduced the five components of descriptive writing. The students were then given a writing task as the post-test and were asked to fill out a Five Factor Personality Inventory. A t-test was used to compare post-test writing scores of the two groups to discover which treatment had a greater impact on intermediate learners’ writing achievement. The analysis of the data obtained reveals that students’ personality traits significantly affect their performance in descriptive writing. Also, both interactionist and interventionist techniques exhibit a positive influence on learners' writing achievement though the interactionist group received a higher mean score in descriptive writing. These findings resonate with the broader academic discourse where dynamic assessment is recognized as a central goal of language learning, and autonomy is considered an indispensable prerequisite for successful language acquisition
Article
Full-text available
This paper presents a study based on the linguistic profiling methodology to explore the relationship between the linguistic structure of a text and how it is perceived in terms of writing quality by humans. The approach is tested on a selection of Italian L1 learners essays, which were taken from a larger longitudinal corpus of essays written by Italian L1 students enrolled in the first and second year of lower secondary school. Human ratings of writing quality by Italian native speakers were collected through a crowdsourcing task, in which annotators were asked to read pairs of essays and rated which one they believed to be better written. By analyzing these ratings, the study identifies a variety of linguistic phenomena spanning across distinct levels of linguistic description that distinguish the essays considered as ‘winners’ and evaluates the impact of students’ errors on the human perception of writing quality.
Article
Although lexical diversity is often used as a measure of productive proficiency (e.g., as an aspect of lexical complexity) in SLA studies involving oral tasks, relatively little research has been conducted to support the reliability and/or validity of these indices in spoken contexts. Furthermore, SLA researchers commonly use indices of lexical diversity such as Root TTR (Guiraud’s index) and D (vocd-D and HD-D) that have been preliminarily shown to lack reliability in spoken L2 contexts and/or have been consistently shown to lack reliability in written L2 contexts. In this study, we empirically evaluate lexical diversity indices with respect to two aspects of reliability (text-length independence and across-task stability) and one aspect of validity (relationship with proficiency scores). The results indicated that neither Root TTR nor D is reliable across different text lengths. However, support for the reliability and validity of optimized versions of MATTR and MTLD was found.
Chapter
This chapter presents an account of the development and emergence of rating scales based on verbal descriptions and their use in L2 writing assessment from the 1970s, culminating in the central position they occupy in current practice. It describes the choices available to L2 writing test developers (as between holistic and analytic approaches), and the alternatives to rating scales provided by new tools, such as automated scoring, on the one hand, and more descriptive, negotiated approaches such as dynamic criteria mapping on the other.
Chapter
This edited volume is a collection of theoretical and empirical overviews of second language (L2) proficiency based on four skills: reading, writing, listening, and speaking. Each skill is reviewed in terms of how it has been conceptualized, measured, and studied over the years in relation to relevant (sub-) constructs of the language skill under discussion. This is followed by meta-analyses of correlation coefficients that examine the relationship between the L2 skill in question and its component variables. Unlike most meta-analyses that have a limited range of variables under investigation, our meta-analyses are much larger in scope to better clarify such relationships. By combining theoretical and empirical approaches, the book is helpful in deepening the understanding of how subcomponents or various variables are related to a particular L2 skill.
Article
This study investigates the potential for linguistic microfeatures related to length, complexity, cohesion, relevance, topic, and rhetorical style to predict L2 writing proficiency. Computational indices were calculated by two automated text analysis tools (Coh-Metrix and the Writing Assessment Tool) and used to predict human essay ratings in a corpus of 480 independent essays written for the TOEFL. A stepwise regression analysis indicated that six linguistic microfeatures explained 60% of the variance in human scores for essays in a test set, providing an exact accuracy of 55% and an adjacent accuracy of 96%. To examine the limitations of the model, a post-hoc analysis was conducted to investigate differences in the scoring outcomes produced by the model and the human raters for essays with score differences of two or greater (N = 20). Essays scored as high by the regression model and low by human raters contained more word types and perfect tense forms compared to essays scored high by humans and low by the regression model. Essays scored high by humans but low by the regression model had greater coherence, syntactic variety, syntactic accuracy, word choices, idiomaticity, vocabulary range, and spelling accuracy as compared to essays scored high by the model but low by humans. Overall, findings from this study provide important information about how linguistic microfeatures can predict L2 essay quality for TOEFL-type exams and about the strengths and weaknesses of automatic essay scoring models.
Article
Full-text available
This study examined the relation of TOEFL® performance to a widely used variant of the cloze procedure–the multiple-choice (MC) cloze method. A main objective was to determine if categories of MC cloze items could be identified that related differentially to the various parts of the TOEFL. MC cloze items were prepared and classified according to whether the involvement of reading comprehension, as defined by sensitivity to long-range textual constraints, was primary or secondary. For two categories, reading comprehension was primary and knowledge of grammar or vocabulary was secondary, and for two other categories knowledge of grammar or vocabulary was primary and reading comprehension secondary. Examinees taking an operational TOEFL at domestic test centers were given the three basic sections of the test along with a fourth section containing the MC cloze items. Performance was examined for each of nine major language groups. Exploratory and confirmatory factor analyses for the basic TOEFL were performed first, to provide a basis for relating the MC cloze items to the TOEFL structure. These factor analyses suggested that, from a practical standpoint, TOEFL performance can be adequately described by just two factors, which relate to (a) Listening Comprehension, and (b) all other parts of the test–Structure, Written Expression, Vocabulary, and Reading Comprehension. Examination of the MC cloze test showed that the total MC cloze score was relatively reliable and that it was possible to estimate item response theory parameters for the MC cloze items with reasonable accuracy. Thus, the development of the MC cloze items was successful in these respects. However, the correlations among scores for the four MC cloze item categories were approximately as high as their reliabilities, thus providing no strong empirical evidence that the item types within the MC cloze test reflected distinct skills. Correlational analyses related the four MC cloze categories to the five parts of the TOEFL. These analyses revealed a slight tendency for MC cloze items that involved a combination of grammar and reading to relate more highly to the Structure and Written Expression parts of the TOEFL than the other parts, and for MC cloze items that involved a combination of vocabulary and reading to relate more highly to the Vocabulary and Reading Comprehension parts of the TOEFL than the other parts. Although this pattern was relatively consistent across language groups, however, the differences among correlations were not substantial enough to be of practical importance. Multiple regression analyses were performed, using total MC cloze score as the dependent variable and the five TOEFL parts as independent variables. The resulting multiple Rs were mostly in the lower to upper .90s, suggesting that total MC cloze performance can be predicted from TOEFL performance with a relatively high degree of accuracy. In general, the study provided no evidence that distinct skills are measured by the nonlistening parts of the TOEFL or by the four categories of MC cloze items. It would appear that the skills associated with grammar, vocabulary, and reading comprehension are highly interrelated, as assessed by the TOEFL and the MC cloze test.
Article
In the past, the GRE Board supported research on an item type that measures higher‐level cognitive abilities and that uses a free‐response format–the Formulating Hypotheses (FH) item type. Further research was not recommended because of issues associated with the cost and feasibility of the operational use of a test composed of FH items. This project focused on the two major issues that need to be addressed in considering FH items for operational use: (1) the costs of scoring, and (2) rather than the conventional number‐right scoring, the assignment of scores along a range of values. The first issue was addressed directly by seeking ways to increase the efficiency of scoring through computerized delivery and scoring. The second issue was addressed both directly and indirectly by recommending specific procedures for the computer recognition of responses and problem delivery that will be sufficiently reliable and well‐rationalized to be acceptable to reasonable evaluators. This project involved collaboration with experts who are closely involved in confronting the issues involved in the computer recognition and evaluation of open‐ended responses. After a series of analyses to explore the design and scoring of FH‐type items for computer delivery, we arrived at specific recommendations for developing a system to deliver computerized problems of the FH type. When developed, the prototype also will serve as a computerized research tool to conduct further investigations of potential variations in these types of items.
Article
The focus of the study was the new GRE writing measure and the proposed GRE scoring guide. The objective was to determine the degree to which the features of student essays upon which scores will be based are the same features that graduate educators use at their institutions to evaluate students' writing. A sample of essays, which had been previously reviewed and judged by graduate deans and faculty, were rescored several times by trained readers. Each rescoring was based on a specific trait in the GRE scoring guide — development of ideas, sentence structure, and so on. The influence of each feature on the judgments made by faculty/deans and on the scores assigned by trained readers was compared. No evidence was uncovered to suggest any differences between graduate deans/faculty and GRE essay readers with respect to the bases on which they judge essay quality.
Article
In this article, I examine to what extent Computers and Composition: An International Journal for Teachers of Writing is international. My analysis of several aspects of the journal indicates limited international scope. I also discuss two issues important when considering the potential international scope of computers and writing research and practices: the differing uses of computers for writing by different language users and the differing concepts of identity and self in different cultures in relation to writing. I conclude with concrete suggestions for broadening our perspectives on computers and writing and making this journal truly international.
Article
The goal of the current study was to examine the validity and topic generality of a writing performance test designed to place international students into appropriate ESL courses at a large mid-western university. Because for each test administration the test randomly rotates three academic topics integrated with listening and reading sources, it is necessary to investigate the extent to which the three topics are compatible in terms of difficulty and generality across a diverse group of examinees. ESL Placement Test (EPT) scores from more than 1,000 examinees were modeled using multinomial logistic regression. Possible explanatory variables were identified as the assigned writing topic, students' majors, and their scores on the Test of English as a Foreign Language (TOEFL). Results indicate that after controlling for general English proficiency as measured by the TOEFL, students' majors were not related to their writing performance; however, the different topics did affect performance. In light of test validity, the demonstrated topic effect argues against the comparability of the three topics. Nevertheless, the absence of effect from the interaction of essay topic and writers' majors supports the generality of each topic for examinees from a wide range of disciplinary areas. Study limitations and future research suggestions are discussed.
Article
In 1986, Educational Testing Service (ETS) added the Test of Written English (TWE), which requires the production of a 30-minute writing sample, to some administrations of the Test of English as a Foreign Language (TOEFL). The test was developed on the basis of two major research projects which investigated the role of writing in the academic community and the type of writing tests that faculty felt their students should be able to produce (Bridgeman and Carlson, 1983; Carlson et al., 1985). ETS appointed a committee of outside consultants, the TWE Core Reader Group, to develop topics for the exam and to determine whether or not a given writing topic should be approved for the exam. This article details the topic development process for the TWE, the work of the Core Reader group, and the procedures for reading and scoring the TWE.
Article
This study investigates the degree to which differences exist in the rating of two NES and two ESL essays by 32 English and 30 ESL professors in the English Department of CUNY's Kingsborough campus. The two faculty groups were divided into subgroups, one rating the four essays holistically on a 1 to 6 scale and the other rating them on a 1 to 6 scale but in light of 10 specifically categorized features, 6 comprising rhetorical and 4 language features. The results indicated that in holistic evaluation, English and ESL faculty raters differed significantly, with English faculty assigning higher scores to all four essay samples. In analytic evaluation, the two groups did not evidence significant differences in rating the specifically categorized features. Raters with more years of experience in teaching and holistic evaluation tended to be more lenient in their holistic evaluation, whereas with respect to analytic evaluation, experience in the two areas was not an influencing factor. Also, in holistic evaluation, English faculty seemed to give greater weight to the overall content and quality of the rhetorical features in the writing samples than they did to language use.
Article
A comprehensive review was conducted of writing research literature and writing test program activities in a number of testing programs. The review was limited to writing assessments used for admission in higher education. Programs reviewed included ACT, Inc.'s ACT™ program, the California State Universities and Colleges (CSUC) testing program, the College Board's SAT® program, the Graduate Management Admissions Test® (GMAT®) program, the Graduate Record Examinations® (GRE®) test program, the Law School Admission Test® (LSAT®) program, the Medical College Admission Test® (MCAT®) program, and the Test of English as a Foreign Language™ (TOEFL®) testing program. Particular attention was given in the review to writing constructs, fairness and group differences, test reliability, and predictive validity. A number of recommendations are made for research on writing assessment.
Article
Like so many other aspects of language analysis, assessing the writing abilities of non-native English speakers (NNES) becomes an increasingly complex issue as one explores both its root meaning and its current uses; this complexity can be traced, in part, to the recognition that writing abilities develop in interaction with other language skills. In this volume, various chapters have narrowed, if artificially, the area of investigation by providing for a separate consideration of assessment as applied to each of the four language skills; clearly assessment of language proficiency as a total package also is of great concern in academic contexts, especially where NNES students are concerned (see also Resources in Language Testing website).
Article
This essay proposes ways to improve mandatory college placement for ESL writers and explores them through theory, an experiment, and a case study. Current methods of placement have problems with reader bias and instructional validity and sometimes disregard common facts of writing diagnosis. The proposed new method intends to avoid the problems by combining and balancing these cognitive acts. It divides readers into two tiers. The first is non-specialist faculty, who read essays with information about the writer hidden, but who can only place students into the most desired course; the second tier is specialist faculty who read with foreknowledge of the writer's name and background. Six years of placement outcomes of this system are reported at one university. Results are also reported of an experiment (participant N = 124) in the reading of the placement writing of a Japanese student (Kiyoko) in which foreknowledge about the writer was systematically varied. Results supported the proposed new system in that ethnic and languagestatus inferences about the writer (some incorrect) and foreknowledge about the writer's background were systematicallly associated with changes in evaluation and placement. Finally, the actual placement history of Kiyoko and the possible effects of knowledge about contrastive rhetoric on the placement are considered as further support of the method.
Article
Full-text available
Recently, first language composition researchers have shown that by using a traditional composition teaching method which focuses on the form and correctness of a finished product, teachers ask students to produce writing which does not reflect the actual writing process. Findings indicate that most school-sponsored writing does not involve the self-motivation, contemplation, exploration, and commitment which characterize real life writing. These researchers recommend that in addition to being taught expository writing, students should have more opportunity for expressive writing in their writing courses to allow them to become better academic writers. According to some second language composition researchers and teachers, these recommendations are applicable to ESL college students because their composing strategies are similar to those of native English-speaking college students.Ungraded, uncorrected journals can provide a non-threatening way for students to express themselves in written English. However, the student-teacher working journals which we describe in this article are unlike student personal journals in two important ways: first, the topic of working journals is not personal, but is rather an outgrowth of the writing class; and second, the teacher regularly writes journals to the class on the same subject and includes, in those journals, selected student journal entries. The advantages of this approach are that a group awareness develops around issues relevant to ESL composition, that students come to see writing as a way to generate ideas and to share them, and that teachers become participants in the writing process.
Article
Full-text available
A concern for the particular needs of university-bound Arabic-speaking students has been shared by many in ESL. Out of this concern, and especially out of concern for the characteristic English writing deficiencies of many Arab ESL students, various aspects of written Arabic—from orthographical conventions to rhetorical devices—are discussed. These contrasting features have been identified as potential contributors to observed error production and weaknesses in some reading skills, but most particularly in writing skills. Ideally, a better understanding of the language background of Arab students can aid the ESL specialist in better addressing the special needs of these students through supplemental curricular objectives and appropriate exercises.Although contrastive analysis is no longer seen as a foundation for instructional programs (Schachter 1974), it can be a useful tool in understanding characteristic language-learning weaknesses demonstrated by a particular language group. Implicit in this article is the assumption that a familiarity with some salient contrasting features of written Arabic and English may prove valuable to those in ESL concerned with addressing noted weaknesses in Arab students' English writing skills.
Article
Full-text available
Explored the question of why competent evaluators award the ratings they do to college students' expository essays. Essays were rewritten to be stronger or weaker in 4 categories: content, organization, sentence structure, and mechanics. 12 evaluators first used a 4-point holistic rating scale to judge the essays' quality. Then they rated whether each of the 4 rewriting categories in each rewritten essay was strong or weak (perceptions). ANOVAs revealed content and organization to affect ratings most ( p 
Article
To find out what kinds of writing are required of graduate engineering students, twenty-five engineering faculty members from the Engineering College at the University of Florida listed the kinds of writing assigned to graduate classes during the academic year 1979-80. Since the faculty members were asked to rank-order the writing kinds from most frequent to least frequent, the Friedman analysis of variance and the Wilcoxon Signed Ranks test were used to test for differences in the rank ordering. The tests showed that faculty-assigned examinations, quantitative problems, and reports most frequently, that they assigned homework and papers (term and publication) less frequently, and that they assigned progress reports and proposals least frequently.
Article
A survey of the academic writing skills needed by beginning undergraduate and graduate students was conducted. Faculty, in 190 academic departments at thirty‐four U.S. and Canadian universities with high foreign student enrollments completed the questionnaire. At the graduate level, six academic disciplines with relatively high numbers of nonnative students were surveyed: business management (MBA), civil engineering, electrical engineering, psychology, chemistry, and computer science. Undergraduate English departments were chosen to document the skills needed by undergraduate students. The major findings are summarized below. Although writing skill was rated as important to success in graduate training, it was consistently rated as even more important to success after graduation. Even disciplines with relatively light writing requirements (e.g., electrical engineering) reported that some writing is required of first‐year students. The writing skills perceived as most important varied across departments. Faculty members reported that, in their evaluations of student writing, they rely more on discourse‐level characteristics than on word‐ or sentence‐level characteristics. Discourse‐level writing skills of natives and nonnatives were perceived as fairly similar, but significant differences between natives and nonnatives were reported for sentence‐ and word‐level skills and for overall writing. Among the ten writing sample topic types provided, preferred topic types differed across departments. Although some important common elements among the different departments were reported, the survey data distinctly indicate that different disciplines do not uniformly agree on the writing task demands and on a single preferred mode of discourse for evaluating entering undergraduate and graduate students.
Article
Technical writing required of employees in business and industry has been investigated, but the writing demands on graduate students have not been systematically surveyed. To find out what kinds of writing are required of graduate engineering students, twenty-five engineering faculty members from the Engineering College at the University of Florida listed the kinds of writing assigned to graduate classes during the academic year 1979–80. Since the faculty members were asked to rank-order the writing kinds from most frequent to least frequent, the Friedman analysis of variance and the Wilcoxon Signed Ranks test were used to test for differences in the rank ordering. The tests showed that faculty assigned examinations, quantitative problems, and reports most frequently, that they assigned homework and papers (term and publication) less frequently, and that they assigned progress reports and proposals least frequently.
Article
In their attempts to design ESOL programs that will produce mature readers, the authors discovered too little attention has been given to the characteristics of writing in English. Non-native learners need systematic exposure to elements of prose style-to enable them not to become writers but to become better readers. In this article, the authors set up a model of the relationship between writing and reading which links the work of separate fields of investigation, all of them relevant to reading: prescriptive and contrastive rhetoric, textual discourse analysis, ESP text research, and psycholinguistics and reading. The model suggests how these endeavors remind us of the blind men touching different parts of the elephant. One area, the literature on prescriptive rhetoric, offers a valuable source for extracting principles which affect writing in English. It indicates a fresh perspective from which to view expository writing-the type with which ESOL readers must be able to cope. From this literature, the authors isolate two pervasive axioms for writers: planning, and using discourse devices. Within each of these broad areas, they outline how prescriptions for good writing can be translated into strategies for effective reading.
Article
This paper is a follow-up to the article "Presuppositional Information in EST Discourse," by the same authors (TESOL Quarterly, 1976). In both these papers we study what appears to be a serious learning problem for advanced learners attempting to learn to read their subject matter in English: namely, the apparent inability of the learner to gain access to the total meaning of a written piece of EST discourse even when he or she may be able to understand all of the individual words in each sentence of an EST paragraph, and/or all of the sentences in that particular paragraph. In this paper, we briefly describe two methods of EST paragraph development: rhetorical process development and rhetorical function-shift development. (To our knowledge, the latter has not been described before.) Next, we discuss a series of hypotheses set up to account for explicit and implicit information in some EST function-shift paragraphs that our students have had difficulty learning to understand. Finally, we discuss some possible pedagogical and research implications of our work.
Article
The American Language Institute, University of Southern California, recently conducted a study of their students' assessments of both what academic skills they expected to need in order to successfully complete their studies, and a self-assessment of their success in using English in varied social and business settings. The study revealed: There is a clear distinction between the academic skills needed by graduate and undergraduate students; many students were concerned over their inability to read complex academic material; some of the skills assessed were major-specific and, although students felt quite comfortable in some language settings, their confidence decreased sharply in those settings requiring creative language skills.
Article
An operational definition of levels of instruction in the teaching of composition is based on the description of the sets of subskills at each level. Level I includes all skills required for the production of a single word; Level II includes all skills required to produce a single sentence of any complexity; Level III subsumes I and II and includes the additional skills required to produce text greater than a single sentence. The last level is equated in this essay with the less specific term "advanced." At Level I the subskills are essentially psychomotor; at Level II they are concerned with the application of syntactic structure to writing and the use of lexical items; at Level III, there are six goals: to become independent of controls imposed by text and teacher; to write for a variety of communicative purposes; to extend and refine the use of vocabulary and syntactic patterns; to write conceptual paragraphs; to write longer units of discourse; and to use awareness of cultural differences in writing.
Article
This paper is a report on the progress we are making in an attempt to establish a second language acquisition index of development. Such an index would be a developmental yardstick by which researchers could expediently and reliably gauge a learner's proficiency in a second language. Encouraged by the findings of an earlier pilot study, a more ambitious project involving the analysis of 212 compositions was undertaken. These compositions, written by university ESL students, were analyzed using several measures based on Kellogg Hunt's T-unit performance variable. Two measures applied in the analysis, the percentage of error-free T-units and the average length of error-free T-units, proved to be the best discriminators among the five levels of ESL proficiency represented in this population. In addition to a discussion of these results, included in this paper is a survey of L2 acquisition studies which have also employed T-unit length as a proficiency measure. Finally, an outline is offered of the studies currently being conducted in an attempt to further refine these measures.
Article
The purpose of this study was to serve as a stepping stone toward closer agreement among judges of student writing at the point of admission to college by revealing common causes of disagreement. It was expected and found that more than half the variability in grades of a large number of judges on the same set of papers was due to “error” (random variation) or the idiosyncratic preferences of individual readers. In the variability that was not random or idiosyncratic, it was expected and found that there was a substantial core of common agreement on the general merit of the papers and that a small number of “schools of thought” would account for most of the systematic differences in grading standards. Factor analysis of correlations among the grades of 53 distinguished readers, representing six different fields, on 300 papers written by college freshmen of widely varying ability revealed just five such “schools of thought,” emphasizing: Ideas : relevance, clarity, quantity, development, persuasiveness; Form : organization and analysis; Flavor : style, interest, sincerity; Mechanics : specific errors in grammar, punctuation, etc.; Wording : choice and arrangement of words. No standards or criteria for judging the papers were suggested to the readers. Instead, they were told to use “whatever hunches, intuitions, or preferences you normally use in deciding that one paper is better than another.” They sorted the papers into nine piles in order of “general merit.” The only restriction was that all nine piles must be used, with not less than 4% of the papers in any pile. The five reader‐factors or “schools of thought” were identified by a “blind” classification of 11,018 comments written on 3,557 papers that were graded high (7–8–9) or low (1–2–3) by the three highest and three lowest readers on each factor. The person who classified the comments did not know the standing of any reader on any factor. In addition to the reader factors, three College Board tests taken by these students formed a separate “test‐factor” that had practically zero correlations with all reader‐factors except Mechanics (.50) and Wording (.45). It was not the purpose of this study to achieve a high degree of unanimity among the readers but to reveal the differences of opinion that prevail in uncontrolled grading–both in the academic community and in the educated public. To that end, the readers included college English teachers, social scientists, natural scientists, writers and editors, lawyers, and business executives. None the less, it was disturbing to find that 94% of the papers received either seven, eight, or nine of the nine possible grades; that no paper received less than five different grades; and that the median correlation between readers was 31. Readers in each field, however, agreed slightly better with the English teachers than with one another.
Book
Scitation is the online home of leading journals and conference proceedings from AIP Publishing and AIP Member Societies
Article
Many university-bound ESL students who have completed intensive English courses seem to lack the essential reading, writing, and study skills needed for successful academic work. Students supposedly ready to begin university work need to read with speed and comprehension, to write cogent essays and reports, to understand and take notes from lectures, and to employ effective study techniques. These skills are difficult for native speakers as well, and they take time to learn. By postponing their introduction until the high intermediate and advanced ESL levels, we give our students only a brief term or two of practice in both conceptually and technically difficult areas. Most of the so-called advanced skills and concepts involved in successful reading, writing, and studying can be adapted for use in the low-level ESL class, the assumption being that students unskilled in the linguistic aspects of the language can still conceptualize.
Article
An Academic Skills Questionnaire was distributed at San Diego State University to 200 randomly selected faculty from all departments in order to determine which skills (reading, writing, speaking or listening) were most essential to non-native speaker success in university classes. The receptive skills, reading and listening, were ranked first by faculty teaching both lower division and upper division/graduate classes. The faculty of all departments but Engineering ranked General English above Specific Purposes English. This study concludes with implications for testing, literacy requirements and curriculum development.
Article
Teaching advanced writing in ESL classes presents several difficulties, ranging from methods for coping with large, heterogeneous classes to the scarcity of ESL materials concerned with the organization of writing above the paragraph level. In this article, we offer a versatile design for writing exercises initially developed for the University of Michigan EAP classes. The design has two purposes: 1) to provide more individualized instruction for advanced writing students, thereby helping them to write more effectively in the genres they will actually need to use, and 2) to acquaint students with some of the English thought patterns which underly the organization of longer units of discourse, from compositions to full-length academic papers.This design for advanced writing materials isolates one organizational function at a time for intensive practice, includes practice in the use of appropriate discourse signals, and stresses the need for integrating functional reading and writing activities. The rationale behind this approach is discussed, the design presented and then illustrated with samples of exercises on writing introductions to compositions, academic papers, and letters.
Article
Current research in applied linguistics claims that most adult learners acquire a second language only to the extent that they are exposed to and actively involved in real, meaningful communication in that language. An ESL class which sets out to provide opportunities for such communication, therefore, requires at least two basic components: an environment which will encourage learners to exercise their own initiative in communicating, and activities which will motivate them to do so.This article explores these issues by briefly reviewing the research which supports incorporating a strong communicative component in language teaching. It then discusses five features of real communication which have implications for the design of such a component and highlights the need to consider not only curricular content but methodology as well. It stresses the importance of classroom atmosphere for the learning and practicing of communicative skills and discusses some of the potential benefits of student-centered teaching. It then outlines some principles for creating appropriate task-oriented classroom materials which promote real communication and can involve the use of any of the four language skills. This article concludes with a discussion of the role of explicit grammar instruction within the context of communicative, student-centered teaching.
Article
The purpose of this paper is to criticize the concept of cohesion as a measure of the coherence of a text. The paper begins with a brief overview of Halliday and Hasan's (1976) cohesion concept as an index of textual coherence. Next, the paper criticizes the concept of cohesion as a measure of textual coherence in the light of schema-theoretical views of text processing (e.g. reading) as an interactive process between the text and the reader. This criticism, which is drawn from both theoretical and empirical work in schema theory, attempts to show that text-analytic procedures such as Halliday and Hasan's cohesion concept, which encourage the belief that coherence is located in the text and can be defined as a configuration of textual features, and which fail to take the contributions of the text's reader into account, are incapable of accounting for textual coherence. The paper concludes with a caution to second language (EFL/ESL) teachers and researchers not to expect cohesion theory to be the solution to EFL/ESL reading/writing coherence problems at the level of the text.
Article
Reading and writing experimental-research papers is important to academic and processional success in the sciences and social sciences, and is becoming increasingly important in the humanities. Few ESL teachers, however, feel comfortable teaching ESL students to read and write such papers. This paper presents both a discussion of experimental-research paper organization and a method for teaching reading and writing of experimental-research articles to ESL students. ESL teachers are advised to teach students to analyze the reading purpose first, and then to select a reading strategy to meet that purpose. Activities must be structured so that students move from teacher-supplied data to student-collected data.
Article
The concept of Contrastive Rhetoric was first articulated in 1966. In the intervening decade, a number of studies have been undertaken to test the basic assumption that the organization of paragraphs written in any language by individuals who are not native speakers of that language will be influenced by the rhetorical preferences of the native language. In retrospect, some dozen studies by a variety of scholars are reviewed. Since the primary assumption appears to have survived analysis, a preliminary taxonomy of syntactic devices operating on inter‐sentence transitions and a framework for the analysis of discourse blocs are developed.
Article
Teaching students to outline their essays before they actually write them is a common practice which presumes that writing is a uni-directional process of recording pre-sorted, pre-digested ideas. While it is certainly true that much of an essay can be planned in advance, one must also recognize that the very act of writing can itself serve to facilitate thought and shape ideas. Essay writing is thus viewed as a hi-directional movement between content and written form.In the ESL classroom this model translates into an approach which places composition revision in a central position. Students are taught how to write and rewrite, refine and recast rough ideas and sketchy drafts into a polished essay. This approach more closely reflects what we actually do when we write.
Article
A review of the use of readability formulas in the military indicated that they are generally invalid and a possible source of significance misjudgements about the adequacy of written technical materials. Strategies are discussed for predicting comprehension levels for existing text and for ensuring that the initial production of new text will result in a comprehensive product. (Author)
Article
Results of the 1981-1982 census of foreign students in the United States are presented. In addition to an overview of foreign study in the United States and other countries, data are provided on students' nationality, academic characteristics, personal characteristics, distribution by state, two- and four-year institutions, public and private institutions, institutions with the most foreign students, expenditures for living costs, intensive English language programs, and study abroad programs. Appendices provide information on the following: foreign student enrollments by institution and state for 1980-1981 and 1981-1982; foreign student detailed nationality data by region (i.e., extrapolated count, base numbers, percentage distribution, and percentage change); detailed field of study categories; codes for countries by continent and subregion; states within U.S. regions; foreign student enrollments in intensive English language programs by program and location; the number of U.S. college-sponsored study abroad programs, 1980-1981 by region/country, as well as the number of students by sex and fields of study; the number of foreign students in the 45 leading host countries, 1978; and characteristics of countries of origin. Information on data collection procedures and sources of data is included, along with a sample questionnaire. (SW)
Article
A review of literature on error correction shows a lack of agreement on the benefits of error correction in second language learning and confusion on which errors to correct and the approach to take to correction of both oral and written language. This monograph deals with these problems and provides examples of techniques in English, French, German, and Spanish. The chapter on selection of errors to correct presents 15 areas research has suggested and proposes a system for choosing errors for correction based on the criteria of comprehensibility, frequency, pedagogical focus, and individual student concerns. With regard to techniques for correcting oral work, there is general agreement that the approach should be positive. Within this perspective, a number of techniques are suggested for oral correction under the headings of self-correction, peer-correction and teacher-correction. The same categories are used to discuss techniques for correcting compositions and other written work. Appendices include a checklist of frequent errors made by ESL students, a list of points to aid essay-writers, and two composition check-lists. A list of references completes the volume. (AMH)
Article
This study investigated the validity of various approaches to the measurement of English composition skills. Over 600 11th and 12th-grade students were asked to write five 20-minute essays on different topics, to take six objective tests of writing ability, and to do two interlinear exercises. Twenty-five experienced readers assigned scores of three, two, or one to each essay. The total of the 25 scores per essay became the criterion for evaluating the validity of the objective tests and interlinear exercises. The sums of 20 ratings on four of the essay topics became the criterion for evaluating the fifth topic as a predictor. Later, a larger number of readers regarded the essays on two of the topics to assess the effects of reading under field conditions. Findings indicated that (1) the reliability of essay scores is primarily a function of the number of different essays and the number of different readings, (2) objective questions designed to measure writing skills prove to be highly valid when evaluated against a reliable criterion, and (3) the most efficient predictor of a reliable direct measure of writing ability is one which includes essay questions or interlinear exercises combined with objective questions. (Tables presenting data from the study are included.) (JS)
Article
Addressed to teachers, educators, and other specialists in language learning, this volume sketches a pedagogical theory of discourse for a language arts curriculum (see "A Student-Centered Language Arts Curriculum, Grades K-13: A Handbook for Teachers," TE 001 468). The emphasis is upon mastering the art of communication through the everyday use of language and upon developing language skills through a sequence of activities which correlate with the student's intellectual and emotional growth. The first chapter defines "structure,""English," and the elements of discourse (speaker, listener, and subject) and explains their interrelationships. The kinds and order of discourse (e.g., interior dialogue, correspondence, public narrative) are presented in chapter two. Narrative and drama as particular kinds of basic discourse are discussed in chapters three and four, with particular emphasis being placed upon parallels between literature and daily life. In the next chapter, the importance and limitations of grammar and sentences as substructures of discourse are examined. The value to the writer of trial-and-error composition and of feedback, and the lack of value of composition textbooks are taken up in chapter six. A brief final chapter urges a reorganization of the total educational curriculum to eliminate conventional subject divisions. (LH)
Article
This evaluative and developmental study was undertaken between 1972-74 to determine the effectiveness of items used for the Test of English as a Foreign Language (TOEFL) in relationship to other item types used in assessing English proficiency, and to recommend possible changes in TOEFL content and format. TOEFL was developed to assess the English proficiency of non-native English-speaking students applying to institutions of higher education in the United States. Questions of validation, criterion selection and content specification were first investigated before nine written and oral TOEFL item formats were evaluated for possible use in a revised test. Both original and new formats were administered to 98 Peruvian, 145 Chilean and 199 Japanese subjects in their native countries. Open ended response measures and multiple choice measures were examined. Intercorrelations among test scores indicated that the test could be revised to incorporate three instead of five components: (1) listening comprehension; (2) English structure and writing ability; (3) reading comprehension and vocabulary in context. Four objective subtests aimed at increasing TOEFL effectiveness, and tailored criterion measures of English productive skills, speaking and writing were also developed. (AEF)
Article
New varieties of English have developed in various parts of the world in recent years in countries where English functions as a second. rather than a foreign language. The processes by which distinctive varieties of English develop in such settings are described. The functional and linguistic characteristics of the processes of nativization and indigenization are discussed with reference to several nativized varieties of English. A distinction is made between two contrasting norms for speech events in these varieties of English, rhetorical and communicative norms Rhetorical norms are repertoires of English used for speech events which have the functional status of Public, Formal, High. Distant, Impersonal, etc., Communicative norms are speech repertoires used for speech events which have the contrasting functional status of Private, Informal, Low, Intimate etc., Five different linguistic processes commonly used to mark a shift from rhetorical to communicative norm in several new varieties of English are discussed in terms of the employment of variable linguistic rules. Acquisition of rhetorical norms is related to socialization. Implications are discussed for language teaching and for creative literary writing in, English.
Article
This paper describes the Writer's Workbench programs, which analyze English prose and suggest improvements. Some limited data on the use of the Writer's Workbench and its acceptance are also presented. The Writer's Workbench incorporates the style and diction programs, described in a previous paper of this TRANSACTIONS, into a more extensive system to help writers improve their writing. The system runs under the UNIXTMoperating system, and includes programs to: 1) proofread, 2) comment on stylistic features of text, and 3) provide reference information about the English language. Among other writing faults, the programs detect split infinitives, errors in spelling and punctuation, overly long sentences, wordy phrases, and passive sentences.
Article
The purpose of this study is to determine whether “constellations” (White 1975) of cohesive items occur in three types of applied and academic written discourse: letters, reports and textbooks. Twenty complete letters, and randomly selected pages from annual reports and ten business and economics textbooks were coded for cohesive elements using the Halliday and Hasan scheme (1976:333–339). Results show that Lexical Cohesion is the most common category in all three discourse types (letters 46%, reports 79%, and textbooks 79%) but that the occurrence of lexical subcategories (e.g., synonym, same item) varies among discourse types. Reference is the second most common category (letters 42%, reports 14%, and textbooks 11%); again, differences appear in the subcategories. Conjunction represents less than 10% of the items in any discourse type; in letters and reports a large number of conjunction subtype categories (e.g., additives, adversatives, causals) appear; whereas in letters, the additive AND predominates.It is concluded that although generalizations cannot be made about cohesive features in the broad classes of applied and academic EBE discourse, constellations of cohesive elements can be identified in each letter type as well as in reports and textbooks. Suggestions are made for curriculum preparation and further study.