Article

Automated evaluation of text and discourse with Coh-Metrix

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Coh-Metrix is among the broadest and most sophisticated automated textual assessment tools available today. Automated Evaluation of Text and Discourse with Coh-Metrix describes this computational tool, as well as the wide range of language and discourse measures it provides. Section I of the book focuses on the theoretical perspectives that led to the development of Coh-Metrix, its measures, and empirical work that has been conducted using this approach. Section II shifts to the practical arena, describing how to use Coh-Metrix and how to analyze, interpret, and describe results. Coh-Metrix opens the door to a new paradigm of research that coordinates studies of language, corpus analysis, computational linguistics, education, and cognitive science. This tool empowers anyone with an interest in text to pursue a wide array of previously unanswerable research questions..

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... According to this approach, comprehensibility is high when a text uses short, common words, short sentences and sentences with a simple syntax, when the text gives explicit clues about the structure of the text, and when the text is globally as well as locally coherent (e.g. McNamara et al., 2014). Comprehensibility is equated with text complexity here. ...
... Frequent words are more easily processed than infrequent words (Crossley et al., 2008;McNamara et al., 2014), and short sentences are overall easier to comprehend than long sentences (McNamara et al., 2014). It is therefore reasonable to assume that reading masculine-only forms can be conducted more fluently than reading in gender-fair language. ...
... Frequent words are more easily processed than infrequent words (Crossley et al., 2008;McNamara et al., 2014), and short sentences are overall easier to comprehend than long sentences (McNamara et al., 2014). It is therefore reasonable to assume that reading masculine-only forms can be conducted more fluently than reading in gender-fair language. ...
Article
Gender-fair language makes women and other genders, their interests, and their achievements more visible and is particularly relevant to grammatical gender languages such as German, in which most nouns and personal pronouns are assigned to a specific gender. The present study tested the often repeated critical claims that gender-fair language impairs the comprehensibility and aesthetic appeal of videos. In an experiment with N = 105 students, participants watched a video on self-determination theory, either with masculine-only forms or using the glottal stop, a form of spoken gender-fair language that inserts an abrupt and sustained closure of the vocal cords in the larynx between the masculine form or the stem and the feminine ending of words (e.g. in German “Leserʔinnen”, ∼feʔmale readers). Subsequently, participants completed a questionnaire regarding the video's comprehensibility. The results show no statistically significant impairment regarding the general subjective comprehensibility (partial η ² < .01), the ease of ascribing meaning to the words (partial η ² < .01), the ease of decoding the syntax of the sentences (partial η ² = .03), or the aesthetic appeal of the videos (partial η ² = .02). The critics’ claims are therefore questioned.
... We also used SEO data about 1565 vaccine web pages without medical misinformation for comparison, as well as psycholinguistic and cognitive data about 82 hightraffic vaccine web pages without medical misinformation identified by Wolfe et al. (under review). We also present published psycholinguistic norms for social studies texts for 11th grade through adult readers for comparison (McNamara et al., 2014). Our interest is in generalizable knowledge rather than specific web pages. ...
... Having investigated search behavior our next step was to analyze the contents of pages presenting medical misinformation about vaccination for readability, ease of understanding, and ease of making inferences using discourse technologies. Coh-Metrix measures of readability, simplicity, gist inferences, and syntax are presented in Table 4 along with comparisons to high-traffic vaccine web pages without misinformation, and norms for social studies texts for 11th grade through adult published by McNamara et al. (2014). Here it can be seen that the most visited pages vary greatly on measures of readability and comprehension. ...
... The impact of medical misinformation on readers for whom English is a second language (ESL) also deserves more research attention . The finding that misinformation pages scored significantly lower on the Coh-Metrix Second Language Readability measure (McNamara et al., 2014) than other vaccine pages suggests that such readers may be more vulnerable to medical misinformation than others, which is concerning. ...
Article
Given the high rates of vaccine hesitancy, web-based medical misinformation about vaccination is a serious issue. We sought to understand the nature of Google searches leading to medical misinformation about vaccination, and guided by fuzzy-trace theory, the characteristics of misinformation pages related to comprehension, inference-making, and medical decision-making. We collected data from web pages presenting vaccination information. We assessed whether web pages presented medical misinformation, had an overarching gist, used narrative, and employed emotional appeals. We used Search Engine Optimization tools to determine the number of backlinks from other web pages, monthly Google traffic, and Google Keywords. We used Coh-Metrix to measure readability and Gist Inference Scores (GIS). For medical misinformation web pages, Google traffic and backlinks were heavily skewed with means of 138.8 visitors/month and 805 backlinks per page. Medical misinformation pages were significantly more likely than other vaccine pages to have backlinks from other pages, and significantly less likely to receive at least one visitor from Google searches per month. The top Google searches leading to medical misinformation were "the truth about vaccinations," "dangers of vaccination," and "pro con vaccines." Most frequently, pages challenged vaccine safety, with 32.7% having an overarching gist, 7.7% presenting narratives, and 17.3% making emotional appeals. Emotional appeals were significantly more common with medical misinformation than other high-traffic vaccination pages. Misinformation pages had a mean readability grade level of 11.5, and a mean GIS of - 0.234. Low GIS scores are a likely barrier to understanding gist, and are the "Achilles' heel" of misinformation pages.
... The linguistic features of the two Lectures were compared to establish equity in their length, syntactic features, semantic features, and cohesion (McNamara and Graesser 2012). According to McNamara et al. (2014), these textual features can predict the comprehension processes of comprehenders. For example, lengthy sentences and passages that contain sophisticated vocabulary, complex clauses, and few cohesive devices would be more difficult to process and comprehend compare with shorter passages containing easy vocabulary and syntax (see Révész and Brunfaut 2013). ...
... One of the first steps taken in this research was to compare the linguistic features in both the lectures using the software Coh-Metrix 3.0, which indicated that the lectures shared similar linguistic features. One of the reasons to conduct this step is that text characteristics can affect (listening) comprehension (McNamara et al. 2014). Specifically, linguistic cues such as phonetic, lexical, syntactic and discourse features seem to have a role in listening comprehension as shown by Révész and Brunfaut (2013). ...
... The linguistic indices that were found useful in establishing test equity consist of descriptive features, such as word and sentence length, and cohesion features, such as LSA of sentence and paragraph overlaps. These features capture distinctive properties of textual genres in reading research (McNamara et al. 2014). Thus, it may be said that Coh-Metrix analysis is generally appropriate in providing supporting evidence to establish the generic similarity of (listening) passages. ...
Article
The present study explored the potential of a new neurocognitive approach to test equity which integrates evidence from eye-tracking and functional near-infrared spectroscopy with conventional test content analysis and psychometric analysis. The participants of the study (n = 29) were neurotypical university students who took two tests of English lecture comprehension. Test equity was examined in this study at four levels: the linguistic level (content evidence) and the test scores level which are conventional levels in test equity; and gaze behavior level and neurocognitive level which are novel to this study. It was found that the linguistic features of the two test forms being equated were similar and that there was no significant difference at neurocognitive and behavioral levels. However, there was a significant difference in gaze behaviors, measured by fixation counts and visit counts, although fixation duration and visit duration did not vary across the two tests. Overall, test equity was supported, despite partial counterevidence from the gaze data. We discuss the implication of this approach for future equity research and response process in language assessment.
... Furthermore, this study examined a wide range of features applied to the problem of automatic identification of Social Presence in online discussion transcripts. This paper reports on the findings of a study that investigated some features extracted by using 1) traditional text mining approaches based on the analysis of words used in the content of messages exchanged in online discussions [19]; 2) pretrained models (i.e., sentiment analysis and latent semantic analysis) [20]; 3) social network analysis [21]; 4) indicators of different psychological processes, and measures of text coherence and complexity [22], [23]. Finally, this study also evaluates the performance of BERT classifier in order to compare the decision tree algorithms with a deep learning algorithm. ...
... Recent studies examined the use of different features and classifiers. For instance, Kovanović et al. [28] developed an approach that relies on features based on Coh-Metrix [23], LIWC [22], LSA similarity, named entities, and discussion context [30] instead of word counts used by previous works. Moreover, they applied a random forest algorithm to classify the messages according to the categories of Cognitive Presence. ...
... 2) Coh-Metrix features: In addition to the psychological indicators provided by LIWC, the Coh-Metrix linguistic resource [23] was also applied. It allows the extraction of 108 features related to textual cohesion, coherence, linguistic complexity, text readability, and lexical category [23]. ...
... Discourse cohesion signifies the extent of semantic relationship among words or phrases in the text [4] and is an important facilitator of text comprehension [5], [6]. Cohesion aids in the development of narrative structure, which strengthens the quality of writing. ...
... Finally, we are left with 1056/1175 sections from 'Pos' category and 503/649 from 'Neg' category. We compute SLIC scores for these sections and compare them against the above mentioned TAACO indices computed using recent TACCO 2.0 tool [16] 6 . ...
Article
Full-text available
Discourse cohesion facilitates text comprehension and helps the reader form a coherent narrative. In this study, we aim to computationally analyze the discourse cohesion in scientific scholarly texts using multilayer network representation and quantify the writing quality of the document. Exploiting the hierarchical structure of scientific scholarly texts, we design section-level and document-level metrics to assess the extent of lexical cohesion in text. We use a publicly available dataset along with a curated set of contrasting examples to validate the proposed metrics by comparing them against select indices computed using existing cohesion analysis tools. We observe that the proposed metrics correlate as expected with the existing cohesion indices. We also present an analytical framework, CHIAA (CHeck It Again, Author), to provide pointers to the author for potential improvements in the manuscript with the help of the section-level and document-level metrics. The proposed CHIAA framework furnishes a clear and precise prescription to the author for improving writing by localizing regions in text with cohesion gaps. We demonstrate the efficacy of CHIAA framework using succinct examples from cohesion-deficient text excerpts in the experimental dataset.
... The majority of these students were white (n = 2,464, 75%). This study focusses on 1,035 students in this larger sample whose aggregated posts that included more than 100 words because NLP indices are not reliable with small language samples, and many of our indices (e.g., lexical diversity) require a minimum of 100 words [32]. Those who included more words in their posts had significantly higher FSA scores, t(3275) = 5.79, p<.001 (M≤100 words,= 354.08, ...
... We assessed students' Math Nation Wall discourse using two linguistic tools, namely Coh-Metrix [32] and SEANCE [33], which report linguistic indices related to language sophistication, cohesion, and sentiment. Use of these two tools was motivated by prior work relating academic performance in mathematics to these features of language in online forums and discussion boards [20][21][22][23][24]. ...
Conference Paper
Full-text available
This study leverages natural language processing to assess dimensions of language and discourse in students' discussion board posts and comments within an online learning platform, Math Nation. This study focusses on 1,035 students whose aggregated posts included more than 100 words. Students' wall post discourse was assessed using two linguistic tools, Coh-Metrix and SEANCE, which report linguistic indices related to language sophistication, cohesion, and sentiment. A linear model including prior math scores (i.e., Mathematics Florida Standards Assessments), grade level, semantic overlap (i.e., LSA givenness), incidence of pronouns, and noun hypernymy accounted for 64.48% of the variance for the Algebra I end of course scores (RMSE=13.73). Students with stronger course outcomes used more sophisticated language, across a wider range of topics, and with less personalized language. Overall, this study confirms the contributions of language and communication skills over and above prior math abilities to performance in mathematics courses such as Algebra.
... Alternatively, Peter et al. (2018) argued for the utility of recent developed approaches to questionnaire item readability. They specifically reviewed two tools that were based on recent advancements in empirically supported text-analysis technology: Coh-Metrix (McNamara et al., 2014), and Question Understanding Aid (QUAID; Graesser et al., 2006). These tools reportedly improve on traditional readability formulas by having a stronger theoretical base and an empirical driven approach to predict item comprehension. ...
... For example, the output from Coh-Metrix includes an index that evaluates the left embeddedness of the main verb in a sentence. This represents the number of words that come before the main verb of the sentence and is a reliable predictor of comprehension ease (McNamara et al., 2014). ...
Article
Full-text available
Many individuals diagnosed with an addictive disorder are members of disadvantaged groups and obtain a high school education or less, yet self-report questionnaires widely used to identify symptoms of addictive disorders do not use best practices to ensure item clarity and comprehension. In the present study, we explore how advanced text-analysis technology can be used to guide the development of a diagnostic questionnaire with an emphasis on maximizing its readability and then test the accuracy of this questionnaire. In Study 1, a self-report questionnaire for symptoms of gambling disorder was created using best practices for item clarity and comprehension. In study 2 an experimental design was used to test whether the measure with enhanced readability, compared to a commonly used screening instrument, improved diagnostic symptom accuracy among samples of high school and college educated individuals. Subsequent analyses revealed that education was positively related to item comprehension, and participants who completed the maximized readability questionnaire correctly identified more symptoms of gambling disorder than participants who completed the comparison questionnaire, regardless of educational attainment. These studies indicate that the rate at which individuals accurately identify symptoms of psychopathology is strongly related to their educational attainment and the read-ability of the questionnaire items themselves.
... Coh-Metrix plays an integral role in the analysis of language used by students in the current study, namely, to quantify the complexity of the mathematical discourse. Coh-Metrix provides measures of language and cohesion pertaining to words, sentences, and paragraphs, including lexical (word-based) and semantic cohesion (e.g., leveraging latent semantic analysis), lexical diversity, connectives, and text difficulty measures at the word and sentence levels [53]. Coh-Metrix provides eight component scores related to text ease and cohesion namely narrativity, syntactic simplicity, word concreteness, referential cohesion, deep cohesion, verb cohesion, connectivity, and temporality [55][56]. ...
... A total of 89 Coh-Metrix indices were extracted. The reliability and stability of some linguistic measures (e.g., cohesion, lexical diversity) require a minimum of 100 words [53]. Hence, threads with fewer than 100 words were removed from the final dataset. ...
... Honnibal & Johnson, 2015;Manning et al., 2014), and several syntactic complexity analyzers can now be used to assess the degree of complexity of the syntactic structures used in spoken or written production (e.g. Biber et al., 1999;Kyle, 2016;Lu, 2009Lu, , 2010McNamara et al., 2014). Such tools have dramatically increased the scope and scale of computational, theoretical, as well as applied linguistics research that involves analyzing the syntactic structures or complexity of spoken or written texts. ...
... or included in Nomlex(Macleod et al., 2001) Finite dependent clauses Number of finite dependent clauses, either nominal, adjectival, or adverbial (Lu, 2010) Non-finite dependent clauses Number of nonfinite dependent clauses with gerund, infinitive, or past participles(Biber et al., 2011) Left-embeddedness Number of words before the main verb of the sentence(McNamara et al., 2014) ...
Chapter
This chapter examines the affordances of automated syntactic analysis (ASA) for language teaching. Following an overview of the state-of-the-art natural language processing (NLP) technologies for ASA, the chapter systematically reviews three key research strands that demonstrate how such NLP capabilities can be implemented into language pedagogy with distinct pedagogical foci. In the first strand, ASA is used to identify, highlight, and create exercises on target syntactic features in pedagogical texts, with the goal to promote learners’ grammar acquisition. The pedagogical integration of ASA in this strand often draws on focus-on-form and input enhancement-based pedagogical activities. In the second strand, the role of ASA is to parse pedagogical texts and present them in modified formats in order to facilitate syntactic processing. This strand of research posits that syntactic enhancement affords learners greater access to syntactic relationships within a text, which, in turn, may promote the development of the learners’ reading skills and potentially other language skills, too. The third strand integrates ASA with rhetorical/functional analysis of texts to promote the development of learners’ ability to recognize genre-specific form-function mappings. The chapter concludes with a discussion of future directions of ASA-informed language pedagogy.
... Researchers have begun exploring the promise of automated scoring to support certain key formative assessment decisions. Mercer et al. (2019) found that an automated scoring model based on features extracted from the Coh-Metrix web tool (McNamara et al., 2014) applied to 7-min narrative writing samples had validity coefficients equal to human-scored measures, suggesting the possibility of substituting automated scoring for human scoring for curriculum-based measurement (CBM). These researchers also documented that automated scoring methods applied to 3-min narrative CBM probes had equal or higher diagnostic accuracy than rater-scored methods for identifying students who were non-responsive to core instruction (Keller-Margulis et al., 2021). ...
Article
We investigated the promise of a novel approach to formative writing assessment at scale that involved an automated writing evaluation (AWE) system called MI Write. Specifically, we investigated elementary teachers’ perceptions and implementation of MI Write and changes in students’ writing performance in three genres from Fall to Spring associated with this implementation. Teachers in Grades 3–5 (n = 14) reported that MI Write was usable and acceptable, useful, and desirable; however, teachers tended to implement MI Write in a limited manner. Multilevel repeated measures analyses indicated that students in Grades 3–5 (n = 570) tended not to increase their performance from Fall to Spring except for third graders in all genres and fourth graders’ narrative writing. Findings illustrate the importance of educators utilising scalable formative assessments to evaluate and adjust core instruction.
... The Flesch Reading Ease readability score provides a score from 0 (very difficult to read) to 100 (very easy to read 1 ) that is a function of the length of the sentences and the number of syllables per word, which is often used as a proxy for the readability level of a text. The Flesch Reading Ease score it does not consider multiple aspects that may impact text readability (e.g., see McNamara et al., 2014), but it is easily computed for any given text and is commonly used as a measure of a text's readability. ...
Article
Full-text available
Age of acquisition (AoA) is a measure of word complexity which refers to the age at which a word is typically learned. AoA measures have shown strong correlations with reading comprehension, lexical decision times, and writing quality. AoA scores based on both adult and child data have limitations that allow for error in measurement, and increase the cost and effort to produce. In this paper, we introduce Age of Exposure (AoE) version 2, a proxy for human exposure to new vocabulary terms that expands AoA word lists through training regressors to predict AoA scores. Word2vec word embeddings are trained on cumulatively increasing corpora of texts, word exposure trajectories are generated by aligning the word2vec vector spaces, and features of words are derived for modeling AoA scores. Our prediction models achieve low errors (from 13% with a corresponding R2 of .35 up to 7% with an R2 of .74), can be uniformly applied to different AoA word lists, and generalize to the entire vocabulary of a language. Our method benefits from using existing readability indices to define the order of texts in the corpora, while the performed analyses confirm that the generated AoA scores accurately predicted the difficulty of texts (R2 of .84, surpassing related previous work). Further, we provide evidence of the internal reliability of our word trajectory features, demonstrate the effectiveness of the word trajectory features when contrasted with simple lexical features, and show that the exclusion of features that rely on external resources does not significantly impact performance.
... This involved extracting and copying the text content into MS Word documents initially, removing all titles and subheadings, information on contact details and REC approval, pictures and proper nouns. Each document was then 'cleaned' as recommended, 30 by removing bullet points, any numbering outside the text, extra line spacing, indentations to text, columns and inverted commas. All other punctuation was retained. ...
Article
Full-text available
Objectives This study aimed to determine the characteristics of ethical review and recruitment processes, concerning the inclusion of adults with capacity-affecting conditions and associated communication difficulties in ethically sound research, under the provisions of the Mental Capacity Act (MCA, 2005) for England and Wales. Design A documentary-based survey was conducted focusing on adults with capacity-affecting conditions and associated communication difficulties. The survey investigated: (1) retrospective studies during the implementation period of the MCA (2007–2017); (2) prospective applications to MCA-approved Research Ethics Committees (RECs) during a 12-month period (2018–19); (3) presentational and linguistic content of participant information sheets used with this population. Setting Studies conducted and approved in England and Wales. Sample Studies focused on adults with the following capacity-affecting conditions: acquired brain injury; aphasia after stroke; autism; dementia; intellectual disabilities; mental health conditions. The sample comprised: (1) 1605 studies; (2) 83 studies; (3) 25 participant information sheets. Primary and secondary outcome measures The primary outcome was the inclusion/exclusion of adults with capacity-affecting conditions from studies. The secondary outcome was the provisions deployed to support their inclusion. Results The retrospective survey showed an incremental rise in research applications post-MCA implementation from 2 (2012) to 402 (2017). The prospective survey revealed exclusions of people on the bases of: ‘lack of capacity’ (n=21; 25%); ‘communication difficulties’ (n=5; 6%); ‘lack of consultee’ (n=11; 13%); and ‘limited English’ (n=17; 20%). REC recommendations focused mainly on participant-facing documentation. The participant information sheets were characterised by inconsistent use of images, typography and layout, volume of words and sentences; some simplified language content, but variable readability scores. Conclusions People with capacity-affecting conditions and associated communication difficulties continue to be excluded from research, with recruitment efforts largely concentrated around participant-facing documentation. There is a need for a more nuanced approach if such individuals are to be included in ethically sound research.
... At the phrasal level, L2 English syntactic complexity studies have focused heavily on complex noun phrases, including, in particular, different amounts and types of nominal modification, for higher-quality writing or at the advanced proficiency level (e.g., Biber et al., 2011;Biber et al., 2016;Kyle & Crossley, 2018;McNamara, Graesser, McCarthy, & Cai, 2014;Taguchi et al., 2013). Our findings indicate that, in the case of L2 Chinese, predicatecomplement structures and predicate modification play a useful role in assessing phraseological usage and indexing writing quality and proficiency. ...
Article
This study investigated the relationship of a set of word-combination-based measures of phraseological diversity, sophistication, and complexity to second language (L2) Chinese proficiency and writing quality in comparison to that of a set of large-grained topic-comment-unit-based measures. Our dataset consisted of 101 assessed narratives produced by Korean learners of Chinese as a L2 at 3 proficiency levels. Multiple phraseological measures exhibited stronger correlations with quality ratings and/or larger effect sizes for proficiency than did the large-grained topic-comment-unit-based measures. Measures pertaining to language-specific features, including topic-comment-unit-based measures and phraseological measures based on language-specific word combination types, exhibited stronger discriminative power for intermediate and advanced levels than for beginning and intermediate levels. Our results also revealed the importance of predicate-related combinations in assessing L2 Chinese phraseological diversity and complexity.We discuss the implications of our findings for L2 Chinese writing research and L2 Chinese pedagogy.
... Although the above two studies have offered some evidence that certain textual features might be associated with inconsistent scoring, they were criticized by Lim (2019) for inappropriate operationalizations of some of those features. Lim (2019) therefore analyzed the impact of textual features on scoring inconsistency more systematically by incorporating nine features retrieved from both Coh-Metrix (McNamara, Graesser, McCarthy, & Cai, 2014) and Authorial Voice Analyzer (Yoon, 2017). By comparing the textual features in 56 discrepantly scored essays and 53 consistently scored essays, Lim (2019) reported that seven features (i.e., apparent length, spelling errors, syntactic diversity, voice strength, conceptual cohesion, noun phrase density, and negation density) were related to score discrepancy. ...
Article
Although theoretical conceptions of voice vary, researchers now generally agree on its amalgamated and dialogical nature, highlighting the interplay among the reader, the writer, and the text (Canagarajah, 2015; Matsuda, 2015). While much research has investigated the elements of the writer and the text in voice construction, far less has examined voice reconstruction from readers’ perspectives. The current study therefore explores reader reconstruction of writer voice, focusing particularly on understanding the phenomenon of discrepant voice perceptions by different readers. Two raters double-rated 65 EFL essays, simulating the conventional writing assessment practice. Independent-samples t tests on various linguistic indices across essays that received consistent vs. inconsistent voice ratings were carried out to identify linguistic elements that might be sources of inconsistency in raters’ voice perceptions. Semi-structured rater interview was conducted to both triangulate quantitative findings and explore other potential sources of inconsistency. Results showed that most of the language features did not seem to be associated with discrepant voice perceptions, but raters’ differing perceptions of the effectiveness of certain language elements, essay structure, and idiosyncratic interpretations of certain evaluative criteria might lead to divergent reconstructions of voice. Implications were discussed to inform L2 writing assessment, pedagogy, and future research.
... The replicated random forest model from [31] followed the original paper's setting. We used features and settings provided by the authors, setting the number of estimators as 500 and bootstrapping sample size as 10000, and including 600+ lexical features that included n-gram frequencies from Google API, and Coh-Metrix [46], sentiment [11], psycholinguistic [10], and other lexical sophistication features [10,36]. ...
Preprint
Full-text available
Both humans and machines learn the meaning of unknown words through contextual information in a sentence, but not all contexts are equally helpful for learning. We introduce an effective method for capturing the level of contextual informativeness with respect to a given target word. Our study makes three main contributions. First, we develop models for estimating contextual informativeness, focusing on the instructional aspect of sentences. Our attention-based approach using pre-trained embeddings demonstrates state-of-the-art performance on our single-context dataset and an existing multi-sentence context dataset. Second, we show how our model identifies key contextual elements in a sentence that are likely to contribute most to a reader's understanding of the target word. Third, we examine how our contextual informativeness model, originally developed for vocabulary learning applications for students, can be used for developing better training curricula for word embedding models in batch learning and few-shot machine learning settings. We believe our results open new possibilities for applications that support language learning for both human and machine learners
... Participants' essays were analysed in terms of CALF. The measurement of syntactic and lexical complexity and fluency was performed with the help of computerized tools such as Coh-Metrix 3.0 (Graesser, McNamara & Kulikowich, 2011;McNamara, Graesser, McCarthy & Cai, 2014) and Synlex (Lu, 2010;Lu & Ai, 2015). Accuracy, however, was measured manually due to a lack of reliable computerized tools. ...
Article
To test the predictive power of the SSARC (stabilize, simplify, automatize, reconstruct, and complexify) model of pedagogic task sequencing in second language (L2) writing development, the present study explores the performance of written decision-making tasks with varied levels of cognitive complexity in a simple-to-complex sequence in comparison to complex-to-simple and individual task performance sequences over time. The participants, 100 advanced-mid learners of English as a second language (ESL), who were divided into three groups completed writing tasks either (1) in a simple-to-complex sequence, (2) in a complex-to-simple sequence, or (3) at one level of complexity (i.e. simple, medium, or complex task versions). Their written production was analysed for syntactic complexity, accuracy, lexical complexity, and fluency (CALF). Quantitative analyses found that the simple-to-complex group produced more syntactically complex and accurate essays over time than the complex-to-simple group, although neither group’s progression was linear. When the two sequencing conditions were compared to the individual task performance condition, the results showed more improvement in CALF in both sequencing groups than in the individual performance group over time, with more and steadier growth in the simple-to-complex condition. These findings support the SSARC model and expand current understandings of the relationship between cognitive task complexity and L2 writing development under different task sequencing conditions, with implications for L2 writing pedagogy.
... Coh-Metrix is an online analyzer designed and developed by McNamara et al. (2014) and is applicable in evaluating the level of coherence and cohesion of written and oral texts. This analyzer can provide researchers and practitioners with real-time analysis in terms of evaluating text difficulty and readability. ...
Article
Full-text available
Recently, there has been a surge of interest in the exploration of psychological properties in a second language context. Considerable literature has grown up around the influence of these psychological properties on L2 writing specifically. However, the impact of academic procrastination, which is an important psychological property, has been understudied and it remains unclear how affective factors in L2 might play a role in the above potential influence on L2 writing. Therefore, the current study explored the impact of academic procrastination on L2 writing and examined the mediating role of L2 writing anxiety, by adopting text readability as an innovative approach to assessing L2 writing performance. Participants were 55 Chinese speakers of L2 English. By utilizing the collected questionnaire data and the readability indicators of the L2 writing task, the current research conducted correlation analysis, regression analysis, and structural equation modeling analysis. The results revealed that academic procrastination had a significant negative impact on the readability indicator of Flesch-Kincaid Grade Level in L2 writing. L2 writing anxiety played a complete mediating role in the impact. Academic procrastination can significantly affect Flesch-Kincaid Grade Level of L2 writing indirectly through L2 writing anxiety. Pedagogical implications and future studies were discussed.
... Inter-rater reliability conducted by an independent transcriber on 20% of randomly selected transcripts was at 98% for morphememorpheme agreement and 84% for the segmentation of utterances into C-units. Syntactic complexity was evaluated from the transcripts using Coh-Metrix 3.0 [62], a computational linguistics and discourse processing software that integrates a variety of natural language processing tools to analyze texts, including part-of-speech taggers [63], lexicons, syntactic parsers, latent semantic analysis, and pattern classifiers [64]. Although a variety of different metrics have been used in prior research to index syntactic complexity (e.g., counts of left-branching clauses, handscoring methods such as IPSYN), the use Coh-Metrix has implementation advantages given it does not require specialized software, programming expertise, or laborious hand-coding or tagging. ...
Article
Full-text available
Background Women who carry a premutation allele of the FMR1 gene are at increased vulnerability to an array of age-related symptoms and disorders, including age-related decline in select cognitive skills. However, the risk factors for age-related decline are poorly understood, including the potential role of family history and genetic factors. In other forms of pathological aging, early decline in syntactic complexity is observed and predicts the later onset of neurodegenerative disease. To shed light on the earliest signs of degeneration, the present study characterized longitudinal changes in the syntactic complexity of women with the FMR1 premutation across midlife, and associations with family history of fragile X-associated tremor/ataxia syndrome (FXTAS) and CGG repeat length. Methods Forty-five women with the FMR1 premutation aged 35–64 years at study entry participated in 1–5 longitudinal assessments spaced approximately a year apart (130 observations total). All participants were mothers of children with confirmed fragile X syndrome. Language samples were analyzed for syntactic complexity and participants provided information on family history of FXTAS. CGG repeat length was determined via molecular genetic testing. Results Hierarchical linear models indicated that women who reported a family history of FXTAS exhibited faster age-related decline in syntactic complexity than those without a family history, with that difference emerging as the women reached their mid-50 s. CGG repeat length was not a significant predictor of age-related change. Conclusions Results suggest that women with the FMR1 premutation who have a family history of FXTAS may be at increased risk for neurodegenerative disease, as indicated by age-related loss of syntactic complexity. Thus, family history of FXTAS may represent a personalized risk factor for age-related disease. Follow-up study is needed to determine whether syntactic decline is an early indicator of FXTAS specifically, as opposed to being a more general age-related cognitive decline associated with the FMR1 premutation.
... Discourse cohesion signifies the extent of semantic relationship among words or phrases in the text (Halliday & Hasan, 1976) and is an important facilitator of text comprehension (Graesser et al., 2004;McNamara et al., 2014). Cohesion aids in the development of narrative structure, which strengthens the quality of writing. ...
Preprint
Full-text available
Discourse cohesion facilitates text comprehension and helps the reader form a coherent narrative. In this study, we aim to computationally analyze the discourse cohesion in scientific scholarly texts using multilayer network representation and quantify the writing quality of the document. Exploiting the hierarchical structure of scientific scholarly texts, we design section-level and document-level metrics to assess the extent of lexical cohesion in text. We use a publicly available dataset along with a curated set of contrasting examples to validate the proposed metrics by comparing them against select indices computed using existing cohesion analysis tools. We observe that the proposed metrics correlate as expected with the existing cohesion indices. We also present an analytical framework, CHIAA (CHeck It Again, Author), to provide pointers to the author for potential improvements in the manuscript with the help of the section-level and document-level metrics. The proposed CHIAA framework furnishes a clear and precise prescription to the author for improving writing by localizing regions in text with cohesion gaps. We demonstrate the efficacy of CHIAA framework using succinct examples from cohesion-deficient text excerpts in the experimental dataset.
... To ensure the homogeneity of the texts and their relevance to the grade of children involved in the experiment, we ran several measurements at the lexical, semantic and syntactic levels, on the basis of the metrics developed by Coh-Metrix (Graesser et al., 2004;McNamara et al., 2014) and used in the study designed by Porion et al. (2016) with similar French materials. At the first step, using the Manulex-infra database (Lété et al., 2004), both texts were analyzed in terms of word length and frequency (occurrences per million, OPM). ...
Article
In this study, we compared the effects of two media (Interactive Whiteboards and Paper) on both expository and narrative texts reading comprehension among 5th grade children of primary school. Two texts were constructed, according to the same controlled hierarchical structure. Comprehension was assessed by a multiple-choice questionnaire including three types of questions (surface, semantics, inferential). Results of the comprehension test revealed no difference between the two supports. Regardless of support, we found better performances for the narrative text, as well as an interaction between Text and Question factors, revealing that children had more difficulties to elaborate inferences when reading the expository text. These results are in line with previous findings underlying that texts with a similar structure, with a single-page presentation elicit similar performances on paper and electronic devices. They also provide interesting perspectives about the use and impact of Interactive Whiteboards during reading activities or lessons in classrooms.
... Cognitive scientists and psycholinguists have long been interested in exploring which characteristics of words affect their processing and learnability (Grainger, 1990;Whaley, 1978). Among several word properties, the psycholinguistic properties of concreteness, meaningfulness, imageability, word familiarity, and age of acquisition have been investigated in the text complexity and readability literature (McNamara et al., 2014). The first of these properties, concreteness, refers to the extent to which content words in a text refer to concrete objects or events as opposed to abstract concepts or ideas (Brysbaert et al., 2014); for example, a word is said to be concrete if one can simply point to the object it signifies (e.g., "chair" or "apple"), and a word is abstract if it can be only described by other words (e.g., "happiness" or "problem"). ...
Article
Full-text available
Although core in the teaching of academic language skills, little research to date has investigated what makes video-recorded lectures difficult for language learners. As part of a larger program to develop automated videotext complexity measures, this study reports on selected dimensions of linguistic complexity to understand how they contribute to overall videotext difficulty. Based on the ratings of English language learners of 320 video lectures, we built regression models to predict subjective estimates of video lecture difficulty. The results of our analysis demonstrate that a 4-component partial least square regression model explains 52% of the variance in video difficulty and significantly outperformed a base-line model in predicting the difficulty of videos in an out-of-sample testing set. The results of our study point to the use of linguistic complexity features for predicting overall videotext difficulty and raise the possibility of developing automated systems for measuring video difficulty, akin to those already available for estimating the readability of written materials.
... We used the entire answers to extract the features instead of extracting them at a sentence level, as done by [1], capturing the overall essence of their experiences and learning, and also, assigning a single grade to each answer. We used LIWC (Linguistic Inquiry and Word Count) [44] and Coh-Metrix [45] tools to extract a large set of features indicative of psychological processes and text cohesion, respectively. While Coh-Metrix is an analytical tool used to measure different aspects of writing cohesion, we used LIWC to extract features indicative of different linguistic categories and biological and psychological process. ...
... This software supplements earlier language analysis tools in that it provides analysis for different languages (English, Spanish and Basque). This overcomes the language limitations of Coh-Metrix (McNamara et al. 2014), one of the most popular language tools to date. Authors (Lorenzo et al. 2019; have already employed Coh-Metrix to analyse an L2 English longitudinal corpus produced by the same learners. ...
... One useful NLP metric for examining integration processes during reading is cohesion. Cohesion refers to the explicit cues in text that establish connections among text content (Gernsbacher, 1990;McNamara et al., 2014). For example, the repetition of words in a text, overlapping ideas within a text, and the use of connectives (e.g., "and," "because") are all markers of cohesion and indicate the presence of interconnected ideas. ...
... The listening tests were comprised of the Lectures (Section 4) of two forms of the IELTS listening tests, hereafter called IELTS-1 and IELTS-2. The two lectures shared similar linguistic features computed using Coh-Matrix (McNamara et al., 2014). The participants were required to listen to the audio texts and complete each test item. ...
Article
Full-text available
This study aims to investigate whether and how test takers' academic listening test performance is predicted by their metacognitive and neurocognitive process under different test methods conditions. Eighty test takers completed two tests consisting of while-listening performance (WLP) and post-listening performance (PLP) test methods. Their metacognitive awareness was measured by the Metacognitive Awareness Listening Questionnaire (MALQ), and gaze behavior and brain activation were measured by an eye-tracker and functional near-infrared spectroscopy (fNIRS), respectively. The results of automatic linear modeling indicated that WLP and PLP test performances were predicted by different factors. The predictors of WLP test performance included two metacognitive awareness measures (i.e., person knowledge and mental translation) and fixation duration. In contrast, the predictors of the PLP performance comprised two metacognitive awareness measures (i.e., mental translation and directed attention), visit counts, and importantly, three brain activity measures: the dmPFC measure in the answering phase, IFG measure in the listening phase, and IFG measure in the answering phase. Implications of these findings for language assessment are discussed.
... Previous studies on text readability argued that traditional readability formulas were outdated (Hartley, 2016) and suffered from the issue of construct validity (Davison & Kantor, 1982). To address these issues, new measures that are based on psycholinguistic and cognitive models (McNamara et al., 2014), or natural language processing (NLP) methods (Martinc et al., 2021) have been proposed and the studies yielded promising results (Cha et al., 2017;Smeuninx et al., 2020;Zheng & Yu, 2018). However, these studies did not, unfortunately, include variables of adjectives and adverbs. ...
Article
Writing in a clear and simple language is critical for scientific communications. Previous studies argued that the use of adjectives and adverbs cluttered writing and made scientific text less readable. The present study aims to investigate if the articles in life sciences have become more cluttered and less readable across the past 50 years in terms of the use of adjectives and adverbs. The data that were used in the study were a large dataset of 775,456 scientific texts published between 1969 and 2019 in 123 scientific journals. Results showed that an increasing number of adjectives and adverbs were used and the readability of scientific texts have decreased in the examined years. More importantly, the use of emotion adjectives and adverbs also demonstrated an upward trend while that of nonemotion adjectives and adverbs did not increase. To our knowledge, this is probably the first large scale diachronic study on the use of adjectives and adverbs in scientific writing. Possible explanations to these findings were discussed.
... As part of the larger construct of linguistic complexity, syntactic complexity has been extensively examined in L2 acquisition and L2 writing research in relation to L2 proficiency (e.g., Lu, 2011;Norris & Ortega, 2009), L2 development (e.g., Bulté & Housen, 2014;Yoon & Polio, 2017), and L2 writing quality (e.g., Biber et al., 2016;Kyle & Crossley, 2018;Yang et al., 2015). A multitude of syntactic complexity measures have been proposed and used in such research (e.g., Biber et al., 2011;Kyle, 2016;Lu, 2010;McNamara et al., 2014;Wolf-Quintero et al., 1998). The consensus among L2 writing researchers now is that syntactic complexity needs to be conceptualized as a multidimensional construct and operationalized using measures that encompass diverse structural levels and types (e.g., Kyle & Crossley, 2018;Norris & Ortega, 2009). ...
Article
This study proposed a set of measures for assessing noun phrase (NP) complexity in second language (L2) Chinese writing and compared the predictive power of these measures for L2 Chinese writing quality to that of a set of syntactic complexity measures based on the topic-comment unit (TC-unit). Our data consisted of 101 narratives written by beginning-intermediate, intermediate-advanced, and advanced Korean Chinese-as-a-second-language (CSL) learners and rated by 2 trained CSL teachers. Results showed that the NP complexity measures explained a substantially larger proportion of variance in holistic writing scores than the TC-unit-based measures. Our findings confirmed the validity of the NP complexity measures we proposed and the need to attend to phrasal complexity in assessing L2 Chinese writing quality.
Article
Full-text available
Research substantiates that inferencing is a critical component to making sense of texts. The ability to make logical inferences is a key characteristic of proficient comprehenders that can be developed before children become fluent readers. This article argues for teaching inferencing via teacher or parent read-alouds to help young readers develop comprehension starting in the earliest grades. Highlighting the importance of inferencing as thinking through text, the authors explain the what, why, and how of inferencing, then promote its value as a read-aloud interaction with numerous pedagogical recommendations to support comprehension development and its assessment.
Article
Full-text available
Written expression curriculum-based measurement (WE-CBM) is a formative assessment approach for screening and progress monitoring. To extend evaluation of WE-CBM, we compared hand-calculated and automated scoring approaches in relation to the number of screening samples needed per student for valid scores, the long-term predictive validity and diagnostic accuracy of scores, and predictive and diagnostic bias for underrepresented student groups. Second- to fifth-grade students (n = 609) completed five WE-CBM tasks during one academic year and a standardised writing test in fourth and seventh grade. Averaging WE-CBM scores across multiple samples improved validity. Complex hand-calculated metrics and automated tools outperformed simpler metrics for the long-term prediction of writing performance. No evidence of bias was observed between African American and Hispanic students. The study will illustrate the absence of test bias as necessary condition for fair and equitable screening procedures and the importance of future research to include comparisons with majority groups.
Article
Full-text available
This paper presents and makes publicly available the NILC-Metrix, a computational system comprising 200 metrics proposed in studies on discourse, psycholinguistics, cognitive and computational linguistics, to assess textual complexity in Brazilian Portuguese (BP). These metrics are relevant for descriptive analysis and the creation of computational models and can be used to extract information from various linguistic levels of written and spoken language. The metrics in NILC-Metrix were developed during the last 13 years, starting in 2008 with Coh-Metrix-Port, a tool developed within the scope of the PorSimples project. Coh-Metrix-Port adapted some metrics to BP from the Coh-Metrix tool that computes metrics related to cohesion and coherence of texts in English. After the end of PorSimples in 2010, new metrics were added to the initial 48 metrics of Coh-Metrix-Port. Given the large number of metrics, we present them following an organisation similar to the metrics of Coh-Metrix v3.0 to facilitate comparisons made with metrics in Portuguese and English. In this paper, we illustrate the potential of NILC-Metrix by presenting three applications: (i) a descriptive analysis of the differences between children's film subtitles and texts written for Elementary School I and II (Final Years); (ii) a new predictor of textual complexity for the corpus of original and simplified texts of the PorSimples project; (iii) a complexity prediction model for school grades, using transcripts of children's story narratives told by teenagers. For each application, we evaluate which groups of metrics are more discriminative, showing their contribution for each task.
Article
In an increasingly diverse society, young children are likely to speak different first languages that are not the majority language of society. Preschool might be one of the first and few environments where they experience the majority language. The present study investigated how preschool teachers communicate with monolingual English preschoolers and preschoolers learning English as an additional language (EAL). We recorded and transcribed four hours of naturalistic preschool classroom activities and observed whether and how preschool teachers tailored their speech to children of different language proficiency levels and linguistic backgrounds (monolingual English: n = 13; EAL: n = 10), using a suite of tools for analysing quantity and quality of speech. We found that teachers used more diverse vocabulary and more complex syntax with the monolingual children and children who were more proficient in English, showing sensitivity to individual children’s language capabilities and adapting their language use accordingly.
Article
Linguistic abnormalities can emerge early in the course of psychotic illness. Computational tools that quantify response similarity in standardized tasks such as the verbal fluency test could efficiently characterize the nature and functional correlates of these deficits. Participants with early-stage psychosis (n=20) and demographically matched controls without a psychiatric diagnosis (n=20) performed category and letter verbal fluency. Semantic similarity was measured via predicted context co-occurrence in a large text corpus using Word2Vec. Phonetic similarity was measured via edit distance using the VFClust tool. Responses were designated as clusters (related items) or switches (transitions to less related items) using similarity-based thresholds. Results revealed that participants with early-stage psychosis compared to controls had lower fluency scores, lower cluster-related semantic similarity, and fewer switches; mean cluster size and phonetic similarity did not differ by group. Lower fluency semantic similarity was correlated with greater speech disorganization (Communication Disturbances Index), although more strongly in controls, and correlated with poorer social functioning (Global Functioning: Social), primarily in the psychosis group. Findings suggest that search for semantically related words may be impaired soon after psychosis onset. Future work is warranted to investigate the impact of language disturbances on social functioning over the course of psychotic illness.
Article
Full-text available
As large-scale, sophisticated open and distance learning environments expand in higher education globally, so does the need to support learning at scale in real time. Valid, reliable rubrics of critical discourse are an essential foundation for developing artificial intelligence tools that automatically analyse learning in educator-student dialogue. This article reports on a validation study where discussion transcripts from a target massive open online course (MOOC) were categorised into phases of cognitive presence to cross validate the use of an adapted rubric with a larger dataset and with more coders involved. Our results indicate that the adapted rubric remains stable for categorising the target MOOC discussion transcripts to some extent. However, the proportion of disagreements between the coders increased compared to the previous experimental study with fewer data and coders. The informal writing styles in MOOC discussions, which are not as prevalent in for-credit courses, caused ambiguities for the coders. We also found most of the disagreements appeared at adjacent phases of cognitive presence, especially in the middle phases. The results suggest additional phases may exist adjacent to current categories of cognitive presence when the educational context changes from traditional, smaller-scale courses to MOOCs. Other researchers can use these findings to build automatic analysis applications to support online teaching and learning for broader educational contexts in open and distance learning. We propose refinements to methods of cognitive presence and suggest adaptations to certain elements of the Community of Inquiry (CoI) framework when it is used in the context of MOOCs.
Article
Full-text available
Although researchers have investigated technical adequacy and usability of written-expression curriculum-based measures (WE-CBM), the economic implications of different scoring approaches have largely been ignored. The absence of such knowledge can undermine the effective allocation of resources and lead to the adoption of suboptimal measures for the identification of students at risk for poor writing outcomes. Therefore, we used the Ingredients Method to compare implementation costs and cost-effectiveness of hand-calculated and automated scoring approaches. Data analyses were conducted on secondary data from a study that evaluated predictive validity and diagnostic accuracy of quantitative approaches for scoring WE-CBM samples. Findings showed that automated approaches offered more economic solutions than hand-calculated methods; for automated scores, the effects were stronger when the free writeAlizer R package was employed, whereas for hand-calculated scores, simpler WE-CBM metrics were less costly than more complex metrics. Sensitivity analyses confirmed the relative advantage of automated scores when the number of classrooms, students, and assessment occasions per school year increased; again, writeAlizer was less sensitive to the changes in the ingredients than the other approaches. Finally, the visualization of the cost-effectiveness ratio illustrated that writeAlizer offered the optimal balance between implementation costs and diagnostic accuracy, followed by complex hand-calculated metrics and a proprietary automated program. Implications for the use of hand-calculated and automated scores for the universal screening of written expression with elementary students are discussed.
Article
Task design features have different effects on second language (L2) production and can be adopted for different pedagogical purposes. However, the synergistic effects of task features were left unexplored in the extant task-based literature. The present study investigated the synergistic effects of two task design features, namely, prior knowledge and reasoning demands, on the writing performance of Chinese learners of English as a foreign language (EFL). Fifty EFL learners were invited to complete two writing tasks, with varying reasoning demands, under one of two conditions, namely, with prior knowledge available or without prior knowledge available. Their written texts were analysed in terms of complexity, accuracy, fluency, and communicative adequacy, to reflect the multi-componential nature of task performance. The results revealed that increasing reasoning demands reduced syntactic complexity. Furthermore, the availability of prior knowledge resulted in greater lexical sophistication. The findings also showed that the interaction of prior knowledge and reasoning demands led to substantial effects on lexical diversity, lexical sophistication, and communicative adequacy. These findings are interpreted in light of Skehan’s Limited Capacity Model and Robinson’s Cognition Hypothesis. Suggestions are provided for the direction of further research on the influence of task complexity on EFL writing performance.
Article
Comprehensibility (readability) is understood as the ease with which a certain reader can conduct the processes needed to comprehend a certain text in a certain situation. Comprehensibility is a special form of fluency and has been shown to have a considerable influence on comprehension. Based on fluency theory and the four-phase model of interest development, hypotheses are derived regarding the positive influence of comprehensibility on comprehension, interestingness, and interest. A study with N = 302 university students and 15 texts showed substantial effects of comprehensibility on all dependent variables, regardless of which of three instruments was used to assess comprehensibility: one of two comprehensibility questionnaires or the LIX readability formula. The results highlight the importance of fluency for the design of learning materials.
Article
Full-text available
Discourse connection is a challenging aspect of writing in a second language. This study seeks to investigate the effects of two classroom instructions on discourse connection in writing for EFL college students, focusing on their argumentative writing. Three classes were exposed to different pre-task conditions: receiving reading materials that provide content support for the writing, receiving planning instructions on effective outlining, and receiving no resources. The results showed that the instructions helped students attain better overall coherence in writing. However, noticeable differences between the two experimental groups emerged in terms of cohesion features. The reading group was found to employ more lexical cohesion devices in writing than the outline group, which indicated a heightened genre awareness. This inquiry helped us identify the reading group’s alignment with content support materials, particularly the change in stance as a factor that contributes to a higher level of lexical cohesion in writing.
Article
This study examined the moderating role of two individual difference factors, metacognitive awareness of listening and motivation, in young second language (L2) learners’ incidental vocabulary acquisition from listening to stories. Participants were 66 fifth-grade English as a Foreign Language learners in South Korea who were randomly assigned to one of two groups: listening to stories or control. A vocabulary meaning recognition test was administered as a pretest, posttest, and delayed posttest. Self-reported questionnaires were employed to assess participants’ metacognitive awareness and motivation. Metacognitive awareness of listening, or more specifically, mental translation strategies, were shown to moderate the effects of treatment such that L2 learners who indicated greater awareness of translation strategies learned more vocabulary from listening to stories than L2 learners who had less awareness of these strategies. Motivation also moderated the effects of treatment such that L2 learners who had higher intrinsic motivation to learn English were able to acquire more vocabulary through listening to stories than learners who were less motivated.
Article
Full-text available
The use of computer automatic text analysis in the study of the elders' discourse is an essential application of artificial intelligence in the fi eld of Gerontolinguisitcs. Coh-Metrix and LIWC are two most commonly-used automatic text analysis tools, which have been widely used in the study of Gerontolinguistics abroad. Coh-Metrix evaluates the coherence and cohesion of the elders' discourse from the perspective of discourse structure features. LIWC mainly measures the elders' vocabulary to investigate their thinking mode, inner state, and personality characteristics. Both tools demonstrate the feasibility of computer automated text analysis for early diagnosis of dementia with important clinical implications. Future research can focus on the automatic transcription and segmentation, conducting long-term follow-up research on dementia patients to improve the accuracy of diagnosis and early detection of cognitive function changes in clinical trials. This paper also suggests to speed up the construction of a Chinese elders' discourse-tagged corpus and develop automatic text analysis tools for the Chinese language.
Article
The aim of the present study is twofold: (1) to assess the degree of register flexibility in advanced second language (L2) learners of English and (2) to determine whether and to what extent this flexibility is impacted by inter-individual variability in experiential factors and personality traits. Register flexibility is quantitatively measured as the degree of differentiation in the use of linguistic complexity – gauged by a range of lexical, syntactic, and information-theoretic complexity measures – across three writing tasks. At the methodological level, we aim to demonstrate how a corpus-based approach combined with natural language processing (NLP) techniques and a within-subjects design can be a valuable complement to experimental approaches to language adaptation.
Article
The reader's ability to connect new information to existing knowledge is crucial when reading a text. Nonetheless, text complexity, in many ways, is more linguistic than cognitive. It encompasses the degree of sophistication, and how challenging a reading section is. Depending on the section, such difficulty may appear on the vocabulary level, in the organizational structure, or with coherence, and cohesion. In the globalized world of scientific communication, research articles published in non-Anglophone academic journals require English abstracts to access an international database and citation possibilities. This paper describes the syntactic complexity in journal research article abstracts. A corpus of abstracts written in English and published in Anglophone and non-Anglophone contexts were sampled. The English sub-corpora underwent software-based text analysis using fourteen syntactic complexity measures with the second language (L2) Syntactic Complexity Analyzer (Lu, 2010). Significant differences appeared in only four of the fourteen syntactic indices between texts in Anglophone and non-Anglophone journals, and out of these fourteen measures, non-native groups reported thirteen lower mean values. The study affords insights for L2 writing research to produce accurate texts in content and structure. Ideally, findings will uncover pedagogical implications and applications for academic writing instructions.
Article
Full-text available
The study presents an overview of discursive complexology, an integral paradigm of linguistics, cognitive studies and computer linguistics aimed at defining discourse complexity. The article comprises three main parts, which successively outline views on the category of linguistic complexity, history of discursive complexology and modern methods of text complexity assessment. Distinguishing the concepts of linguistic complexity, text and discourse complexity, we recognize an absolute nature of text complexity assessment and relative nature of discourse complexity, determined by linguistic and cognitive abilities of a recipient. Founded in the 19th century, text complexity theory is still focused on defining and validating complexity predictors and criteria for text perception difficulty. We briefly characterize the five previous stages of discursive complexology: formative, classical, period of closed tests, constructive-cognitive and period of natural language processing. We also present the theoretical foundations of Coh-Metrix, an automatic analyzer, based on a five-level cognitive model of perception. Computing not only lexical and syntactic parameters, but also text level parameters, situational models and rhetorical structures, Coh-Metrix provides a high level of accuracy of discourse complexity assessment. We also show the benefits of natural language processing models and a wide range of application areas of text profilers and digital platforms such as LEXILE and ReaderBench. We view parametrization and development of complexity matrix of texts of various genres as the nearest prospect for the development of discursive complexology which may enable a higher accuracy of inter- and intra-linguistic contrastive studies, as well as automating selection and modification of texts for various pragmatic purposes.
Chapter
The COVID-19 pandemic has affected teachers’ practices at all education levels worldwide. Alternative educational practices are, in a smaller or larger extent, implemented online in response to the global pandemic, including early childhood and primary education. Access to recent research outcomes on effective approaches to quality online teaching is fundamental in preschool and in the first critical years of primary school. In this chapter, we provide a review of the key challenges that online classes pose for young children and teachers. In parallel we discuss, based on professional and research-informed insights, best practice principles for online teaching to support preschool and primary school teachers to transform online approaches into effective teaching practices for meeting children’s needs. Both the challenges and the effective online approaches are grouped under two main headings with each heading being related to several outcomes. A major challenge occurring during online classes, falling under the first heading, is the limited face-to-face interaction between learners and teachers. The second challenge is concerned with difficulties in oral and written language. This chapter concludes with a reflection on the implications for the use of best practice principles for online teaching in the early childhood and primary school setting.
Chapter
Previous research on automated analyses of written texts has focused on detecting lexical, collocational, and grammatical errors in written texts and on identifying linguistic features of written texts that are discriminative of proficiency levels or predictive of writing quality. This chapter starts with a brief description of the history of this line of research and the cumulative knowledge it has generated. It then proposes an agenda for future research on automated analyses of written texts. Three research questions are raised: 1) How can we improve the accuracy of natural language processing tools on texts produced by second language learners? 2) How can we automatically assess whether the linguistic features deployed in written texts are appropriate and effective for the rhetorical functions they are used to realize? 3) How can the capability to automatically identify form-function mappings and assess their appropriateness and effectiveness in written texts be utilized to inform and promote the teaching and learning of second language writing? Two empirical studies are suggested for each research question, each taking a different approach. The chapter concludes with a discussion of the critical importance of a functional turn in research on automated analyses of written texts.
Article
Virtually all researchers understand the requirement of presenting their studies in peer-reviewed English-medium journals. Russian scientific writers understand this necessity too; however, evidence suggests that these particular researchers are under-performing relative to similar non-native English speakers. The considerable challenge Russians face centers on articulating their ideas in English in the way that meets the norms and expectations of their international discourse community. That is, their writing is characterized as wordy, cumbersome, too academese, and syntactically complex. These issues need to be addressed in the early stages of developing writing skills. In this study, we address discourse differences between the scientific writing of Russian engineering students and that of international experts. Using the computational linguistics tools Coh-Metrix and Gramulator, we compare a corpus of students’ manuscripts with a similar corpus of experts’ published papers. We focused on six conceptual categories: readability, writing quality, cohesion, syntax, word choice, and genre purity. The overall results suggest that student writing differs significantly for multiple characteristics of text and discourse. A discriminant analysis provided a model that successfully predicts group membership and helps identify the most important issues in Russian student writing. Measures such as noun phrase density, genre purity, word age of acquisition, and variance in sentence length were found to be significant positive predictors of Russian student writing, whereas lexical diversity, adversative/contrastive connectives, adverbial phrase density, and word concreteness were significant positive predictors of expert writing. Our analysis allowed us to provide guidance for instructors, materials designers, as well as for technological assessment tools.
Article
Full-text available
In this paper, we highlight the importance of distilling the computational assessments of constructed responses to validate the indicators/proxies of constructs/trins using an empirical illustration in automated summary evaluation. We present the validation of the Inbuilt Rubric (IR) method that maps rubrics into vector spaces for concepts’ assessment. Specifically, we improved and validated its scores’ performance using latent variables, a common approach in psychometrics. We also validated a new hierarchical vector space, namely a bifactor IR. 205 Spanish undergraduate students produced 615 summaries of three different texts that were evaluated by human raters and different versions of the IR method using latent semantic analysis (LSA). The computational scores were validated using multiple linear regressions and different latent variable models like CFAs or SEMs. Convergent and discriminant validity was found for the IR scores using human rater scores as validity criteria. While this study was conducted in the Spanish language, the proposed scheme is language-independent and applicable to any language. We highlight four main conclusions: (1) Accurate performance can be observed in topic-detection tasks without hundreds/thousands of pre-scored samples required in supervised models. (2) Convergent/discriminant validity can be improved using measurement models for computational scores as they adjust for measurement errors. (3) Nouns embedded in fragments of instructional text can be an affordable alternative to use the IR method. (4) Hierarchical models, like the bifactor IR, can increase the validity of computational assessments evaluating general and specific knowledge in vector space models. R code is provided to apply the classic and bifactor IR method.
Article
Full-text available
This article discusses a distinction present in many theories of relation categorization: the Source of Coherence, which distinguishes between semantic and pragmatic relations. Existing categorizations of both relations and connectives show a reasonable consensus on prototypical examples. Still, there many ambiguous cases. How can the distinction be clarified? And to what extent does it depend on the context in which relations occur?: A more precise text-linguistic definition is presented in the form of a paraphrase test, intended to systematically check analysts' intuitions. A paraphrase experiment shows that language users recognize the difference between clear cases in context. More importantly, the type of context (descriptive, argumentative) appeared not to influence the interpretation of clear cases, whereas subjects' judgements of ambiguous relations are influenced by the type of context. A corpus study further illustrates the link between text type and relation type: Informative texts are dominated by semantic relations, persuasive and expressive texts are dominated by pragmatic relations. © 1997 Ablex Publishing Corporation All rights of reproduction in any form reserved.
Conference Paper
Full-text available
This study investigates the importance of human evaluations of coherence in predicting human judgments of holistic essay quality. Of secondary interest is the potential for computational indices of cohesion and coherence to model human judgments of coherence. The results indicate that human judgments of coherence are the most predictive features of holistic essay scores and that computational indices related to text structure, semantic coherence, lexical sophistication, and grammatical complexity best explain human judgments of text coherence. These findings have important implications for understanding the role of coherence in writing quality.
Article
Full-text available
The purpose of this research was to determine if individual differences in working‐memory capacity are related to the ways readers use inferences to facilitate text comprehension. Two groups of subjects, who differed in working‐memory span, read difficult narrative passages a few sentences at a time. The subjects furnished “thinking out loud” protocols of their emerging interpretations. Idea units from the subjects’ protocols were categorized with particular attention to those idea units that expressed a general or specific elaborative inference. Several differences between the two groups of subjects emerged. Low‐memory‐span subjects produced significantly more specific elaborations than the high‐span readers. In addition, most of the specific elaborations that were produced by the high‐span readers were toward the end of a passage. Low‐span readers had a more even distribution of specific elaborations throughout their protocols. Thus, readers with adequate working‐memory capacity can keep their interpretations more open‐ended and await more information from the text. Readers with low working‐memory capacity appear to face a tradeoff between maintaining an overall passage representation (global coherence) and maintaining sentence‐to‐sentence connections (local coherence). An analysis of the number of inferences that represented new thematic interpretations suggested that some low‐span readers in our sample emphasized global coherence and others emphasized local coherence.
Article
Full-text available
Abstract This paper ,introduces ,a new, algorithm ,for ,calculating semantic similaritywithin and between texts. Werefer to this algorithm as NLS, for Non-Latent Similarity. This algorithm makes,use of a ,second-order similarity matrix ,(SOM) based onthe,cosine of the ,vectors from a ,first-order (non-latent) matrix. This first-order matrix (FOM) could be generated in any number ,of ways; here we ,used a method ,modified ,from Lin (1998). Our question ,regarded the ability of NLS to predict word associations. We,compared,NLS to both Latent Semantic Analysis (LSA) and the FOM. Across two sets of norms, we found that LSA, NLS, and FOM were equally predictive ofas sociatesto modifiersand verbs. However, the NLS and FOM algorithms better predicted associates tonouns than did LSA.
Article
Full-text available
Identifying given and new information within a text has long been addressed as a research issue. However, there has previously been no accurate computational method for assessing the degree to which constituents in a text contain given versus new information. This study develops a method for automatically categorizing noun phrases into one of three categories of givenness/newness, using the taxonomy of Prince (1981) as the gold standard. The central computational technique used is span (Hu et al., 2003), a derivative of latent semantic analysis (LSA). We analyzed noun phrases from two expository and two narrative texts. Predictors of newness included span as well as pronoun status, determiners, and word overlap with previous noun phrases. Logistic regression showed that span was superior to LSA in categorizing noun-phrases, producing an increase in accuracy from 74% to 80%.
Article
Full-text available
Just as a sentence is far more than a mere concatenation of words, a text is far more than a mere concatenation of sentences. Texts contain pertinent information that co-refers across sentences and paragraphs [30]; texts contain relations between phrases, clauses, and sentences that are often causally linked [21], [51], [56]; and texts that depend on relating a series of chronological events contain temporal features that help the reader to build a coherent representation of the text [19], [55]. We refer to textual features such as these as cohesive elements, and they occur within paragraphs (locally), across paragraphs (globally), and in forms such as referential, causal, temporal, and structural [18], [22], [36]. But cohesive elements, and by consequence cohesion, does not simply feature in a text as dialogues tend to feature in narratives, or as cartoons tend to feature in newspapers. That is, cohesion is not present or absent in a binary or optional sense. Instead, cohesion in text exists on a continuum of presence, which is sometimes indicative of the text-type in question [12], [37], [41] and sometimes indicative of the audience for which the text was written [44], [47]. In this chapter, we discuss the nature and importance of cohesion; we demonstrate a computational tool that measures cohesion; and, most importantly, we demonstrate a novel approach to identifying text-types by incorporating contrasting rates of cohesion.
Article
Full-text available
Self-explanation refers to explaining text to oneself while reading. We examined the quality of middle-school students' self-explanations of a science text which were collected while they were engaged with iSTART, an interactive computer program that teaches reading strategies. Our analysis included an examination of how the quality of paraphrases (i.e., restating the sentence) and elaborations (i.e., drawing on prior knowledge) were mediated by individual difference measures (reading comprehension skill and prior science knowledge) and sentence difficulty (based on information density). Reading comprehension skill was an important determinant in the production of paraphrases and elaborations. In addition, reading skill affected the quality of self-explanations produced; that is, skilled readers produced better quality elaborations (e.g., elaborations that helped build a global understanding of the text). Prior knowledge was also important, with high-knowledge students providing more 'distant' paraphrases. Finally, the production of elaborations and paraphrases was influenced by sentence difficulty. Fewer accurate elaborations were produced for the more difficult sentences. Implications for individual differences and sentence difficulty in the production of self-explanations are discussed.
Article
Full-text available
This study examined how the contribution of self-explanation to science text comprehension is affected by the cohesion of a text at a local level. Psychology undergraduates read and self-explained a science text with either low or high local cohesion. Local cohesion was manipulated by the presence or absence of connectives and referential words or phrases that explicitly link successive sentences. After the self-explanation activity, participants answered open-ended comprehension questions about the text. Participants in the high local cohesion condition produced higher quality explanations, including more local bridging self-explanations, than those in the low local cohesion condition. However, these explanations, although higher in quality, did not improve comprehension. Performance on text-based comprehension questions was better in the low local cohesion condition. In addition, the correlation between self-explanation quality and comprehension performance was generally higher in the low local cohesion condition compared to the high local cohesion condition, even after factoring out participants' level of topic-relevant knowledge. These data suggest that the contribution of self-explanation to comprehension is larger when the text lacks certain cues that facilitate making connections between successive ideas in a text. Further, the results imply that a key contribution of self-explanation to text comprehension is to induce active inference processes whereby readers fill in conceptual gaps in challenging texts.
Article
Full-text available
Connectives are cohesive devices that signal the relations between clauses and are critical to the construction of a coherent representation of a text's meaning. The authors investigated young readers' knowledge, processing, and comprehension of temporal, causal, and adversative connectives using offline and online tasks. In a cloze task, 10-year-olds were more accurate than 8-year-olds on temporal and adversative connectives, but both age groups differed from adult levels of performance (Experiment 1). When required to rate the “sense” of 2-clause sentences linked by connectives, 10-year-olds and adults were better at discriminating between clauses linked by appropriate and inappropriate connectives than were 8-year-olds. The 10-year-olds differed from adults only on the temporal connectives (Experiment 2). In contrast, online reading time measures indicated that 8-year-olds' processing of text is influenced by connectives as they read, in much the same way as 10-year-olds'. Both age groups read text more quickly when target 2-clause sentences were linked by an appropriate connective compared with texts in which a connective was neutral (and), inappropriate to the meaning conveyed by the 2 clauses, or not present (Experiments 3 and 4). These findings indicate that although knowledge and comprehension of connectives is still developing in young readers, connectives aid text processing in typically developing readers. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Conference Paper
Full-text available
This study investigates the roles of cohesion and coherence in evaluations of essay quality. Cohesion generally has a facilitative effect on text comprehension and is assumed to be related to essay coherence. By contrast, recent studies of essay writing have demonstrated that computational indices of cohesion are not predictive of evaluations of writing quality. This study investigates expert ratings of individual text features, including coherence, in order to examine their relation to evaluations of holistic essay quality. The results suggest that coherence is an important attribute of overall essay quality, but that expert raters evaluate coherence based on the absence of cohesive cues in the essays rather than their presence. This finding has important implications for text understanding and the role of coherence in writing quality.
Article
Full-text available
This study investigated second language (L2) lexical development in the spontaneous speech of six adult, L2 English learners in a 1-year longitudinal study. One important aspect of lexical development is lexical organization and depth of knowledge. Hypernymic relations, the hierarchical relationships among related words that vary in relation to their semantic specificity (e.g., Golden Retriever vs. dog vs. animal), are an important indicator of both lexical organization and depth of knowledge. Thus, this study used hypernymy values from the WordNet database and a lexical diversity measure to analyze lexical development. Statistical analyses in this study indicated that both hypernymic relations and lexical diversity in L2 learners increase over time. Additionally, lexical diversity and hypernymic values correlated significantly, suggesting that as learners' lexicons grow, learners have access to a wider range of hypernymy levels. These findings are discussed in relation to developing abstractness in language, extending hypernymic knowledge, and the growth of lexical networks.
Article
Full-text available
Comprehension emerges as the results of inference and strategic processes that support the construction of a coherent mental model for a text. However, the vast majority of comprehension skills tests adopt a format that does not afford an assessment of these processes as they operate during reading. This study assessed the viability of the Reading Strategy Assessment Tool (RSAT), which is an automated computer-based reading assessment designed to measure readers’ comprehension and spontaneous use of reading strategies while reading texts. In the tool, readers comprehend passages one sentence at a time, and are asked either an indirect (“What are your thoughts regarding your understanding of the sentence in the context of the passage?”) or direct (e.g., why X?) question after reading each pre-selected target sentence. The answers to the indirect questions are analyzed on the extent that they contain words associated with comprehension processes. The answers to direct questions are coded for the number of content words in common with an ideal answer, which is intended to be an assessment of emerging comprehension. In the study, the RSAT approach was shown to predict measures of comprehension comparable to standardized tests. The RSAT variables were also shown to correlate with human ratings. The results of this study constitute a “proof of concept” and demonstrate that it is possible to develop a comprehension skills assessment tool that assesses both comprehension and comprehension strategies. KeywordsComprehension assessment–Comprehension processes and strategies–Assessment
Conference Paper
Full-text available
Natural language processing and statistical methods were used to identify linguistic features associated with the quality of student-generated paragraphs. Linguistic features were assessed using Coh-Metrix. The resulting computational models demonstrated small to medium effect sizes for predicting paragraph quality: introduction quality r2 = .25, body quality r2 = .10, and conclusion quality r2 = .11. Although the variance explained was somewhat low, the linguistic features identified were consistent with the rhetorical goals of paragraph types. Avenues for bolstering this approach by considering individual writing styles and techniques are considered. Copyright © 2011, Association for the Advancement of Artificial Intelligence. All rights reserved.
Conference Paper
Full-text available
The purpose of this study is to evaluate the validity of measuring grammatical diversity with a specifically designed Lexical Diversity Assessment Tool (LDAT). A secondary objective is to use LDAT to determine if the level of difficulty assigned to English as a Second Language (ESL) texts corresponds to increases in grammatical, lexical, and temporal diversity. Other methods of lexical diversity assessment, such as type-token ratio (TTR), have been used with varying accuracy in an effort to determine the complexity or level of texts. We analyzed 120 ESL texts independently assigned by their sources to one of four levels (Beginner, Lower- intermediate, Upper-intermediate, and Advanced). We demonstrated that LDAT significantly reflected the grammatical diversity within these texts. While the findings conflicted with the prediction that grammatical and lexical diversity would increase with assigned level, we concluded that the implementation of LDAT in text design could provide reliable assessments of grammatical diversity.
Article
Full-text available
Four experiments were conducted to assess two models of topic sentencehood identification: the derived model and the free model. According to the derived model, topic sentences are identified in the context of the paragraph and in terms of how well each sentence in the paragraph captures the paragraph's theme. In contrast, according to the free model, topic sentences can be identified on the basis of sentential features without reference to other sentences in the paragraph (i.e., without context). The results of the experiments suggest that human raters can identify topic sentences both with and without the context of the other sentences in the paragraph. Another goal of this study was to develop computational measures that approximated each of these models. When computational versions were assessed, the results for the free model were promising; however, the derived model results were poor. These results collectively imply that humans' identification of topic sentences in context may rely more heavily on sentential features than on the relationships between sentences in a paragraph.
Article
Full-text available
We tested a computer-based procedure for assessing reader strategies that was based on verbal protocols that utilized latent semantic analysis (LSA). Students were given self-explanation-reading training (SERT), which teaches strategies that facilitate self-explanation during reading, such as elaboration based on world knowledge and bridging between text sentences. During a computerized version of SERT practice, students read texts and typed self-explanations into a computer after each sentence. The use of SERT strategies during this practice was assessed by determining the extent to which students used the information in the current sentence versus the prior text or world knowledge in their self-explanations. This assessment was made on the basis of human judgments and LSA. Both human judgments and LSA were remarkably similar and indicated that students who were not complying with SERT tended to paraphrase the text sentences, whereas students who were compliant with SERT tended to explain the sentences in terms of what they knew about the world and of information provided in the prior text context. The similarity between human judgments and LSA indicates that LSA will be useful in accounting for reading strategies in a Web-based version of SERT.
Book
Praise forEnvisioning the Survey Interview of the Future "This book is an excellent introduction to some brave new technologies . . . and their possible impacts on the way surveys might be conducted. Anyone interested in the future of survey methodology should read this book." -Norman M. Bradburn, PhD, National Opinion Research Center, University of Chicago "Envisioning the Survey Interview of the Future gathers some of the brightest minds in alternative methods of gathering self-report data, with an eye toward the future self-report sample survey. Conrad and Schober, by assembling a group of talented survey researchers and creative inventors of new software-based tools to gather information from human subjects, have created a volume of importance to all interested in imagining future ways of interviewing." -Robert M. Groves, PhD, Survey Research Center, University of Michigan This collaboration provides extensive insight into the impact of communication technology on survey research As previously unimaginable communication technologies rapidly become commonplace, survey researchers are presented with both opportunities and obstacles when collecting and interpreting data based on human response. Envisioning the Survey Interview of the Future explores the increasing influence of emerging technologies on the data collection process and, in particular, self-report data collection in interviews, providing the key principles for using these new modes of communication. With contributions written by leading researchers in the fields of survey methodology and communication technology, this compilation integrates the use of modern technological developments with established social science theory. The book familiarizes readers with these new modes of communication by discussing the challenges to accuracy, legitimacy, and confidentiality that researchers must anticipate while collecting data, and it also provides tools for adopting new technologies in order to obtain high-quality results with minimal error or bias. Envisioning the Survey Interview of the Future addresses questions that researchers in survey methodology and communication technology must consider, such as: How and when should new communication technology be adopted in the interview process? What are the principles that extend beyond particular technologies? Why do respondents answer questions from a computer differently than questions from a human interviewer? How can systems adapt to respondents' thinking and feeling? What new ethical concerns about privacy and confidentiality are raised from using new communication technologies? With its multidisciplinary approach, extensive discussion of existing and future technologies, and practical guidelines for adopting new technology, Envisioning the Survey Interview of the Future is an essential resource for survey methodologists, questionnaire designers, and communication technologists in any field that conducts survey research. It also serves as an excellent supplement for courses in research methods at the upper-undergraduate or graduate level.
Article
Cognitive load theory investigates instructional consequences of processing limitations of the human cognitive system. Because of these limitations, text processing may result in an excessive cognitive load that would influence comprehension and learning from texts, as well as change learner affective states. This chapter reviews basic assumptions of cognitive load theory, their consequences for optimizing the design of information presentations, and implications for processing written and spoken texts.
Article
Cloze tests are frequently used to measure reading comprehension. Although cloze is generally accepted as a global measure of reading comprehension, and cloze test results are reportedly well correlated with those of traditional comprehension tests, the question of which specific components of reading comprehension are measured by cloze tests has not been adequately explored. This study assessed the sensitivity of cloze passages as measures of the ability to use information across sentence boundaries. Three experiments were carried out. Standard cloze passages were administered to subjects; these same passages, with scrambled sentence sequences, were administered to other subjects; still other passages were used which were constructed by embedding single sentences from the original passages in other, non-supportive text. No performance differences due to sentence order or to the presence of supportive text were found, even with a timed cloze test, suggesting an important limitation on cloze as a measure of this aspect of reading comprehension. Implications of the findings are discussed./// [French] Les tests "cloze" sont souvent utilisés pour mesurer la compréhension de lecture. Bien que la méthode "cloze" soit généralement acceptée comme une mesure globale de compréhension de lecture, et que les résultats des tests "cloze" soient notamment bien mis en corrélation avec ceux des tests de compréhension traditionnels, la question concernant les composantes spécifiques de compréhension de lecture qui sont mesurées par les tests "cloze", n'a pas été explorée de manière adéquate. Cette étude a évalué la sensitivité des passages "cloze" comme mesures de compétence pour l'utilisation d'information à travers les limites de la phrase. On a effectué trois expériences. On a administré des passages "cloze" standard à des sujets; on a administré ces mêmes passages avec des séquences de phrases brouillées, a d'autres sujets; et de plus on a utilisé d'autres passages qui étaient construits à l'aide de phrases uniques incorporées à partir des passages originaux dans d'autres textes de non-support. On n'a pas trouvé de différences d'accomplissement dues à l'ordre des phrases ou à la presence de textes de support, même avec un test "cloze" calculé, suggérant une restriction importante sur la méthode "cloze" comme une mesure de cet aspect de compréhension de lecture. Les implications de ces découvertes sont en cours de discussion./// [Spanish] Con frecuencia, se utilizan tests con la técnica "cloze" para medir comprensión de lectura. Aunque la técnica "cloze" se acepta generalmente como una medida universal de comprensión de lectura--y los resultados de los tests "cloze" indican correlación positiva con los tests tradicionales de comprensión--la cuestión de qué componentes concretos de comprensión de lectura son medidos por los tests "cloze" no ha sido debidamente explorada. Este estudio evaluó la capacidad de pasajes "cloze" de medir la destreza de utilizar información más allá de los límites de la oración. Se completaron 3 experimentos. Se administraron pasajes "cloze" normales a individuos; los mismos pasajes, con secuencias de oraciones interpuestas, se administraron a otros individuos; y se utilizaron todavía otros pasajes que incluían oraciones individuales de los pasajes originales, pero en otro texto no relacionado. No se observaron diferencias de destreza debido a cierto orden de oraciones o a la presencia de un texto relacionado, hasta con un test "cloze" con límite de tiempo, sugiriendo una importante limitación de "cloze" como medida en este aspecto de comprensión de lectura. Se discuten implicaciones de los resultados.
Article
The authors investigated effects of text coherence and active engagement on students' comprehension of textbook information. A revised version of a textbook passage about a climatological phenomenon represented enhanced textual coherence; a thinking aloud procedure represented active engagement. There were four conditions in each of two studies: original or revised text combined with silent reading or thinking aloud. In Study 1, sixth graders were asked to recall what they had read and answer open-ended questions immediately after reading. Study 2 extended Study 1 to include varying levels of student ability and retention of information a week later. Results suggest a continuum of increased performance from original silent text, to original text with thinking aloud, to revised text read silently, and finally revised text with thinking aloud. The revised text was shown to bring performance of middle-level readers close to that of their upper-level counterparts reading the textbook version. Also, students who read the revised text tended to connect recalled information, whereas students who read the original text tended to list it. /// [French] L'auteurs ont étudié les effets de la cohérence du texte et d'un réel investissement sur la compréhension des informations fournies par un manuel. Pour ce faire, on a réécrit un passage de manuel relatif à un phénomène climatologique et demandé de penser à haute voix. Chaque étude comportait quatre conditions avec texte original et texte réécrit combinés avec lecture silencieuse ou avec pensée à voix haute. La première étude, en sixìeme année, comportait rappel et questions ouvertes. La seconde étude comportait des élèves de différents niveaux et rappel après une semaine. Les performances croissent du texte original lu silencieusement au texte original avec pensée à haute voix, de la version réécrite lue silencieusement à la version réécrite avec pensée à haute voix. Avec le texte réécrit, les élèves moyens obtiennent des résultats voisins de ceux de niveau élevé lisant la version originale. De même, la version réécrite conduit à un rappel avec information liée, et le texte original à un rappel en liste. /// [Spanish] Los autores investigaron los efectos de la coherencia textual y el compromiso activo sobre la comprensión de la información de libros de texto. La versión revisada de un pasaje de un libro de texto acerca de un fenómeno meteorológico representó la coherencia textual mejorada; el procedimiento de pensar en voz alta representó el compromiso activo. Hubo cuatro condiciones en cada uno de los dos estudios: texto original o revisado combinado con lectura silenciosa o pensar en voz alta. En el Estudio 1, se pidió a niños de sexto grado que relataran lo que habían leído y que respondieran a preguntas abiertas inmediatamente después de la lectura. El Estudio 2 extiende el Estudio 1, incluyendo varios niveles de habilidad de los estudiantes y la retención de la información una semana más tarde. Los resultados sugieren un continuo en el que el desempeño va mejorando desde la lectura silenciosa del texto original, pensar en voz alta con el texto original, la lectura silenciosa del texto revisado, y finalmente pensar en voz alta con el texto revisado. Además, la versión revisada coloca el desempeño de los lectores de nivel medio cerca del de los lectores de nivel alto cuando leen la versión original. Por otra parte, los estudiantes que leyeron el texto revisado mostraron la tendencia a conectar la información recordada, mientras que los que leyeron el texto original tendieron a hacer un listado de la información. /// [German] Die autoren untersuchten die Wirkungen von Textkohärenz und des Einbeziehens des Schülerverstehens in die Schulbuchinformation. In einer überarbeiteten Fassung einer Schulbuchpassage (über ein klimatologisches Phänomen) wurde die Textkohärenz verbessert, die Vorgehensweise des lauten Denkens umfaßte hier das Einbeziehen der Schüler. Jede der beiden Studien unterlag vier Bedingungen: ursprünglicher oder revidierter Texte kombiniert mit stillem oder lauten Lesen. In Studie 1 wurden Sechstklässler gebeten, sich an das Gelesene zu erinnem und direkt nach dem Lesen Fragen zu beantworten. Studie 2 stellte eine Erweiterung dar, insofern als sie verschiedene Ebenen der Schülerfähigkeiten und das Festhalten von Informationen eine Woche später umfaßte. Die Ergebnisse zeigen ein Kontinuum zunehmender Performanz ausgehend von stiller Lektüre zum laut gedachten Text, dem still gelesenen überarbeiteten Text und schließlich dem laut gesprochenen überarbeiteten Text. Der überarbeitete Text ermöglichte es mittelmäßigen Lesern, sich an das Niveau der besseren Leser heranzuarbeiten. Schüler, die den überarbeiteten Text gelesen hatten, konnten die Informationen besser vernetzen als die Leser des ursprünglichen Textes.
Article
Two stories from basal readers were revised to improve their coherence without altering their plot. Although the revisions increased the difficulty of the passages as indexed by traditional readability formulas, they enhanced comprehension of both skilled and less skilled readers as indexed by recall and answers to forced-choice questions. Implications for assessing text difficulty are discussed. /// [French] On a révisé deux histoires à partir de lecteurs fondamentaux pour améliorer leur cohérence sans changer leur intrigue. Bien que les révisions aient augmenté la difficulté des passages classés selon des formules de lisibilité traditionnelles, elles ont accru la compréhension des lecteurs compétents et moins compétents classés selon le rappel et les résponses à des questions de choix forcé. Les implications pour évaluer la difficulté de texte sont en cours de discussion. /// [Spanish] Se revisaron dos cuentos de libros de lectura básica para mejorar la coherencia, sin alterar la trama. Aunque las revisiones aumentaron la dificultad de los pasajes, según la clasificación de las fórmulas tradicionales de lecturabilidad, incrementaron la comprensión de los lectores capaces y la de los de menos destreza, según el índice de recuerdo y de las respuestas a preguntas de selección obligatoria. Se discuten las repercusiones para la evaluación de dificultades de texto.
Article
The purpose of this paper is to criticize the concept of cohesion as a measure of the coherence of a text. The paper begins with a brief overview of Halliday and Hasan's (1976) cohesion concept as an index of textual coherence. Next, the paper criticizes the concept of cohesion as a measure of textual coherence in the light of schema-theoretical views of text processing (e.g. reading) as an interactive process between the text and the reader. This criticism, which is drawn from both theoretical and empirical work in schema theory, attempts to show that text-analytic procedures such as Halliday and Hasan's cohesion concept, which encourage the belief that coherence is located in the text and can be defined as a configuration of textual features, and which fail to take the contributions of the text's reader into account, are incapable of accounting for textual coherence. The paper concludes with a caution to second language (EFL/ESL) teachers and researchers not to expect cohesion theory to be the solution to EFL/ESL reading/writing coherence problems at the level of the text.
Article
Reading times were collected for sentences in passages in order to examine how cognitive resources are distributed among different components of reading. Multiple regression analyses indicated that most of the reading time variance was predicted by macrostructure processing which integrates information from different sentences, as opposed to microstructure processing which includes the processing of words, syntax, and propositions. Experiment 1 revealed that slower readers require more time than faster readers to perform microstructure processing, but no differences were found for macrostructure components of reading. Experiment 2 revealed that variations in reading goals influence macrostructure processing but not microstructure processing. These findings suggest that functionally separate reading skills may be involved in microstructure versus macrostructure processing.
Article
The purpose of the present study was to conduct a comprehensive study of book genres used in preschool classrooms. Text titles gathered from the reading logs of 84 preschool teachers were analyzed and coded for genre (narrative, expository, or mixed). Expository or mixed texts were then further examined according to topics covered. Analyses indicated that (a) narrative texts dominated the genre of text being utilized in preschool classroom read-alouds, representing 82.3% of texts read, (b) among the 125 texts identified as expository or mixed genres, the topic of living creatures was the most common focus, and (c) informational texts on the topics of transportation and geography were read very infrequently. These findings suggest that informational texts are seldom read in preschool classrooms and that when children are exposed to them, the topics addressed do not reflect a wide range of information about the social and natural world.
Following up on recent work by Malvern and Richards (1997, this issue; McKee et al., 2000) concerning the measurement of lexical diversity through curve fitting, the present study compares the accuracy of five formulae in terms of their ability to model the type-token curves of written texts produced by learners and native speakers. The most accurate models are then used to consider unresolved issues that have been at the forefront of past research on lexical diversity: the relationship between lexical diversity and age, second language (L2) instruction, L2 proficiency, first language (L1) background, writing quality and vocabulary knowledge. The participants in the study comprise 140 Finnish-speaking and 70 Swedish-speakinglearners of English, and an additional group of 66 native English speakers. The data include written narrative descriptions of a silent film, and the results show that two of the curve-fitting formulae provide accurate models of the type-token curves of over 90% of the texts. The texts for which accurate models were obtained were subjected to further analyses, and the results indicate a clear relationship between lexical diversity and amount of instruction, but a more complicated relationship between lexical diversity and L1 background, writing quality and vocabulary knowledge.
Article
Educators know that an achievement gap exists between students of low-income and middle-income families, a gap that is especially evident in fourth grade and beyond. This essay explores issues related to this gap, including primary-level children being immersed in narrative text and, therefore, unprepared for the challenges of informational text and content-specific vocabulary; lack of available material children are interested in reading; and limited reading opportunities created by a focus on high-stakes, test-preparation regimens.
Article
This article investigates whether expectations about discourse genre influence the process and products of text comprehension. Ss read texts either with a literary story or with a news story as the purported genre. Subsequently, they verified statements pertaining to the texts. Two experiments demonstrated that Ss reading under a literary perspective had longer reading times, better memory for surface information, and a poorer memory for situational information than those reading under a news perspective. Regression analyses of reading times produced findings that were consistent with the memory data. The results support the notion that readers differentially allocate their processing resources according to their expectations about the genre of a text. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
This paper follows up on the work of Crossley, Louwerse, McCarthy & McNamara (2007), who conducted an exploratory study of the linguistic differences of simplified and authentic texts found in beginner level English as a Second Language (ESL) textbooks using the computational tool Coh-Metrix. The purpose of this study is to provide a more comprehensive study of second language (L2) reading texts than that provided by Crossley et al. (2007) by investigating the differences between the linguistic structures of a larger and more selective corpus of intermediate reading texts. This study is important because advocates of both approaches to ESL text construction cite linguistic features, syntax, and discourse structures as essential elements of text readability, but only the Crossley et al. (2007) study has measured the differences between these text types and their implications for L2 learners. This research replicates the methods of the earlier study. The findings of this study provide a more thorough understanding of the linguistic features that construct simplified and authentic texts. This work will enable material developers, publishers, and reading researchers to more accurately judge the values of simplified and authentic L2 texts as well as improve measures for matching readers to text.
Article
This experiment investigated comprehension of four types of anaphor (reference, ellipsis, substitution and lexical) in 7 to 8−year-old good and poor comprehenders, matched in decoding skills but differing in reading comprehension skill. Poor comprehenders performed less well than skilled comprehenders both in identifying antecedents of anaphors in a story, and in answering questions on the text which required anaphor resolution. Both groups performed more poorly as distance between anaphor and antecedent increased, and poor comprehenders were more adversely affected by distance than good comprehenders for ellipsis. Children's errors are used to suggest differences between the groups in processes of resolving anaphors, in terms of scanning text for appropriate antecedents and integrating text with world knowledge.