Article

Rewriting Content with GPT-4 to Support Emerging Readers in Adaptive Mathematics Software

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Large language models (LLMs) offer an opportunity to make large-scale changes to educational content that would otherwise be too costly to implement. The work here highlights how LLMs (in particular GPT-4) can be prompted to revise educational math content ready for large scale deployment in real-world learning environments. We tested the ability of LLMs to improve the readability of math word problems and then looked at how these readability improvements impacted learners, especially those identified as emerging readers. Working with math word problems in the context of an intelligent tutoring system (i.e., MATHia by Carnegie Learning, Inc), we developed an automated process that can rewrite thousands of problems in a fraction of the time required for manual revision. GPT-4 was able to produce revisions with improved scores on common readability metrics. However, when we examined student learning outcomes, the problems revised by GPT-4 showed mixed results. In general, students were more likely to achieve mastery of the concepts when working with problems revised by GPT-4 as compared to the original, non-revised problems, but this benefit was not consistent across all content areas. Further complicating this finding, students had higher error rates on GPT-4 revised problems in some content areas and lower error rates in others. These findings highlight the potential of LLMs for making large-scale improvements to math word problems but also the importance of additional nuanced study to understand how the readability of math word problems affects learning.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The third strand focuses on enhancing adaptive digital learning environments with LLMs. Norberg et al. (2024) utilize GPT-4 to revise educational math content with the aim of enhancing readability for deployment in an adaptive learning system. The researchers found that GPT-4 could improve readability metrics, such as word frequency, sentence complexity, and semantic similarity. ...
Preprint
Full-text available
This study presents a reinforcement learning-driven multi-agent AI tutor that advances beyond traditional intelligent tutoring systems (ITS) by integrating adaptive intervention selection, real-time engagement tracking, and multi-agent feedback refinement. Unlike rule-based ITS and static LLM-generated responses, our system dynamically adjusts instructional strategies based on student behavior, incorporating reinforcement learning (RL) for intervention optimization, neural knowledge tracing (NKT) for misconception prediction, and an engagement prediction model (EPM) to sustain student participation. Additionally, a multi-agent debate mechanism refines AI-generated explanations, enhancing clarity and pedagogical alignment. Experimental validation demonstrates that our approach improves intervention adaptability by 28.6%, reduces recurring student errors by 31.2%, and lowers dropout rates by 24.8%, surpassing existing ITS and static AI tutors. These findings indicate that reinforcement learning and multi-agent collaboration enable AI tutors to provide more responsive, personalized, and effective learning support. Beyond technical improvements, this work contributes to scalable, real-world AI tutoring solutions that align with established learning science principles. Future research should explore deployment in diverse educational settings and further refinements in balancing personalization with instructional equity.
Article
Full-text available
Technology plays an important role in mathematics education around the world. Artificial intelligence (AI) is one of the products of technology that is widely used in academic activities, which is also accompanied by many studies examining the use of Artificial intelligence in mathematics education. This paper aims to look at the research trends on AI in mathematics education. The Systematic Literature Review (SLR) method was used to review 35 articles from internationally reputable journals obtained from Scopus. The article search process used keywords namely AI, Artificial Intelligence, and mathematics education. The findings showed a sharp spike in the publication of AI articles in 2024, especially on chatbots. Chatbots have become the focus of AI research in mathematics education and are mostly conducted at the college level with student subjects. The application of AI in mathematics education is characterized by the function of AI to realize personalized learning. Finally, this paper recommends that future research should increase the diversity of research related to AI in mathematics education, especially among teachers and students at various levels given that students today can easily access AI. Debriefing on maintaining academic ethics is also needed to avoid misuse of AI in mathematics education.
Article
Full-text available
Recently, the readability of texts has become the focus of reading research because it is believed to have implications for reading comprehension, which is of utmost importance in the field of English as a foreign language (EFL), particularly in the teaching, learning and assessment of reading comprehension. Unfortunately, the influence of text readability on reading comprehension (and reading time) has not been well studied in the EFL context. Most text readability studies are conducted in medical contexts, but these studies are often limited in predicting readability scores for sample texts. To address this gap, the current study aimed to evaluate the influence of text readability levels (based on the Flesch-Kincaid grade level (FKGL)) on students' reading comprehension and reading time. Data were collected through reading test and analyzed using SPSS version 22. The Friedman test revealed that the distribution of students' reading comprehension score (X2=197.532, p=0.000) and reading time (X2=215.323, p=0.000) are different in each text, suggesting that the readability of texts has a significant influence on both. This study contributed to the practices of reading instruction and assessment. Limitations and suggestions for further research are briefly discussed.
Conference Paper
Full-text available
This study introduces the leveled-text generation task, aiming to rewrite educational materials to specific readability levels while preserving meaning. We assess the capability of GPT-3.5, LLaMA-2 70B, and Mixtral 8x7B, to generate content at various readability levels through zero-shot and few-shot prompting. Evaluating 100 processed educational materials reveals that few-shot prompting significantly improves performance in readability manipulation and information preservation. LLaMA-2 70B performs better in achieving the desired difficulty range, while GPT-3.5 maintains original meaning. However, manual inspection highlights concerns such as misinformation introduction and inconsistent edit distribution. These findings emphasize the need for further research to ensure the quality of generated educational content.
Article
Full-text available
Background Large language models (LLMs) have gained prominence since the release of ChatGPT in late 2022. Objective The aim of this study was to assess the accuracy of citations and references generated by ChatGPT (GPT-3.5) in two distinct academic domains: the natural sciences and humanities. Methods Two researchers independently prompted ChatGPT to write an introduction section for a manuscript and include citations; they then evaluated the accuracy of the citations and Digital Object Identifiers (DOIs). Results were compared between the two disciplines. Results Ten topics were included, including 5 in the natural sciences and 5 in the humanities. A total of 102 citations were generated, with 55 in the natural sciences and 47 in the humanities. Among these, 40 citations (72.7%) in the natural sciences and 36 citations (76.6%) in the humanities were confirmed to exist (P=.42). There were significant disparities found in DOI presence in the natural sciences (39/55, 70.9%) and the humanities (18/47, 38.3%), along with significant differences in accuracy between the two disciplines (18/55, 32.7% vs 4/47, 8.5%). DOI hallucination was more prevalent in the humanities (42/55, 89.4%). The Levenshtein distance was significantly higher in the humanities than in the natural sciences, reflecting the lower DOI accuracy. Conclusions ChatGPT’s performance in generating citations and references varies across disciplines. Differences in DOI standards and disciplinary nuances contribute to performance variations. Researchers should consider the strengths and limitations of artificial intelligence writing tools with respect to citation accuracy. The use of domain-specific models may enhance accuracy.
Article
Full-text available
Achievement in mathematics has been shown to partially depend on verbal skills. In multilingual educational settings, varying language proficiencies might therefore contribute to differences in mathematics achievement. We explored the relationship between mathematics achievement and language competency in terms of home language and instruction language proficiency in the highly multilingual society of Luxembourg. We focussed on third graders' linguistic and mathematical achievement and used data from the national school monitoring program from two consecutive years to assess the influence of children's language profiles on reading comprehension in German (the instruction language) and mathematics performance. Results were similar for both co-horts. Regression analysis indicated that German reading comprehension was a significant predictor of mathematics achievement when accounting for both home language group and socioeconomic status. Moreover, mediation analysis showed that lower mathematics achievement of students with a home language that is very different from the instruction language relative to the Luxembourgish reference group were significantly mediated by achievement in German reading comprehension. These findings show that differences in mathematics achievement between speakers of a home language that is similar to the instruction language and speakers of distant home languages can be explained by their underachievement in reading comprehension in the instruction language. Possible explanations for varying performance patterns between language groups and solutions are being discussed.
Conference Paper
Full-text available
This paper describes a new, open source tool for A/B testing in educational software called UpGrade. We motivate UpGrade's approach, describe development goals and UpGrade's software architecture, and provide a brief overview of working within UpGrade to define and monitor experiments. We conclude with some avenues for future research and development.
Article
Full-text available
Background Advances in natural language processing (NLP) and computational linguistics have facilitated major improvements on traditional readability formulas that aim at predicting the overall difficulty of a text. Recent studies have identified several types of linguistic features that are theoretically motivated and predictive of human judgments of text readability, which outperform predictions made by traditional readability formulas, such as Flesch–Kincaid. The purpose of this study is to develop new readability models using advanced NLP tools to measure both text comprehension and reading speed. Methods This study used crowdsourcing techniques to collect human judgments of text comprehension and reading speed across a diverse variety of topic domains (science, technology and history). Linguistic features taken from state‐of‐the‐art NLP tools were used to develop models explaining human judgments of text comprehension and reading speed. The accuracy of these models was then compared with classic readability formulas. Results The results indicated that models employing linguistic features more theoretically related to text comprehension and reading speed outperform classic readability models. Conclusions This study developed new readability formulas based on advanced NLP tools for both text comprehension and reading speed. These formulas, based on linguistic features that better represent theoretical and behavioural accounts of the reading process, significantly outperformed classic readability formulas.
Article
Full-text available
If we wish to embed assessment for accountability within instruction, we need to better understand the relative contribution of different types of learner data to statistical models that predict scores and discrete achievement levels on assessments used for accountability purposes. The present work scales up and extends predictive models of math test scores and achievement levels from existing literature and specifies six categories of models that incorporate information about student prior knowledge, socio-demographics, and performance within the MATHia intelligent tutoring system. Linear regression, ordinal logistic regression, and random forest regression and classification models are learned within each category and generalized over a sample of 23,000+ learners in Grades 6, 7, and 8 over three academic years in Miami-Dade County Public Schools. After briefly exploring hierarchical models of this data, we discuss a variety of technical and practical applications, limitations, and open questions related to this work, especially concerning to the potential use of instructional platforms like MATHia as a replacement for time- consuming standardized tests.
Article
Full-text available
This study addressed the concept of reading instructional level. First, we reviewed the history of the concept, including some recent criticisms of its validity (Schwanenflugel & Knapp, 2017 Schwanenflugel, P., & Knapp, N. (2017). Three myths about “reading levels” and why you shouldn’t fall for them. Psychology Today, February. https://www.psychologytoday.com/us/blog/reading-minds/201702/three-myths-about-reading-levels [Google Scholar]; Shanahan, 2014 Shanahan, T. (2014). Should we teach students at their reading levels? Reading Today, September/October, 14–15. [Google Scholar]). Next, we examined, from an instructional-level perspective, the oral reading performance of 248 third graders. We divided the sample into quartiles (based on third-grade reading rate) and tested for between-quartile differences in the students' reading of second-, third-, and fourth-grade passages. A major finding was that the lowest quartile differed significantly from the other quartiles in oral reading accuracy and rate, showing, at best, a second-grade print-processing level at the end of third grade. Finally, we considered how our findings raise old—and still controversial—questions about the best way to provide instruction to low-achieving readers in the elementary grades.
Article
Full-text available
Research has identified a number of linguistic features that influence the reading comprehension of young readers; yet, less is known about whether and how these findings extend to adult readers. This study examines text comprehension, processing, and familiarity judgment provided by adult readers using a number of different approaches (i.e., natural language processing, crowd-sourced ratings, and machine learning). The primary focus is on the identification of the linguistic features that predict adult text readability judgments, and how these features perform when compared to traditional text readability formulas such as the Flesch-Kincaid grade level formula. The results indicate the traditional readability formulas are less predictive than models of text comprehension, processing, and familiarity derived from advanced natural language processing tools.
Article
Full-text available
Linear mixed-effects models have increasingly replaced mixed-model analyses of variance for statistical inference in factorial psycholinguistic experiments. The advantages of LMMs over ANOVAs, however, come at a cost: Setting up an LMM is not as straightforward as running an ANOVA. One simple option, when numerically possible, is to fit the full variance-covariance structure of random effects (the " maximal " model; Barr et al., 2013), presumably to keep Type I error down to the nominal α in the presence of random effects. Although it is true that fitting a model with only random intercepts may lead to higher Type I error, fitting a maximal model also has a cost: it can lead to a significant loss of power. We demonstrate this with simulations and suggest that for typical psychological and psycholinguistic data, models with a random effect structure that is supported by the data have optimal Type I error and power properties.
Article
Full-text available
Word problems (WPs) belong to the most difficult and complex problem types that pupils encounter during their elementary-level mathematical development. In the classroom setting, they are often viewed as merely arithmetic tasks; however, recent research shows that a number of linguistic verbal components not directly related to arithmetic contribute greatly to their difficulty. In this review, we will distinguish three components of WP difficulty: (i) the linguistic complexity of the problem text itself, (ii) the numerical complexity of the arithmetic problem, and (iii) the relation between the linguistic and numerical complexity of a problem. We will discuss the impact of each of these factors on WP difficulty and motivate the need for a high degree of control in stimuli design for experiments that manipulate WP difficulty for a given age group.
Article
Full-text available
This article explores how differences in problem representations change both the per- formance and underlying cognitive processes of beginning algebra students engaged in quantitative reasoning. Contrary to beliefs held by practitioners and researchers in mathematics education, students were more successful solving simple algebra story problems than solving mathematically equivalent equations. Contrary to some views of situated cognition, this result is not simply a consequence of situated world knowl- edge facilitating problem-solving performance, but rather a consequence of student difficulties with comprehending the formal symbolic representation of quantitative relations. We draw on analyses of students' strategies and errors as the basis for a cognitive process explanation of when, why, and how differences in problem repre- sentation affect problem solving. We conclude that differences in external represen- tations can affect performance and learning when one representation is easier to com- prehend than another or when one representation elicits more reliable and meaningful solution strategies than another.
Article
Full-text available
Students with low knowledge have been shown to better understand and learn more from more cohesive texts, whereas high-knowledge students have been shown to learn more from lower cohesion texts; this has been called the reverse cohesion effect. This study examines whether students' comprehension skill affects the interaction between text cohesion and their domain knowledge. College students (n= 143) read either a high- or a low-cohesion text and answered text-based and bridging inference questions. The results indicated that the benefit of low-cohesion text was restricted to less skilled, high-knowledge readers, whereas skilled comprehenders with high knowledge benefited from a high-cohesion text. Consistent with McNamara (2001)43. McNamara , D. S. 2001. Reading both high and low coherence texts: Effects of text sequence and prior knowledge. Canadian Journal of Experimental Psychology, 55: 51–62. [CrossRef], [PubMed], [Web of Science ®], [CSA]View all references, the interaction of text cohesion and knowledge was restricted to text-based questions. In addition, for low-knowledge readers, the benefits of high-cohesion texts emerged in their responses to bridging inference questions but not text-based questions. The results suggest a more complex view of when and for whom textual cohesion affects comprehension.
Article
Full-text available
The purpose of this study was to examine the cognitive correlates of 3rd-grade skill in arithmetic, algorithmic computation, and arithmetic word problems. Third graders (N = 312) were measured on language, nonverbal problem solving, concept formation, processing speed, long-term memory, working memory, phonological decoding, and sight word efficiency as well as on arithmetic, algorithmic computation, and arithmetic word problems. Teacher ratings of inattentive behavior also were collected. Path analysis indicated that arithmetic was linked to algorithmic computation and to arithmetic word problems and that inattentive behavior independently predicted all 3 aspects of mathematics performance. Other independent predictors of arithmetic were phonological decoding and processing speed. Other independent predictors of arithmetic word problems were nonverbal problem solving, concept formation, sight word efficiency, and language. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Connectives are cohesive devices that signal the relations between clauses and are critical to the construction of a coherent representation of a text's meaning. The authors investigated young readers' knowledge, processing, and comprehension of temporal, causal, and adversative connectives using offline and online tasks. In a cloze task, 10-year-olds were more accurate than 8-year-olds on temporal and adversative connectives, but both age groups differed from adult levels of performance (Experiment 1). When required to rate the “sense” of 2-clause sentences linked by connectives, 10-year-olds and adults were better at discriminating between clauses linked by appropriate and inappropriate connectives than were 8-year-olds. The 10-year-olds differed from adults only on the temporal connectives (Experiment 2). In contrast, online reading time measures indicated that 8-year-olds' processing of text is influenced by connectives as they read, in much the same way as 10-year-olds'. Both age groups read text more quickly when target 2-clause sentences were linked by an appropriate connective compared with texts in which a connective was neutral (and), inappropriate to the meaning conveyed by the 2 clauses, or not present (Experiments 3 and 4). These findings indicate that although knowledge and comprehension of connectives is still developing in young readers, connectives aid text processing in typically developing readers. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
This study examined how the chronological distance between 2 consecutively narrated story events affects the on-line comprehension and mental representation of these events. College students read short narrative passages from a computer screen and responded to recognition probes. The results of 4 experiments consistently demonstrated that readers used temporal information to construct situation models while comprehending narratives. First, sentence reading times increased when there was a narrative time shift (e.g., as denoted by an hour later) as opposed to when there was no narrative time shift (e.g., as denoted by a moment later). Second, information from the previously narrated event was less accessible when it was followed by a time shift than when it was not. Third, 2 events that were separated by a narrative time shift were less strongly connected in long-term memory than 2 events that were not separated by a narrative time shift. The results suggest that readers use a strong iconicity assumption during story comprehension. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
This article investigates how people's metacognitive judgments influence subsequent study-time-allocation strategies. The authors present a comprehensive literature review indicating that people allocate more study time to judged-difficult than to judged-easy items--consistent with extant models of study-time allocation. However, typically, the materials were short, and participants had ample time for study. In contrast, in Experiment 1, when participants had insufficient time to study, they allocated more time to the judged-easy items than to the judged-difficult items, especially when expecting a test. In Experiment 2, when the materials were shorter, people allocated more study time to the judged-difficult materials. In Experiment 3, under high time pressure, people preferred studying judged-easy sonnets; under moderate time pressure, they showed no preference. These results provide new evidence against extant theories of study-time allocation.
Article
Full-text available
The ERP experiment reported here addresses some outstanding questions regarding word processing in sentential contexts: (1) Does only the 'message-level' representation (the representation of sentence meaning combining lexico-semantic and syntactic constraints) affect the processing of the incoming word [J. Exp. Psychol.: Learn. Mem. Cogn. 20 (1994) 92]? (2) Is lexically specified semantic relatedness between multiple words the primary factor instead [J. Exp. Psychol.: Learn. Mem. Cogn. 15 (1989) 791]? (3) Alternatively, do word and sentence level information interact during sentence comprehension? Volunteers read sentences (e.g. Dutch sentences resembling The javelin was by the athletes...) in which the (passive) syntactic structure and the semantic content of the lexical items together created a strong expectation of a specific final word (e.g., thrown), but also sentences in which the syntactic structure was changed from passive to active (e.g. Dutch sentences resembling The javelin has the athletes...), which altered the message level constraint substantially and strongly reduced the expectation of any particular completion. Half of the sentences ended in a final word with a good lexico-semantic fit relative to the preceding content words (e.g. thrown, fitting well with the preceding javelin and athletes). This creates very plausible sentences in the strong constraint context but semantically anomalous ones in the weakly constraining context (e.g., The javelin has the athletes thrown). In the other half the final word had a poor lexico-semantic fit (e.g., summarized that does not fit at all with javelin and athletes). Good lexico-semantic fit endings showed no difference in N400 amplitude in the strong and weak message-level constraint sentences, despite the fact that the latter were semantically anomalous. This result suggests that lexico-semantic fit can be more important for word processing than the meaning of the sentence as determined by the syntactic structure, at least initially. These conditions did differ, however, in the region of the P600 where the anomalous weak constraint version was much more positive, a pattern usually seen with ungrammatical sentences. The processing of poor lexico-semantic fit words showed a quite different pattern; in both strong and weak constraint sentences they elicited a substantial N400 effect, but N400-amplitude was significantly more negative following strong constraint contexts, even though both sentence contexts were equivalently anomalous. Taken together, these findings provide evidence for the importance of both message-level and lexico-semantic information during sentence comprehension. The implications for theories of sentence interpretation are discussed and an extension of the message-based hypothesis will be proposed.
Article
Full-text available
Two experiments, theoretically motivated by the construction-integration model of text comprehension (Kintseh, 1988), investigated the role of text coherence in the comprehension of science texts. In Experiment 1, junior-high students' comprehension of one of three versions of a biology text was examined via free recall, written questions, and a keyword sorting task. This study demonstrates advantages for globally coherent text and for more explanato text. In Experiment 2, interactions between local and global text coherence, readers background knowledge, and levels of understanding were examined. Using the same methods as in Experiment 1, we examined students' comprehension of one of four versions of a text, orthogonally varying local and global coherence. We found that readers who know little about the domain of the text benefit from a coherent text, whereas high-knowledge readers benefit from a minimally coherent text. We argue that the poorly written text forces the knowledgeable readers to engage in compensatory processing to infer unstated relationships in the text. These findings, however, depended on the level of understanding, textbase or simational, being measured by the three comprehension tasks. Whereas the free-recall naeastne and text-based questions primarily tapped readers' superficial understanding of the text, the inference questions, problem solving questions, and sorting task relied on a situational understanding of the text. This study provides evidence that the rewards to be gained from active processing are primarily at the level of the situation model, rather than at the superficial level of textbase understanding. Interactions of text coherence, background knowledge, People learn a great deal from texts -- story books, textbooks, newspapers, or...
Article
Background and objectives: Interest surrounding generative large language models (LLMs) has rapidly grown. Although ChatGPT (GPT-3.5), a general LLM, has shown near-passing performance on medical student board examinations, the performance of ChatGPT or its successor GPT-4 on specialized examinations and the factors affecting accuracy remain unclear. This study aims to assess the performance of ChatGPT and GPT-4 on a 500-question mock neurosurgical written board examination. Methods: The Self-Assessment Neurosurgery Examinations (SANS) American Board of Neurological Surgery Self-Assessment Examination 1 was used to evaluate ChatGPT and GPT-4. Questions were in single best answer, multiple-choice format. χ2, Fisher exact, and univariable logistic regression tests were used to assess performance differences in relation to question characteristics. Results: ChatGPT (GPT-3.5) and GPT-4 achieved scores of 73.4% (95% CI: 69.3%-77.2%) and 83.4% (95% CI: 79.8%-86.5%), respectively, relative to the user average of 72.8% (95% CI: 68.6%-76.6%). Both LLMs exceeded last year's passing threshold of 69%. Although scores between ChatGPT and question bank users were equivalent (P = .963), GPT-4 outperformed both (both P < .001). GPT-4 answered every question answered correctly by ChatGPT and 37.6% (50/133) of remaining incorrect questions correctly. Among 12 question categories, GPT-4 significantly outperformed users in each but performed comparably with ChatGPT in 3 (functional, other general, and spine) and outperformed both users and ChatGPT for tumor questions. Increased word count (odds ratio = 0.89 of answering a question correctly per +10 words) and higher-order problem-solving (odds ratio = 0.40, P = .009) were associated with lower accuracy for ChatGPT, but not for GPT-4 (both P > .005). Multimodal input was not available at the time of this study; hence, on questions with image content, ChatGPT and GPT-4 answered 49.5% and 56.8% of questions correctly based on contextual context clues alone. Conclusion: LLMs achieved passing scores on a mock 500-question neurosurgical written board examination, with GPT-4 significantly outperforming ChatGPT.
Chapter
Readability formulas can be used to better match readers and texts. Current state-of-the-art readability formulas rely on large language models like transformer models (e.g., BERT) that model language semantics. However, the size and runtimes make them impractical in educational settings. This study examines the effectiveness of new readability formulas developed on the CommonLit Ease of Readability (CLEAR) corpus using more efficient sentence-embedding models including doc2vec, Universal Sentence Encoder, and Sentence BERT. This study compares sentence-embedding models to traditional readability formulas, newer NLP-informed linguistic feature formulas, and newer BERT-based models. The results indicate that sentence-embedding readability formulas perform well and are practical for use in various educational settings. The study also introduces an open-source NLP website to readily assess the readability of texts along with an application programming interface (API) that can be integrated into online educational learning systems to better match texts to readers.KeywordsText ReadabilityLarge Language ModelsNatural Language Processing
Chapter
We present a randomized field trial delivered in Carnegie Learning’s MATHia’s intelligent tutoring system to 12,374 learners intended to test whether rewriting content in “word problems” improves student mathematics performance within this content, especially among students who are emerging as English language readers. In addition to describing facets of word problems targeted for rewriting and the design of the experiment, we present an artificial intelligence-driven approach to evaluating the effectiveness of the rewrite intervention for emerging readers. Data about students’ reading ability is generally neither collected nor available to MATHia’s developers. Instead, we rely on a recently developed neural network predictive model that infers whether students will likely be in this target sub-population. We present the results of the intervention on a variety of performance metrics in MATHia and compare performance of the intervention group to the entire user base of MATHia, as well as by comparing likely emerging readers to those who are not inferred to be emerging readers. We conclude with areas for future work using more comprehensive models of learners.Keywordsmachine learningA/B testingintelligent tutoring systemsreading abilitymiddle school mathematics
Article
This study was designed to deepen insights on whether word-problem (WP) solving is a form of text comprehension (TC) and on the role of language in WPs. A sample of 325 second graders, representing high, average, and low reading and math performance, was assessed on (a) start-of-year TC, WP skill, language, nonlinguistic reasoning, working memory, and foundational skill (word identification, arithmetic) and (b) year-end WP solving, WP-language processing (understanding WP statements, without calculation demands), and calculations. Multivariate, multilevel path analysis, accounting for classroom and school effects, indicated that TC was a significant and comparably strong predictor of all outcomes. Start-of-year language was a significantly stronger predictor of both year-end WP outcomes than of calculations, whereas start-of-year arithmetic was a significantly stronger predictor of calculations than of either WP measure. Implications are discussed in terms of WP solving as a form of TC and a theoretically coordinated approach, focused on language, for addressing TC and WP-solving instruction.
Chapter
There has been widespread application of readability formulas as a tool for defining plain English in the production of texts as well as in judging existing documents. There are numerous reasons why readability formulas have been selected to fulfil this defining role. However, the findings of Duffy and Kabance along with Kniffen et al present a strong case against the readable writing approach to revision and hence against the use of a readability formula as a feedback device for the writer. Kniffen et al used a readable writing style manual. In both cases, conditions were optimal for the readability improvements to facilitate comprehension. Yet in both cases the manipulations, with one exception, resulted in no effect or, at best, marginal effect on comprehension. If the revision approach does not produce large comprehension effects under ideal testing conditions, then there must be little expectation for the approach to be effective in practical application. The findings of Duffy and Kabance, in fact, suggest that some readable writing techniques will not be effective in improving comprehension under any circumstances. The effectiveness of other simplification strategies will depend on the reading requirements and reading conditions.
Article
Recent studies have shown that speakers and writers use linguistic devices to signal a shift of topic in their discourse. The present paper considers the comprehension function of one of these segmentation markers, namely a temporal adverbial, by studying its role in a well-established boundary effect: Reading time for the first sentence of a new discourse unit is longer than reading time for the other sentences. In four experiments, participants read short narratives in which a target sentence was preceded by highly congruent sentences (topic continuity condition) or by weakly congruent sentences (topic shift condition). As predicted by the boundary hypothesis, topic shift sentences were read more slowly than topic continuous sentences. However, the boundary effect disappeared when the segmentation marker was inserted at the beginning of the topic shift sentence (Experiment 1), though not at its end (Experiment 2). The third experiment showed that not just any adverbial produces this effect. The fourth experiment confirmed that segmentation markers specifically reduce the amount of processing required for the part of the sentence that is topic discontinuous.
Article
The results of 4 experiments, which involved 239 college students, indicate that the presence of a connective such as because increases the activation level of the 1st clause when placed between 2 clauses of a sentence. Immediately after reading 2 clauses that were either linked or not linked by a connective, Ss judged whether a probe word had been mentioned in 1 of the clauses. The recognition probe times to the verb from the 1st statement were faster when a connective had conjoined the statements than when the statements constituted 2 separate sentences. Exp 2 indicated that the reactivation of the 1st clause occurred at the end of the 2nd statement but not at the beginning of the 2nd statement. The results of Exp 3 revealed that the reactivation effect occurred for related statement pairs but not for unrelated statement pairs. Exp 4 showed that the reactivation effect also generalized to the connective although. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
The effect of providing middle school students with a video accommodation for a standardized mathematics test was examined. Two hundred forty-seven students were asked to solve 60 word problems. One half of the questions were presented in standard form, while the other half were read by an actor on a video monitor. Students were grouped according to mathematics and reading ability. A test accommodation effect was found for students possessing below-average mathematics skills. The problems were identified as having relatively high reading difficulty according to word count, number of verbs, and word familiarity. Students with above-average mathematics proficiency but low reading skill performed better when the questions were presented in video format. This accommodation may be useful on specific test items for students with certain reading deficiencies.
Article
This book explores the question of how students become thoughtful, independent readers who deeply understand what they read by examining the thought processes of proficient readers. The book then uses these processes as models for the strategies it offers--strategies intended to help children become more flexible, adaptive, independent, and engaged readers. The book offers a new instructional paradigm focused on in-depth instruction in the strategies used by proficient readers. It goes beyond the traditional classroom into literature-based, workshop-oriented classrooms. Through vivid portraits of these environments, the book explores how instruction looks in dynamic, literature-rich reader's workshops. According to the book, as the students connect their reading to their background knowledge, create sensory images, ask questions, draw inferences, determine what is important, synthesize ideas, and solve problems, they are able to construct a mosaic of meaning. The book is relevant to all literature-based classrooms, regardless of level. It offers practical tools for inservice teachers, as well as essential methods instruction for preservice teachers as both the undergraduate and graduate level. Appendixes cover frequent questions and brief responses. (NKA)
Article
The problem of identifying and providing for individual needs in the classroom is the main theme of this book. Part I, The Reading Situation, includes a brief introduction to the author's point of view, a discussion of the evolution of our graded school system, and attempts to break this lockstep and reorganize the school to meet pupil needs. Part II, The Reading Problem, discusses the reading facet of language and goals of reading instruction. In Part III, The Nature of Readiness, social and emotional, as well as visual and auditory readiness, are discussed. Part IV, Developing Readiness, presents a program of activities and materials designed to develop in each child the necessary background of experience, language facility, and visual and auditory discrimination. Part V, Reading Instruction, is concerned with the reading program proper: initial reading experiences, ways of discovering specific reading needs and developing basic reading abilities, directing reading activities, and encouraging vocabulary development, with a concluding discussion of levels of differentiation of instruction and their importance in a democratic society. Extensive bibliographies conclude most chapters, and photographs illustrate many of the recommended classroom procedures. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
This paper describes an effort to model students' changing knowledge state during skill acquisition. Students in this research are learning to write short programs with the ACT Programming Tutor (APT). APT is constructed around a production rule cognitive model of programming knowledge, called theideal student model. This model allows the tutor to solve exercises along with the student and provide assistance as necessary. As the student works, the tutor also maintains an estimate of the probability that the student has learned each of the rules in the ideal model, in a process calledknowledge tracing. The tutor presents an individualized sequence of exercises to the student based on these probability estimates until the student has mastered each rule. The programming tutor, cognitive model and learning and performance assumptions are described. A series of studies is reviewed that examine the empirical validity of knowledge tracing and has led to modifications in the process. Currently the model is quite successful in predicting test performance. Further modifications in the modeling process are discussed that may improve performance levels.
Article
This study examined how text features (i.e., cohesion) and individual differences (i.e., reading skill and prior knowledge) contribute to biology text comprehension. College students with low and high levels of biology knowledge read two biology texts, one of which was high in cohesion and the other low in cohesion. The two groups were similar in reading skill. Participants' text comprehension was assessed with open-ended comprehension questions that measure different levels of comprehension (i.e., text-based, local-bridging, global-bridging). Results indicated: (a) reading a high-cohesion text improved text-based comprehension; (b) overall comprehension was positively correlated with participants' prior knowledge, and (c) the degree to which participants benefited from reading a high-cohesion text depended on participants' reading skill, such that skilled participants gained more from high-cohesion text.
Article
This account of the Matthew effect is another small exercise in the psychosociological analysis of the workings of science as a social institution. The initial problem is transformed by a shift in theoretical perspective. As originally identified, the Matthew effect was construed in terms of enhancement of the position of already eminent scientists who are given disproportionate credit in cases of collaboration or of independent multiple discoveries. Its significance was thus confined to its implications for the reward system of science. By shifting the angle of vision, we note other possible kinds of consequences, this time for the communication system of science. The Matthew effect may serve to heighten the visibility of contributions to science by scientists of acknowledged standing and to reduce the visibility of contributions by authors who are less well known. We examine the psychosocial conditions and mechanisms underlying this effect and find a correlation between the redundancy function of multiple discoveries and the focalizing function of eminent men of science-a function which is reinforced by the great value these men place upon finding basic problems and by their self-assurance. This self-assurance, which is partly inherent, partly the result of experiences and associations in creative scientific environments, and partly a result of later social validation of their position, encourages them to search out risky but important problems and to highlight the results of their inquiry. A macrosocial version of the Matthew principle is apparently involved in those processes of social selection that currently lead to the concentration of scientific resources and talent (50).
Article
Presents a model of reading comprehension that accounts for the allocation of eye fixations of 14 college students reading scientific passages. The model deals with processing at the level of words, clauses, and text units. Readers made longer pauses at points where processing loads were greater. Greater loads occurred while readers were accessing infrequent words, integrating information from important clauses, and making inferences at the ends of sentences. The model accounts for the gaze duration on each word of text as a function of the involvement of the various levels of processing. The model is embedded in a theoretical framework capable of accommodating the flexibility of reading. (70 ref)
Article
For 25 years, we have been working to build cognitive models of mathematics, which have become a basis for middle- and high-school curricula. We discuss the theoretical background of this approach and evidence that the resulting curricula are more effective than other approaches to instruction. We also discuss how embedding a well specified theory in our instructional software allows us to dynamically evaluate the effectiveness of our instruction at a more detailed level than was previously possible. The current widespread use of the software is allowing us to test hypotheses across large numbers of students. We believe that this will lead to new approaches both to understanding mathematical cognition and to improving instruction.
Model Card and Evaluations for Claude Model
  • Anthropic
emmeans: Estimated marginal means, aka least-squares means
  • R Lenth
Desirable difficulties and studying in the region of proximal learning. Successful remembering and successful forgetting: A Festschrift in honor of Robert A. Bjork
  • J Metcalfe
Prompt Engineering Guide
  • E Saravia
Avoiding miscomprehension: A metacognitive perspective for how readers identify and overcome comprehension failure, Doctoral dissertation
  • K A Norberg
Rewriting Math Word Problems with Large Language Models
  • K A Norberg
  • H Almoubayyed