Fig 4 - uploaded by William Dubay
Content may be subject to copyright.
Source publication
A brief introduction to the research on readability (reading ease) and the readability formulas. Readability is tightly related to reading comprehension, retention, reading speed, and persistence. The readability formulas use variables that are known to be among the first causes of reading difficulty. While there are many other features of language...
Context in source publication
Context 1
... authors found that content, with a slight margin over style, was most important. Third in importance was format, and almost equal to it, "features of organization," referring to the chapters, sections, headings, and paragraphs that show the organization of ideas (See Figure 4). They found they could not measure content, format, or organization statistically, though many would later try (See below, "The measurement of content"). ...
Similar publications
The ability to control unwanted memories is critical for maintaining cognitive function and mental health. Prior research has shown that suppressing the retrieval of unwanted memories impairs their retention, as measured using intentional (direct) memory tests. Here, we review emerging evidence revealing that retrieval suppression can also reduce t...
The authors evaluated the effectiveness of explicit reading comprehension strategies instruction, followed by practice in teacher-led whole-class activities (STRAT), reciprocal same-age (STRAT + SA) peer-tutoring activities, or cross-age peer-tutoring activities (STRAT + CA) on 2nd and 5th graders' reading comprehension and self-efficacy perception...
As student learning outcomes and retention receive more attention in higher education, failure rates in principles of accounting courses, gate-keeper courses for business majors, are coming under scrutiny. This study shows promising results from use of a learning innovation, ultra-short online videos, for addressing three common reasons for poor pe...
The authors explored whether a testing effect occurs not only for retention of facts but also for application of principles and procedures. For that purpose, 38 high school students either repeatedly studied a text on probability calculations or studied the text, took a test on the content, restudied the text, and finally took the test a second tim...
DC circuit analysis has been identified in the literature as being particularly difficult for students to learn. Research on the difficulties students face regarding this topic focuses solely on 4-year university students, which neglects students studying this topic in alternative institutions like community colleges. The one common link between re...
Citations
... Among the most commonly used formulas are Simple Measure of Gobbledygook (SMOG), the Dale-Chall Readability formula, the Flesch Reading Ease formula, the Fog Index, and the Fry Readability Graph. More details about readability methods can be found in [26]. ...
Timely and relevant information enables clinicians to make informed decisions about patient care outcomes. However, discovering related and understandable information from the vast medical literature is challenging. To address this problem, we aim to enable the development of search engines that meet the needs of medical practitioners by incorporating text difficulty features. We collected a dataset of 209 scientific research abstracts from different medical fields, available in both English and German. To determine the difficulty aspects of readability and technical level of each abstract, 216 medical experts annotated the dataset. We used a pre-trained BERT model, fine-tuned to our dataset, to develop a regression model predicting those difficulty features of abstracts. To highlight the strength of this approach, the model was compared to readability formulas currently in use. Analysis of the dataset revealed that German abstracts are more technically complex and less readable than their English counterparts. Our baseline model showed greater efficacy than current readability formulas in predicting domain-specific readability aspects. Conclusion: Incorporating these text difficulty aspects into the search engine will provide healthcare professionals with reliable and efficient information retrieval tools. Additionally, the dataset can serve as a starting point for future research.
... For their linguistic diversity analysis, Kumarage et al. (2023) used the Flesch Reading Ease score (Flesch, 1948;Kincaid et al., 1975). Readability is what makes some texts easier than others (Dubay, 2004;DuBay, 2007), and consequently estimates the difficulty of texts (Si and Callan, 2001) and how easy it is to read them (Das and Cui, 2019). ...
... DuBay (2007) highlighted that prior knowledge and reading skills might impact how easy a text is. Most readability scores refer to a ranking of the reading level a person should have to understand the text (see Dubay (2004) and DuBay (2007) for a review on readability). One of the most common variables used in existing formulas is the number of words, but according to Si and Callan (2001), they ignore text content. ...
Online questionnaires that use crowd-sourcing platforms to recruit participants have become commonplace, due to their ease of use and low costs. Artificial Intelligence (AI) based Large Language Models (LLM) have made it easy for bad actors to automatically fill in online forms, including generating meaningful text for open-ended tasks. These technological advances threaten the data quality for studies that use online questionnaires. This study tested if text generated by an AI for the purpose of an online study can be detected by both humans and automatic AI detection systems. While humans were able to correctly identify authorship of text above chance level (76 percent accuracy), their performance was still below what would be required to ensure satisfactory data quality. Researchers currently have to rely on the disinterest of bad actors to successfully use open-ended responses as a useful tool for ensuring data quality. Automatic AI detection systems are currently completely unusable. If AIs become too prevalent in submitting responses then the costs associated with detecting fraudulent submissions will outweigh the benefits of online questionnaires. Individual attention checks will no longer be a sufficient tool to ensure good data quality. This problem can only be systematically addressed by crowd-sourcing platforms. They cannot rely on automatic AI detection systems and it is unclear how they can ensure data quality for their paying clients.
... Klare [5] defines readability as "the ease of understanding or comprehension due to the style of writing". Readability is frequently demented with legibility related to the typeface and layout [6]. Separately, this definition concentrates on the writing style more than the issues and this writing style includes content, coherence, and arrangement. ...
... Separately, this definition concentrates on the writing style more than the issues and this writing style includes content, coherence, and arrangement. Similarly, Hargis and her colleagues at IBM (1998), as described in [6], it is stated that readability is the "ease of reading words and sentences", as an attribute of clarity. ...
Predicting the reading difficulty level of English texts is a critical process for second language education and assessment. Reading difficulty level is concerned with the problem of matching a reader's proficiency and the appropriate text. The reading difficulty level or readability assessment is the process for predicting the reading grade level required from an input text or document, which corresponds to the reader and to the materials. Students in Jordan at their academic levels find obstacles in finding relevant readable data for any subject at their levels. This paper is intended to introduce a model that foretells the reading difficulty level of a given text in terms of a student's ability to read and understand English as a non-native English speaker in Jordanian schools. In this paper, Jordanian students were classified into four categories according to their knowledge of English. The prediction of the reading difficulty level is achieved by using a modern statistical model that is situated on the Bayes model. The model compares the given text with some standard predefined text that strongly reflects the ability to read and understand English text. The accuracy of the proposed model was tested using the hold-out method. The overall prediction accuracy was 75.9%.
... Readability is determined by the complexity of sentence structure and the vocabulary used (Bailin & Grafstein, 2016). It is tightly related to reading comprehension, retention, reading speed, and persistence (DuBay, 2004). Readability formulas have been used since the 1920s by educators and researchers to predict the difficulty level of a selected text (Flesch, 1948;Kincaid et al., 1975;DuBay, 2004). ...
... It is tightly related to reading comprehension, retention, reading speed, and persistence (DuBay, 2004). Readability formulas have been used since the 1920s by educators and researchers to predict the difficulty level of a selected text (Flesch, 1948;Kincaid et al., 1975;DuBay, 2004). Giving a historical review of readability is not the intention of this study. ...
... Giving a historical review of readability is not the intention of this study. DuBay (2004) gives an extensive review of research that spans 100 years. The readability formulas use variables that are known to be among the first causes of reading difficulties, such as the average length of sentences, the number of new words contained, and the grammatical complexity of the language used in a passage. ...
... Readability formulae [1][2][3][4][5][6][7][8][9] are applicable to any alphabetical language. Based on the length of words and sentences, they allow for the comparison of diverse texts automatically and objectively to assess the difficulty that readers may find in reading them. ...
We have studied how the readability of a text can change in translation by considering Matthew’s Gospel, written in Greek, translated into Latin and 35 modern languages. We have found that the deep-language parameters CP (characters per word), PF (words per sentence), IP (words per interpunctions), MF (interpunctions per sentence) and a universal readability index GU of each translation are so diverse from language to language, and even within a given language for which there are many versions of Matthew—such as in English and Spanish—that the resulting texts mathematically seem to be diverse. The several tens of versions of Matthew’s Gospel studied appear to address very diverse audiences. If a reader could understand all of them well, he/she would have the impression of reading texts written by diverse authors, although all of them tell the same story.
... Readability is essential in achieving competencies and maximizing learning time for students in a distance education system. Recent studies emphasized the importance of text features and their ease of comprehension (DuBay, 2004). Furthermore, Abrosymova (2021) argued that readability referred to the ease of reading and the readers' ability to understand the text. ...
Purpose
This article presented the results of studies that examined the appropriateness of the content, readability of printed learning materials and the effectiveness of external resources in ecology course offered at Universitas Terbuka. To integrate external resources, links to their websites were provided in the printed materials.
Design/methodology/approach
An in-depth interview with a content expert was employed to review the course content, while digital and printed learning materials were reviewed for readability and to determine the usefulness of the external resources. A total of 47 students completed surveys and a focus group discussion that included in-depth interviews were conducted with 21 selected students.
Findings
The results revealed that the content of ecology course was conceptually valid. However, two key aspects needed to be emphasized, including the application of ecology phenomena for further development of the science and its applications in real-life situations. Regarding readability, students stated that the course materials were easily comprehended. In terms of the benefit, 79% of the students found the external resources interesting and helpful in understanding the learning materials.
Practical implications
Printed learning materials were crucial for students, specifically those residing in remote areas. Therefore, the institution should ensure that the materials were high-quality, easy to comprehend and enriched with up-to-date content/materials through scannable links to external resources.
Originality/value
The value added to the findings of this study was that the provision of links to external resources within printed learning materials improves students' understanding of the course content.
... Various readability indices associated with short-term memory capacity and the difficulty of a text reading have been developed in empirical and heuristic studies during the last 70 years in works by Flesch [1,2], Kincaid et al. [3], Dale and Chall [4,5], Gunning [6], and Spache [7]. An extensive review of earlier studies and other indices was performed by DuBay [8,9]. These indices have been implemented toward multiple practical purposes by educators to assess the reading level of students, by librarians and publishers to rank the difficulty of texts, by medical and other specialists for efficient communication with lay people, and in word-processing programs. ...
... This is a general result for the optimum number of words for any given values of sentences and elements of the words. It is interesting to note that up to the constants, this expression reduces to the structure of the SMOG grade index, with the elements in the expression in (15) in place of the polysyllables in the index defined in (8). ...
The work considers formal structure and features of the readability indices widely employed in various information and education fields, including theory of communication, cognitive psychology, linguistics, and multiple applications. In spite of the importance and popularity of readability indices in practical research, their intrinsic properties have not yet been sufficiently investigated. This paper aims to fill this gap between the theory and application of these indices by presenting them in a uniform expression which permits analyzing their features and deriving new properties that are useful in practice. Three theorems are proved for relations between the units of a text structure. The general characteristics are illustrated by numerical examples which can be helpful for researchers and practitioners.
... Descriptive statistics can be used to summarize and understand data, such as by exploring patterns and relationships within the data, getting a better understanding of the data set, or identifying any changes in the distribution of the data. Readability metrics, which assess the clarity and ease of understanding of written text, have a variety of applications, including the design of educational materials and the improvement of legal or technical documents (DuBay, 2004). Dependency distance can be used as a measure of language comprehension difficulty or of sentence complexity and has been used for analysing properties of natural language or for similar purposes as readability metrics (Gibson et al., 2019;Liu, 2008). ...
... The resulting level ranges between 0 and 18 (lower is better) and assesses the approximate reading grade level of a text according to the US education system [54,55]. As an understandability metric, we used the SMOG index that, as in the case of the Flesch-Kincaid grade level, 9 estimates the number of years of education required to be able to fully understand a text [55,57,58]. For this study, the application provided by the website app.readable.com ...
Phishing, the deceptive act of stealing personal and sensitive information by sending messages that seem to come from trusted entities, is one of the most widespread and effective cyberattacks. Automated defensive techniques against these attacks have been widely investigated. These solutions often exploit AI-based systems that, when a suspect website is detected, show a dialog that warns users about the potential risk. Despite significant advances in creating warning dialogs for phishing, this type of attack is still very effective. To overcome the limitations of existing warning dialogs and better defend users from phishing attacks, this article presents a novel technique to create warning dialogs that not only warn users about a possible attack, as in traditional solutions, but also explain why a website is suspicious, addressing in the explanation the most malicious feature of the suspect website. An experimental study that consisted of a remote survey and analyzed data from 150 participants is reported. The goal was to evaluate the proposed warning dialogs with explanations and to compare them with the dialogs presented by Chrome, Firefox, and Edge. The study revealed interesting results: most explanations were understandable and familiar to users; they also showed some potential of diverting users from visiting malicious sites. However, more attention should be devoted to aspects such as features to be explained, as well as user interest and trust in warning dialogs. The lessons learned that might drive the design of more powerful warning dialogs are provided.
... It necessitates counting the number of words per phrase (in a sample of 100 words) and the number of syllables in one hundred words. It assigns a 100-point scale to texts, with the greater the score, the easier the material is to comprehend (DuBay, 2004). The results of the previous study showed the use of these readability formulas has the advantage of providing a solid figure on the text difficulty. ...
... As mentioned by (DuBay, 2004) in his study about readability principles, a literary work can be examined based on its level of difficulty. In this part, the researchers would like to disclose three literary works' levels of difficulty based on the readability study. ...
... To triangulate data on the readability level of those three short stories, the researchers did the test twice for each short story. For the first test, the researchers did the test manually using the Flesch Reading Ease formula from (DuBay, 2004). After that, the researchers checked the readability level using the Flesch-Kincaid Grade Level from (Gopal et al., 2021) to gain the readability level. ...
The short story as a literary work has been ubiquitously used to teach language and literature. However, not all short stories are appropriate and suitable to be used in the language-learning classroom. One of the aspects that should be considered in choosing short stories to teach language and literature is the readability level. The matching of appropriate text difficulty level to the readers’ reading ability is crucial to inculcate an interest in reading and elicit comprehension. Derived from this rationale, this study aimed to investigate the readability level of short stories used in the Introduction to Educational English Literature Course (IEEL). Moreover, this study also attempted to analyse the students’ perceptions of the short stories assigned to them to read in the IEEL course. In measuring the readability level, document analysis utilizing the Flesch-Kincaid Readability Formula was employed. To strengthen the data and ensure triangulation, 5 participants were purposely selected to be interviewed to explore their perspectives after reading those short stories. The results showed that 3 short stories namely The Birthday Party, The Dead Men’s Path, and Turning Thirty were categorized as standard (63.1), fairly easy (74.4), and easy level (83.7) respectively. Lastly, further inquiries through semi-structured interviews found that the 5 participants were reported to have positive viewpoints after reading those short stories.