Article

A Computer Readability Formula Designed for Machine Scoring

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Existing computer programs that measure readability are based largely upon subroutines which estimate number of syllables, usually by counting vowels. The shortcoming in estimating syllables is that it necessitates keypunching the prose into the computer. There is no need to estimate syllables since word length in letters is a better predictor of readability than word length in syllables. Therefore, a new readability formula was computed that has for its predictors letters per 100 words and sentences per 100 words. Both predictors can be counted by an optical scanning device, and thus the formula makes it economically feasible for an organization (e.g., the US Office of Education) to calibrate the readability of all textbooks for the public school system. (PsycINFO Database Record (c) 2012 APA, all rights reserved)

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Vemos que los factores más breves (3 -Talento humano y 4 -Gestión del cambio) presentan una mayor densidad léxica, lo cual indica que son textos un poco más complejos; sin embargo, se trata de una diferencia minúscula (10%), lo cual señala que se trata de los mismos informantes en un mismo tema global. Es importante anotar cómo la densidad de todos los subcorpus está más cercana a la proporción de los textos orales que escritos, pues estos últimos están por encima del 40%; más aún, teniendo en cuenta que los textos académicos suelen presentar densidades entre el 40% y el 65% según el nivel de especialidad (Castello, 2008), podemos afirmar que los informantes ofrecieron respuestas espontáneas con poca elaboración técnica. ...
... Finalmente la legibilidad es un indicador de la complejidad de un texto desde la perspectiva del nivel de formación, a partir del cálculo de la cantidad de caracteres por palabras y oraciones (Coleman & Liau, 1975). En este caso, los puntajes superiores a 12 equivalen a personas con formación posterior a la básica, como técnicas, tecnología y pregrados, lo cual concuerda con la población de informantes. ...
Technical Report
Full-text available
El presente informe explora, apoyándose en herramientas de análisis lingüístico computacional, algunas tendencias del corpus conformado por las respuestas abiertas de los cuestionarios realizados a docentes y administrativos en el contexto de la fase de evaluación de la implementación de la adecuación académica y administrativa de la Facultad de Comunicaciones y Filología. Para ello, se distinguen cuatro factores, correspondientes a la estructura de la encuesta: estrategia, procesos, talento humano y gestión del cambio. El análisis de la información y la redacción del documento estuvo a cargo del Nodo M.Data, con la participación de los doctores Víctor Julián Vallejo y Jorge Mauricio Molina, docentes del área de lingüística computacional y de corpus del pregrado en Filología Hispánica.
... This is a modified version of the Flesch Kincaid Reading Ease that was developed in conjunction with the U.S. Navy. It estimates the U.S grade level to adequately read a piece of text.Test developed byColeman and Liau in 1975. It is based on the principle that measuring readability via the number of letters is a superior measurement over syllable length. ...
... Descriptions of each readability test usedW/S, number words/number sentences; Sl/W, number syllables divided by number of words; CW, complex words (≥3 syllables); L, average number of letters per 100 words; S100, average number of sentences per 100 words; W, number of words[15][16][17]. ...
Article
Introduction: Cataract is the leading cause of blindness worldwide. Phacoemulsification is now the gold standard for cataract extraction and is greatly needed in low socioeconomic status (SES) communities, rural and older patient populations, and patients with poor vision. This greatly increases the importance of high readability for online resources on this topic. This study aims to assess the readability of online information about phacoemulsification based on readability scores for each resource. Methods: We conducted a retrospective cross-sectional study. The term "phacoemulsification" was searched online, and each website was categorized by type: academic, physician, non-physician, commercial, social media, and unspecified. The readability scores for each website were calculated using six different readability tests and a composite score that reflects reading grade level was obtained. To evaluate the difference between the categories of websites, analysis of variance (ANOVA) testing was used. All test scores were compared with the 6th grade standard recommendation using a one-sample t-test. Results: A total of 20 websites were analyzed. Three websites (3/20; 15%) had a score which is correlated with a 6th grade reading level or below. Seventeen websites had a score correlated with a college reading level or above (17/20; 85%). None of the readability scores had a mean below a 6th grade reading level. No category had an average readability score at or below a 6th grade reading level. None of the mean readability scores resulted in a statistically significant difference across categories. All readability tests had an average score which was significantly different from a 6th grade reading level (p<0.001). Conclusions: This is the first study to focus on the accessibility of online English resources on phacoemulsification and implement multiple standardized readability scores with regards to cataract surgery resources. It provides further overwhelming evidence that online resources on phacoemulsification are too complex for the average patient to understand. Interventions should be implemented to improve readability.
... The FORCAST test quantifies singlesyllable words and calculates a grade level, while the Fry test uses the number of syllables and sentences to illustrate grade level on a graph [20]. The Coleman-Liau score takes into account sentence length and character count, while the New Fog Test considers sentence length and the use of polysyllabic words (words > 4 syllables) [21]. The SMOG test also considers polysyllabic words, along with sentence length, to determine a grade level [22]. ...
... The FKGL supplements the difficulty with an estimated grade level [23,24]. The NDC test uses the number of unfamiliar words, meaning those not found on a list of words commonly used and comprehended by 4th graders, along with sentence length [21]. The Gunning-Fog assessment uses polysyllabic words (defined as "complex") and total number of sentences to determine a grade level based on American standards [22]. ...
Article
Full-text available
Introduction: The National Institutes of Health (NIH), American Medical Association (AMA), and the US Department of Health and Human Services (USDHHS) recommend that patient education materials (PEMs) be written between the 4th to 6th grade reading level to ensure readability by the average American. In this study, we examine the reading levels of online patient education materials from major anesthesiology organizations. Methods: Readability analysis of PEMs found on the websites of anesthesiology organizations was performed using the Flesch Reading Ease score, Flesch-Kincaid Grade Level, Simple Measure of Gobbledygook, Gunning Frequency of Gobbledygook, New Dale-Chall test, Coleman-Liau Index, New Fog Count, Raygor Readability Estimate, the FORCAST test, and the Fry Score. Results: Most patient educational materials from the websites of the anesthesiology organizations evaluated were written at or above the 10th grade reading level. Conclusions: Online patient education materials from the major anesthesiology societies are written at levels higher than an average American adult reading skill level and higher than recommended by National Institute of Health, American Medical Association, and US Department of Health and Human Services. Online resources should be revised to improve readability. Simplifying text, using shorter sentences and terms are strategies online resources can implement to improve readability. Future studies should incorporate comprehensibility, user-friendliness, and linguistic ease to further understand the implications on overall healthcare.
... For example, through three experiments, Eagly demonstrated how lowering comprehensibility lessened message acceptance (Eagly, 1974). This directly relates to classical (manually tuned) readability indices like the Flesch-Kincaid Index (Kincaid et al., 1975), the Gunning Fog Index (Gunning, 1952), and the Coleman-Liau Index (Coleman & Liau, 1975). Moreover, the criteria "easy to follow," "less complex," and "easy to understand" have been identified by researchers as a means to assess content comprehensibility(Collins-Thompson, 2014; Van der Sluis et al., 2014). ...
... Stajner et al. [22] and Brigo et al. [4] suggest readability indices to measure text complexity. Accordingly, we implement Flesch-Reading-Ease [23], its successor the Flesch-Kincaid and the Gunning-Fog Index [5], the SMOG [14], Coleman-Liau [8] as well as the Automated Readability Index (ARI) [21]. ...
Preprint
Informal learning on the Web using search engines as well as more structured learning on MOOC platforms have become very popular in recent years. As a result of the vast amount of available learning resources, intelligent retrieval and recommendation methods are indispensable -- this is true also for MOOC videos. However, the automatic assessment of this content with regard to predicting (potential) knowledge gain has not been addressed by previous work yet. In this paper, we investigate whether we can predict learning success after MOOC video consumption using 1) multimodal features covering slide and speech content, and 2) a wide range of text-based features describing the content of the video. In a comprehensive experimental setting, we test four different classifiers and various feature subset combinations. We conduct a detailed feature importance analysis to gain insights in which modality benefits knowledge gain prediction the most.
... (3) Diversity metric [98,106]: Rao et al. [98] proposed a metric to measure diversity in model's outputs by computing the ratio of unique trigrams present in a generated summary. (4) Readability evaluation metrics [45]: Guo et al. uses Flesch-Kincaid grade level [66], Gunning fog index [39], and Coleman-Liau index [29] to compute the ease of readability and fluency of generated summaries. (5) Fact-based Evaluation: To measure the accuracy of medical facts generated by the model in output summary, Enarvi et al. [36] utilizes a machine-learning clinical fact extractor module that is capable of extracting medical facts such as treatment or diagnosis and fine-grained attributes such as body part or medications. ...
Preprint
Full-text available
The internet has had a dramatic effect on the healthcare industry, allowing documents to be saved, shared, and managed digitally. This has made it easier to locate and share important data, improving patient care and providing more opportunities for medical studies. As there is so much data accessible to doctors and patients alike, summarizing it has become increasingly necessary - this has been supported through the introduction of deep learning and transformer-based networks, which have boosted the sector significantly in recent years. This paper gives a comprehensive survey of the current techniques and trends in medical summarization
... For this study we used the following readability formulae: Flesch-Kincaid [26], Gunning Fog [27], Coleman-Liau Index [28], Simple Measure of Gobbledygook Index (SMOG) [29] and the Automated Readability Index [30]. ...
Article
Full-text available
Introduction Fibroadenomas are benign lesions found in the breast tissue. Widespread access to and use of the internet has resulted in more individuals using online resources to better understand health conditions, their prognosis and treatment. The aim of this study was to investigate the readability and visual appearance of online patient resources for fibroadenoma. Methods We searched Google TM , Bing TM and Yahoo TM on 6 July 2022 using the search terms “fibroadenoma”, “breast lumps”, “non-cancerous breast lumps”, “benign breast lumps” and “benign breast lesions” to identify the top ten websites that appeared on each of the search engines. We excluded advertised websites, links to individual pdf documents and links to blogs/chats. We compiled a complete list of websites identified using the three search engines and the search terms and analysed the content. We only selected pages that were relevant to fibroadenoma. We excluded pages which only contained contact details and no narrative information relating to the condition. We did not assess information where links were directed to alternative websites. We undertook a qualitative visual assessment of each of the websites using a framework of pre-determined key criteria based on the Centers for Medicare and Medicaid Services toolkit. This involved assessing characteristics such as overall design, page layout, font size and colour. Each criterion was scored as: +1- criterion achieved; -1- criterion not achieved; and 0- no evidence, unclear or not applicable (maximum total score 43). We then assessed the readability of each website to determine the UK and US reading age using five different readability tests: Flesch Kincaid, Gunning Fog, Coleman Liau, SMOG, and the Automated Readability Index. We compared the readability scores to determine if there were any significant differences across the websites identified. We also generated scores for the Flesh Reading Ease as well as information about sentence structure (number of syllables per sentence and proportion of words with a high number of syllables) and proportion of people the text was readable to. Results We identified 39 websites for readability and visual assessment. The visual assessment scores for the 39 websites identified ranged from -19 to 31 points out of a possible score of 43. The median readability score for the identified websites was 8.58 (age 14–15), with a range of 6.69–12.22 (age 12–13 to university level). There was a statistically significant difference between the readability scores obtained across websites (p<0.001). Almost half of the websites (18/39; 46.2%) were classified as very difficult by the Flesch Reading Ease score, with only 13/39 (33.33%) classified as being fairly easy or plain English. Conclusion We found wide differences in the general appearance, layout and focus of the fibroadenoma websites identified. The readability of most of the websites was also much higher than the recommended level for the public to understand. Fibroadenoma website information needs to be simplified to reduce the use of jargon and specificity to the condition for individuals to better comprehend it. In addition, their visual appearance could be improved by changing the layout and including images and diagrams.
... We first evaluate generation quality using ROUGE-L (Lin, 2004) 4 and BERTScore . The Coleman-Liau readability score (Coleman and Liau, 1975) assesses the ease with which a reader can understand a passage, and word familiarity (Leroy and Kauchak, 2014) measures the inverse document frequency of unigrams in text using frequency counts from Wikipedia. Lower Coleman-Liau score and word familiarity indicate that text is easier to read. ...
Preprint
Full-text available
Recent lay language generation systems have used Transformer models trained on a parallel corpus to increase health information accessibility. However, the applicability of these models is constrained by the limited size and topical breadth of available corpora. We introduce CELLS, the largest (63k pairs) and broadest-ranging (12 journals) parallel corpus for lay language generation. The abstract and the corresponding lay language summary are written by domain experts, assuring the quality of our dataset. Furthermore, qualitative evaluation of expert-authored plain language summaries has revealed background explanation as a key strategy to increase accessibility. Such explanation is challenging for neural models to generate because it goes beyond simplification by adding content absent from the source. We derive two specialized paired corpora from CELLS to address key challenges in lay language generation: generating background explanations and simplifying the original abstract. We adopt retrieval-augmented models as an intuitive fit for the task of background explanation generation, and show improvements in summary quality and simplicity while maintaining factual correctness. Taken together, this work presents the first comprehensive study of background explanation for lay language generation, paving the path for disseminating scientific knowledge to a broader audience. CELLS is publicly available at: https://github.com/LinguisticAnomalies/pls_retrieval.
... We also added the sentence-level subjectivity measure from an LSTM-based model k that was trained on a dataset released by (Pang and Lee 2004). Finally, various readability indicators were used: Flesh-Kincaid (Kincaid et al. 1975;Flesch 1948), ARI (Smith and Senter 1967), Coleman-Liau (Coleman and Liau 1975), etc. They can be used in various downstream tasks and have proven to yield good results in many text classification problems. ...
Article
Full-text available
Fake news detection is an emerging topic that has attracted a lot of attention among researchers and in the industry. This paper focuses on fake news detection as a text classification problem: on the basis of five publicly available corpora with documents labeled as true or fake, the task was to automatically distinguish both classes without relying on fact-checking. The aim of our research was to test the feasibility of a universal model: one that produces satisfactory results on all data sets tested in our article. We attempted to do so by training a set of classification models on one collection and testing them on another. As it turned out, this resulted in a sharp performance degradation. Therefore, this paper focuses on finding the most effective approach to utilizing information in a transferable manner. We examined a variety of methods: feature selection, machine learning approaches to data set shift (instance re-weighting and projection-based), and deep learning approaches based on domain transfer. These methods were applied to various feature spaces: linguistic and psycholinguistic, embeddings obtained from the Universal Sentence Encoder, and GloVe embeddings. A detailed analysis showed that some combinations of these methods and selected feature spaces bring significant improvements. When using linguistic data, feature selection yielded the best overall mean improvement (across all train-test pairs) of 4%. Among the domain adaptation methods, the greatest improvement of 3% was achieved by subspace alignment.
... Traditional readability formula. Three established heuristics-based readability metrics are adopted, including the Flesch-Kincaid Grade (Kincaid et al., 1975), Coleman-Liau Index (Coleman and Liau, 1975), and automated readability index (ARI) (Senter and Smith, 1967), which are used to approximate the U.S. grade level to understand a written text. These metrics rely on shallow features like the length of a sentence or the number of characters in words and thus are unable to fully reveal the gap between the summaries of biomedical documents on different readability levels (Gao et al., 2019). ...
Preprint
Different from general documents, it is recognised that the ease with which people can understand a biomedical text is eminently varied, owing to the highly technical nature of biomedical documents and the variance of readers' domain knowledge. However, existing biomedical document summarization systems have paid little attention to readability control, leaving users with summaries that are incompatible with their levels of expertise. In recognition of this urgent demand, we introduce a new task of readability controllable summarization for biomedical documents, which aims to recognise users' readability demands and generate summaries that better suit their needs: technical summaries for experts and plain language summaries (PLS) for laymen. To establish this task, we construct a corpus consisting of biomedical papers with technical summaries and PLSs written by the authors, and benchmark multiple advanced controllable abstractive and extractive summarization models based on pre-trained language models (PLMs) with prevalent controlling and generation techniques. Moreover, we propose a novel masked language model (MLM) based metric and its variant to effectively evaluate the readability discrepancy between lay and technical summaries. Experimental results from automated and human evaluations show that though current control techniques allow for a certain degree of readability adjustment during generation, the performance of existing controllable summarization methods is far from desirable in this task.
... Only a few works have studied the correlations between headlines and their linked articles for clickbait identification. Biyani et al. [12] adopted n-gram-based gradient boost decision trees (GBDT) to evaluate information similarities, as well as various measurements such as Coleman-Liau scores [54] and RIX and LIX indices [55]. Kumar et al. [13] applied Siamese networks to determine the textual similarity between headlines and linked articles. ...
Article
Full-text available
Clickbait is a commonly used social engineering technique to carry out phishing attacks, illegitimate marketing, and dissemination of disinformation. As a result, clickbait detection has become a popular research topic in recent years due to the prevalence of clickbait on the web and social media. In this article, we propose a novel attention-based neural network for the task of clickbait detection. To the best of our knowledge, our work is the first that incorporates human semantic knowledge into an artificial neural network, and uses linguistic knowledge graphs to guide attention mechanisms for the clickbait detection task. Extensive experimental results show that the proposed model outperforms existing state-of-the-art clickbait classifiers, even when training data is limited. The proposed model also performs better or comparably to powerful pre-trained models, namely, BERT, RoBERTa, and XLNet, while being much more lightweight. Furthermore, we conducted experiments to demonstrate that the use of human semantic knowledge can significantly enhance the performance of pre-trained models in the semi-supervised domain such as BERT, RoBERTa, and XLNet.
... The Coleman-Liau index (Índice de Coleman-Liau] follows the criterion adopted by the ARI method, in the sense that it was developed with the purpose of being an index of easy computational implementation. Its formula, created by Meri Coleman and T. L. Liau, is given by [9] Coleman-Liau grade level = −15.8 − 2.96 × sentences words + 5.88 × letters words . ...
Preprint
In the initial stage of human life, communication, seen as a process of social interaction, was always the best way to reach consensus between the parties. Understanding and credibility in this process are essential for the mutual agreement to be validated. But, how to do it so that this communication reaches the great mass? This is the main challenge when what is sought is the dissemination of information and its approval. In this context, this study presents the ALT software, developed from original readability metrics adapted to the Portuguese language, available on the web, to reduce communication difficulties. The development of the software was motivated by the theory of communicative action of Habermas, which uses a multidisciplinary style to measure the credibility of the discourse in the communication channels used to build and maintain a safe and healthy relationship with the public.
... The reading ease score indicates the readability of a text by computing the average length of the sentences and the average number of syllables per word. A lower score indicates more complicated and long words are used in the text, making it more difficult for readers to process the text (Coleman and Liau 1975). ...
Preprint
COVID-19 poses disproportionate mental health consequences to the public during different phases of the pandemic. We use a computational approach to capture the specific aspects that trigger an online community's anxiety about the pandemic and investigate how these aspects change over time. First, we identified nine subjects of anxiety (SOAs) in a sample of Reddit posts ($N$=86) from r/COVID19\_support using thematic analysis. Then, we quantified Reddit users' anxiety by training algorithms on a manually annotated sample ($N$=793) to automatically label the SOAs in a larger chronological sample ($N$=6,535). The nine SOAs align with items in various recently developed pandemic anxiety measurement scales. We observed that Reddit users' concerns about health risks remained high in the first eight months of the pandemic. These concerns diminished dramatically despite the surge of cases occurring later. In general, users' language disclosing the SOAs became less intense as the pandemic progressed. However, worries about mental health and the future increased steadily throughout the period covered in this study. People also tended to use more intense language to describe mental health concerns than health risks or death concerns. Our results suggest that this online group's mental health condition does not necessarily improve despite COVID-19 gradually weakening as a health threat due to appropriate countermeasures. Our system lays the groundwork for population health and epidemiology scholars to examine aspects that provoke pandemic anxiety in a timely fashion.
... Readability: These measure the ease with which one can expect a text to be comprehended. Included are features such as CLI (Coleman and Liau 1975), and the Gunning fog index (Gunning 1969). Prior literature, including Potthast et al. (2017), have used a similar set of readability features for stylometric analysis. ...
Preprint
Full-text available
Content has historically been the primary lens used to study language in online communities. This paper instead focuses on the linguistic style of communities. While we know that individuals have distinguishable styles, here we ask whether communities have distinguishable styles. Additionally, while prior work has relied on a narrow definition of style, we employ a broad definition involving 262 features to analyze the linguistic style of 9 online communities from 3 social media platforms discussing politics, television and travel. We find that communities indeed have distinct styles. Also, style is an excellent predictor of group membership (F-score 0.952 and Accuracy 96.09%). While on average it is statistically equivalent to predictions using content alone, it is more resilient to reductions in training data.
... where C denotes the number of numbers and letters while W shows the number of spaces and S indicates the number of sentences [118]. CLI, on the other hand, suggested by Meri Coleman and T. L. Liau [123] is given in Eq.(6) ...
Article
Full-text available
Phishing attacks are still seen as a significant threat to cyber security, and large parts of the industry rely on anti-phishing simulations to minimize the risk imposed by such attacks. This study conducted a large-scale anti-phishing training with more than 31000 participants and 144 different simulated phishing attacks to develop a data-driven model to classify how users would perceive a phishing simulation. Furthermore, we analyze the results of our large-scale anti-phishing training and give novel insights into users’ click behavior. Analyzing our anti-phishing training data, we find out that 66% of users do not fall victim to credential-based phishing attacks even after being exposed to twelve weeks of phishing simulations. To further enhance the phishing awareness-training effectiveness, we developed a novel manifold learning-powered machine learning model that can predict how many people would fall for a phishing simulation using the several structural and state-of-the-art NLP features extracted from the emails. In this way, we present a systematic approach for the training implementers to estimate the average "convincing power" of the emails prior to rolling out. Moreover, we revealed the top-most vital factors in the classification. In addition, our model presents significant benefits over traditional rule-based approaches in classifying the difficulty of phishing simulations. Our results clearly show that anti-phishing training should focus on the training of individual users rather than on large user groups. Additionally, we present a promising generic machine learning model for predicting phishing susceptibility.
... Readability evaluation Other than how much information is retained in the summary, we are also interested in assessing the ease with which a reader can understand a passage, defined as readability. We use three standard metrics to evaluate readability: Flesch-Kincaid grade level (Kincaid et al. 1975), Gunning fog index (Gunning et al. 1952), and Coleman-Liau index (Coleman and Liau 1975). These scores are computed using textstat 7 , and their formulae are as follows: where complex words are those words with three or more syllables. ...
Article
Health literacy has emerged as a crucial factor in making appropriate health decisions and ensuring treatment outcomes. However, medical jargon and the complex structure of professional language in this domain make health information especially hard to interpret. Thus, there is an urgent unmet need for automated methods to enhance the accessibility of the biomedical literature to the general population. This problem can be framed as a type of translation problem between the language of healthcare professionals, and that of the general public. In this paper, we introduce the novel task of automated generation of lay language summaries of biomedical scientific reviews, and construct a dataset to support the development and evaluation of automated methods through which to enhance the accessibility of the biomedical literature. We conduct analyses of the various challenges in performing this task, including not only summarization of the key points but also explanation of background knowledge and simplification of professional language. We experiment with state-of-the-art summarization models as well as several data augmentation techniques, and evaluate their performance using both automated metrics and human assessment. Results indicate that automatically generated summaries produced using contemporary neural architectures can achieve promising quality and readability as compared with reference summaries developed for the lay public by experts (best ROUGE-L of 50.24 and Flesch-Kincaid readability score of 13.30). We also discuss the limitations of the current effort, providing insights and directions for future work.
... Textual web content with a higher reading level may be considered more professional, and therefore of higher quality, impacting credibility assessments. Using the readability 4 library to compare various readability grades, we find that the Coleman-Liau index [34] performs best and use it in our signal evaluation. 5 The same readability index is also utilised by [35] and [33]. ...
Chapter
Full-text available
The credibility and trustworthiness of online content has become a major societal issue as human communication and information exchange continues to evolve digitally. The prevalence of misinformation, circulated by fraudsters, trolls, political activists and state-sponsored actors, has motivated a heightened interest in automated content evaluation and curation tools. We present an automated credibility evaluation system to aid users in credibility assessments of web pages, focusing on the automated analysis of 23 mostly language- and content-related credibility signals of web content. We find that emotional characteristics, various morphological and syntactical properties of the language, and exclamation mark and all caps usage are particularly indicative of credibility. Less credible web pages have more emotional, shorter and less complex texts, and put a greater emphasis on the headline, which is longer, contains more all caps and is frequently clickbait. Our system achieves a 63% accuracy in fake news classification, and a 28% accuracy in predicting the credibility rating of web pages on a five-point Likert scale.
... Readability analysis was performed using Readability Studio Professional Edition Version 2015 (Oleander Software, Ltd), applying nine validated formulas to quantify article readability: Coleman-Liau Index [15], Flesch-Kincaid Grade Level [16], FORCAST formula [17], Fry graph [18], Gunning Fog Index [19], New Dale-Chall [20], New Fog Count [16], Raygor Reading Estimate [21], and SMOG (Simple Measure of Gobbledygook) [22]. Nine different readability scales were used to minimize the bias that comes with using only one scale. ...
Preprint
BACKGROUND The COVID-19 pandemic spurred an increase in online information regarding disease spread and symptomatology. OBJECTIVE Our purpose is to systematically assess the quality and readability of articles resulting from frequently Google-searched COVID-19 terms in the United States. METHODS We used Google Trends to determine the 25 most commonly searched health-related phrases between February 29 and April 30, 2020. The first 30 search results for each term were collected, and articles were analyzed using the Quality Evaluation Scoring Tool (QUEST). Three raters scored each article in authorship, attribution, conflict of interest, currency, complementarity, and tone. A readability analysis was conducted. RESULTS Exactly 709 articles were screened, and 195 fulfilled inclusion criteria. The mean article score was 18.4 (SD 2.6) of 28, with 7% (14/189) scoring in the top quartile. National news outlets published the largest share (70/189, 36%) of articles. Peer-reviewed journals attained the highest average QUEST score compared to national/regional news outlets, national/state government sites, and global health organizations (all P <.05). The average reading level was 11.7 (SD 1.9, range 5.4-16.9). Only 3 (1.6%) articles were written at the recommended sixth grade level. CONCLUSIONS COVID-19–related articles are vastly varied in their attributes and levels of bias, and would benefit from revisions for increased readability.
... In order to measure the ease of comprehension of the content of the e-learning courses, there are five readability formulas deployed in the present study. Among such indices, the most popular ones and those within the scope of the present study include, Automated Readability Index (ARI) [5], Flesch Reading Index (FRI) [13], Coleman-Liau Index (CLI) [29], Gunning Fog Index (GFI) [30], and Flesch-Kincaid Grade Level (FKGL) [32]. It is notable that FRI is also called Flesch Reading Ease (FRE) [13]. ...
... Prior work has shown that readability can indicate the quality and the validity of the work items (Zimmermann et al. 2010;Fan et al. 2018). Similar to prior work (Fan et al. 2018), we use seven readability metrics, i.e., Flesch (read-flesch, (Flesch 1948)), Fog (read-fog, (Gunning 1952)), Lix (read-lix, (Jonathan Anderson 1983)), Flesch-Kincaid (read-kincaid, (Kincaid et al. 1975)), Automated Readability Index (readari, (Senter and E.A.Smith 1967)), Coleman-Liau (read-coleman-liau, (Coleman and Liau 1975)), and SMOG (read-smog, (Mc Laughlin 1969)). These metrics indicate the education level required to comprehend the text based on the number of syllables, words, and length of sentences. ...
Article
Full-text available
Story Points (SP) are an effort unit that is used to represent the relative effort of a work item. In Agile software development, SP allows a devel- opment team to estimate their delivery capacity and facilitate the sprint plan- ning activities. Although Agile embraces changes, SP changes after the sprint planning may negatively impact the sprint plan. To minimize the impact, there is a need to better understand the SP changes and an automated approach to predict the SP changes. Hence, to better understand the SP changes, we examine the prevalence, accuracy, and impact of information changes on SP changes. Through the analyses based on 19,349 work items spread across seven open-source projects, we find that on average, 10% of the work items have SP changes. These work items typically have SP value increased by 58%-100% rel- ative to the initial SP value when they were assigned to a sprint. We also find that the unchanged SP reflect the development time better than the changed SP. Our qualitative analysis shows that the work items with changed SP of- ten have the information changes relating to updating the scope of work. Our empirical results suggest that SP and the scope of work should be reviewed prior or during sprint planning to achieve a reliable sprint plan. Yet, it could be a tedious task to review all work items in the product (or sprint) backlog. Therefore, we develop a classifier to predict whether a work item will have SP changes after being assigned to a sprint. Our classifier achieves an AUC of 0.69-0.8, which is significantly better than the baselines. Our results suggest that to better manage and prepare for the unreliability in SP estimation, the team can leverage our insights and the classifier during the sprint planning. To facilitate future studies, we provide the replication package and the datasets, which are available online.
... • Coleman-Liau: Coleman-Liau readability score (Coleman and Liau, 1975). ...
Conference Paper
Full-text available
This paper describes the winning approach in the first automated German text complexity assessment shared task as part of KONVENS 2022. To solve this difficult problem, the evaluated system relies on an ensemble of regression models that successfully combines both traditional feature engineering and pre-trained resources. Moreover, the use of adversarial validation is proposed as a method for countering the data drift identified during the development phase, thus helping to select relevant models and features and avoid leaderboard overfitting. The best submission reached 0.43 mapped RMSE on the test set during the final phase of the competition.
... The CLI was developed as a readability formula that lends itself to machine-assisted scoring [29]. The CLI yields a US school grade reading level based on the average number of letters per 100 words and the average number of sentences per 100 words. ...
Article
Full-text available
Self-report measures are central in capturing young people’s perspectives on mental health concerns and treatment outcomes. For children and adolescents to complete such measures meaningfully and independently, the reading difficulty must match their reading ability. Prior research suggests a frequent mismatch for mental health symptom measures. Similar analyses are lacking for measures of Quality of Life (QoL). We analysed the readability of 13 commonly used QoL self-report measures for children and adolescents aged 6 to 18 years by computing five readability formulas and a mean reading age across formulas. Across measures, the mean reading age for item sets was 10.7 years (SD = 1.2). For almost two-thirds of the questionnaires, the required reading age exceeded the minimum age of the target group by at least one year, with an average discrepancy of 3.0 years (SD = 1.2). Questionnaires with matching reading ages primarily targeted adolescents. Our study suggests a frequent mismatch between the reading difficulty of QoL self-report measures for pre-adolescent children and this group’s expected reading ability. Such discrepancies risk undermining the validity of measurement, especially where children also have learning or attention difficulties. Readability should be critically considered in measure development, as one aspect of the content validity of self-report measures for youth.
... The result of the formula represents the American school grade that the reader should be in, in order to understand the text. In the same year, another formula was devised by [17], the output of which represents the grade level of the text. In 2006, Weir and Ritchie proposed the Strathclyde readability measure that focused on the frequency of a word [18]. ...
Article
Full-text available
Accessibility of text is an attribute that deserves the attention of researchers and content creators. This study is an attempt to determine the lexical features that play a key role in identifying complex words in Hindi text. As the first step, we studied the parameters used in readability metrics in different languages and tested their importance on classifiers built on datasets created with the help of a user study. In part of the study, we reported the results of two different approaches used to label a word as complex. In this part, we compare the previous results with the results obtained from a third labeling approach. We found satisfactory evidence for certain parameters and also observed a new parameter that could be used while devising readability metrics for Hindi.
... At this stage, readability and the degree of change are used to evaluate the lexical simplification task. We use three different readability indices including Flesch-Kincaid (Kincaid, Fishburne, Rogers, & Chissom, 1975), Gunning-Fog (Gunning, 1952), and Coleman-Liau (Coleman & Liau, 1975), the mathematic calculations of which are provided in Equations (1) to (3) ...
Article
Purpose Clinical notes typically contain medical jargons and specialized words and phrases that are complicated and technical to most people, which is one of the most challenging obstacles in health information dissemination to consumers by healthcare providers. The authors aim to investigate how to leverage machine learning techniques to transform clinical notes of interest into understandable expressions. Design/methodology/approach The authors propose a natural language processing pipeline that is capable of extracting relevant information from long unstructured clinical notes and simplifying lexicons by replacing medical jargons and technical terms. Particularly, the authors develop an unsupervised keywords matching method to extract relevant information from clinical notes. To automatically evaluate completeness of the extracted information, the authors perform a multi-label classification task on the relevant texts. To simplify lexicons in the relevant text, the authors identify complex words using a sequence labeler and leverage transformer models to generate candidate words for substitution. The authors validate the proposed pipeline using 58,167 discharge summaries from critical care services. Findings The results show that the proposed pipeline can identify relevant information with high completeness and simplify complex expressions in clinical notes so that the converted notes have a high level of readability but a low degree of meaning change. Social implications The proposed pipeline can help healthcare consumers well understand their medical information and therefore strengthen communications between healthcare providers and consumers for better care. Originality/value An innovative pipeline approach is developed to address the health literacy problem confronted by healthcare providers and consumers in the ongoing digital transformation process in the healthcare industry.
... To demonstrate that our results do not depend on this choice, we employ a variety of alternatives. The most common measures for linguistic complexity are the Flesch Reading Ease (Flesch, 1948), the Gunning Fog Index (Gunning, 1952), the SMOG Index (McLaughlin, 1969), the Coleman-Liau Index (Coleman and Liau, 1975), and the Automated Readability Index (Senter and Smith, 1967). Table 6 sets out the respective definitions. ...
Article
We empirically examine how the complexity of ECB communications affects financial market trading based on high-frequency data from European stock index futures trading between 2009 and 2017. Analysing the linguistic complexity of the ECB’s introductory statements and differentiating between press conferences with and without announcements of unconventional monetary policy measures (UMPM), we find that more complex communication, i.e. high linguistic complexity and UMPM-announcement, is associated with a lower level of contemporaneous trading activity. Moreover, complex communication leads to a temporal shift in trading activity towards the subsequent Q&A session, which suggests that Q&A sessions facilitate market participants’ information processing.
... The Coleman-Liau index (ICL) [26] was developed for computer analysis of text readability using word processors and is mostly used in specialized software for translation analysis: 0.0588 0.296 15.8, ...
Article
Full-text available
Research on the development of methods for identifying signs of hidden manipulation (destructive information and psychological impact) in text messages that are published on Internet sites and distributed among users of social networks is relevant. One of the main problems in the development of these methods is the difficulty of formalizing the process of identifying signs of manipulation in text messages of social network agents. To do this, based on morphological synthesis, it is necessary to determine relevant indicators for analyzing text messages and criteria for making a decision about the presence of signs of manipulation in text messages. Based on morphological synthesis, a method for determining manipulation indicators in text messages was developed, taking into account the achievements of modern technologies of intelligent content analysis of text messages, machine learning methods, fuzzy logic and computational linguistics, which made it possible to reasonably determine a group of indicators for evaluating text messages for signs of manipulation. The stages of the method include evaluating the text message at the level of perception by the indicator of text readability, at the phonetic level by the indicator of emotional impact on the subconscious, at the graphic level by the indicator of text marking intensity, and calculating the integral indicator for making a decision about the presence of manipulation in the text message. Based on the proposed method, specialized software was developed that provided 13 % greater accuracy in evaluating messages for manipulative impact compared to the known method of expert evaluations, which reduced the influence of the subjective factor on the evaluation result
... For that purpose, the measurements are explainable and can be interpreted in a follow-up qualitative manual analysis.The text elements computed by Udat are described in detail in (Shamir, 2020). In summary, they include the following: -Readability: Automated Readability Index (Smith and Senter, 1967), and the Coleman-Liau index (Coleman and Liau, 1975) are established methods for estimating the level of difficulty of reading the text. Both methods are based on the length of words and length of sentences, which are expected to provide an indication of the level of reading difficulty. ...
Article
Popular music lyrics exhibit clear differences between songwriters. This study describes a quantitative approach to the analysis of popular music lyrics. The method uses explainable measurements of the lyrics and therefore allows the use of quantitative measurements for consequent qualitative analyses. This study applies the automatic quantitative text analytics to 18,577 songs from 89 popular music artists. The analysis quantifies different elements of the lyrics that might be impractical to measure manually. The analysis includes basic supervised machine learning, and the explainable nature of the measurements also allows to identify specific differences between the artists. For instance, the sentiments expressed in the lyrics, the diversity in the selection of words, the frequency of gender-related words, and the distribution of the sounds of the words show differences between popular music artists. The analysis also shows a correlation between the easiness of readability and the positivity of the sentiments expressed in the lyrics. The analysis can be used as a new approach to studying popular music lyrics. The software developed for the study is publicly available and can be used for future studies of popular music lyrics.
... In addition, the approach is not adaptable to different educational/domain contexts that can vary greatly. Therefore, automatic quantitative formulas of lexical complexity such as the Lexile reader measure (Stenner, 1996), the Flesch-Kincaid index (Kincaid, Fishburne, Rogers, & Chissom, 1975) or the Coleman-Liau index (Coleman & Liau, 1975) have become viable alternatives. These formulas provide automated estimates of text difficulty level based on the difficulty of the words and the sentences. ...
Article
Textual complexity is widely used to assess the difficulty of reading materials and writing quality in student essays. At a lexical level, word complexity can represent a building block for creating a comprehensive model of lexical networks that adequately estimates learners’ understanding. In order to best capture how lexical associations are created between related concepts, we propose automated indices of word complexity based on Age of Exposure (AoE). AOE indices computationally model the lexical learning process as a function of a learner's experience with language. This study describes a proof of concept based on the on a large-scale learning corpus (i.e., TASA). The results indicate that AoE indices yield strong associations with human ratings of age of acquisition, word frequency, entropy, and human lexical response latencies providing evidence of convergent validity.
... Prior work has shown that readability can indicate the quality and the validity of the work items (Zimmermann et al., 2010;Fan et al., 2018). Similar to prior work (Fan et al., 2018), we use seven readability metrics, i.e., Flesch (read-flesch, Flesch (1948)), Fog (read-fog, Gunning (1952)), Lix (read-lix , Jonathan Anderson (1983)), Flesch-Kincaid (readkincaid , Kincaid et al. (1975)), Automated Readability Index (read-ari , Senter and E.A.Smith (1967)), Coleman-Liau (read-coleman-liau, Coleman and Liau (1975)), and SMOG (read-smog, Mc Laughlin (1969)). These metrics indicate the education level required to comprehend the text based on the number of syllables, words, and length of sentences. ...
Preprint
Full-text available
Story Points (SP) are an effort unit that is used to represent the relative effort of a work item. In Agile software development, SP allows a development team to estimate their delivery capacity and facilitate the sprint planning activities. Although Agile embraces changes, SP changes after the sprint planning may negatively impact the sprint plan. To minimize the impact, there is a need to better understand the SP changes and an automated approach to predict the SP changes. Hence, to better understand the SP changes, we examine the prevalence, accuracy, and impact of information changes on SP changes. Through the analyses based on 13,902 work items spread across seven open-source projects, we find that on average, 10% of the work items have SP changes. These work items typically have SP value increased by 58%-100% relative to the initial SP value when they were assigned to a sprint. We also find that the unchanged SP reflect the development time better than the changed SP. Our qualitative analysis shows that the work items with changed SP of- ten have the information changes relating to updating the scope of work. Our empirical results suggest that SP and the scope of work should be reviewed prior or during sprint planning to achieve a reliable sprint plan. Yet, it could be a tedious task to review all work items in the product (or sprint) backlog. Therefore, we develop a classifier to predict whether a work item will have SP changes after being assigned to a sprint. Our classifier achieves an AUC of 0.69-0.8, which is significantly better than the baselines. Our results suggest that to better manage and prepare for the unreliability in SP estimation, the team can leverage our insights and the classifier during the sprint planning. To facilitate future studies, we provide the replication package and the datasets, which are available online.
... RF shows an accuracy equal to 0.83 and AUC equal to 0.93, which proves its better performance and means that it has a very high precision. To obtain these results, the following features were selected as the best features: the age of the account of the user who left the comment, the average word length and the number of difficult words of the comment, the average number of words per comment, and posts per author, comment karma, the number of data science terms, the polarity of the comment, several readability scores -Coleman-Liau index (Coleman & Liau, 1975), Dale-Chall readability score (Edgar & Jeanne, 1948) and Spache readability formula (Spache, 1953), and the term frequency-inverse document frequency derived features. These exact features are important to detect experts because they represent essential characteristics of the user who performed the comment and the comment itself. ...
Chapter
Full-text available
This chapter uncovers the opportunities that online media portals like content sharing and consumption sites or photography sites have for informal learning. The authors explored online portals that can provide evidence of evaluating, inferring, measuring skills, and/or contributing to the development of competencies and capabilities of the 21 st century with two case studies. The first one is focused on identifying data science topical experts across the Reddit community. The second one uses online Flickr data to apply a model on the photographs to rate them as aesthetically attractive and technically sound, which can serve as a base for measuring the photography capabilities of Flickr users. The presented results can serve as a base to replicate these methodologies to infer other competencies and capabilities across online portals. This novel approach can be an effective alternative evaluation of key 21 st century skills for the modern workforce with multiple applications.
Article
Maritime accident reporting is performed as a means for experience feedback within and across organizations. While the quality and representativeness of the findings are critical to prevent similar accidents from occurring in the future, various contextual factors concerning the reports can affect the ability of various actors to use these effectively as a basis for learning and action. Research suggests that the readability of safety documents is essential to their successful adoption and use. However, there is currently no empirical knowledge about the readability of maritime accident reports. Consequently, this study presents a comparative analysis of quantitative readability metrics of maritime accident reports. Three-year data extracted from reports by five English-language national accident investigation authorities, and one industry reporting system are used. The results show that the language used is commonly at the post-secondary reading level. Reports by the Nautical Institute’s Mariners’ Alerting and Reporting Scheme are written at a high school level and thus easier to read. Statistical variation of readability of reports by different organizations is significant. Implications for future research and practice are discussed. The main recommendation for reporting organizations is to be mindful of language complexity and simplify where possible.
Article
Full-text available
OPSOMMING Goedontwikkelde leesvaardighede word vir suksesvolle leer, asook vir volle deelname in die gemeenskap en werksmilieu in die Nasionale Kurrikulum-en Assesserings-beleidsverklaring (KABV) onderskryf (DBO, 2011). Ten spyte van die belangrikheid van effektiewe leesvaardighede bly die lees-standaard van intermediêrefaseleerders steeds onrusbarend laag wanneer die uitslae van internasionale leesassesserings in ag geneem word. Uit hierdie resultate word dit duidelik dat leerders nie voldoende leesbegrip het wanneer hulle op die inhoud van 'n leesteks geassesseer word nie. Alhoewel duidelike kriteria vir tekskeuse bestaan, kan die oorsaak van die swak leesbegrip moontlik aan tekskeuse toegeskryf word. Vir hierdie artikel is die leesbaarheidsvlak van voorgeskrewe tekste as moontlike struikelblok ondersoek. Verskeie leesbaarheidsindekse is ondersoek, maar spesifiek dié wat vir Afrikaans Huistaal gebruik kan word om tekskompleksiteit te meet. Die leesbaarheid van tekste in huistaalhandboeke (intermediêre fase) is bereken. Die ondersoek is vanuit die interpretivisme benader en is kwalitatief van aard. Ten eerste is 'n uitgebreide literatuurondersoek na die leessituasie geloods met betrekking tot intermediêrefaseleerders, tekste, kriteria vir tekskeuse, die leesbaarheid van tekste en leesbaarheidsindekse. Ten tweede is die KABV (Afrikaans Huistaal, intermediêre fase) ontleed om vas te stel wat die riglyne vir tekste, tekskeuse en leesbaarheid behels. Laastens is KABV-geakkrediteerde handboeke ontleed om die leesbaarheidsvlakke van enkele tekste daarin te bepaal. Die oorkoepelende temas uit hierdie drie data-insamelingsmetodes is gebruik om moontlike voorstelle vir Afrikaans Huistaal-onderwysers en handboek-samestellers ten opsigte van selfgerigte tekskeuse vir intermediêrefaseleerders met behulp van 'n leesbaarheidsindeks te maak.
Article
This article investigates the syllabification of Sesotho words using a rule-based approach. A total of eleven syllabification rules are proposed based on Guma's (1982) three types of syllables, that is, consonant only (C), consonants and vowels (CV), and vowels only (V) syllable types. The syllabification rules are established using the South African Sesotho (SAS) orthography. The proposed syllabification rules are illustrated and applied to an extract from Masowa (2017). The outcomes indicate that the laws of syllabification proposed in this article are sufficient for syllabification of Sesotho words. Among other findings, we differ from Guma (1982) by proposing the removal of the /ny/ consonant digraph from the list of C syllable subtypes. Moreover, we extend Madigoe's (2003) list of CV syllable subtypes by adding two more CV syllable subtypes based on the number of consonants preceding the vowel in CV syllables. These suggested CV subtypes focus on the number of vowels that precede the consonant-vowel syllable types. We believe that if the rules suggested in this article are used correctly, the development of an automated syllabification system for Sesotho can be achieved.
Article
Full-text available
Background/Objective Informed consent forms (ICFs) and practices vary widely across institutions. This project expands on previous work at the University of Arkansas for Medical Sciences (UAMS) Center for Health Literacy to develop a plain language ICF template. Our interdisciplinary team of researchers, comprised of biomedical informaticists, health literacy experts, and stakeholders in the Institutional Review Board (IRB) process, has developed the ICF Navigator, a novel tool to facilitate the creation of plain language ICFs that comply with all relevant regulatory requirements. Methods Our team first developed requirements for the ICF Navigator tool. The tool was then implemented by a technical team of informaticists and software developers, in consultation with an informed consent legal expert. We developed and formalized a detailed knowledge map modeling regulatory requirements for ICFs, which drives workflows within the tool. Results The ICF Navigator is a web-based tool that guides researchers through creating an ICF as they answer questions about their project. The navigator uses those responses to produce a clear and compliant ICF, displaying a real-time preview of the final form as content is added. Versioning and edits can be tracked to facilitate collaborative revisions by the research team and communication with the IRB. The navigator helps guide the creation of study-specific language, ensures compliance with regulatory requirements, and ensures that the resulting ICF is easy to read and understand. Conclusion The ICF Navigator is an innovative, customizable, open-source software tool that helps researchers produce custom readable and compliant ICFs for research studies involving human subjects.
Article
The increase in healthcare coverage for transgender populations has made facial feminization surgeries (FFS) more accessible. Majority of patients interested in surgery regularly check online medical information to help understand surgical procedures, risks, and recovery. National health organizations recommend that patient information material should be written at a sixth-grade-reading level, but online material often surpasses patient health literacy. This study evaluates the readability of online FFS resources. An Internet search of the top 100 Web sites was conducted using the keywords “facial feminization surgery.” Web sites were analyzed for relevant patient information articles on FFS and categorized into health care and nonhealth care groups. Readability examinations were performed for written text using the Automated Readability Index, Coleman-Liau Index, Flesch-Kincaid Grade Level, Gunning Fog Index, and Simple Measure of Gobbledygook Index. Statistical analysis was performed using 2-tailed z tests, with statistical significance set at P≤0.05. A total of 100 articles from 100 Web sites were examined. The average readability for all online FFS resources was at a 12th-grade-writing level. Articles from health care organizations were at a 13th-grade-reading level and nonhealth care organization articles were at a 12th-grade-reading level (P<0.01). Online patient information for FFS is more complex than nationally recommended writing levels, which may interfere with patient decision making and outcomes. Patient resources for FFS should be written at a lower reading level to promote patient education, satisfaction, and compliance.
Article
Objectives: Mobile applications (apps) are multiplying in laryngology, with little standardization of content, functionality, or accessibility. The purpose of this study is to evaluate the quality, functionality, health literacy, readability, accessibility, and inclusivity of laryngology mobile applications. Methods: Of the 3230 apps identified from the Apple and Google Play stores, 28 patient-facing apps met inclusion criteria. Apps were evaluated using validated scales assessing quality and functionality: the Mobile App Rating Scale (MARS) and the Institute for Healthcare Informatics App Functionality Scale. The Clear Communication Index (CDC) Institute of Medicine Strategies for Creating Health Literate Mobile Applications, and Patient Education Materials Assessment Tool (PEMAT) were used to evaluate apps health literacy level. Readability was assessed using established readability formulas. Apps were evaluated for language, accessibility features, and representation of a diverse population. Results: Twenty-six apps (92%) had adequate quality (MARS score > 3). The mean PEMAT score was 89% for actionability and 86% for understandability. On average, apps utilized 25/33 health literate strategies. Twenty-two apps (79%) did not pass the CDC index threshold of 90% for health literacy. Twenty-four app descriptions (86%) were above an 8th grade reading level. Only 4 apps (14%) showed diverse representation, 3 (11%) had non-English language functions, and 2 (7%) offered subtitles. Inter-rater reliability for MARS was adequate (CA-ICC = 0.715). Conclusion: While most apps scored well in quality and functionality, many laryngology apps did not meet standards for health literacy. Most apps were written at a reading level above the national average, lacked accessibility features, and did not represent diverse populations. Laryngoscope, 2022.
Article
Full-text available
The issue of web accessibility is prevalent in society. However, few studies have looked at social interactions on Twitter associated with the issue. Sentiment analysis and readability analysis were used to assess the emotions reflected in the tweets and to determine whether the tweets were easy to understand or not. In addition, the relationship between the features of the tweets and their readability was also assessed using statistical analysis techniques. A total of 11,483 tweets associated with web accessibility were extracted and analysed using sentiment and statistical analysis. For readability analysis, 200 randomly selected tweets from the dataset were used. Sentiment analysis highlighted that overall, the tweets reflected a positive sentiment, with ‘trust' being the highest-scoring emotion. The most common words and hashtags show a focus on technology and the inclusion of various users. Readability analysis showed that the 200 selected tweets had a level of reading difficulty associated with the readability level of college students.
Article
Questionnaire designers use readability measures to ensure that questions can be understood by the target population. The most common measure is the Flesch-Kincaid Grade level, but other formulas exist. This article compares six different readability measures across 150 questions in a self-administered questionnaire, finding notable variation in calculated readability across measures. Some question formats, including those that are part of a battery, require important decisions that have large effects on the estimated readability of survey items. Other question evaluation tools, such as the Question Understanding Aid (QUAID) and the Survey Quality Predictor (SQP), may identify similar problems in questions, making readability measures less useful. We find little overlap between QUAID, SQP, and the readability measures, and little differentiation in the tools’ prediction of item nonresponse rates. Questionnaire designers are encouraged to use multiple question evaluation tools and develop readability measures specifically for survey questions.
Article
Purpose: New therapies for retinitis pigmentosa (RP) have led to patients desiring more information about their disease. We assessed the readability, content, and accountability of online health information for RP and its treatments. Methods: Two internet queries were performed: one pertaining to the condition RP, and another pertaining to treatments of RP. Three analyses were performed on the top search results that met eligibility criteria: (1) A readability analysis produced an average reading level; (2) A content analysis was conducted to score each source on the accuracy, completeness, clarity, and organization of the content; and (3) An accountability analysis was performed to evaluate adherence to accountability benchmarks, including authorship, attribution, disclosure, and currency. Results: The mean reading level was 12.0 (SD = 3.2, 95% CI = 11.0-13.0) for the 8 RP webpages and 12.5 (SD = 3.1, 95% CI = 11.7-13.4) for the 10 RP treatment webpages. The mean content score for RP sites was 21.3 of 32 points (SD = 4.1, 95% CI = 19.5-23.0). The mean content score for RP treatment sites was 5.5 out of 16 points (SD = 3.7, 95% CI = 4.1-6.9). The inter-rater reliability was 0.973 (Cronbach's alpha). For RP sites, the mean accountability score was 2.6 out of 4 points (SD = 0.9, 95% CI = 1.9-3.4). For RP treatment sites, the mean accountability score was 2 out of 4 points (SD = 0.9, 95% CI = 1.4-2.6). Conclusion: Our data suggest that the online information available to patients regarding RP and RP treatment options exceeds the AMA-recommended sixth-grade reading level and contains gaps in content relevant to patients.
Article
Objective Plain language in medicine has long been advocated as a way to improve patient understanding and engagement. As the field of Natural Language Processing has progressed, increasingly sophisticated methods have been explored for the automatic simplification of existing biomedical text for consumers. We survey the literature in this area with the goals of characterizing approaches and applications, summarizing existing resources, and identifying remaining challenges. Materials and Methods We search English language literature using lists of synonyms for both the task (eg, “text simplification”) and the domain (eg, “biomedical”), and searching for all pairs of these synonyms using Google Scholar, Semantic Scholar, PubMed, ACL Anthology, and DBLP. We expand search terms based on results and further include any pertinent papers not in the search results but cited by those that are. Results We find 45 papers that we deem relevant to the automatic simplification of biomedical text, with data spanning 7 natural languages. Of these (nonexclusively), 32 describe tools or methods, 13 present data sets or resources, and 9 describe impacts on human comprehension. Of the tools or methods, 22 are chiefly procedural and 10 are chiefly neural. Conclusions Though neural methods hold promise for this task, scarcity of parallel data has led to continued development of procedural methods. Various low-resource mitigations have been proposed to advance neural methods, including paragraph-level and unsupervised models and augmentation of neural models with procedural elements drawing from knowledge bases. However, high-quality parallel data will likely be crucial for developing fully automated biomedical text simplification.
Article
Clickbaits are articles with misleading titles, exaggerating the content on the landing page. Their goal is to entice users to click on the title in order to monetize the landing page. The content on the landing page is usually of low quality. Their presence in user homepage stream of news aggregator sites (e.g., Yahoo news, Google news) may adversely impact user experience. Hence, it is important to identify and demote or block them on homepages. In this paper, we present a machine-learning model to detect clickbaits. We use a variety of features and show that the degree of informality of a webpage (as measured by different metrics) is a strong indicator of it being a clickbait. We conduct extensive experiments to evaluate our approach and analyze properties of clickbait and non-clickbait articles. Our model achieves high performance (74.9% F-1 score) in predicting clickbaits.
Article
Community question answering (CQA) platforms are receiving increased attention and are becoming an indispensable source of information in different domains ranging from board games to physics. The success of these platforms dependent on how efficiently new questions are assigned to community experts, known ascalled question routing. In this paper, we address the problem of question routing by adopting a learning to rank approach over five CQA websites in the context of which we introduce 74 features and systematically classify them into content-based and social-based categories. Our extensive experiments on datasets from five real online question answering websites indicate that content-based features related to tags and topics as well as social features that are related to user characteristics and user temporality are effective for question routing. Our work shows the ability to improve performance compared to the state-of-the-art neural matchmaking methods that lack the interpretability offered by our work. The improvement can be as high as on average 2.47% and 1.10% in terms of common ranking metrics, Normalized Discounted Cumulative Gain (NDCG) and Mean Average Precision (MAP) respectively, compared to our best baselines.
Article
Review scores collect users’ opinions in a simple and intuitive manner. However, review scores are also easily manipulable, hence they are often accompanied by explanations. A substantial amount of research has been devoted to ascertaining the quality of reviews, to identify the most useful and authentic scores through explanation analysis. In this paper, we advance the state of the art in review quality analysis. We introduce a rating system to identify review arguments and to define an appropriate weighted semantics through formal argumentation theory. We introduce an algorithm to construct a corresponding graph, based on a selection of weighted arguments, their semantic distance, and the supported ratings. We also provide an algorithm to identify the model of such an argumentation graph, maximizing the overall weight of the admitted nodes and edges. We evaluate these contributions on the Amazon review dataset by McAuley et al. (2015), by comparing the results of our argumentation assessment with the upvotes received by the reviews. Also, we deepen the evaluation by crowdsourcing a multidimensional assessment of reviews and comparing it to the argumentation assessment. Lastly, we perform a user study to evaluate the explainability of our method, i.e., to test whether the automated method we use to assess reviews is understandable by humans. Our method achieves two goals: (1) it identifies reviews that are considered useful, comprehensible, and complete by online users, and does so in an unsupervised manner, and (2) it provides an explanation of quality assessments.
Article
Readability reflects the ease of reading a text and high readability indicates easy texts. Based on a corpus consisting of 71,628 abstracts published in SSCI journals in language and linguistics from 1991 to 2020, this paper employs nine readability indexes to analyze their readability and relationship with citations. The results show that the readability of abstracts in journals of language and linguistics is low. Moreover, in the past 30 years, the abstract readability in language and linguistics abstracts is decreasing. Meanwhile, readability is significantly negatively correlated with the number of citations, even though the effect size is very small. The results above suggest that abstracts are very difficult to read; they are becoming more and more difficult than before; the abstract of the articles with more citations appear to be less readable. Faced with decreasing readability, it is suggested that scholars make themselves understood when expressing their ideas with jargon. This study not only has implications for scholars to use linguistic features to improve readability, but also provides quantitative support for the research on readability.
Article
This paper examines whether the presence of risk management committees is associated with the readability of risk management disclosure. Specifically, we consider the presence and the effectiveness of risk management committees. We measure the readability of risk management disclosure using six different readability indices, namely: Bog index; Flesch Reading Ease score; Coleman–Liau index; Flesch–Kincaid Grade level; Simple Measure of Gobbledygook; and Automated Reading index. We find that the presence and the effectiveness of risk management committees are associated with the higher readability of risk management disclosure. We adopt various methods, including an instrumental variable approach, the entropy balancing method and the dynamic generalised method of moments, to address endogeneity concerns. Taken together, our results highlight the important role of the risk management committee in communicating risk management information.
Article
Full-text available
Background The internet has become an increasingly popular resource among sports medicine patients seeking injury-related information. Numerous organizations recommend that patient educational materials (PEMs) should not exceed sixth-grade reading level. Despite this, studies have consistently shown the reading grade level (RGL) of PEMs to be too demanding across a range of surgical specialties. Purpose To determine the readability of online sports medicine PEMs. Study Design Cross-sectional study. Methods The readability of 363 articles pertaining to sports medicine from 5 leading North American websites was assessed using 8 readability formulas: Flesch-Kincaid Reading Grade Level, Flesch Reading Ease Score, Raygor Estimate, Fry Readability Formula, Simple Measure of Gobbledygook, Coleman-Liau Index, FORCAST Readability Formula, and Gunning Fog Index. The mean RGL of each article was compared with the sixth- and eighth-grade reading level in the United States. The cumulative mean website RGL was also compared among individual websites. Results The overall cumulative mean RGL was 12.2 (range, 7.0-17.7). No article (0%) was written at a sixth-grade reading level, and only 3 articles (0.8%) were written at or below the eighth-grade reading level. The overall cumulative mean RGL was significantly higher than the sixth-grade [95% CI for the difference, 6.0-6.5; P < .001] and eighth-grade (95% CI, 4.0-4.5; P < .001) reading levels. There was a significant difference among the cumulative mean RGLs of the 5 websites assessed. Conclusion Sports medicine PEMs produced by leading North American specialty websites have readability scores that are above the recommended levels. Given the increasing preference of patients for online health care materials, the imperative role of health literacy in patient outcomes, and the growing body of online resources, significant work needs to be undertaken to improve the readability of these materials.
ResearchGate has not been able to resolve any references for this publication.