Project

LIN: A validated reading level tool for Dutch

Goal: While there is an urgent need for robust readability assessment tools, applicable to a range of communicative domains, none of the existing tools for Dutch offers a valid empirical basis and sufficient functionality. Recent developments in computational linguistics and discourse processing create possibilities to change this situation. We propose to develop a validated reading level tool, aiming at secondary school readers, adult readers of public information, and by extension, the Dutch-speaking population at large. First, a text-analytic tool is developed using the latest results of computational linguistics research. Second, cloze comprehension data are collected among secondary school readers, in a design assessing both differences between texts and between versions of the same text. Third, a subsample of texts is re-used to investigate on-line processing in eye-tracking studies. The combination of comprehension data and on-line processing measures provides insight in the way textual features affect the construction of cognitive representations. Fourth, additional cloze data are collected for adults reading government-citizen information texts in The Netherlands and Flanders. Fifth, the relation between cloze data and reading times on the one hand, and text features on the other hand, is analyzed in a multilevel regression analysis and in a machine learning study. These analyses are used to develop a reading level prediction tool: LIN (LeesbaarheidsIndex voor het Nederlands). This validated reading tool is relevant to various domains in society: education, publishing, government-citizen communication. It will provide the foundation for developing domain-specific readability and writing tools that has been missing so far.

Updates

0 new
0
Recommendations

0 new
0
Followers

0 new
23
Reads

2 new
349

Project log

Suzanne Kleijn
added a research item
Studies investigating the effect of connectives on comprehension have yielded different results, most likely because of differences in methodology and limited samples of texts and readers. We added and removed causal, temporal, contrastive, and additive connectives in 20 authentic Dutch texts. Dutch adolescents (n = 794) differing in reading proficiency filled out four “HyTeC” cloze tests. Connectives were found to affect comprehension on a local level but not on a global level. However, a post-hoc analysis revealed a global comprehension effect for difficult texts but not for easy texts. Local effects were predominantly carried by the difficult texts as well. The direction of the effect did not vary between reading proficiency or readers’ educational level but did vary between types of coherence relations. Contrastive and causal connectives increased comprehension, whereas additive connectives reduced comprehension. Our large-scale study shows that effects of connectives on text comprehension are consistent between readers but not between texts and types of coherence relations.
Suzanne Kleijn
added a research item
Although there are many methods available for assessing text comprehension, the cloze test is not widely acknowledged as one of them. Critiques on cloze testing center on its supposedly limited ability to measure comprehension beyond the sentence. However, these critiques do not hold for all types of cloze tests; the particular configuration of a cloze determines its validity. We review various cloze configurations and discuss their strengths and weaknesses. We propose a new cloze procedure specifically designed to gauge text comprehension: the Hybrid Text Comprehension cloze (HyTeC-cloze). It employs a hybrid mechanical-rational deletion strategy and semantic scoring of answers. The procedure was tested in a large-scale study, involving 2926 Dutch secondary school students with 120 unique cloze tests. Our results show that, in terms of reliability and validity, the HyTeC-cloze matches and sometimes outperforms standardized tests of reading ability.
Suzanne Kleijn
added a research item
In my dissertation (Kleijn, 2018) I studied the effects of different linguistic features on the readability of texts for Dutch adolescents. Sixty texts were turned into cloze tests using the newly developed HyTeC-cloze procedure and all texts were carefully manipulated on one stylistic linguistic feature to create an ‘easy’ and ‘difficult’ version of the same text. As a result, causal effects of these linguistic features on readability could be separated from correlational relationships. Or in other words: we know how well these factors predict readability versus how much they can actually improve the readability of texts for Dutch adolescents. In the current study we look at the generalizability of these results with regard to other Dutch speaking populations. In two replication studies we collected comprehension data from Dutch and Flemish adults as well as data from Flemish adolescents. I will present the results of these studies and compare them to the earlier findings for Dutch adolescents.
Suzanne Kleijn
added a research item
Open Access version via Utrecht University Repository (https://dspace.library.uu.nl/handle/1874/363346) The first readability formulae were developed almost 100 years ago. Despite a fair amount of critique, readability formulae have retained their overall popularity. The main reason for this is that the need for objective measures of readability has only increased. Fortunately, developments in computational linguistics have opened up new possibilities to improve the old readability formulae. In this dissertation current language technology is combined with insights from readability research and discourse processing in an attempt to build an empirically validated readability tool for Dutch secondary school readers. We investigate the relationship between linguistic features and two aspects of readability: comprehension and processing ease. In addition, we use an integrated methodological design in which we combine experimental with correlational work to disentangle causal effects of linguistic features on readability from correlational relationships. That is, we study readability differences between texts and differences between stylistic variants of the same text. In three separate experiments we change only the lexical complexity, the syntactic complexity or the number of coherence markers within texts to see whether these factors really affect readability. This way we are able to provide a realistic (and sobering) view of the importance of these factors and their potential for reducing the difficulty level of a given text, without altering its content. Due to our design we are able to generalize our results across a large number of texts and across adolescent readers differing in reading proficiency. Hence, our findings are relevant both to the field of discourse processing and practitioners aiming for readability improvement.
Suzanne Kleijn
added 2 research items
Cloze has never been widely accepted as a valid measure of text comprehension. We address the problems previously reported in literature and introduce an improved procedure: the HyTeC-cloze. The procedure was evaluated using data collected among 2855 Dutch secondary school students. The procedure matches and sometimes outperforms standardized tests in validity and reliability. Its sensitivity to differences between texts, text versions and readers make the procedure an appealing method for experimental and correlational studies.
The effect of lexical complexity on text comprehension and text processing was investigated in a strictly controlled cloze study (PPN=786) and an eye-tracking study (PPN=181). Secondary school students enrolled in different levels of the Dutch school system participated in the experiments. Comprehension scores increased and reading times decreased when lexical complexity was reduced but the effects were small. Higher-level students read faster and scored higher on comprehension than lower-level students. Education level interacted with lexical complexity.
Ted J.M. Sanders
added a research item
Dit themanummer is gewijd aan een nieuw landelijk onderzoeksprogramma dat in september 2010 van start ging: Begrijpelijke Taal – fundamenten en toepassingen van effectieve communicatie. Dit themaprogramma wordt gesteund door de Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO). Het programma heeft tot doel fundamenteel en toegepaste onderzoek naar begrijpelijkheid in communicatie te stimuleren. Een belangrijk aspect van het programma is de publiek-private samenwerking: maatschappelijke organisaties leveren een substantiële bijdrage aan het onderzoek. Samen met vertegenwoordigers uit de wetenschap dragen maatschappelijke organisaties dan ook verantwoordelijkheid voor de onderzoeksagenda. In dit artikel schetsen we de achtergronden van het programma, zetten we de doelen uiteen, geven we een eerste blik op de resultaten en kondigen we in grote lijnen de inhoud van het nummer aan.
Suzanne Kleijn
added a research item
T-Scan is a new tool for analyzing Dutch text. It aims at extracting text features that are theoretically interesting, in that they relate to genre and text complexity, as well as practically interesting, in that they enable users and text producers to make text-specific diagnoses. T-Scan derives it features from tools such as Frog and Alpino, and resources such as SoNaR, SUBTLEX-NL and Referentie Bestand Nederlands. This paper offers a qualitative discussion of a number of T-Scan features, based on a minimal demonstration corpus of six texts, three of them scientific articles and three of them drawn from a women's magazine. We discuss features concerning lexical complexity, sentence complexity, referential cohesion and lexical diversity, lexical semantics and personal style. For all these domains we examine the construct validity as well as the reliability of a number of important features. We conclude that T-Scan offers a number of promising lexical and syntactic features, while the interpretation of referential cohesion/ lexical diversity features and personal style features is less clear. Further developing the application and analyzing authentic text need to go hand in hand.
Suzanne Kleijn
added a project goal
While there is an urgent need for robust readability assessment tools, applicable to a range of communicative domains, none of the existing tools for Dutch offers a valid empirical basis and sufficient functionality. Recent developments in computational linguistics and discourse processing create possibilities to change this situation. We propose to develop a validated reading level tool, aiming at secondary school readers, adult readers of public information, and by extension, the Dutch-speaking population at large. First, a text-analytic tool is developed using the latest results of computational linguistics research. Second, cloze comprehension data are collected among secondary school readers, in a design assessing both differences between texts and between versions of the same text. Third, a subsample of texts is re-used to investigate on-line processing in eye-tracking studies. The combination of comprehension data and on-line processing measures provides insight in the way textual features affect the construction of cognitive representations. Fourth, additional cloze data are collected for adults reading government-citizen information texts in The Netherlands and Flanders. Fifth, the relation between cloze data and reading times on the one hand, and text features on the other hand, is analyzed in a multilevel regression analysis and in a machine learning study. These analyses are used to develop a reading level prediction tool: LIN (LeesbaarheidsIndex voor het Nederlands). This validated reading tool is relevant to various domains in society: education, publishing, government-citizen communication. It will provide the foundation for developing domain-specific readability and writing tools that has been missing so far.