LIN: A validated reading level tool for Dutch
Studies investigating the effect of connectives on comprehension have yielded different results, most likely because of differences in methodology and limited samples of texts and readers. We added and removed causal, temporal, contrastive, and additive connectives in 20 authentic Dutch texts. Dutch adolescents (n = 794) differing in reading proficiency filled out four “HyTeC” cloze tests. Connectives were found to affect comprehension on a local level but not on a global level. However, a post-hoc analysis revealed a global comprehension effect for difficult texts but not for easy texts. Local effects were predominantly carried by the difficult texts as well. The direction of the effect did not vary between reading proficiency or readers’ educational level but did vary between types of coherence relations. Contrastive and causal connectives increased comprehension, whereas additive connectives reduced comprehension. Our large-scale study shows that effects of connectives on text comprehension are consistent between readers but not between texts and types of coherence relations.
Although there are many methods available for assessing text comprehension, the cloze test is not widely acknowledged as one of them. Critiques on cloze testing center on its supposedly limited ability to measure comprehension beyond the sentence. However, these critiques do not hold for all types of cloze tests; the particular configuration of a cloze determines its validity. We review various cloze configurations and discuss their strengths and weaknesses. We propose a new cloze procedure specifically designed to gauge text comprehension: the Hybrid Text Comprehension cloze (HyTeC-cloze). It employs a hybrid mechanical-rational deletion strategy and semantic scoring of answers. The procedure was tested in a large-scale study, involving 2926 Dutch secondary school students with 120 unique cloze tests. Our results show that, in terms of reliability and validity, the HyTeC-cloze matches and sometimes outperforms standardized tests of reading ability.
In my dissertation (Kleijn, 2018) I studied the effects of different linguistic features on the readability of texts for Dutch adolescents. Sixty texts were turned into cloze tests using the newly developed HyTeC-cloze procedure and all texts were carefully manipulated on one stylistic linguistic feature to create an ‘easy’ and ‘difficult’ version of the same text. As a result, causal effects of these linguistic features on readability could be separated from correlational relationships. Or in other words: we know how well these factors predict readability versus how much they can actually improve the readability of texts for Dutch adolescents. In the current study we look at the generalizability of these results with regard to other Dutch speaking populations. In two replication studies we collected comprehension data from Dutch and Flemish adults as well as data from Flemish adolescents. I will present the results of these studies and compare them to the earlier findings for Dutch adolescents.
Open Access version via Utrecht University Repository (https://dspace.library.uu.nl/handle/1874/363346) The first readability formulae were developed almost 100 years ago. Despite a fair amount of critique, readability formulae have retained their overall popularity. The main reason for this is that the need for objective measures of readability has only increased. Fortunately, developments in computational linguistics have opened up new possibilities to improve the old readability formulae. In this dissertation current language technology is combined with insights from readability research and discourse processing in an attempt to build an empirically validated readability tool for Dutch secondary school readers. We investigate the relationship between linguistic features and two aspects of readability: comprehension and processing ease. In addition, we use an integrated methodological design in which we combine experimental with correlational work to disentangle causal effects of linguistic features on readability from correlational relationships. That is, we study readability differences between texts and differences between stylistic variants of the same text. In three separate experiments we change only the lexical complexity, the syntactic complexity or the number of coherence markers within texts to see whether these factors really affect readability. This way we are able to provide a realistic (and sobering) view of the importance of these factors and their potential for reducing the difficulty level of a given text, without altering its content. Due to our design we are able to generalize our results across a large number of texts and across adolescent readers differing in reading proficiency. Hence, our findings are relevant both to the field of discourse processing and practitioners aiming for readability improvement.
Cloze has never been widely accepted as a valid measure of text comprehension. We address the problems previously reported in literature and introduce an improved procedure: the HyTeC-cloze. The procedure was evaluated using data collected among 2855 Dutch secondary school students. The procedure matches and sometimes outperforms standardized tests in validity and reliability. Its sensitivity to differences between texts, text versions and readers make the procedure an appealing method for experimental and correlational studies.
The effect of lexical complexity on text comprehension and text processing was investigated in a strictly controlled cloze study (PPN=786) and an eye-tracking study (PPN=181). Secondary school students enrolled in different levels of the Dutch school system participated in the experiments. Comprehension scores increased and reading times decreased when lexical complexity was reduced but the effects were small. Higher-level students read faster and scored higher on comprehension than lower-level students. Education level interacted with lexical complexity.
Dit themanummer is gewijd aan een nieuw landelijk onderzoeksprogramma dat in september 2010 van start ging: Begrijpelijke Taal – fundamenten en toepassingen van effectieve communicatie. Dit themaprogramma wordt gesteund door de Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO). Het programma heeft tot doel fundamenteel en toegepaste onderzoek naar begrijpelijkheid in communicatie te stimuleren. Een belangrijk aspect van het programma is de publiek-private samenwerking: maatschappelijke organisaties leveren een substantiële bijdrage aan het onderzoek. Samen met vertegenwoordigers uit de wetenschap dragen maatschappelijke organisaties dan ook verantwoordelijkheid voor de onderzoeksagenda. In dit artikel schetsen we de achtergronden van het programma, zetten we de doelen uiteen, geven we een eerste blik op de resultaten en kondigen we in grote lijnen de inhoud van het nummer aan.
T-Scan is a new tool for analyzing Dutch text. It aims at extracting text features that are theoretically interesting, in that they relate to genre and text complexity, as well as practically interesting, in that they enable users and text producers to make text-specific diagnoses. T-Scan derives it features from tools such as Frog and Alpino, and resources such as SoNaR, SUBTLEX-NL and Referentie Bestand Nederlands. This paper offers a qualitative discussion of a number of T-Scan features, based on a minimal demonstration corpus of six texts, three of them scientific articles and three of them drawn from a women's magazine. We discuss features concerning lexical complexity, sentence complexity, referential cohesion and lexical diversity, lexical semantics and personal style. For all these domains we examine the construct validity as well as the reliability of a number of important features. We conclude that T-Scan offers a number of promising lexical and syntactic features, while the interpretation of referential cohesion/ lexical diversity features and personal style features is less clear. Further developing the application and analyzing authentic text need to go hand in hand.