
Wilco H M Emons- Tilburg University
Wilco H M Emons
- Tilburg University
About
79
Publications
26,148
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,498
Citations
Introduction
Current institution
Publications
Publications (79)
Background: This article reports on prevalence of preregistration of empirical studies presented at three editions of the World Conference on Research Integrity (Hong Kong, 2019; Cape Town, 2022; Athens, 2024) at the time of abstract submission, and the association of preregistration with the characteristics of the study and of the researchers subm...
In their fifth meeting at the Vrije Universiteit Amsterdam in 2017 (5th WCRI), the World Conference on Research Integrity (WCRI) adopted the Amsterdam Agenda (AA). The AA is a policy statement stipulating six key elements that researchers doing research on aspects of research integrity preferably should use when preregistering their empirical resea...
This chapter explores targeting testing in applications where the main interest is in classifying the test takers into three (or more) ordered proficiency levels. A targeted test consists of several fixed booklets, balanced in content but varying in the overall difficulty. The booklets are assigned to test takers using background information about...
Educational assessments can have far-reaching consequences for individuals. To allow test users to make valid decisions, it is important to provide evidence about the uncertainties in the observed scores on which the individual decisions are based. In this chapter we examine standard errors of measurement defined for specific score groups, which ar...
Nonparametric methods are appropriate when certain assumptions about distributions that common parametric methods make are questionable. In this entry, we review nonparametric statistical tests based on exact or simulated sampling distributions. We also discuss methods for nonparametric data exploration and nonparametric regression and we explain c...
Various methods exist to assess the temporal stability of psychological constructs. In this paper we discuss common methods based on a review of the personality traits negative affectivity and social inhibition. Most methods ignore the non-normal distributions and measurement error in the questionnaire item scores. We illustrate how to handle these...
Samenvatting In een heranalyse van data van een peilingson-derzoek in het najaar van 2015 is nagegaan wat de informatieve meerwaarde kan zijn van dyna-misch toetsen ten opzichte van statisch toet-sen, voor het vaststellen van onderzoeksvaar-digheden uit het domein Natuur & Techniek bij leerlingen uit groep 8 van het basisonderwijs. Hiertoe zijn de...
Continuous norming is an increasingly popular approach to establish norms when the performance on a test is dependent on age. However, current continuous norming methods rely on a number of assumptions that are quite restrictive and may introduce bias. In this study, quantile regression was introduced as more flexible alternative. Bias and precisio...
Comparing countries according to their PISA results can be considered cross-cultural studies. An important issue in these studies is that the measurement tools must be culturally and linguistically equivalent. Cultural or linguistic differences in measurement tools may threaten validity. Aberrant behavior is another important factor that affects va...
This study reports on an Evidence Centered Design (ECD) project in the Netherlands, involving the theory exam for prospective car drivers. In particular, we illustrate how cognitive load theory, task-analysis, response process models, and explanatory item-response theory can be used to systematically develop and refine task models. Based on a cogni...
Purpose:
Increasingly more patients with multiple (> 4) brain metastases (BM) are being treated with stereotactic radiosurgery (SRS). Preserving patients' health-related quality of life (HRQoL) is an important treatment goal. The aim of this study was to assess (individual) changes in HRQoL in patients with 1-10 BM over time.
Methods:
A total of...
Background
A proposal to encourage the preregistration of research on research integrity was developed and adopted as the Amsterdam Agenda at the 5th World Conference on Research Integrity (Amsterdam, 2017). This paper reports on the degree to which abstracts of the 6th World Conference in Research Integrity (Hong Kong, 2019) reported on preregiste...
Research on the longitudinal association between self-esteem and satisfaction with social relationships has led to ambiguous conclusions regarding the temporal order and strength of this relation. Existing studies have examined this association across intervals ranging from days to years, leaving it unclear as to what extent differences in timing m...
Clinical, medical, and health psychologists use difference scores obtained from pretest-posttest designs employing the same test to assess intra-individual change possibly caused by an intervention addressing, for example, anxiety, depression, eating disorder, or addiction. Reliability of difference scores is important for interpreting observed cha...
In this newly published study me and my colleagues focused on three popular methods to model interactions between two constructs containing measurement error in predicting an observed binary outcome: logistic regression using (1) observed scores, (2) factor scores, and (3) Structural Equation Modeling (SEM). It is still unclear how they compare wit...
Research on the longitudinal association between self-esteem and satisfaction with social relationships led to ambiguous conclusions regarding the temporal order and strength of this relation. Existing studies have examined this association across intervals ranging from days to years, leaving it unclear as to what extent differences in timing may e...
To interpret a person’s change score, one typically transforms the change score into, for example, a percentile, so that one knows a person’s location in a distribution of change scores. Transformed scores are referred to as norms and the construction of norms is referred to as norming. Two often-used norming methods for change scores are the regre...
Continuous norming is an increasingly popular approach to establish norms when the performance on a test is dependent on age. However, current continuous norming methods rely on a number of assumptions that are quite restrictive and may introduce bias. In this study quantile regression was introduced as more flexible alternative. Bias and precision...
Purpose
The Hospital Anxiety and Depression Scale (HADS-A) and State-Trait Anxiety Inventory (STAI-S) are popular instruments for assessing anxiety and are considered interchangeable, although little is known about their equivalence. Hence, we examined whether the two instruments are (i) equivalent with respect to determining the prevalence of prob...
The aim of this study was to assess the extent to which discrepancy between self-reported and clinician-rated severity of depression are due to inconsistent self-reports. Response inconsistency threatens the validity of the test score. We used data from a large sample of outpatients (N = 5,959) who completed the self-report Beck Depression Inventor...
Introduction: Alexithymia may moderate the effectiveness of treatment and may predict impaired general functioning of patients suffering from somatic symptom and related disorders (SSRD).
Aim: We compared alexithymia levels in a clinical prospective study with 234 consecutive patients suffering from SSRD from the Centre of Excellence for Body, Mind...
This study examined test-retest reliabilities and (predictors of) practice effects of the widely used computerized neuropsychological battery CNS Vital Signs. The sample consisted of 158 Dutch healthy adults. At 3 and 12 months follow-up, 131 and 77 participants were retested. Results revealed low to high test–retest reliability coefficients for CN...
Introduction
The Bermond–Vorst Alexithymia Questionnaire (BVAQ) has been validated in student samples and small clinical samples, but not in the general population; thus, representative general-population norms are lacking.
Aim
We examined the factor structure of the BVAQ in Longitudinal Internet Studies for the Social Sciences panel data from the...
Change scores obtained in pretest–posttest designs are important for evaluating treatment effectiveness and for assessing change of individual test scores in psychological research. However, over the years the use of change scores has raised much controversy. In this article, from a multilevel perspective, we provide a structured treatise on severa...
Objective: To evaluate the effect of haptotherapy on severe fear of childbirth in pregnant women.
Design: Randomized controlled trial.
Setting: Community midwifery practices and a teaching hospital in the Netherlands.
Population or Sample: Primi- and multigravida, suffering from severe fear of childbirth (N = 134).
Methods: Haptotherapy, psycho-edu...
Introduction: Central Nervous System Vital Signs (CNS VS) is a computerized neuropsychological battery that is translated into many languages. However, published CNS VS’ normative data were established over a decade ago, are solely age-corrected, and collected in an American population only. Method: Mean performance of healthy Dutch participants on...
Purpose:
In the absence of measurement invariance across measurement occasions, change scores based on pretest-posttest measurements may be inaccurate representations of real change on the latent variable. In this study, we examined whether measurement invariance held in the Dutch version of Outcome Questionnaire-45 (OQ-45).
Method:
Using second...
The current study compares the closeness to unidimensionality (CU) and measurement precision (MP) of the Narcissistic Personality Inventory (NPI)—with either a pairwise forced-choice or 5-point Likert-type scale response format—to the Narcissistic Admiration and Rivalry Questionnaire (NARQ). Minimum rank factor analysis and item information curves...
Aim:
To compare levels of paediatric parenting stress in the fathers and mothers of young children with Type 1 diabetes and study the variation in this stress over time.
Methods:
One hundred and twelve parents (56 mothers and 56 fathers) of young children (0-7 years) with Type 1 diabetes participated in this study. They completed the Pediatric I...
Clinical psychologists are advised to assess clinical and statistical significance when assessing change in individual patients. Individual change assessment can be conducted using either the methodologies of classical test theory (CTT) or item response theory (IRT). Researchers have been optimistic about the possible advantages of using IRT rather...
Purpose - Because of the increased risk of long-term sickness leave for employees with a major depressive disorder (MDD), it is important for occupational health professionals to recognize depression in a timely manner. The Patient Health Questionnaire-9 (PHQ-9) has proven to be a reliable and valid instrument for screening MDD, but has not been va...
Conclusion:
The results support the notion that diabetes does not only affect the child with T1DM: T1DM is a family disease, as parenting factors (like stress and parent-child interactions) are associated with important child outcomes. Therefore, it is important for health-care providers to not only focus on the child with T1DM, but also on the fa...
Background Allocation of inevitably limited financial resources for health care requires assessment of an intervention's effectiveness. Interventions likely affect quality of life (QOL) more broadly than is measurable with commonly used health-related QOL utility scales. In line with the World Health Organization's definition of health, a recent De...
In young children (0-7 years) with diabetes mellitus type 1 (T1DM), parents have full responsibility for the diabetesmanagement of their child. The tasks needed to achieve optimal blood glucose control may interfere with normal behavioral processes in childhood, which could negatively affect the parent-child interaction. Currently, there is no diab...
Research has shown that taking care of a child with type 1 diabetes (T1DM) can be stressful for parents and could have a negative effect on the parent-child interaction. It is currently unclear whether this leads to suboptimal HbA1c levels and decreased quality of life (QoL).
Latent class (LC) cluster analysis of a set of subscale lz person-fit statistics was proposed to explain person misfit on multiscale measures. The proposed explanatory LC person-fit analysis was used to analyze data of students (N = 91,648) on the nine-subscale School Attitude Questionnaire Internet (SAQI). Inspection of the class-specific lz mean...
This study investigated sex bias in the classification of borderline and narcissistic personality disorders. A sample of psychologists in training for a post-master degree (N = 180) read brief case histories (male or female version) and made DSM classification. To differentiate sex bias due to sex stereotyping or to base rate variation, we used dif...
We applied item response theory based person-fit analysis (PFA) to data of the Outcome Questionnaire-45 (OQ-45) to investigate the prevalence and causes of aberrant responding in a sample of Dutch clinical outpatients. The l z p person-fit statistic was used to detect misfitting item-score patterns and the standardized residual statistic for identi...
Common mental disorders are strongly associated with long-term sickness absence, which has negative consequences for the individual employee's quality of life and leads to substantial costs for society. It is important to focus on return to work (RTW) during treatment of sick-listed employees with common mental disorders. Factors such as self-effic...
Background
About six percent of pregnant women suffer from severe fear of childbirth. These women are at increased risk of obstetric labour and delivery interventions and pre- and postpartum complications, e.g., preterm delivery, emergency caesarean section, caesarean section at maternal request, severe postpartum fear of childbirth and trauma anxi...
Purpose:
Somatoform disorders (physical symptoms without medical explanation that cause dysfunction) are prevalent in the occupational health (OH) care setting and are associated with functional impairment and absenteeism. Availability of psychometric instruments aimed at assessing somatoform disorders is limited. In the OH setting, so far only th...
Background
In young children with type 1 diabetes mellitus (T1DM), parents have complete responsibility for the diabetes-management. In toddlers and (pre)schoolers, the tasks needed to achieve optimal blood glucose control may interfere with normal developmental processes and could negatively affect the quality of parent–child interaction. Several...
This article is a rejoinder to Humphry's (2013) comment on Sijtsma (2012). Sijtsma argued that the Rasch paradox does not exist but Humphry replies that the Rasch paradox can occur provided the measurement procedure is precise enough. The rejoinder argues that the debates about the Rasch paradox mingle properties of formal psychometric models, idea...
Self-report measures are vulnerable to concentration and motivation problems, leading to responses that may be inconsistent with the respondent's latent trait value. We investigated response consistency in a sample (N = 860) of cardiac patients with an implantable cardioverter defibrillator and their partners who completed the Spielberger State-Tra...
Several authors proposed a shortened version of the State scale of the State-Trait Anxiety Inventory (S-STAI) to obtain a more efficient measurement instrument. Psychometric theory shows that test shortening makes a total score more vulnerable to measurement error, and this may result in inaccurate and biased research results and an increased risk...
To efficiently assess multiple psychological constructs and to minimize the burden on respondents, psychologists increasingly use shortened versions of existing tests. However, compared to the longer test, a shorter test version may have a substantial impact on the reliability and the validity of the test scores in psychological research and indivi...
In clinical contexts, tests and questionnaires are used to assess change at the level of the individual client. The difference between an individual client's posttreatment and pretreatment scores is used to decide about the degree to which the client benefited from a treatment. Because administration time is limited, clinicians prefer using short t...
Most person-fit statistics require long tests to reliably detect aberrant item-score vectors and are not readily applicable to noncognitive measures that consist of multiple short subscales. The authors propose combining subscale person-fit information to detect aberrant item-score vectors on noncognitive multiscale measures. They used a simulation...
The present chapter elaborates on the use of ordered latent class models (OR-LCMs) in analyzing ordinal measurement properties of psychological tests and questionnaires in the context of clinical and medical psychology. Using simulated data, we illustrate several approaches for evaluating absolute fit of the model and we show how the fitted OR-LCMs...
Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of measurement...
Type D personality refers to a clustering of 2 stable personality traits, namely negative affectivity and social inhibition. Currently Type D is standardly assessed using the DS14. An experimental Type D personality scale, the DS((3)), was developed to examine an avenue for assessing Type D more efficiently. The DS((3)) differs from the DS14 in its...
Depression is a common complication in type 2 diabetes (DM2), affecting 10-30% of patients. Since depression is underrecognized and undertreated, it is important that reliable and validated depression screening tools are available for use in patients with DM2. The Edinburgh Depression Scale (EDS) is a widely used method for screening depression. Ho...
This article addresses three reliability issues that are problematic in the construction of scales intended for use in psychosomatic research, illustrates how these problems may lead to errors, and suggests solutions.
We used psychometric results and present five computational studies. The first, third, and fourth studies are based on the generatio...
In young children with type 1 diabetes mellitus (T1DM) parents have full responsibility for the diabetes-management of their child (e.g. blood glucose monitoring, and administering insulin). Behavioral tasks in childhood, such as developing autonomy, and oppositional behavior (e.g. refusing food) may interfere with the diabetes-management to achiev...
The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000)37.
Reise , S. P. 2000 . Using multilevel logistic regression to evaluate person-fit in IRT models . Multivariate Behavioral Research , 35 : 543 – 568 . [Taylor & Francis Online], [Web of Science ®]View all refer...
This article discusses typical properties of nonparametric statistical methods for educational research. This is followed by examples of well-known nonparametric methods for estimating distributions, comparing distributions, and expressing the association between variables. In addition, the nonparametric bootstrap for estimating sampling distributi...
The Hospital Anxiety and Depression Scale (HADS) measures anxiety and depressive symptoms and is widely used in clinical and nonclinical populations. However, there is some debate about the number of dimensions represented by the HADS. In a sample of 534 Dutch cardiac patients, this study examined (a) the dimensionality of the HADS using Mokken sca...
Chronic heart failure (CHF) is a condition with a high mortality risk. Besides traditional risk factors, poor health-related quality of life (HRQoL) is also associated with poor prognosis in CHF. Immunological functioning might serve as a biological pathway underlying this association, since pro and anti-inflammatory cytokines are independent predi...
Purpose
The aim of this paper is to examine the influence of group composition in cultural values on conflict management styles in groups.
Design/methodology/approach
A field study using data from 125 groups was conducted.
Findings
The results show that in groups where members feel they are equal and connected (horizontal collectivism) cooperatio...
For valid decision making, it is essential to both the person being measured and the person or organization that is having the person measured that the observed scores adequately represent the underlying trait. This study deals with person-fit analysis of polytomous item scores to detect unusual patterns of sum scores on subsets of items. This appr...
Two types of answer-copying statistics for detecting copiers in small-scale examinations are proposed. One statistic identifies the “copier-source” pair, and the other in addition suggests who is copier and who is source. Both types of statistics can be used when the examination has alternate test forms. A simulation study shows that the statistics...
We illustrate the usefulness of person-fit methodology for personality assessment. For this purpose, we use person-fit methods from item response theory. First, we give a nontechnical introduction to existing person-fit statistics. Second, we analyze data from Harter's (1985)15.
Harter , S. 1985. Manual for the Self-Perception Profile for Children...
This study investigates the usefulness of the nonparametric monotone homogeneity model for evaluating and constructing Health-Related Quality-of-Life Scales consisting of polytomous items, and compares it to the often-used parametric graded response model.
The nonparametric monotone homogeneity model is a general model of which all known parametric...
Person-fit methods are used to uncover atypical test performance as reflected in the pattern of scores on individual items in a test. Unlike parametric person-fit statistics, nonparametric person-fit statistics do not require fitting a parametric test theory model. This study investigates the effectiveness of generalizations of nonparametric person...
Individuals with increased levels of both negative affectivity (NA) and social inhibition (SI)-referred to as type-D personality-are at increased risk of adverse cardiac events. We used item response theory (IRT) to evaluate NA, SI, and type-D personality as measured by the DS14. The objectives of this study were (a) to evaluate the relative contri...
Short tests containing at most 15 items are used in clinical and health psychology, medicine, and psychiatry for making decisions about patients. Because short tests have large measurement error, the authors ask whether they are reliable enough for classifying patients into a treatment and a nontreatment group. For a given certainty level, proporti...
Person-fit statistics test whether the likelihood of a respondent's complete vector of item scores on a test is low given the hypothesized item response theory model. This binary information may be insufficient for diagnosing the cause of a misfitting item-score vector. The authors propose a comprehensive methodology for person-fit analysis in the...
The person-response function (PRF) relates the probability of an individual's correct answer to the difficulty of items measuring the same latent trait. Local deviations of the observed PRF from the expected PRF indicate person misfit. We discuss two new approaches to investigate person fit. The first approach uses kernel smoothing to estimate cont...
Person-fit analysis revolves around fitting an item response theory (IRT) model to respondents’ vectors of item scores on a test and drawing statistical inferences about fit or misfit of these vectors. Four person-fit measures were studied in order-restricted latent class models (OR-LCMs). To decide whether the OR-LCM fits an item score vector, a B...
The accuracy with which the theoretical sampling distribution of van der Flier’s person-fit statistic U3 approaches the empirical U3 sampling distribution is affected by the item discrimination. A simulation study showed that for tests with a moderate or a strong mean item discrimination, the Type I error rates were either too high or too low to be...
This paper discusses a nonparametric approach for testing the local fit of an item-score vector by using the person response function Unlike most person-fit statistics, which test the fit of the complete pattern, the person response function indicates which subsets of items are misfitting. Three person-fit statistics were investigated that compare...
Samenvatting
Om huisartsen feedback te kunnen geven over de mate waarin zij de richtlijnen uit een standaard van het Nederlands Huisartsen Genootschap (NHG) beheersen, moet een referentiewaarde beschikbaar zijn. Onderzocht is of de methode van Angoff een geschikte methode is om de referentiewaarde te bepalen.
Methode
Met behulp van een Angoff proce...