Matthijs J Warrens

Matthijs J Warrens
University of Groningen | RUG · Groningen Institute for Educational Research (GION)

PhD

About

96
Publications
37,550
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,809
Citations
Additional affiliations
November 2011 - present
Leiden University
Position
  • Professor (Assistant)
February 2011 - October 2011
Tilburg University
Position
  • Professor (Assistant)
February 2008 - January 2011
Leiden University
Position
  • PostDoc Position

Publications

Publications (96)
Article
Full-text available
Regression analysis makes up a large part of supervised machine learning, and consists of the prediction of a continuous independent target from a set of other predictor variables. The difference between binary classification and regression is in the target range: in binary classification, the target can have only two values (usually encoded as 0 a...
Article
Full-text available
The aim of this study was to identify school motivation profiles of Dutch 9th grade students in a four-dimensional motivation space, including mastery, performance, social and extrinsic motivation. Multiple clustering methods (K-means, K-medoids, restricted latent profile analysis) and multiple indices for selecting the optimal number of clusters w...
Article
Full-text available
Even if measuring the outcome of binary classifications is a pivotal task in machine learning and statistics, no consensus has been reached yet about which statistical rate to employ to this end. In the last century, the computer science and statistics communities have introduced several scores summing up the correctness of the predictions with res...
Article
Full-text available
Kappa coefficients are commonly used for quantifying reliability on a categorical scale, whereas correlation coefficients are commonly applied to assess reliability on an interval scale. Both types of coefficients can be used to assess the reliability of ordinal rating scales. In this study, we compare seven reliability coefficients for ordinal rat...
Chapter
In many applications of cluster analysis in educational research, the solutions found have very limited predictive power for relevant outcomes. In this paper, we explore whether the clusterings found have more predictive power (in terms of explained variance) if relevant outcomes are included in the estimation procedure, using a real-world data set...
Article
Full-text available
Success for All, a multi-tiered school reform program, has its origins in the US and has recently expanded to the Netherlands. Using a multiple case study approach, we investigated similarities and differences in the way struggling readers are supported at two Success for All schools in the US and two in the Netherlands. First- and second-grade tea...
Chapter
Malvestuto’s version of the normalized mutual information is a well-known information theoretic index for quantifying agreement between two partitions. To further our understanding of what information on agreement between the clusters the index may reflect, we study components of the index that contain information on individual clusters, using math...
Chapter
The Rand index continues to be one of the most popular indices for assessing agreement between two partitions. The Rand index combines two sources of information, object pairs put together, and object pairs assigned to different clusters, in both partitions. Via a decomposition of the Rand index into four asymmetric indices, we show that in many si...
Article
Full-text available
Two types of nominal classifications are distinguished, namely regular nominal classifications and dichotomous-nominal classifications. The first type does not include an ‘absence’ category (for example, no disorder), whereas the second type does include an ‘absence’ category. Cohen’s unweighted kappa can be used to quantify agreement between two r...
Article
Full-text available
In the transition from Dutch primary to secondary education, two indicators are used to place students in the right track: primary school teachers' track recommendations (TTR) and standardized achievement tests (SATs) at the end of primary school. Which indicator is better for placing students is a long-standing issue among educational researchers...
Article
Full-text available
The purpose of crying has recently become a topic of interest, with evidence supporting its interpersonal functions. The assumption that tears not only express a need for help, but in reaction also foster willingness to help in an observer, has received preliminary empirical support. The current study replicated previous work using a within-subject...
Article
Cohen’s kappa coefficient is commonly used for assessing agreement between classifications of two raters on a nominal scale. Three variants of Cohen’s kappa that can handle missing data are presented. Data are considered missing if one or both ratings of a unit are missing. We study how well the variants estimate the kappa value for complete data u...
Preprint
Full-text available
In unsupervised machine learning, agreement between partitions is commonly assessed with so-called external validity indices. Researchers tend to use and report indices that quantify agreement between two partitions for all clusters simultaneously. Commonly used examples are the Rand index and the adjusted Rand index. Since these overall measures g...
Article
Full-text available
Cronbach’s alpha is the most frequently used measure to investigate the reliability of measurement instruments. Despite its frequent use, many warn for misinterpretations of alpha. These claims about regular misunderstandings, however, are not based on empirical data. To understand how common such beliefs are, we conducted a survey study to test re...
Article
Full-text available
Many external validity indices for comparing different clusterings of the same set of objects are overall measures: they quantify similarity between clusterings for all clusters simultaneously. Because a single number only provides a general notion of what is going on, the values of such overall indices (usually between 0 and 1) are often difficult...
Article
Full-text available
The Gini coefficient is a measure of statistical dispersion that is commonly used as a measure of inequality of income, wealth or opportunity. Empirical research has shown that the coefficient may have a nonnegligible downward bias when data are grouped. It is unknown under which grouping conditions the downward bias occurs. In this note it is show...
Article
2 × 2 tables are encountered in various scientific disciplines, including biomedical, social and behavioral sciences, economics and ecology. In the literature many different similarity measures have been proposed that can be used to further summarize the information in a 2 × 2 table. Many of these measures are just functions of the four cells. In t...
Article
Full-text available
Students in secondary education inevitably favour some subjects more than other subjects. This appraisal may affect how motivation relates to performance in these subjects. Whereas autonomous motivation is generally linked to positive school outcomes, the effect of controlled motivation is less clear. This study specifically focused on the associat...
Article
Full-text available
Cohen’s kappa is the most widely used coefficient for assessing interobserver agreement on a nominal scale. An alternative coefficient for quantifying agreement between two observers is Bangdiwala’s B. To provide a proper interpretation of an agreement coefficient one must first understand its meaning. Properties of the kappa coefficient have been...
Article
Background: De Vet, Mokkink, Mosmuller and Terwee (2017) discovered that the Spearman-Brown formula can be used to transform certain intraclass correlation coefficients (ICCs) for single measurements into the corresponding ICCs for average measurements, without knowledge of the variance components. Methods and results: A formal proof of the disc...
Article
Full-text available
Circular classifications are classification scales with categories that exhibit a certain periodicity. Since linear scales have endpoints, the standard weighted kappas used for linear scales are not appropriate for analyzing agreement between two circular classifications. A family of kappa coefficients for circular classifications is defined. The k...
Article
Full-text available
Similarity measures are entities that can be used to quantify the similarity between two vectors with real numbers. We present inequalities between seven well known similarities. The inequalities are valid if the vectors contain non-negative real numbers.
Article
It is shown that a symmetric kappa corresponding to a c × c table with c ≥ 3 categories can be written as a function of the unweighted kappa corresponding to the same table and the c(c − 1)/2 distinct unweighted kappas associated with the (c − 1) × (c − 1) tables that are obtained by combining two categories. The result is a new MGB-type result.
Article
The kappa statistic is a widely used measure for quantifying agreement between two nominal classifications. The statistic has been extended to the case of two normalized fuzzy classifications. In this paper we define category kappas for quantifying agreement on a particular category of two normalized fuzzy classifications. The overall fuzzy kappa i...
Article
Full-text available
Student performance is related to motivation to learn. As motivation generally declines during lower secondary education, one might expect performance to decline as well during this period. Though, until now, it has been unclear whether this pattern exists. In the present study, we examined student performance during the early years of secondary ed...
Article
Accuracy assessment of classified imagery is an important task in remote sensing. A commonly used approach for assessing classification accuracy is based on the error matrix. The error matrix can be summarized by overall and category accuracy measures, e.g. the overall agreement and the user’s and producer’s accuracies. Two relatively new measures...
Article
Full-text available
This paper studies correction for chance for association measures for continuous variables. The set of linear transformations of Pearson’s product-moment correlation is used as the domain of the correction for chance function. Examples of measures in this set are Tucker’s congruence coefficient, Jobson’s coefficient, and Pearson’s correlation. An e...
Article
Full-text available
It is shown for coefficient matrices of Russell-Rao coefficients and two asymmetric Dice coefficients that ordinal information on a latent variable model can be obtained from the eigenvector corresponding to the largest eigenvalue.
Article
Cronbach’s alpha is an estimate of the reliability of a test score if the items are essentially tau-equivalent. Several authors have derived results that provide alternative interpretations of alpha. These interpretations are also valid if essential tau-equivalency does not hold. For example, alpha is the mean of all split-half reliabilities if the...
Article
An important task in remote sensing is accuracy assessment of classified imagery. The error matrix is a widely used approach for expressing the accuracy. In the remote-sensing literature, various accuracy measures have been developed for summarizing the information in an error matrix. Two relatively new measures are the so-called quantity disagreem...
Article
Full-text available
If a test consists of two parts the Spearman-Brown formula and Flanagan’s coefficient (Cronbach’s alpha) are standard tools for estimating the reliability. However, the coefficients may be inappropriate if their associated measurement models fail to hold. We study the robustness of reliability estimation in the two-part case to coefficient misspeci...
Article
Full-text available
A famous description of Cronbach’s alpha is that it is the mean of all (Flanagan–Rulon) split-half reliabilities. The result is exact if the test is split into two halves that are equal in size. This requires that the number of items is even, since odd numbers cannot be split into two groups of equal size. In this chapter it is shown that alpha is...
Article
Full-text available
The kappa statistic is commonly used for quantifying inter-rater agreement on a nominal scale. In this review article we discuss five interpretations of this popular coefficient. Kappa is a function of the proportion of observed and expected agreement, and it may be interpreted as the proportion of agreement corrected for chance. Furthermore, kappa...
Article
Full-text available
A weighted version of Bennett, Alpert, and Goldstein’s S, denoted by S r , is studied. It is shown that the special cases of S r are often ordered in the same way. It is also shown that many special cases of S r tend to produce values close to unity, especially when the number of categories of the rating scale is large. It is argued tha...
Article
Full-text available
Cohen’s kappa is a standard tool for the analysis of agreement in a 2 × 2 reliability study. Researchers are frequently only interested in the kappa-value of a sample. Various authors have observed that if two pairs of raters have the same amount of observed agreement, the pair whose marginal distributions are more similar to each other may have a...
Article
Full-text available
Coefficient alpha is the most commonly used internal consistency reliability coefficient. Alpha is the mean of all possible k-split alphas if the items are divided into k parts of equal size. This result gives proper interpretations of alpha: interpretations that also hold if (some of) its assumptions are not valid. Here we consider the cases where...
Article
Full-text available
Cohen’s kappa is a widely used association coefficient for summarizing interrater agreement on a nominal scale. Kappa reduces the ratings of the two observers to a single number. With three or more categories it is more informative to summarize the ratings by category coefficients that describe the information for each category separately. Examples...
Article
Full-text available
Kappa coefficients are standard tools for summarizing the information in cross-classifications of two categorical variables with identical categories, here called agreement tables. When two categories are combined the kappa value usually either increases or decreases. There is a class of agreement tables for which the value of Cohen’s kappa remains...
Article
Full-text available
It is shown that if cell weights may be calculated from the data the chance-corrected Zegers-ten Berge coefficients for metric scales are special cases of Cohen’s weighted kappa. The corrected coefficients include Pearson’s product-moment correlation, Spearman’s rank correlation and the intraclass correlation ICC(3, 1).
Article
Application of Cohen’s weighted kappa for inter-rater agreement requires the specification of a weighting matrix. An explicit formula for the determinants of the principal minors of the weighting matrix with Cicchetti–Allison weights is derived. Since all determinants are strictly positive, it follows that the Cicchetti–Allison weighting matrix is...
Article
Full-text available
Cohen’s weighted kappa is a popular descriptive statistic for summarizing interrater agreement on an ordinal scale. An agreement table with $n\in \mathbb N _{\ge 3}$ ordered categories can be collapsed into $n-1$ distinct $2\times 2$ tables by combining adjacent categories. Weighted kappa with linear weights is a weighted average of the kappa...
Article
Cohen's kappa and weighted kappa are two standard tools for describing the degree of agreement between two observers on a categorical scale. For agreement tables with three or more categories, popular weights for weighted kappa are the so-called linear and quadratic weights. It has been frequently observed in the literature that, when Cohen's kappa...
Article
Full-text available
Weighted kappa is a widely used statistic for summarizing inter-rater agreement on a categorical scale. For rating scales with three categories, there are seven versions of weighted kappa. It is shown analytically how these weighted kappas are related. Several conditional equalities and inequalities between the weighted kappas are derived. The anal...
Article
Full-text available
This paper studied correction for chance and correction for maximum value as functions on a space of association coefficients. Various properties of both functions are presented. It is shown that the two functions commute under composition; and that the composed function maps a coefficient and all its linear transformations given the marginal total...
Article
Full-text available
Cohen’s kappa is a popular descriptive statistic for summarizing agreement between the classifications of two raters on a nominal scale. With raters there are several views in the literature on how to define agreement. The concept of g-agreement refers to the situation in which it is decided that there is agreement if g out of m raters assign an ob...
Article
Cohen’s weighted kappa is a popular descriptive statistic for measuring the agreement between two raters on an ordinal scale. Popular weights for weighted kappa are the linear weights and the quadratic weights. It has been frequently observed in the literature that the value of the quadratically weighted kappa is higher than the value of the linear...
Article
Cohen’s kappa is a popular descriptive statistic for measuring agreement between two raters on a nominal scale. Various authors have generalized Cohen’s kappa to the case of m≥2 raters. We consider a family of multi-rater kappas that are based on the concept of g-agreement (g=2,3,…,m), which refers to the situation in which it is decided that there...
Article
Cohen’s unweighted kappa and weighted kappa are popular descriptive statistics for measuring agreement between two raters on a categorical scale. With m≥3m≥3 raters, there are several views in the literature on how to define agreement. We consider a family of weighted kappas for multiple raters using the concept of gg-agreement (g=2,3,…,mg=2,3,…,m)...
Article
Full-text available
An n × n agreement table F = {f ij } with n ≥ 3 ordered categories can for fixed m (2 ≤ m ≤ n − 1) be collapsed into ${\binom{n-1}{m-1}}$ distinct m × m tables by combining adjacent categories. It is shown that the components (observed and expected agreement) of Cohen’s weighted kappa with linear weights can be obtained from the m × m subtables....
Article
Full-text available
We signal and discuss common methodological errors in agreement studies and the use of kappa indices, as found in publications in the medical and behavioural sciences. Our analysis is based on a proposed statistical model that is in line with the typical models employed in metrology and measurement theory. A first cluster of errors is related to no...
Article
Cohen’s kappa is the most widely used descriptive measure of interrater agreement on a nominal scale. A measure that has repeatedly been proposed in the literature as an alternative to Cohen’s kappa is Bennett, Alpert and Goldstein’s S. The latter measure is equivalent to Janson and Vegelius’ C and Brennan and Prediger’s kappan. An agreement table...
Article
The κ coefficient is a popular descriptive statistic for summarizing an agreement table. It is sometimes desirable to combine some of the categories, for example, when categories are easily confused, and then calculate κ for the collapsed table. Since the categories of an agreement table are nominal and the order in which the categories of a table...
Article
Full-text available
An agreement table with n∈ℕ≥3 ordered categories can be collapsed into n−1 distinct 2×2 tables by combining adjacent categories. Vanbelle and Albert (Stat. Methodol. 6:157–163, 2009c) showed that the components of Cohen’s weighted kappa with linear weights can be obtained from these n−1 collapsed 2×2 tables. In this paper we consider several conseq...
Article
Cohen's kappa is presently a standard tool for the analysis of agreement in a 2 × 2 reliability study, and weighted kappa is a standard statistic for summarizing a 2 × 2 validity study. The special cases of weighted kappa, for example Cohen's kappa, are chance-corrected measures of association. For various measures of 2 × 2 association it has been...
Article
J. Cohen’s [Psychol. Bull. 70, 213–220 (1968)] kappa and weighted kappa are two popular descriptive statistics for measuring agreement between two observers on a nominal scale. It has been frequently observed in the literature that when Cohen’s kappa and weighted kappa are applied to the same agreement table, the value of the weighted kappa is high...
Article
Full-text available
The quadratically weighted kappa is the most commonly used weighted kappa statistic for summarizing interrater agreement on an ordinal scale. The paper presents several properties of the quadratically weighted kappa that are paradoxical. For agreement tables with an odd number of categories n it is shown that if one of the raters uses the same base...
Article
Full-text available
The paper presents inequalities between four descriptive statistics that have been used to measure the nominal agreement between two or more raters. Each of the four statistics is a function of the pairwise information. Light’s kappa and Hubert’s kappa are multi-rater versions of Cohen’s kappa. Fleiss’ kappa is a multi-rater extension of Scott’s pi...
Article
Full-text available
Suppose two judges each classify a group of objects into one of several nominal categories. It has been observed in the literature that, for fixed observed agreement between the judges, Cohen’s kappa penalizes judges with similar marginals compared to judges who produce different marginals. This paper presents a formal proof of this phenomenon. Ke...
Article
The kappa coefficient is a popular descriptive statistic for summarizing the cross classification of two nominal variables with identical categories. It has been frequently observed in the literature that combining two categories increases the value of kappa. In this note we prove the following existence theorem for kappa: For any nontrivial k×k ag...
Article
Full-text available
We study a family of n-way metrics that generalize the usual two-way metric. The n-way metrics are totally symmetric maps from E n into \( {\mathbb{R}_{ \geqslant 0}} \). The three-way metrics introduced by Joly and Le Calvé (1995) and Heiser and Bennani (1997) and the n-way metrics studied in Deza and Rosenberg (2000) belong to this family. It is...
Article
Full-text available
This paper presents a simple rescaling of the odds ratio that transforms the association measure into the weighted kappa statistic for a 2×2 table. KeywordsCohen’s kappa-2×2 association measure
Article
Full-text available
The paper presents inequalities between four descriptive statistics that can be expressed in the form [P−E(P)]/[1−E(P)], where P is the observed proportion of agreement of a k×k table with identical categories, and E(P) is a function of the marginal probabilities. Scott’s π is an upper bound of Goodman and Kruskal’s λ and a lower bound of both Benn...
Article
Full-text available
Suppose two judges each classify a group of objects into one of several nominal categories. It has been observed in the literature that, for fixed observed agreement between the judges, Cohen's kappa penalizes judges with similar marginals compared to judges who produce different marginals. This paper presents a formal proof of this phenomenon.
Article
Full-text available
A dissimilarity measure on a set of objects is Robinsonian if its matrix can be symmetrically permuted so that its elements do not decrease when moving away from the main diagonal along any row or column. The Robinson property of a dissimilarity reflects an order of the objects. If a dissimilarity is not observed directly, it must be obtained from...
Article
Full-text available
k-Adic formulations (for groups of objects of size k) of a variety of 2-adic similarity coefficients (for pairs of objects) for binary (presence/absence) data are presented. The formulations are not functions of 2-adic similarity coefficients. Instead, the main objective of the the paper is to present k-adic formulations that reflect certain basic...
Article
Correspondence analysis is an exploratory technique for analyzing the interaction in a contingency table. Tables with meaningful orders of the rows and columns may be analyzed using a model-based correspondence analysis that incorporates order constraints. However, if there exists a permutation of the rows and columns of the contingency table so th...
Article
Full-text available
Validity of the triangle inequality and minimality, both axioms for two-way dissimilarities, ensures that a two-way dissimilarity is nonnegative and symmetric. Three-way generalizations of the triangle inequality and minimality from the literature are reviewed and it is investigated what forms of symmetry and nonnegativity are implied by the three-...
Article
Full-text available
We discuss properties that association coefficients may have in general, e.g., zero value under statistical independence, and we examine coefficients for 2×2 tables with respect to these properties. Furthermore, we study a family of coefficients that are linear transformations of the observed proportion of agreement given the marginal probabilities...
Article
Full-text available
It is shown that one can calculate the Hubert-Arabie adjusted Rand index by first forming the fourfold contingency table counting the number of pairs of objects that were placed in the same cluster in both partitions, in the same cluster in one partition but in different clusters in the other partition, and in different clusters in both, and then c...
Article
Full-text available
Bounds of association coefficients for binary variables are derived using the arithmetic-geometric-harmonic mean inequality. More precisely, it is shown which presence/absence coefficients are bounds with respect to each other. Using the new bounds it is investigated whether a coefficient is in general closer to either its upper or its lower bound.
Article
Full-text available
Many similarity coefficients for binary data are defined as fractions. For certain resemblance measures the denominator may become zero. If the denominator is zero the value of the coefficient is indeterminate. It is shown that the serious- ness of the indeterminacy problem differs with the resemblance measures. Following Batagelj and Bren (1995) w...
Article
Full-text available
This paper studies correction for chance in coefficients that are linear functions of the observed proportion of agreement. The paper unifies and extends various results on correction for chance in the literature. A specific class of coefficients is used to illustrate the results derived in this paper. Coefficients in this class, e.g. the simple ma...
Article
Full-text available
In data analysis, an important role is played by similarity coefficients. A similarity coefficient is a measure of resemblance or association of two entities or variables. Similarity coefficients for binary data are used, for example, in biological ecology for measuring the degree of coexistence between two species type over different locations, or...
Article
Using a self-regulatory framework, this study aims to identify how couples perceive a partner's support style after myocardial infarction (MI), and whether this predicts the patient's health-related quality of life (HR-QoL) and self-management (S-M) 9 months later. This longitudinal dyadic study includes 73 couples (86% of patients were men), recru...
Article
Full-text available
In this article, the relationship between two alternative methods for the analysis of multivariate categorical data is systematically explored. It is shown that the person score of the first dimension of classical optimal scaling correlates strongly with the latent variable for the two-parameter item response theory (IRT) model. Next, under the ass...
Chapter
A square similarity matrix is called a Robinson matrix if the highest entries within each row and column are on the main diagonal and if, when moving away from this diagonal, the entries never increase. This paper formulates Robinson cubes as three-way generalizations of Robinson matrices. The first definition involves only those entries that are i...
Article
Full-text available
Individual performance was compared across three different tasks that tap into the binding of stimulus features in perception, the binding of action features in action planning, and the emergence of stimulus-response bindings ("event files"). Within a task correlations between the size of binding effects were found within visual perception (e.g., t...
Article
Full-text available
Two test theoretical approaches to item analysis are compared, an approach based on homogeneity analysis and one based on item response theory. The literature on the relationship between the two approaches is briefly reviewed. The paper contains a contribution to the relationship between the two approaches for the case that the scores are dichotomo...

Network

Cited By