Advice on total-score reliability issues in psychosomatic measurement

Department of Methodology and Statistics, Tilburg University, Tilburg, The Netherlands.
Journal of psychosomatic research (Impact Factor: 2.74). 06/2011; 70(6):565-72. DOI: 10.1016/j.jpsychores.2010.11.002
Source: PubMed


This article addresses three reliability issues that are problematic in the construction of scales intended for use in psychosomatic research, illustrates how these problems may lead to errors, and suggests solutions.
We used psychometric results and present five computational studies. The first, third, and fourth studies are based on the generation of artificial data from psychometric models in combination with distributions for scale scores, as is common in psychometric research, whereas the second and fifth studies are analytical.
The power of Student's t test depends more on sample size than on total-score reliability, but reliability must be high when one estimates correlations involving test scores. Short scales often do not allow total scores to be significantly different from a cutoff score. Coefficient alpha is uninformative about the factorial structure of questionnaires and is one of the weakest estimators of total-score reliability.
The relationship between questionnaire length/reliability and statistical power is complex. Both in research and individual diagnostics, we recommend the use of highly reliable scales so as to reduce the chance of faulty decisions. The conclusion calls for profound statistical research producing hands-on rules for researchers to act upon. Factor analysis should be used to assess the internal consistency of questionnaires. As a reliability estimator, alpha should be replaced by better and readily available methods.

Download full-text


Available from: Wilco H M Emons, Apr 16, 2015
  • Source
    • "In addition, removing items 1 and 2 reduces Cronbach's alpha from 0.86 to 0.82 (results based on a simulated data set of 10,000 item-response vectors; details available from the second author). This may seem small, but it should be noted that decreasing reliability caused by test length has several adverse effects, including a reduction in the power to find group differences, additional bias in the estimated regression effects of the EDS, and higher risks of classification errors (e.g., [74]). Furthermore, removal of the items necessitates determining new cutoffs for diagnosing mild and severe levels of depression, and may unduly narrow the construct since one aspect (anhedonia) may no longer be well represented. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Depression is a common complication in type 2 diabetes (DM2), affecting 10-30% of patients. Since depression is underrecognized and undertreated, it is important that reliable and validated depression screening tools are available for use in patients with DM2. The Edinburgh Depression Scale (EDS) is a widely used method for screening depression. However, there is still debate about the dimensionality of the test. Furthermore, the EDS was originally developed to screen for depression in postpartum women. Empirical evidence that the EDS has comparable measurement properties in both males and females suffering from diabetes is lacking however. In a large sample (N = 1,656) of diabetes patients, we examined: (1) dimensionality; (2) gender-related item bias; and (3) the screening properties of the EDS using factor analysis and item response theory. We found evidence that the ten EDS items constitute a scale that is essentially one dimensional and has adequate measurement properties. Three items showed differential item functioning (DIF), two of them showed substantial DIF. However, at the scale level, DIF had no practical impact. Anhedonia (the inability to be able to laugh or enjoy) and sleeping problems were the most informative indicators for being able to differentiate between the diagnostic groups of mild and severe depression. The EDS constitutes a sound scale for measuring an attribute of general depression. Persons can be reliably measured using the sum score. Screening rules for mild and severe depression are applicable to both males and females.
    BMC Psychiatry 08/2011; 11(1):141. DOI:10.1186/1471-244X-11-141 · 2.21 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: I address two issues that were inspired by my work on the Dutch Committee on Tests and Testing (COTAN). The first issue is the understanding of problems test constructors and researchers using tests have of psychometric knowledge. I argue that this understanding is important for a field, like psychometrics, for which the dissemination of psychometric knowledge among test constructors and researchers in general is highly important. The second issue concerns the identification of psychometric research topics that are relevant for test constructors and test users but in my view do not receive enough attention in psychometrics. I discuss the influence of test length on decision quality in personnel selection and quality of difference scores in therapy assessment, and theory development in test construction and validity research. I also briefly mention the issue of whether particular attributes are continuous or discrete.
    Psychometrika 01/2011; 77(1):4-20. DOI:10.1007/s11336-011-9242-4 · 1.09 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Type D personality refers to a clustering of 2 stable personality traits, namely negative affectivity and social inhibition. Currently Type D is standardly assessed using the DS14. An experimental Type D personality scale, the DS((3)), was developed to examine an avenue for assessing Type D more efficiently. The DS((3)) differs from the DS14 in its use of a 3-point Likert scale to rate responses, use of all negatively worded items, and a rearranged presentation of items. This article examines the psychometric properties of this questionnaire by examining its dimensionality, item and scale properties, and cutoff scores to screen for Type D personality. Data from 2 clinical samples were analyzed using item response theory. The results suggest that the DS((3)) is a potentially suitable instrument for Type D assessment. It has high reliability, and Type D personality classification based on this scale corresponds well with the current standard Type D assessment based on the DS14.
    Journal of Personality Assessment 03/2012; 94(2):210-9. DOI:10.1080/00223891.2011.645933 · 1.84 Impact Factor
Show more