ArticlePDF AvailableLiterature Review

Minimal changes in health status questionnaires: Distinction between minimally detectable change and minimally important change

Authors:

Abstract

Changes in scores on health status questionnaires are difficult to interpret. Several methods to determine minimally important changes (MICs) have been proposed which can broadly be divided in distribution-based and anchor-based methods. Comparisons of these methods have led to insight into essential differences between these approaches. Some authors have tried to come to a uniform measure for the MIC, such as 0.5 standard deviation and the value of one standard error of measurement (SEM). Others have emphasized the diversity of MIC values, depending on the type of anchor, the definition of minimal importance on the anchor, and characteristics of the disease under study. A closer look makes clear that some distribution-based methods have been merely focused on minimally detectable changes. For assessing minimally important changes, anchor-based methods are preferred, as they include a definition of what is minimally important. Acknowledging the distinction between minimally detectable and minimally important changes is useful, not only to avoid confusion among MIC methods, but also to gain information on two important benchmarks on the scale of a health status measurement instrument. Appreciating the distinction, it becomes possible to judge whether the minimally detectable change of a measurement instrument is sufficiently small to detect minimally important changes.
BioMed Central
Page 1 of 5
(page number not for citation purposes)
Health and Quality of Life Outcomes
Open Access
Commentary
Minimal changes in health status questionnaires: distinction
between minimally detectable change and minimally important
change
Henrica C de Vet*, Caroline B Terwee, Raymond W Ostelo,
Heleen Beckerman, Dirk L Knol and Lex M Bouter
Address: EMGO Institute, VU University Medical Center, Van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands
Email: Henrica C de Vet* - hcw.devet@vumc.nl; Caroline B Terwee - cb.terwee@vumc.nl; Raymond W Ostelo - r.ostelo@vumc.nl;
Heleen Beckerman - h.beckerman@vumc.nl; Dirk L Knol - d.knol@vumc.nl; Lex M Bouter - lm.bouter@vumc.nl
* Corresponding author
Abstract
Changes in scores on health status questionnaires are difficult to interpret. Several methods to
determine minimally important changes (MICs) have been proposed which can broadly be divided
in distribution-based and anchor-based methods. Comparisons of these methods have led to insight
into essential differences between these approaches. Some authors have tried to come to a uniform
measure for the MIC, such as 0.5 standard deviation and the value of one standard error of
measurement (SEM). Others have emphasized the diversity of MIC values, depending on the type
of anchor, the definition of minimal importance on the anchor, and characteristics of the disease
under study. A closer look makes clear that some distribution-based methods have been merely
focused on minimally detectable changes. For assessing minimally important changes, anchor-based
methods are preferred, as they include a definition of what is minimally important. Acknowledging
the distinction between minimally detectable and minimally important changes is useful, not only to
avoid confusion among MIC methods, but also to gain information on two important benchmarks
on the scale of a health status measurement instrument. Appreciating the distinction, it becomes
possible to judge whether the minimally detectable change of a measurement instrument is
sufficiently small to detect minimally important changes.
Introduction
Health status questionnaires are increasingly used in med-
ical research and clinical practice. They are attractive
because they provide a self-report of patients' perceived
health status. However, the meaning of the (changes in)
scores on these questionnaires is not intuitively apparent.
The interpretation of (change)scores has been a topic of
research for almost two decades [1,2]. It is recognized that
the statistical significance of a treatment effect, because of
its partial dependency on sample size, does not always
correspond to the clinical relevance of the effect. Statisti-
cally significant effects are those that occur beyond some
level of chance. In contrast, clinical relevance refers to the
benefits derived from that treatment, its impact upon the
patient, and its implications for clinical management of
the patient [2,3]. As a yardstick for clinical relevance one
is interested in the minimally important change (MIC) of
health status questionnaires. Changes in scores exceeding
the MIC are clinically relevant by definition.
Published: 22 August 2006
Health and Quality of Life Outcomes 2006, 4:54 doi:10.1186/1477-7525-4-54
Received: 26 July 2006
Accepted: 22 August 2006
This article is available from: http://www.hqlo.com/content/4/1/54
© 2006 de Vet et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0
),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Health and Quality of Life Outcomes 2006, 4:54 http://www.hqlo.com/content/4/1/54
Page 2 of 5
(page number not for citation purposes)
Different methods to determine the MIC on the scale of a
measurement instrument have been proposed. These
methods have been summarized by Lydick and Epstein
[4], and recently more extensively by Crosby et al. [5].
Both overviews distinguish distribution-based and
anchor-based methods [4,5].
Distribution-based approaches are based on statistical
characteristics of the sample at issue. Most distribution-
based methods express the observed change in a standard-
ized metric. Examples are the effect size (ES) and the
standardized response mean (SRM), where the numera-
tors of both parameters represent the mean change and
the denominators are the standard deviation at baseline
and the standard deviation of change for the sample at
issue, respectively. Another distribution-based measure is
the standard error of measurement (SEM), which links the
reliability of the measurement instrument to the standard
deviation of the population [5]. ES and SRM are relative
representations of change (without units), whereas the
SEM provides a number in the same units as the original
measurement. The major disadvantage of all distribution-
based methods is that they do not, in themselves, provide
a good indication of the importance of the observed
change.
Anchor-based methods assess which changes on the
measurement instrument correspond with a minimal
important change defined on the anchor [4], i.e. an exter-
nal criterion is used to operationalize a relevant or an
important change. The advantage is that the concept of
'minimal importance' is explicitly defined and incorpo-
rated in these methods. A limitation of anchor-based
approaches is that they do not, in themselves, take meas-
urement precision into account [4,5]. Thus, there is no
information on whether an important change according
to an anchor-based method, lies within the measurement
error of the health status measurement.
An often used anchor-based method is the one proposed
by Jaeschke et al. [2], which defined MIC as the mean
change in scores of patients categorized by the anchor as
having experienced minimally important improvement or
minimally important deterioration. Another anchor-
based method, proposed by Deyo and Centor [6], is based
on diagnostic test methodology. In this method, the
change score on the measurement instrument is consid-
ered the diagnostic test and the anchor, dividing the pop-
ulation in persons who are minimally importantly
changed and those who are not, is considered the gold
standard. At different cut-off values of change scores the
sensitivity and specificity are calculated and the MIC value
is set at the change value on the measurement, where the
sum of the percentages of false positives and false nega-
tives is minimal.
A number of studies have compared anchor-based and
distribution-based approaches. Comparisons of these
approaches sometimes led to surprisingly similar results.
However, in other situations different results were found.
The focus of this paper is on explanation of the differences
between distribution-based and anchor-based
approaches. We will provide arguments for the distinction
between minimally detectable change and minimally
important change. Appreciating and acknowledging this
distinction enhances the interpretation of change scores
of a measurement instrument.
Comparison of SEM with anchor-based approaches
A number of studies have compared the value of the SEM
with the MIC value derived by an anchor-based approach.
A SEM value is easy to calculate, based on the standard
deviation (SD) of the sample and the reliability of the
measurement instrument: in formula: SEM = SD (1-R).
As reliability parameter, test-retest reliability or Cron-
bach's α can be used. In the latter case, SEM can be calcu-
lated based on one measurement and it purely represents
the variability of the instrument [7]. Test-retest reliability
requires two measurements in a stable population. It rep-
resents the temporal stability and is therefore more appro-
priate than Cronbach's α to use in the context of changes
in health status which are based on measurements at two
different time points [8]. In classical test theory, SEM has
a rather stable value in different populations [6].
Several authors showed that a MIC based on patient's glo-
bal rating as anchor was close to the value of one SEM
[9,11]. Cella et al. [12] also observed similar values for
SEM and MIC, using clinical parameters as anchor instead
of patients' global rating of change. However, Crosby et al.
[13] showed that only for patients with moderate baseline
values the anchor-based MIC value more or less equalled
the SEM value (with adjustment for regression to the
mean). With higher baseline values the MIC became con-
siderably larger than one SEM, while with lower baseline
values the MIC became much smaller than one SEM. A
recent study [14] compared SEM with anchor-based esti-
mations of minimally important change using crosssec-
tional and longitudinal anchors. No substantial
differences were found between these methods, but it
should be noted that they only presented anchor-based
values when effect sizes were between 0.2 and 0.5 [14].
Wyrwich [15], compared SEM to MIC values determined
by an anchor-based approach in two sets of studies which
differed on several points. Set A consisted of studies on
musculoskeletal disorders like low back pain [16], neck
pain [17] and lower extremity disorders [18], while set B
included studies on chronic disorders like chronic respira-
tory disease [10], chronic heart failure [11], and asthma
[19]. In addition, set A studies used the ROC method and
Health and Quality of Life Outcomes 2006, 4:54 http://www.hqlo.com/content/4/1/54
Page 3 of 5
(page number not for citation purposes)
studies in set B applied mean change as anchor-based
method. And they differed with regard to the definition of
'minimal important change' on the anchor. For set A, the
MIC corresponded to 2.3 or 2.6 * SEM, for set B, the MIC
values were close to 1*SEM.
In summary, it seems that the proposition that one SEM
equals the MIC is not a universal truth.
MIC is a variable concept
1. MIC depends on the definition of 'important change' on the anchor
A patient's self report global rating scale of perceived
change has often been used as anchor. Studies determin-
ing MIC have used different definitions of 'minimally
importance' using this anchor. Wyrwich et al. [10,11,19]
defined a slight change on the anchor as 'minimally
important', consisting of the categories "a little worse/bet-
ter" and "somewhat worse/better". In their earlier studies
Wyrwich et al. even included the category "almost the
same, hardly any worse/better" [10,11]. Other authors
have defined 'minimal importance' as a larger change on
the anchor. Binkley et al. [18] chose the category "moder-
ate improvement" as minimally important. Stratford et al.
[17] chose to lay the cut-off point for MIC between "mod-
erate" and "a good deal" of improvement. Others [20-24]
have laid the cut-off point for MIC between "slightly
improved" and "much improved" on the patient global
rating scale. In studies requiring moderate or much
improvement, the MIC corresponds to about 2.5 times the
SEM value. The differences in set A and B in Wyrwich's
study [15] may be partly explained by a different defini-
tion of important change on the anchor: set A consists of
studies which defined MIC as a good deal better [16] and
studies in set B [10,11,19] defined MIC as a little and
somewhat better according to the anchor.
The MIC value depends to a great degree on the anchor's
definition of minimal importance. So, the crucial question,
then, is "what is a minimally important improvement or
deterioration?" Some authors tend to emphasize minimal,
while others stress important [25]. Remarkably, the refer-
ence standard is usually based on the amount of change
and little research has focused on the "importance" of the
change.
2. MIC depends on the type of anchor
Clinicians may have other opinions about what is impor-
tant than patients. Therefore, clinician-based anchors may
lead to different MIC values. Kosinski et al. [26] used five
different anchors to estimate the minimally important dif-
ferences for the SF-36 in a clinical trial of people with
rheumatoid arthritis, and found different MIC values
dependent on the anchor used. Some authors [16,20-24]
have asked patients' global rating of perceived change in
overall health, while others asked to rate the perceived
change separately for each dimension of their measure-
ment instrument [10,19]. For example, in a study deter-
mining the MIC of the Chronic Respiratory Disease Scale
the patients' global rating has been asked separately for
the subscales dyspnoea, fatigue and emotional function
[10]. In the rating of change in overall health status
patients have to weigh the relative contribution of the dif-
ferent dimensions on their health status. For example, if
patients with asthma judge dyspnoea to be much more
important for their quality of life than emotional func-
tioning, a small change in dyspnoea will affect the global
rating of overall health, while for emotional functioning
the change must be larger to be influential. The observed
MIC value will be smallest for the anchor that shows the
highest correlation with the health status scale under
study.
3. MIC depends on baseline values and direction of change
Several studies have shown that the MIC value of a meas-
urement instrument depends on the baseline score on
that instrument. This was clearly shown by Crosby et al.
[13] who compared the SEM, corrected for regression to
the mean, with the anchor-based MIC for various baseline
scores of obesity-specific health related quality of life.
With higher baseline values MIC became considerably
larger than one SEM. Other authors [16,24,27,28] showed
that the values of anchor-based MIC for functional status
questionnaires in patients with low back pain were
dependent on baseline values. Patients with a high level of
functional disability at baseline must change more points
on the Roland Disability Questionnaire than patients
with less functional disability at baseline to consider it an
important change. In addition, Van der Roer et al. [24]
reported different MIC values for acute and chronic low
back pain patients.
Furthermore, there has been discussion whether the MIC
for improvement is the same as for deterioration [5]. In
some studies the same MIC is reported for patients who
improve and patients who deteriorate [2,29,30], but oth-
ers found different MIC values for improvement and dete-
rioration. Cella et al. [31] demonstrated that cancer
patients who reported global worsening had considerably
larger change scores on the Functional Assessment of Can-
cer Therapy (FACT) scale than those reporting comparable
global improvements. Also Ware et al. observed that a
larger change on the SF-36 was needed for patients to feel
worsened than to feel improved [32].
Thus, the MIC is dependent on, among other things, the
type of anchor, the definition of 'minimal importance' on
the anchor, and on the baseline score which might be an
indicator of severity of the disease. Therefore, various
authors have suggested to present a range of MIC values
[24,26,33-35], to account for this diversity. Hays et al. rec-
Health and Quality of Life Outcomes 2006, 4:54 http://www.hqlo.com/content/4/1/54
Page 4 of 5
(page number not for citation purposes)
ommend to use different anchors and to give reasonable
bounds around the MIC, rather than forcing the MIC to be
a fixed value [33,34].
Distinction between minimally detectable and minimally
important changes
Some authors have searched for uniform measures for
minimally important changes. Wyrwich and others
[10,11] have evaluated whether the one-SEM criterion can
be applied as a proxy for MIC. Norman et al. [36] made a
systematic review of 38 studies (including 62 effect sizes),
and observed, with only a few exceptions, that the MICs
for health related quality of life instruments were close to
half a standard deviation (SD). This held for generic and
disease specific measures and was not dependent on the
number of response options.
Norman et al. [36] explain their finding of 0.5 SD by refer-
ring to psychophysiological evidence that the limit of peo-
ple's ability to discriminate is approximately 1 part in 7,
which is very close to half a SD. Thus, this criterion of 0.5
SD may be considered a threshold of detection and corre-
sponds more to minimally detectable change than to mini-
mally important change. Also SEM, based on the test-retest
reliability in stable persons, is merely a measure of detect-
able change [37]. Note that, using the formula SEM = SD
(1-R), 1 SEM equals 0.5 SD, when the reliability of the
instrument is 0.75. Thus, 0.5 SD and SEM clearly alert to
the concept of minimally detectable changes.
Wyrwich [15] comparing the two sets of studies showed
that if the cut-off point for 'minimal importance' on the
anchor is laid between "no change" and "slightly
changed", i.e. the first category above no change, together
with a complaint-specific anchor, the MIC is close to one
SEM. But in this case it focuses more on minimally detecta-
ble change than minimally important change. Wyrwich [15]
showed a clear dependency between MIC and cut-off
value of 'minimal importance' on the anchor of patients'
global rating of perceived change.
Salaffi et al. [38] presented the change on a numerical rat-
ing scale for pain using two cut-off points on a patient glo-
bal impression of change scale. In their opinion, a MIC
using "slightly better" as cut-off point on the anchor
reflected the minimum and lowest degree of improve-
ment that could be detected, while the cut-off point
"much better" refers to a clinically important outcome.
Note that the choice of anchor and cut-off point is arbi-
trary and cannot be based on statistical characteristics.
Interpretation and applicability
We believe that the confusion about MIC will decrease if
the distinction between minimally detectable and mini-
mal important change is appreciated and acknowledged.
In statistical terms, the minimally detectable change
(MDC), also called smallest detectable change or smallest
real change [37] shows which changes fall outside the
measurement error of the health status measurement
(either based on internal or test-retest reliability in stable
persons). It is represented by the following formula: MDC
= 1.96 * 2 * SEM, where the 1.96 derives from the 95%
confidence interval of no change, and 2 is included
because two measurements are involved in measuring
change [37].
As a different concept, the MIC value depicts changes
which are considered to be minimally important by
patients, clinicians, or relevant others. The SEM, the min-
imally detectable change and the minimally important
change are all important benchmarks on the scale of the
measurement instrument, which helps with the interpre-
tation of change scores.
Appreciating the distinction, we can answer the important
question whether a health status measurement instru-
ment is able to detect changes as small as the MIC value.
This application is shown in a study on measurement
instruments for low back pain [27] and for visual impair-
ments [39].
Conclusion
Some distribution-based methods to assess MIC have
been more focussed on minimally detectable changes
than on minimally important changes. For assessing min-
imally important changes, anchor-based methods are pre-
ferred, as they include a definition of what is minimally
important. Acknowledging the distinction between mini-
mally detectable and minimally important changes is use-
ful, not only to avoid confusion among MIC methods, but
also to gain information on two important benchmarks
on the scale of a health status measurement instrument.
Moreover, it becomes possible to judge whether the min-
imally detectable change of a measurement instrument is
sufficiently small to detect minimally important changes.
References
1. Jacobson N, Follette W, Revenstorf D: Toward a standard defini-
tion of clinically significant change. Behavior Therapy 1986,
17:308-311.
2. Jaeschke R, Singer J, Guyatt GH: Measurement of health status.
Ascertaining the minimal clinically important difference.
Control Clin Trials 1989, 10:407-415.
3. Jacobson NS, Truax P: Clinical significance: a statistical
approach to defining meaningful change in psychotherapy
research. J Consult Clin Psychol 1991, 59:12-19.
4. Lydick E, Epstein RS: Interpretation of quality of life changes.
Qual Life Res 1993, 2:221-226.
5. Crosby RD, Kolotkin RL, Williams GR: Defining clinically mean-
ingful change in health-related quality of life. J Clin Epidemiol
2003, 56:395-407.
6. Deyo RA, Centor RM: Assessing the responsiveness of func-
tional scales to clinical change : an analogy to diagnostic test
performance. J Chron Dis 1986, 39:897-906.
Publish with BioMed Central and every
scientist can read your work free of charge
"BioMed Central will be the most significant development for
disseminating the results of biomedical research in our lifetime."
Sir Paul Nurse, Cancer Research UK
Your research papers will be:
available free of charge to the entire biomedical community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours — you keep the copyright
Submit your manuscript here:
http://www.biomedcentral.com/info/publishing_adv.asp
BioMedcentral
Health and Quality of Life Outcomes 2006, 4:54 http://www.hqlo.com/content/4/1/54
Page 5 of 5
(page number not for citation purposes)
7. Nunnally JC, Bernstein IH: Psychometric theory New York: McGraw-
Hill; 1994.
8. Schmidt FL, Le H, Ilies R: Beyond alpha: an empirical examina-
tion of the effects of different sources of measurement error
on reliability estimates for measures of individual differences
constructs. Psychol Methods 2003, 8:206-224.
9. Norquist JM, Fitzpatrick R, Jenkinson C: Health-related quality of
life in amyotrophic lateral sclerosis: determining a meaning-
ful deterioration. Qual Life Res 2004, 13:1409-1414.
10. Wyrwich KW, Tierney WM, Wolinsky FD: Further evidence sup-
porting an SEM-based criterion for identifying meaningful
intra-individual changes in health-related quality of life. J Clin
Epidemiol 1999, 52:861-873.
11. Wyrwich KW, Nienaber NA, Tierney WM, Wolinsky FD: Linking
clinical relevance and statistical significance in evaluating
intra-individual changes in health-related quality of life. Med
Care 1999, 37:469-478.
12. Cella D, Eton DT, Fairclough DL, Bonomi P, Heyes AE, Silberman C,
Wolf MK, Johnson DH: What is a clinically meaningful change
on the Functional Assessment of Cancer Therapy-Lung
(FACT-L) Questionnaire? Results from Eastern Cooperative
Oncology Group (ECOG) Study 5592. J Clin Epidemiol 2002,
55:285-295.
13. Crosby RD, Kolotkin RL, Williams GR: An integrated method to
determine meaningful changes in health-related quality of
life. J Clin Epidemiol 2004, 57:1153-1160.
14. Yost KJ, Cella D, Chawla A, Holmgren E, Eton DT, Ayanian JZ, West
DW: Minimally important differences were estimated for the
Functional Assessment of Cancer Therapy-Colorectal
(FACT-C) instrument using a combination of distribution-
and anchor-based approaches. J Clin Epidemiol 2005,
58:1241-1251.
15. Wyrwich KW: Minimal important difference thresholds and
the standard error of measurement: is there a connection? J
Biopharm Stat 2004, 14:97-110.
16. Stratford PW, Binkley JM, Riddle DL, Guyatt GH:
Sensitivity to
change of the Roland-Morris Back Pain Questionnaire: part
1. Phys Ther 1998, 78:1186-1196.
17. Stratford PW, Riddle DL, Binkley JM, Spadoni G, Westaway MD, Pad-
field B: Using the neck disability index to make decisions con-
cerning individual patients. Physiother Can 1999, 51:107-112.
18. Binkley JM, Stratford PW, Lott SA, Riddle DL: The Lower Extrem-
ity Functional Scale (LEFS): scale development, measure-
ment properties, and clinical application. North American
Orthopaedic Rehabilitation Research Network. Phys Ther
1999, 79:371-383.
19. Wyrwich KW, Tierney WM, Wolinsky FD: Using the standard
error of measurement to identify important changes on the
Asthma Quality of Life Questionnaire. Qual Life Res 2002,
11:1-7.
20. Beurskens AJ, de Vet HC, Koke AJ, Lindeman E, van der Heijden GJ,
Regtop W, Knipschild PG: A patient-specific approach for meas-
uring functional status in low back pain. J Manipulative Physiol
Ther 1999, 22:144-148.
21. Davidson M, Keating JL: A comparison of five low back disability
questionnaires: reliability and responsiveness. Phys Ther 2002,
82:8-24.
22. Farrar JT, Portenoy RK, Berlin JA, Kinman JL, Strom BL: Defining
the clinically important difference in pain outcome meas-
ures. Pain 2000, 88:287-294.
23. Ostelo RW, de Vet HC, Knol DL, van den Brandt PA: 24-item
Roland-Morris Disability Questionnaire was preferred out of
six functional status questionnaires for post-lumbar disc sur-
gery. J Clin Epidemiol 2004, 57:268-276.
24. Van der Roer N, Ostelo RW, Bekkering GE, van Tulder MW, de Vet
HC: Minimal clinically important change for different out-
come measures in patients with non-specific low back pain.
Spine 2006, 31:578-582.
25. Sloan JA, Cella D, Hays RD: Clinical significance of patient-
reported questionnaire data : another step toward consen-
sus.
J Clin Epidemiol 2005, 58:1217-1219.
26. Kosinski M, Zhao SZ, Dedhiya S, Osterhaus JT, Ware JE Jr: Deter-
mining minimally important changes in generic and disease-
specific health-related quality of life questionnaires in clinical
trials of rheumatoid arthritis. Arthritis Rheum 2000,
43:1478-1487.
27. Hagg O, Fritzell P, Nordwall A: The clinical importance of
changes in outcome scores after treatment for chronic low
back pain. Eur Spine J 2003, 12:12-20.
28. Riddle DL, Stratford PW, Binkley JM: Sensitivity to change of the
Roland-Morris Back Pain Questionnaire: part 2. Phys Ther
1998, 78:1197-1207.
29. Juniper EF, Guyatt GH, Willan A, Griffith LE: Determining a mini-
mal important change in a disease-specific Quality of Life
Questionnaire. J Clin Epidemiol 1994, 47:81-87.
30. Redelmeier DA, Guyatt GH, Goldstein RS: Assessing the minimal
important difference in symptoms: a comparison of two
techniques. J Clin Epidemiol 1996, 49:1215-1219.
31. Cella D, Hahn EA, Dineen K: Meaningful change in cancer-spe-
cific quality of life scores : differences between improvement
and worsening. Qual Life Res 2002, 11:207-221.
32. Ware JE, Snow K, Kosinski M, Gandek B: SF-36 Health Survey: Manual
and Interpretation Guide Boston: The Health Institute; 1993.
33. Hays RD, Woolley JM: The concept of clinically meaningful dif-
ference in health-related quality-of-life research. How mean-
ingful is it? Pharmacoeconomics 2000, 18:419-423.
34. Hays RD, Farivar SS, Liu H: Approaches and recommendations
for estimating minimally important differences for health-
related quality of life measures. COPD: Journal of Chronic Obstruc-
tive Pulmonary Disease 2005, 2:63-67.
35. Ostelo RW, de Vet HC: Clinically important outcomes in low
back pain. Best Pract Res Clin Rheumatol 2005, 19:593-607.
36. Norman GR, Sloan JA, Wyrwich KW:
Interpretation of changes
in health-related quality of life: the remarkable universality
of half a standard deviation. Med Care 2003, 41:582-592.
37. Beckerman H, Roebroeck ME, Lankhorst GJ, Becher JG, Bezemer PD,
Verbeek ALM: Smallest real difference, a link between repro-
ducibility and responsiveness. Qual Life Res 2001, 10:571-578.
38. Salaffi F, Stancati A, Silvestri CA, Ciapetti A, Grassi W: Minimal clin-
ically important changes in chronic musculoskeletal pain
intensity measured on a numerical rating scale. Eur J Pain
2004, 8:283-291.
39. de Boer MR, de Vet HC, Terwee CB, Moll AC, Volker-Dieben HJ, van
Rens GH: Changes to the subscales of two vision-related qual-
ity of life questionnaires are proposed. J Clin Epidemiol 2005,
58:1260-1268.
... Statistical analyses were performed with Stata 14 (StataCorp, College Station, Texas). The standard error of measurement (SEM) was calculated using correlation coefficients to find the minimal detectable change (MDC) for PSE and thresholds [61]. We conducted an unplanned post-hoc repeated measures ANCOVA to determine if visit order (two tests/session first, yes or no) influenced PSE or thresholds as the second visit for some subjects included their third exposure to this heading perception task. ...
... Here we report on a distribution based MDC which provides an initial perspective on change in inertial heading perception. MDC differs from minimally important change which largely depends on a criterion anchor [61]. No consensus exists in the literature on a meaningful anchor defining important change for inertial heading perception. ...
Article
BACKGROUND: Inertial self-motion perception is thought to depend primarily on otolith cues. Recent evidence demonstrated that vestibular perceptual thresholds (including inertial heading) are adaptable, suggesting novel clinical approaches for treating perceptual impairments resulting from vestibular disease. OBJECTIVE: Little is known about the psychometric properties of perceptual estimates of inertial heading like test-retest reliability. Here we investigate the psychometric properties of a passive inertial heading perceptual test. METHODS: Forty-seven healthy subjects participated across two visits, performing in an inertial heading discrimination task. The point of subjective equality (PSE) and thresholds for heading discrimination were identified for the same day and across day tests. Paired t-tests determined if the PSE or thresholds significantly changed and a mixed interclass correlation coefficient (ICC) model examined test-retest reliability. Minimum detectable change (MDC) was calculated for PSE and threshold for heading discrimination. RESULTS: Within a testing session, the heading discrimination PSE score test-retest reliability was good (ICC = 0. 80) and did not change (t(1,36) = –1.23, p = 0.23). Heading discrimination thresholds were moderately reliable (ICC = 0.67) and also stable (t(1,36) = 0.10, p = 0.92). Across testing sessions, heading direction PSE scores were moderately correlated (ICC = 0.59) and stable (t(1,46) = –0.44, p = 0.66). Heading direction thresholds had poor reliability (ICC = 0.03) and were significantly smaller at the second visit (t(1,46) = 2.8, p = 0.008). MDC for heading direction PSE ranged from 6–9 degrees across tests. CONCLUSION: The current results indicate moderate reliability for heading perception PSE and provide clinical context for interpreting change in inertial vestibular self-motion perception over time or after an intervention.
... pooled (moderate). 34 (2) the number of participants who did not demonstrate a change beyond measurement error but did make a change in state of consciousness. These analyses allowed us to evaluate the number of participants whose improvement may have been missed or misinterpreted if only relying on change into another state of consciousness. ...
... The amantadine HCl and placebo groups both had large Cohen's d effect sizes and SRMs using the CRS-R over the four-week treatment period which is similar to the finding based on the Disability Rating Scale study outcome.10 The amantadine HCl group's change in mean Rasch person measure was 3.6 units greater than the placebo group's change in mean Rasch person measure which is comparable to the 0.20SD minimal clinically important difference (4 units),indicating that the amantadine HCl group may have made a small detectable difference in average improvement compared to the placebo group.34 Further supporting amantadine HCl, the proportion of participants who made a true improvement was greater in the amantadine versus placebo group; however, this finding did not reach statistical significance based on the chi-square test.We also delineated participants in the clinical trial based on the alignment of their baseline CRS-R record to Aspen criteria (i.e., MCS or UWS) and found that both groups had large effect sizes and standardized response means, indicating that regardless of group there was a large, positive CRS-R Rasch person measure change. ...
Article
The purpose of this study was to differentiate clinically meaningful improvement or deterioration from normal fluctuations in patients with disorders of consciousness (DoC) following severe brain injury. We computed indices of responsiveness for the Coma Recovery Scale-Revised (CRS-R) using data from a clinical trial of 180 participants with DoC. We used CRS-R scores from baseline (enrollment in a clinical trial) and a four-week follow-up assessment period for these calculations. To improve precision, we transformed ordinal CRS-R total scores (0 to 23 points) to equal-interval measures on a 0-to-100-unit scale using Rasch Measurement theory. Using the 0-to-100 unit total Rasch measures, we calculated distribution-based 0.5 standard deviation (SD) minimal clinically important difference, minimal detectable change using 95% confidence intervals, and conditional minimal detectable change using 95% confidence intervals. The distribution-based minimal clinically important difference evaluates group-level changes, whereas the minimal detectable change values evaluate individual-level changes. The minimal clinically important difference and minimal detectable change are derived using the overall variability across total measures at baseline and four weeks. The conditional minimal detectable change is generated for each possible pair of CRS-R Rasch person measures and accounts for variation in standard error across the scale. We applied these indices to determine the proportions of participants who made a change beyond measurement error within each of the two sub-groups, based on treatment arm (amantadine hydrochloride or placebo) or categorization of baseline Rasch person measure to states of consciousness (i.e., unresponsive wakefulness syndrome and minimally conscious state). We compared the proportion of participants in each treatment arm who made a change according to the minimal detectable change and determined whether they also changed to another state of consciousness. CRS-R indices of responsiveness (using the 0-100 transformed scale) were as follows: 0.5SD minimal clinically important difference = 9 units, minimal detectable change = 11 units, and the conditional minimal detectable change ranged from 11 to 42 units. For the amantadine and placebo groups, 67% and 58% of participants showed change beyond measurement error using the minimal detectable change, respectively. For the unresponsive wakefulness syndrome and minimally conscious state groups, 52% and 67% of participants changed beyond measurement error using the minimal detectable change, respectively. Among 115 participants (64% of the total sample) who made a change beyond measurement error, 29 participants (25%) did not change state of consciousness. CRS-R indices of responsiveness can support clinicians and researchers in discerning when behavioral changes in patients with DoC exceed measurement error. Notably, the minimal detectable change can support the detection of patients who make a ‘true’ change within or across states of consciousness. Our findings highlight that continued use of ordinal scores may result in incorrect inferences about the degree and relevance of a change score.
... Las mediciones se realizaron con el transductor en inclinación neutra utilizando una vista de sección transversal para medir el grosor muscular, utilizando el límite interno de los bordes epimisiales de cada músculo. 20 . Además, se midió la concordancia entre las puntuaciones de Heckmatt de los evaluadores novatos y el consenso de los experimentados usando Spearman Rho. ...
Article
Full-text available
Rev Med Chile 2023; 151: 1153-1163 Muscle ultrasound: reliability across experience levels in critical care physiotherapists Background: Muscle ultrasound is a valid tool to monitor muscle mass loss in critically ill patients. The level of experience is essential to the accuracy of the measurements. Aim: To evaluate the interobserver reliability of experienced and novice raters measuring muscle thickness and echo intensity of the quadriceps and tibialis anterior. Material and Methods: Cross-sectional observational study. Twenty-four critical care physiotherapists participated (5 experienced and 19 novice). Following a standardized ultrasound protocol, each rater measured the thickness (centimeters) of the quadriceps and tibialis anterior of 10 healthy and young models using linear and convex probes of portable devices. The Intraclass Correlation Coefficient and the Minimal Detectable Change (95% confidence interval) were calculated. Additionally, the novices scored the echo intensity of 19 muscle ultrasound images of critically ill patients using the Heckmatt score (qualitative assessment). The agreement with experienced raters was evaluated (Spearman Rho). Results: 960 muscle thickness measurements were performed (experienced = 200 and novice = 760). The mean thickness of the quadriceps and tibialis anterior was 4.4 ± 0.77 and 2.4 ± 0.35 centimeters for the experienced and 4.2 ± 0.80 and 2.2 ± 0.39 centimeters for the novices, respectively. Quadriceps' and tibialis' anterior reliability were 0.82 and 0.86 for experienced and 0.76 and 0.41 for novices, respectively. The Minimal Detectable Change ranged from 0.14-0.33 centimeters. The mean Heckmatt score was 2.6 ± 0.83 points, with a reliability of 0.68 and an agreement with the experimenters of 0.78 [p < 0.001]. Conclusions: Interobserver reliability was excellent for experienced raters and moderate to good for novice raters. The level of experience could determine the reliability of the results. (Rev Med Chile 2023; 151: 1153-1163)
... However, we judged the risk of bias in the studies (approximately one week) using another method described by Lee et al. [32]. A suitable measurement error requires that the smallest detectable change (SDC) in the measurement instrument is less than the MIC [64]. Only one study was conducted on the SDC and MIC [54]. ...
Article
Full-text available
The Peabody Developmental Motor Scales-2 (PDMS-2) has been used to assess the gross and fine motor skills of children (0–6 years); however, the measurement properties of the PDMS-2 are inconclusive. Here, we aimed to systematically review the measurement properties of PDMS-2, and synthesize the quality of evidence using the Consensus-based Standards for the Selection of Health Measurements Instruments (COSMIN) methodology. Electronic databases, including PubMed, EMBASE, Web of Science, CINAHL and MEDLINE, were searched for relevant studies through January 2023; these studies used PDMS-2. The methodological quality of each study was assessed by the COSMIN risk-of-bias checklist, and the measurement properties of PDMS-2 were evaluated by the COSMIN quality criteria. Modified GRADE was used to evaluate the quality of the evidence. We included a total of 22 articles in the assessment. Among the assessed measurement properties, the content validity of PDMS-2 was found to be sufficient with moderate-quality evidence. The structural validity, internal consistency, test-retest reliability and interrater reliability of the PDMS-2 were sufficient for high-quality evidence, while the intrarater reliability was sufficient for moderate-quality evidence. Sufficient high-quality evidence was also found for the measurement error of PDMS-2. The overall construct validity of the PDMS-2 was sufficient but showed inconsistent quality of evidence. The responsiveness of PDMS-2 appears to be sufficient with low-quality evidence. Our findings demonstrate that the PDMS-2 has sufficient content validity, structural validity, internal consistency, reliability and measurement error with moderate to high-quality evidence. Therefore, PDMS-2 is graded as ‘A’ and can be used in motor development research and clinical settings.
... In the absence of minimal important change thresholds for KOOS-child 'Sport/play' scale, specific in terms of diagnosis, severity, intervention, and length of follow [47], we have performed a clinical case series accounting for these factors, providing specific and relevant estimates of change on the different KOOS-Child subscales. In this clinical case series, 16 out of 33 patients met eligible criteria for inclusion in the current trial at their baseline visit, received the intervention in question, and attended follow-up at 6 months. ...
Article
Full-text available
Background Osgood-Schlatter is the most frequent growth-related injury affecting about 10% of physically active adolescents. It can cause long-term pain and limitations in sports and physical activity, with potential sequela well into adulthood. The management of Osgood-Schlatter is very heterogeneous. Recent systematic reviews have found low level evidence for surgical intervention and injection therapies, and an absence of studies on conservative management. Recently, a novel self-management approach with exercise, education, and activity modification, demonstrated favorable outcomes for adolescents with patellofemoral pain and Osgood-Schlatter in prospective cohort studies. Aim The aim of this trial is to assess the effectiveness of the novel self-management approach compared to usual care in improving self-reported knee-related function in sport (measured using the KOOS-child ‘Sport/play’ subscale) after a 5-month period. Methods This trial is a pragmatic, assessor-blinded, randomized controlled trial with a two-group parallel arm design, including participants aged 10–16 years diagnosed with Osgood-Schlatter. Participants will receive 3 months of treatment, consisting of either usual care or the self-management approach including exercise, education, and activity modification, followed by 2 months of self-management. Primary endpoint is the KOOS-child ‘Sport/play’ score at 5 months. This protocol details the planned methods and procedures. Discussion The novel approach has already shown promise in previous cohort studies. This trial will potentially provide much-needed level 1 evidence for the effectiveness of the self-management approach, representing a crucial step towards addressing the long-term pain and limitations associated with Osgood-Schlatter. Trial registration Clinicaltrials.gov: NCT05174182. Prospectively registered December 30th 2021. Date of first recruitment: January 3rd 2022. Target sample size: 130 participants.
... A better understanding of the RCI and the MID is that they are different quantities (De Vet & Terwee, 2010;Terwee et al., 2009). They can provide different levels of confidence, support different inferences, rely on different data, and may both be useful (De Vet et al., 2006). There is no contradiction in finding that a change is likely clinically meaningful even if the evidence that the difference is greater than 0 is weak. ...
Article
Full-text available
The reliable change index (RCI) is a widely used statistical tool to account for measurement error when evaluating difference scores. However, there is considerable debate regarding its use. Several researchers have demonstrated ways that the RCI is insufficient or invalid, and others have defended its use for various applications. The aims of this article are to describe the formulation, rationale, and operationalization of the RCI, and critically evaluate whether it is appropriate when using self-report data, especially in clinical psychology. This evaluation finds that the RCI is rarely the best available method; is easily miscalculated, misinterpreted, and misunderstood; and produces incorrect inferences more often than alternatives, largely because it is highly insensitive to real changes. It is argued that the RCI effectively discourages the collection of appropriate data for longitudinal analysis which would benefit from more than two observations, and many applications of the RCI are inaccurate because they use inappropriate estimates of reliability. Better approaches to determining the reliability of changes are required to meet clinical needs and operationalize research questions. Several alternative methods to conceptualize and operationalize reliability of change and treatment outcomes are presented. While the RCI is easy to use, it is also easy to misuse and it fails to address the central issue: two observations of a noisy measure are insufficient data to estimate change and error.
Article
Full-text available
Objective The purpose of this research was to design and psychometrically validate a new instrument (the Biobehavioural Pain and Movement Questionnaire/BioPMovQ), which assesses the relationship between pain and various factors related to motor behaviour from a biobehavioural perspective. Methods A mixed-method design combining a qualitative study with an observational and cross-sectional study was employed to develop (content validity) and psychometrically validate (construct validity, reliability and concurrent/discriminant validity) a new instrument. A total of 200 patients with chronic musculoskeletal pain were recruited. Results According to the exploratory factor analysis, the final version of the BioPMovQ consists of 16 items distributed across 4 subscales (1, disability, 2, self-efficacy for physical activity; 3, movement avoidance behaviours; and 4, self-perceived functional ability), all with an eigen value greater than 1, explaining 55.79% of the variance. The BioPMovQ showed high internal consistency (Cronbach’s α = 0.82; McDonald’s ω = 0.83). The intraclass correlation coefficient was 0.86 (95% confidence interval 0.76 to 0.91), which was considered to demonstrate excellent test–retest reliability. The standard error of measurement and minimal detectable change were 3.43 and 8.04 points, respectively. No floor or ceiling effects were identified. There was a positive, significant and moderate magnitude correlation with the Graded Chronic Pain Scale (r = 0.54), kinesiophobia (r = 0.60), pain catastrophising (r = 0.44) and chronic pain self-efficacy (r = −0.31). Conclusion The BioPMovQ showed good psychometric properties. Based on the findings of this study, the BioPMovQ can be used in research and clinical practice to assess patients with chronic musculoskeletal pain.
Article
Background Minimal clinically important differences (MCIDs) quantify the clinical relevance of quality of life results at the individual patient and group level. The aim of this study was to estimate the MCID for the Brief Fatigue Inventory (BFI) and the Worst and Usual Fatigue items in patients with brain or CNS cancer undergoing curative radiotherapy. Methods Data from a multi-site prospective registry was used. The MCID was calculated using distribution-based and anchor-based approaches. For the anchor-based approach, the fatigue item from the PROMIS-10 served as the anchor to determine if a patient improved, deteriorated, or had no change from baseline to end of treatment (EOT). We compared the unadjusted means on the BFI for the 3 groups to calculate the MCID. For the distribution-based approaches, we calculated the MCID as 0.5 SD of the scores and as 1.96 times the standard error of measurement. Results Three-hundred and fifty nine patients with brain or CNS tumors undergoing curative radiotherapy filled out the 9-item BFI at baseline and EOT. The MCID for the BFI was 1.33 (ranging from 0.99 to 1.70 across the approaches), 1.51 (ranging from 1.16 to 2.02) and 1.76 (ranging from 1.38 to 2.14) for the usual and worst fatigue items, respectively. Conclusions This study provides the MCID ranges for the BFI and Worst and Usual fatigue items, which will allow clinically meaningful conclusions to be drawn from BFI scores. These results can be used to select optimal treatments for patients with brain or CNS cancer or to interpret BFI scores from clinical trials.
Article
Full-text available
In recent years quality of life instruments have been featured as primary outcomes in many randomized trials. One of the challenges facing the investigator using such measures is determining the significance of any differences observed, and communicating that significance to clinicians who will be applying the trial results. We have developed an approach to elucidating the significance of changes in score in quality of life instruments by comparing them to global ratings of change. Using this approach we have established a plausible range within which the minimal clinically important difference (MCID) falls. In three studies in which instruments measuring dyspnea, fatigue, and emotional function in patients with chronic heart and lung disease were applied the MCID was represented by mean change in score of approximately 0.5 per item, when responses were presented on a seven point Likert scale. Furthermore, we have established ranges for changes in questionnaire scores that correspond to moderate and large changes in the domains of interest. This information will be useful in interpreting questionnaire scores, both in individuals and in groups of patients participating in controlled trials, and in the planning of new trials.
Article
Background and Purpose. One purpose of this study was to determine whether the Roland-Morris Back Pain Questionnaire (RMQ) could be used to detect clinically meaningful change in individual patients. The construct that served as the basis for this study was that RMQ change scores should be greater for patients meeting their treatment goals than for patients who did not meet their goals. The second purpose of the study was to determine whether sensitivity to change (STC) varies depending on the magnitude of the initial RMQ score. Subjects and Methods. Of the 143 patients with low back pain who completed the study, 104 patients achieved their goals and 39 patients did not achieve their goals. Receiver operating characteristic (ROC) curve analysis and likelihood ratios were used to determine the RMQ change scores that best classify patients as having met or not met their goals. Results. The area under the ROC curve for the entire RMQ scale was 0.68, while the curve areas for smaller RMQ intervals varied from 0.80 to 0.97. Conclusion and Discussion. The STC for the entire RMQ scale was poor for the construct examined in this study. The likelihood ratios for smaller RMQ intervals support the construct validity of the RMQ for assessing change in disability. Initial RMQ score magnitudes must be taken into account to improve the rate of making correct predictions about whether meaningful change in disability will occur following treatment.
Article
Background and Purpose. The purpose of this study was to assess the reliability, construct validity, and sensitivity to change of the Lower Extremity Functional Scale (LEFS). Subjects and Methods. The LEFS was administered to 107 patients with lower-extremity musculoskeletal dysfunction referred to 12 outpatient physical therapy clinics. Methods. The LEFS was administered during the initial assessment, 24 to 48 hours following the initial assessment, and then at weekly intervals for 4 weeks. The SF-36 (acute version) was administered during the initial assessment and at weekly intervals. A type 2,1 intraclass correlation coefficient was used to estimate test-retest reliability. Pearson correlations and one-way analyses of variance were used to examine construct validity. Spearman rank-order correlation coefficients were used to examine the relationship between an independent prognostic rating of change for each patient and change in the LEFS and SF-36 scores. Results. Test-retest reliability of the LEFS scores was excellent (R=.94 [95% lower limit confidence interval (CI)=.89]). Correlations between the LEFS and the SF-36 physical function subscale and physical component score were r=.80 (95% lower limit CI=;73) and r=.64 (95% lower limit CI=.54), respectively. There was a higher correlation between the prognostic rating of change and the LEFS than between the prognostic rating of change and the SF-36 physical function score. The potential error associated with a score on the LEFS at a given point in time is +/-5.3 scale points (90% CI), the minimal detectable change is 9 scale points (90% CI), and the minimal clinically important difference is 9 scale points (90% CI). Conclusion and Discussion. The LEFS is reliable, and construct validity was supported by comparison with the SF-36. The sensitivity to change of the LEFS was superior to that of the SF-36 in this population. The LEFS is efficient to administer and score and is applicable for research purposes and clinical decision making for individual patients.
Article
Introduction: There has been increased recent attention to the clinical meaningfulness of group change scores on health-related quality of life (HRQL) questionnaires. It has been assumed that improvements and declines of comparable magnitude have the same meaning or value. Method: We assessed 308 cancer patients with the Functional Assessment of Cancer Therapy (FACT) and a Global Rating of Change. Patients were classified into five levels of change in HRQL and its dimensions based upon their responses to retrospective ratings of change after 2 months: sizably worse, minimally worse, no change, minimally better, and sizably better. Raw score and standardized score changes on the FACT-G subscales and total score were then compared across different categories of patient-rated change. Results: The relationship between actual FACT change scores and retrospective ratings of change was modest but usually statistically significant (r: 0.07 to 0.35). Change scores associated with each retrospective rating category were evaluated to determine estimates of meaningful difference. Patients who reported global worsening of HRQL dimensions had considerably larger change scores than those reporting comparable global improvements. Although related to a ceiling effect, this remained true even after removing cases that began near the ceiling of the questionnaire. Discussion: Relatively small gains in HRQL have significant value. Comparable declines may be less meaningful, perhaps due to patients' tendency to minimize personal negative evaluations about one's condition. This has important implications for the interpretation of the meaningfulness of change scores in HRQL questionnaires. Factors such as adaptation to disease, response shift, dispositional optimism and the need for signs of clinical improvement may be contributing to the results and should be investigated in future studies.