Content uploaded by Janice Abbott
Author content
All content in this area was uploaded by Janice Abbott on Apr 24, 2014
Content may be subject to copyright.
Original article
Comparison of the psychometric properties of
health-related quality of life measures used in
adults with systemic lupus erythematosus: a review
of the literature
Madhura Castelino
1
, Janice Abbott
2
, Kathleen McElhone
1
and Lee-Suan Teh
1
Abstract
Objective. A review of the literature was undertaken to evaluate the development and psychometric
properties of health-related quality of life (HRQoL) measures used in adults with SLE. This information
will help clinicians make an informed choice about the measures most appropriate for research and
clinical practice.
Methods. Using the key words lupus and quality of life, full original papers in English were identified from
six databases: OVID MEDLINE, EMBASE, Allied and Complementary Medicine, Psychinfo, Web of Science
and Health and Psychosocial Instruments. Only studies describing the validation of HRQoL measures in
adult SLE patients were retrieved.
Results. Thirteen papers were relevant; five evaluated generic instruments [QOLS-S (n = 1), EQ-5D/SF-6D
(n = 1), SF-36 (n = 3)] and eight evaluated disease-specific measures [L-QOL (n = 1), LupusQoL (UK) (n = 1),
LupusQoL (US) (n = 1), SSC (n = 2), SLEQOL (n = 3)]. For the generic measures, there is moderate evidence
of good content validity and internal consistency, whereas there is strong evidence for both these psy-
chometric properties in disease-specific measures. There is limited to moderate evidence to support the
construct validity and testretest reliability for the disease-specific measures. Responsiveness and floor/
ceiling effects have not been adequately investigated in any of the measures.
Conclusions. Direct comparison of the psychometric properties was difficult because of the different
methodologies employed in the development and evaluation of the different HRQoL measures.
However, there is supportive evidence that multidimensional disease-specific measures are the most
suitable in terms of content and internal reliability for use in studies of adult patients with SLE.
Key words: quality of life, development, validation, systemic lupus erythematosus.
Introduction
SLE is a chronic inflammatory autoimmune disorder
with variable multi-system involvement that affects
primarily young women. The varied manifestations, the
unpredictable relapsingremitting course of the disease,
side effects of potentially toxic treatments and poor
understanding of the condition by the general public all
have an impact on patients, leading to dissatisfaction
in various domains of their life [1]. Improvement in survival
[2] has not reflected a similar improvement in the quality
of life [3] for SLE patients. As this condition affects a rela-
tively younger age group, with subsequently longer dis-
ease duration, the clinical manifestations may have
far-reaching psychological and social consequences [4].
Objective assessments of disease activity and damage
are gauged by the clinician and do not capture the
patient’s perspective of their health [3]. Therefore, more
recently it has been emphasized that patient-reported in-
struments such as those measuring health-related quality
of life (HRQoL) should be one of the outcome measures in
1
Rheumatology Department, Royal Blackburn Hospital, Blackburn and
2
School of Psychology, University of Central Lancashire, Preston, UK.
Correspondence to: Lee-Suan Teh, Department of Rheumatology,
Administration Block, Level 1, Royal Blackburn Hospital, Haslingden
Road, Blackburn BB2 3HH, UK. E-mail: lsteh@btinternet.com
Submitted 15 May 2012; revised version accepted 29 October 2012.
!
The Author 2012. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com
RHEUMATOLOGY
Rheumatology 2013;52:684696
doi:10.1093/rheumatology/kes370
Advance Access publication 22 December 2012
CLINICAL
SCIENCE
at centlancs1 on April 24, 2014http://rheumatology.oxfordjournals.org/Downloaded from
clinical trials [5]. This has been advocated by both the US
Food and Drug Administration (FDA) [6] and the European
Medicines Agency (EMA) [7].
HRQoL measures, in the form of questionnaires, have
been either developed exclusively for use in SLE
(disease-specific measures) or have been used in SLE pa-
tients but developed for evaluation of quality of life in any
disease state or healthy individuals (generic measures).
An instrument with good psychometric properties will
be able to determine the HRQoL of the patient more
accurately than one without. Knowledge of acceptable
psychometric standards, the conceptual framework and
appraisal of the developmental process of HRQoL tools
would help determine the adequacy of the measure for
clinical use, for example, to determine the effectiveness
of an intervention.
The aim of this work was to compare the psychometric
properties of all published HRQoL measures (generic
and disease specific) that have been developed and/or
evaluated for use in adults with SLE. This should provide
valuable information to clinicians on the appropriateness
of the instrument for measuring HRQoL in their clinical
practice as well as in research studies.
Materials and methods
Search strategy
A literature search was carried out using the keywords
lupus and quality of life. The search was limited to full
papers in the English language and those pertaining
to adult patients. The following databases were searched
up to November 2010: OVID MEDLINE (from 1950),
EMBASE (from 1980), Health and Psychosocial
Instruments (from 1985), Allied and Complementary
Medicine (from 1985), Psychinfo, PubMed and Web of
Science.
The papers were assessed by all the authors based on
the eligibility criteria as defined below:
Inclusion criteria were papers that described the meth-
odology of the development and validation of HRQoL
measures in SLE; linguistic translation and evaluation of
an existing HRQoL measure and papers that primarily
evaluated the measures for SLE patients.
Exclusion criteria were inadequate numbers of SLE
patients recruited for the evaluation (Fig. 1) and HRQoL
measures published only as abstracts.
Papers that fulfilled any one of the inclusion criteria
in the absence of the exclusion criteria were included in
this review.
After accounting for duplicates, a total of 374 papers
were identified, and after reviewing titles and abstracts,
20 full papers were identified as possibly suitable for this
systematic review (Fig. 1). References of these papers
were also screened for additional relevant papers and
no further papers were identified. All 20 papers were
read by all the authors independently, and using the eligi-
bility criteria, 13 were identified as suitable for inclusion in
this systematic review. The reasons for exclusion of the
seven papers [814] are explained in Fig. 1.
Data extraction and quality assessments
Data extraction was carried out independently by all the
authors. Demographic and clinical data, and information
on the description of the instruments and their psycho-
metric properties were extracted. The demographic and
clinical data included were age, gender, disease duration,
disease activity and damage. Descriptive information of
the scales included the number of items, domains,
ranges of score, mode of administration, time to adminis-
ter and recall period. The psychometric properties
extracted were validity: content, construct (convergent
and divergent), concurrent (criterion) and cross-cultural;
reliability: internal and testretest; responsiveness and
the floor/ceiling effects. A template was used to assess
the psychometric properties of the instrument and the
quality of the methodology used to determine that prop-
erty. The template was based on both the numerical
criteria for the qualitative evaluation of an HRQoL instru-
ment proposed by Terwee et al. [15] and the consensus
published as the original Consensus Based Standards for
the Selection of Health Measurement Instruments
(COSMIN) checklist [1619].
Scoring criteria for evaluation of psychometric
properties
The strength of the evidence was rated both for a psycho-
metric property and the robustness of the methodology
used to determine that property as follows: (i) strong evi-
dence to support the property and the robustness of the
methodology used to evaluate it was schematically rated
as three pluses, +++; (ii) moderate evidence was denoted
by two pluses, ++; (iii) limited evidence, one plus, +; (iv) no
evidence, a minus, ; (v) if interpretation was difficult, a
question mark,?; or (vi) not assessed by NA. Any discre-
pancies in the scores for the measurement properties
were discussed by all the authors and agreement was
achieved based on the available information.
Psychometric properties
Psychometric properties of a measurement instrument are
broadly classified into three domains: validity, reliability
and responsiveness. The validity of the instrument in-
cludes evaluation of content, construct (convergent/diver-
gent) and criterion validity. Content validity ensures that
the measure is sensible, relevant and comprehensively
covers all aspects of the condition assessed. In this
study, content validity was rated positive if there was
involvement of experts (doctors, nurses and social scien-
tists) as well as patients at the stage of questionnaire
development. Construct validity evaluates the robustness
of the structure and determines the subscales of the
questionnaire. Convergent validity was judged to be
adequately demonstrated if there was high positive cor-
relations between scales and divergent validity, if correl-
ations were low or if they were negative. Assessment of
the instrument against the true value or against a gold
standard is termed concurrent (criterion) validity. A posi-
tive rating was given if convincing arguments were pre-
sented that the comparator questionnaire really was the
www.rheumatology.oxfordjournals.org 685
HRQoL measures used in adults with SLE
at centlancs1 on April 24, 2014http://rheumatology.oxfordjournals.org/Downloaded from
gold standard and the correlation was 50.7. Reliability
assesses the reproducibility and consistency of an instru-
ment. Internal reliability or internal consistency measures
the extent to which items within a subscale are concep-
tually related and the acceptable statistical value is
Cronbach’s a 50.7 [15]. Testretest reliability measures
the stability of a questionnaire and is gauged by the
intraclass correlation coefficient (ICC). An ICC >0.7 is
considered adequate [15]. Responsiveness is the ability
of the scale to detect changes within the same patient
in longitudinal studies. The instrument was considered
to have floor or ceiling effects if >15% of respondents
scored at the extreme ends of the scale. The
generalizability of the questionnaire was assessed by es-
tablishing if the study population was adequately
described to help clinicians extrapolate the results to
their respective patient cohorts.
Results
Measures included in the review
The measures reviewed can be subdivided into generic or
disease-specific HRQoL measures. The generic measures
used in SLE were the Medical Outcome Survey Short
Form-36 (MOS SF-36 version 1) [2022], Quality of Life
Scale, Swedish version (QOLS-S) [23], Short Form-6D
(SF-6D) [24] and EuroQoL-5D (EQ-5D) [24]. These four
measures were evaluated in five studies, one for QOLS-
S, three on the SF-36 and one study utilizing both the
SF-6D and the EuroQoL-5D. The disease-specific meas-
ures used in SLE were the Systemic Lupus Erythematosus
Symptom Checklist (SSC) [25, 26], Systemic Lupus
Erythematosus-Specific Quality of Life instrument
(SLEQOL) [2729], LupusQoL [30, 31] and L-QoL [32].
The SSC, SLEQOL and LupusQoL have all undergone
cross-cultural evaluation [26, 28, 29, 31].
Description of patients
The demographic data of the patients are tabulated in
Table 1. The gender distribution in the studies reflects
the incidence of disease in both genders. Exceptions
were the QOLS-S and the developmental phases of
both the original LupusQoL [referred to as LupusQoL
(UK)] and L-QoL in that only females participated. For
the instruments developed in Europe, the age distribution,
disease duration and disease activity of the samples were
similar. The studies within the Chinese population
included younger patients with less disease activity and
damage and shorter disease duration. For ethnicity distri-
bution, some authors [22, 24, 32] did not mention the
ethnic profile of the samples.
Description of the questionnaires
This is summarized in supplementary Table S1, available
as supplementary data at Rheumatology Online. All meas-
ures except the SSC and L-QOL have multiple domains.
All the questionnaires were developed in English-speaking
populations except the QOLS-S (Swedish) and the SSC
(Dutch). Cross-cultural validation was undertaken for the
SF-36 (to Chinese), SSC (to Brazilian Portuguese),
SLEQOL (to Brazilian Portuguese) and LupusQoL (UK)
(to US English).
The number of items/response options, scoring range/
interpretation, time for administration and recall time for
each measure are also summarized in supplementary
Table S1, available as supplementary data at
Rheumatology Online. The response options varied in
the different measures with most questionnaires using a
5-point or 7-point Likert scale [33]. For the SF-36, SF-6D,
QOLS-S, EQ-5D and LupusQoL, higher scores reflected
better health, while for SLEQOL, SSC and L-QOL the re-
verse was true. The time for administration of the ques-
tionnaires, when stated, was <10 min for all measures.
The measures had varying numbers of items (540) and
recall periods (present time to the previous 4 weeks and
up to the previous year for the general health item in the
SF-36) with the most typical recall period being the previ-
ous 4 weeks.
Psychometric properties of the questionnaire
The psychometric properties tabulated in Tables 2 and 3
include validity (content, criterion and construct), reliability
(internal consistency and testretest reliability) and re-
sponsiveness. These properties are further described/
analysed in the following paragraphs.
Content validity
Qualitative interviews or other ways of involving patients
and experts (rheumatologists and nurse clinicians) were
an essential part of the development of the disease-spe-
cific measures and the QOLS-S. In the case of generic
measures, the content validity was assumed for an SLE
population.
Construct validity
The construct validity was evaluated using different meth-
ods and statistical analyses in the different papers. Factor
analysis carried out for the SF-36 failed to sufficiently sup-
port the proposed domains in the studies [21, 22].
QOLS-S used the consensus model, whereas SF-6D
and EQ-5D had no factor analysis done. In comparison,
the factor analyses for all of the disease-specific meas-
ures except the SLEQOL confirmed the dimensionality
and the domain structure. The developers of SLEQOL
based the final subsections on convenience.
Correlation coefficients were used to determine conver-
gent and divergent validity to test a priori hypotheses.
Convergent validity was evaluated against different meas-
ures in the various studies. This was not assessed ad-
equately for the SF-36, SF-6D and EQ-5D. The QOLS-S
was compared against the Arthritis Impact Measurement
Scale, with moderate correlation between the psycho-
logical score and the QOLS-S for SLE patients. All the
disease-specific measures used the SF-36 as a compara-
tor measure for convergent validity, except the L-QoL,
which used the Nottingham Health Profile (NHP). The
LupusQoL (US) used the EQ-5D in addition to the
SF-36. Moderate to strong correlations were noted for
www.rheumatology.oxfordjournals.org 687
HRQoL measures used in adults with SLE
at centlancs1 on April 24, 2014http://rheumatology.oxfordjournals.org/Downloaded from
TABLE 1 Demographic data of the studies included in this systematic review
Measure
First author and
year [ref] Number of patients in the studies
Female,
n (%)
Age, mean
(S.D.), years
Disease duration,
mean (S.D.), years
Disease activity,
median (range)
Disease damage,
median (range)
SF-36 v1 Stoll 1997 [20] n = 150 143 (95) 40 (11.0) 10 (7) BILAG 5 (37) SLICC 1 (02)
Thumboo 1999 [21] n = 118 112 (94.5) NS 3.61 (0.0116.1)
a
BILAG 2 (015) SLICC 0 (08)
Thumboo 2000 [22] n = 69 61 (88.4) NS 4.7 (0.125.5)
b
BILAG 3 (012) SLICC 0 (06)
QOLS-S Burckhardt 1992 [23] n = 50 50 (100) 43.5 (12.7) 13.9 (10.2) NS NS
EQ-5D/SF-6D Aggarwal 2009 [24] n = 167 156 (93.5) 42.5 (13.0) 9.3 (8.8) SLEDAI 6.2 (5.7)
c
SLICC 2 (2)
b
SSC Grootscholten 2003 [25] Group I: n = 87 82 (94) 32.5 (2071)
b
8(130)
b
SLEDAI 4 (16) NS
Group II: n = 33 29 (88) 37.0 (1864)
b
9.2 (126)
b
SLEDAI 4 (16) NS
Freire 2007 [26] n = 50 44 (93) 34.2 (12.0) 6.5 (7.4) SLEDAI2K 7.2 (4.7)
c
NS
SLEQOL Leong 2005 [27] Initial group: n = 100 89 (89) 39.4 (13.7) NS NS NS
Test group: n = 275 248 (90.5) 40.1 (13.4) NS SLEDAI 2.74 (4.82)
c
SLICC 0.67 (06)
Kong 2007 [28] n = 237 213 (89.9) 47.63 (11.91) NS NS NS
Freire 2010 [29] n = 107 106 (99.5) 36.8 (12) 5.9 (5.6) NS NS
LupusQoL McElhone 2007 [30] Interview: n = 30 30 (100) 48.1 (13.1) 9.2 (8.4) NS NS
Patient feedback on draft: n = 20 20 (100) 52.0 (15.2) 11.2 (6.1) NS NS
Psychometric testing version 1: n = 322 299 (93) 45.1 (13.4) NS NS NS
Psychometric testing version 2: n = 215 206 (96) 46.2 (13.3) NS NS NS
Psychometric testing version 3: n = 160;
(postal survey n = 115)
152 (95) 45.3 (13.9) NS NS NS
Jolly 2010 [31] n = 205 (complete data n = 186) NS (94) 42.5 (12.9) NS SLEDAI 4 (027)
[6.2 (5.8)
c
]
SLICC 1 (010)
[2.0 (2.1)
c
]
L-QoL Doward 2009 [32] Qualitative interviews: n = 50 47 (94.0) 42.6 (13.4) 8 (0.336)
b
NS NS
Field test interviews: n = 16 16 (100) 48.7 (11.1) 9.5 (122)
b
NS NS
Postal survey 1: n = 95 90 (94.7) 45.3 (15.0) 7 (150)
b
NS NS
Postal survey 2: n = 93 91 (97.8) 43.9 (12.1) 14 (137)
b
NS NS
a
Mean (range) as published;
b
median (range);
c
mean (S.D.). NS: not stated.
688 www.rheumatology.oxfordjournals.org
Madhura Castelino et al.
at centlancs1 on April 24, 2014http://rheumatology.oxfordjournals.org/Downloaded from
TABLE 2 Psychometric qualities of SLE measures
Instruments Content validity Construct validity
a
Concurrent criterion
validity
SF-36 (UK) Assumed for all
disease states
No factor analysis SF-20+ (SF-20 with
fatigue as gold
standard
Eight domains assumed
Hypotheses: known groups validity
Discriminant validity: levels of disease severity—BILAG Correlation [0.69
(0.890.31)]Divergent validity: 0.40 to 0.27 with BILAG 0.30 to 0.04 with SLICC/ACR-DI
Convergent validity with SF-20+: 0.310.89
SF-36 (Singapore) Assumed for all
disease states
Factor analysis: eight domains loading onto four factors No gold standard
(i) Physical functioning
(ii) Physical and emotional role functioning and bodily pain
(iii) Mental health, vitality and social functioning
(iv) General health
Hypotheses: divergent validity: 0.37 to 0.09 with disease activity (BILAG) and 0.25 to 0.16 with damage
(SLICC/ACR DI)
Chinese SF-36 Assumed for all
disease states
Factor analysis: four domains (physical functioning, role physical, social functioning, bodily pain) loaded onto
one factor; the other four domains loaded onto at least two factors.
No gold standard
Hypotheses: known groups validity
Divergent validity: 0.34 to 0.17 with disease activity (BILAG) and 0.35 to 0.19 with damage (SLICC/ACR DI)
QOLS-S Assumed for all
chronic diseases
[35] + additional
item (independ-
ence) [1]
No factor analysis during the original development. Consensus model—used for categorizing the items based
on original five factors item to scale correlation (t
1
= 0.210.64; t
2
= 0.440.70)
No gold standard
On recent analysis: three-factor structure [34]
Hypotheses: divergent validity: 0.00 to 0.63 with VAS-pain, Ritchie Articular Index, patient SLAM and some
of the AIMS subscales.
SF-6D Assumed No factor analysis SF-36 as gold
standardHypotheses: discriminant validity: SLEDAI used to differentiate subgroups based on disease severity and
SLICC/ACR DI for levels of damage
Strong correlations
between SF-6D and
SF-36 (0.760.57)
and PCS (0.72) but
not MCS (0.30)
Divergent validity: 0.23 with SLEDAI and 0.22 with SLICC/ACR DI
EQ-5D Not assessed No factor analysis No gold standard
Likely to have
assumed
Hypotheses: discriminant validity: based on disease severity (SLEDAI) and damage (SLICC/ACR DI) convergent
validity: 0.69 to 0.55 with corresponding domains of SF-36
Divergent validity: 0.49 to 0.24 with non-corresponding domains of SF-36
0.16 to 0.21 with SLEDAI and 0.21 to 0.20 with SLICC/ACR DI
(continued)
www.rheumatology.oxfordjournals.org 689
HRQoL measures used in adults with SLE
at centlancs1 on April 24, 2014http://rheumatology.oxfordjournals.org/Downloaded from
TABLE 2 Continued
Instruments Content validity Construct validity
a
Concurrent criterion
validity
SSC Derived from Dutch
literature; phys-
ician and patient
involvement to
add items
Exploratory factor analysis—unidimensional scale No gold standard
Hypotheses:
Correlations: 0.44 to 0.66 with SF-36, 0.26 with SLEDAI and 0.54 to 0.69 with IRGL and POMS
SLEQOL Nomination by ex-
perts;
patients-ascertai-
ned importance
and relevance of
items and item
addition (n = 100)
Factor analysis: eight factors No gold standard
Divided into six subsections for convenience
Rasch analysis: to ascertain the item difficulty
Hypotheses: not stated
Convergent validity: 0.0300.171 with SF-36, RAI Helplessness subscale
Divergent validity: 0.0030.091 with SLEDAI, SLAM, SLICC/ACR-DI
LupusQoL (UK) Expert input and
patient
semi-structured
qualitative inter-
views (n = 30)
Exploratory principal component factor analysis:eight factors No gold standard
Hypotheses: not stated
Concurrent validity: 0.710.79 with related domains of SF-36
Discriminant validity: BILAG used to differentiate levels of disease severity (seven domains) and SLICC/ACR DI
for damage (five domains).
LupusQoL (US) Assumed based on
original LupusQoL
Factor analysis—exploratory: eight domains loading onto five factors; Confirmatory: five-factor loading LupusQoL (UK) as
gold standardHypotheses: convergent validity: 0.540.73 with related domains of SF-36, 0.50 to 0.68 with EQ-5D
Discriminant validity: SLEDAI used to differentiate levels of disease severity (four domains) and SLICC/ACR DI
for damage (six domains)
L-QoL Themes and items
identified from
qualitative inter-
views reviewed by
patients (n = 50)
Factor analysis—unidimensional scale No gold standard
Hypothesis: convergent validity: moderate correlations with NHP and worse scores for poorer health or more
severe SLE—values not published
a
All values are correlation coefficient r interpreted as follows: >0.6: strong positive correlation; less than 0.6: strong negative correlation; 0.300.59: moderate positive correlation;
0.30 to 0.59: moderate negative correlation; <0.30 to 0: weak positive correlation; greater than 0.30 to 0: weak negative correlation. AIMS: Arthritis Impact Measurement Scale;
IRGL: Influence of Rheumatic Diseases on General Health and Lifestyle; MCS: Mental Component Score; PCS: Physical Component Score; POMS: Profile of Mood States; RAI:
Rheumatology Attitudes Index; SLAM: Systemic Lupus Activity Measure.
690 www.rheumatology.oxfordjournals.org
Madhura Castelino et al.
at centlancs1 on April 24, 2014http://rheumatology.oxfordjournals.org/Downloaded from
TABLE 3 Further psychometric properties of HRQoL measures used in SLE
Instruments
Internal consistency
[mean (range)]
Testretest reliability
[mean (range)] Responsiveness Floor/ceiling effect Generalizability
SF-36 (UK) Cronbach’s a (n = 150) Not assessed Not assessed Not assessed Good descriptive data of
population sample[0.85 (0.710.95)]
Missing data included in
analysis
SF-36 (Singapore) Cronbach’s a (n = 118) Repeatability coeff.
Bland and Altman
Not assessed Floor effect [19] Good descriptive data of
population sample[0.89 (0.840.94)]
[38.7 (21.169.3)]
Spearman’s rank
correlation
[0.76 (0.670.88)]
Mean: 10.92%
Handling of missing data not
explained
Range: 022.6%
Ceiling effect [19]
Mean: 24.92%
(n = 78) (t =514days)
Range: 0.458.9%
Chinese SF-36 Cronbach’s a (n = 69) Repeatability coeff.
Bland and Altman
Not assessed Not assessed Good descriptive data of
population sample[0.83 (0.720.91)]
[39.19 (21.670.29)] Missing scale items were
substitutedSpearman’s rank correl-
ation [0.81 (0.650.90)]
(n = 47) (t =514 days)
QOLS-S Cronbach’s a Correlation coeff.: 0.86 Not assessed Not assessed Good descriptive data of
population sampletime 1: 0.85 (n = not specified) (t =4
but no data on ethnicitytime 2: 0.91 weeks)
Handling of missing data
explained
SF-6D Domains derived from
SF-36
Not assessed Effect sizes: small 0.040.43
(n
= 66)
Ceiling
effect: 2.6% Good descriptive data of
population sampleFloor effect:0.67%
No factor analysis Sensitive to self-reported improve-
ment and improvement in EQ-5D
VAS
Handling of missing data not
explained
EQ-5D Not assessed Not assessed Effect sizes: small Ceiling effect: 12.7% Good descriptive data of
population sample0.0120.428 (n = 66) Floor effect: none
Sensitive to self-reported
improvement and improvement
in EQ-5D VAS
Handling of missing data not
explained
SSC Unidimensional (n = 87) Pearson’s correlation
coefficient
(n = 17) (t = 1 year) Not assessed Good descriptive data of
population sample but no
data on ethnicity
Cronbach’s a:
(n = 28) (t = 1 month)
change in SSC noted but not in
subjective patient VASSSC: 0.89
SSC: 0.78TDL: 0.89
TDL: 0.87
Missing data stated—
‘almost no missing data’
(continued)
www.rheumatology.oxfordjournals.org 691
HRQoL measures used in adults with SLE
at centlancs1 on April 24, 2014http://rheumatology.oxfordjournals.org/Downloaded from
TABLE 3 Continued
Instruments
Internal consistency
[mean (range)]
Testretest reliability
[mean (range)] Responsiveness Floor/ceiling effect Generalizability
SLEQOL Cronbach’s a (n = 51) (t = 2 weeks) (n = 115 data pairs from 95 patients) Floor effect: Good descriptive data of
population sampleOverall 0.95 (n = 275) ICC: summary score: 0.83 Mean: 28.8%
Handling of missing data not
explained
Range:14.944%
For subsections
[0.87 (0.760.93)]
ICC for subsections:
[0.63 (0.570.80)]
More sensitive but less specific
than SF-36
Ceiling effect:
Multiple statistical methods used
Mean: 0.57%
Range: 02.6%
LupusQoL (UK) Cronbach’s a (n = 160) (n = 83) (t = 4 weeks) Not assessed 10% floor effects: Good descriptive data of
population sample[0.93 (0.880.96)] ICC [0.83 (0.720.93)] Mean: 5.76%
Missing responses men-
tioned treated as
unanswered
Range: 2.210.8%
10% ceiling effect:
Mean: 17.99%
Range: 6.228.2%
LupusQoL (US) Cronbach’s a (n = 185) (n = 15) (t = 1 week) Not assessed Not assessed Good descriptive data of
population sample[0.91 (0.830.94)] ICC [0.87 (0.680.92)]
Handling of missing data not
explained
L-QoL Unidimensional scale (n = 76) (t = 2 weeks) Not assessed Floor and ceiling effects—
reported as relatively few
scored at the extremes
Good descriptive data of
population sample but no
ethnicity data
Cronbach’s a ICC: 0.95
Time 1: 0.91 (n = 93)
Time 2: 0.92 (n = 76)
Missing data stated as
minimal
n:
number of patients involved; t: time interval; TDL: total distress level.
692 www.rheumatology.oxfordjournals.org
Madhura Castelino et al.
at centlancs1 on April 24, 2014http://rheumatology.oxfordjournals.org/Downloaded from
the disease-specific measures except for the SLEQOL.
The evaluation of divergent validity and discriminant val-
idity was against the disease activity measures (the BILAG
index or the SLEDAI) and the damage index [SLICC/ACR
Damage Index (SLICC/ACR-DI)] for all studies. Weak or
no correlations supported the divergent validity of both
generic and disease-specific measures. The disease-
specific measures had moderate evidence for construct
validity, but the generic measures had limited evidence
for this.
Concurrent (criterion) validity
The SF-36, SF-6D and LupusQoL (US) were derived from
the SF-20, SF-36 and LupusQoL (UK), and therefore had
gold standards for determining the concurrent criterion
validity. The SF-36 and SF-6D showed strong correlation
coefficients in most domains when compared with the ori-
ginal instrument. Concurrent validity was not reported in
the LupusQoL (US), and thus this property could not be
adequately explored for this instrument.
Internal consistency
There was moderate to strong evidence for internal con-
sistency in all the measures except the SF-6D and EQ-5D.
The scoring was evaluated as moderate due to the smaller
sample sizes in the studies for the Chinese SF-36 and the
QOLS-S. For all the scales, the mean internal consistency
(Cronbach’s a) was >0.80. All of the disease-specific
measures had high mean ICC, although the individual do-
mains ranged from 0.57 to 0.93 across measures.
Testretest reliability
Testretest reliability varied for the different domains in the
various measures. Four of the measures [Chinese SF-36,
SSC, QOLS-S and LupusQoL (US)] used sample sizes
of fewer than 50, which is the minimum number recom-
mended for such analyses [15]. For all the instruments the
mean ICC was >0.70. The English and the Chinese ver-
sions of the SF-36 assessed in the Singapore population
employed the Bland and Altman repeatability coefficient.
Two of the domains were found to have large changes,
thus bringing into question the reliability of these domains.
The SLEQOL had good ICC for the overall score on
testretest reliability, but the scores of the subsections
were below the accepted 0.70 in four of the six
domains. The LupusQoL (UK) had strong evidence for
good methodology and ICC, but only two of the eight do-
mains had a sample size of more than 50 patients.
From the data available it is clear that the generic meas-
ures have limited evidence for testretest reliability,
whereas the disease-specific measures appear to fare
marginally better with limited to moderate evidence for
this property.
Responsiveness and floor/ceiling effect
Percentages of floor and ceiling effect were provided for
the EQ-5D, SF-6D, SLEQOL, SF-36 (Singapore) and
LupusQoL (UK). The EQ-5D and SF-6D did not show
either effect, the SLEQOL reported floor effects and the
SF-36 (Singapore) and LupusQoL (UK) were noted to have
ceiling effects in some of the domains. The responsive-
ness or sensitivity to change was assessed in four of the
HRQoL measures (SLEQOL, SSC, EQ-5D and SF-6D).
Although the SSC scores improved statistically, the pa-
tients (on treatment with cyclophosphamide) did not per-
ceive any change as per the patient’s visual analogue
scale (VAS). The authors attributed this to the psycho-
logical adaptation in patients with chronic illness and to
the small sample size. For the EQ-5D and SF-6D, only
small effect sizes were demonstrated.
Table 4 summarizes schematically the level of evidence
for the main psychometric properties of the HRQoL meas-
ures that have been evaluated in this review. There is
strong to moderate evidence for good reliability, internal
TABLE 4 Level of evidence for psychometric properties of HRQoL measures evaluated in patients with SLE
Instrument
Content
validity
Internal
consistency
Testretest
reliability
Construct
validity Responsiveness
Floor/ceiling
effect
SF-36 (UK) NA +++ NA + NA NA
SF-36 (Singapore) NA +++ ? + NA
Chinese SF-36 NA ++ ? + NA NA
QOLS-S ++ ++ ? + NA NA
SF-6D NA NA NA + + ++
EQ-5D NA NA NA + + ++
SSC ++ ++ + ++ NA
SLEQOL +++ +++ ++ ++ +
LupusQoL (UK) +++ +++ ++ ++ NA
LupusQoL (US) NA +++ ? ++ NA NA
L-QoL +++ +++ ++ ++ NA ?
Levels of evidence: +++: strong evidence for measurement property (excellent evidence of methodological quality); ++:
moderate evidence for measurement property (good evidence of methodological quality); +: limited evidence for measurement
property (study of fair methodological quality); : no evidence for measurement property; ?: interpretation difficult (poor
methodological quality); NA: not assessed.
www.rheumatology.oxfordjournals.org 693
HRQoL measures used in adults with SLE
at centlancs1 on April 24, 2014http://rheumatology.oxfordjournals.org/Downloaded from
consistency and validity for the disease-specific meas-
ures. The internal consistency was strong to moderate
for both generic and disease-specific measures.
Generalizability
The description of the study sample was available in all the
measures except QOLS-S, SSC and L-QOL. This import-
ant omission makes it difficult for the reader to determine
the population for which the measure is best suited.
Discussion
This review highlights some deficiencies in the HRQoL
measures used in SLE patients. Comparison of the stu-
dies was difficult due to the varied methodology employed
by the different authors. This is partly because the studies
were undertaken at different time periods (between 1997
and 2010) and reflected the trend in the development of
quality of life measures over the years. Despite this, all the
HRQoL measures developed/evaluated for use in SLE
patients have moderate to strong evidence for content
validity and internal consistency. The structure of a good
measure should emphasize and evaluate the domains of
importance to the patients. All the disease-specific meas-
ures have addressed this adequately. However, there was
limited evidence for construct validity for the generic
measures.
Only a stable measure can be used with confidence in
clinical studies. Accuracy in determining the changes of
HRQoL in a clinical situation is crucial if the measure has
to evaluate any improvement, deterioration or lack of
change in the patients’ quality of life. Only some of the
disease-specific measures have moderate evidence
for testretest reliability and this was not demonstrated
adequately in the generic measures.
In addition, responsiveness and floor and ceiling effects
were not evaluated in all measures. In those measures
where they were evaluated, there was limited evidence
for responsiveness of the SLEQOL, EQ-5D and SF-6D,
while the evidence for the SSC was inconclusive. For a
clinician to be able to benefit from using an HRQoL meas-
ure it is essential that responsiveness is evaluated, so that
data from interventional studies can be interpreted more
accurately.
The settings in which the surveys were administered
(outpatient clinics, inpatient, postal surveys or a mixture
of these) may have introduced bias. Although this makes it
difficult to compare, it would reflect the bias in subsequent
administration of the measures in similar settings.
Appropriate samples are important in terms of both
composition and size. Sample size calculations and the
reasoning behind the numbers recruited would have
added to the robustness of the studies. Acknowledging
missing data in any study is also important, as it not only
highlights the possible areas of difficulties for the subjects,
but also gives the researchers the opportunity to scrutinize
any deficiencies of the questionnaire. Missing data were
described in all the studies, but how they subsequently
affected the developmental process of each questionnaire
was not described in any of the papers.
The selection of the most appropriate instrument to
use in a study will be determined by the aims of that
study. A unidimensional measure lends itself more to eco-
nomic evaluation. On the other hand, a multidimensional
measure addresses the various aspects vital to the con-
cept of quality of life to an individual. Multidimensional
measures help to identify areas that need to be targeted
for intervention to improve the quality of life for an adult
with SLE.
Guidelines from the FDA and EMA encourage the use
of patient-reported outcome evaluations in studies on
the development of interventions, especially new medica-
tions. The validation of HRQoL instruments is essential
to ascertain that these tools are robust and can thus
be confidently used by clinicians. In this review, it is
evident that the methodologies employed in the process
of HRQoL development have not been uniform
across the measures. The recently published COSMIN
checklist [1619], developed by international consensus,
will hopefully inform future research and lead to a
more uniform approach that would aid comparison.
However, all the measures that have been discussed
in this review were developed prior to the publication of
this guidance.
In conclusion, based on the published studies reviewed,
the disease-specific multidimensional measures have
the strongest evidence for content and construct validity
as well as internal consistency. More studies would be
required to support the stability of these measures and
their sensitivity/responsiveness. If these properties are
supported, it would then make the disease-specific meas-
ures strong contenders for use in clinical practice and
interventional studies in adult SLE populations.
Rheumatology key messages
. Stronger evidence exists for reliability and validity
of SLE disease-specific HRQoL measures than for
generic measures.
. HRQoL measures used in SLE should be evaluated
for responsiveness to aid clinical interpretation.
Disclosure statement: The authors have declared no
conflicts of interest.
Supplementary data
Supplementary data are available at Rheumatology
Online.
References
1 Archenholtz B, Burckhardt CS, Segesten K. Quality of life
of women with systemic lupus erythematosus or
rheumatoid arthritis: domains of importance and dissatis-
faction. Qual Life Res 1999;8:4116.
2 Urowitz MB, Gladman DD, Tom BDM, Ibanez D,
Farewell VT. Changing patterns in mortality and disease
694 www.rheumatology.oxfordjournals.org
Madhura Castelino et al.
at centlancs1 on April 24, 2014http://rheumatology.oxfordjournals.org/Downloaded from
outcomes for patients with systemic lupus erythematosus.
J Rheumatol 2008;35:21528.
3 McElhone K, Abbott J, Teh L-S. A review of health related
quality of life in systemic lupus erythematosus. Lupus
2006;15:6334.
4 Seawell AH, Danoff-Burg S. Psychosocial research on
systemic lupus erythematosus: a literature review. Lupus
2004;13:8919.
5 Strand V, Gladman D, Isenberg D, Petri M, Smolen J,
Tugwell P. Endpoints: consensus recommendations from
OMERACT IV. Lupus 2000;9:3227.
6 Guidance for industry patient-reported outcome
measures: use in medical product development to support
labeling claims. 2009. http://www.fda.gov/downloads/
Drugs/GuidanceComplianceRegulatoryInformation/
Guidances/UCM193282.pdf (29 November 2011, date last
accessed).
7 Reflection paper on the regulatory guidance for the use of
health related quality of life (HRQL) measures in the
evaluation of medicinal products. 2005. http://www.ispor.
org/workpaper/EMEA-HRQL-Guidance.pdf (29 November
2011, date last accessed).
8 Moore AD, Clare AE, Danoff DS et al. Can health utility
measures be used in lupus research? A comparative
validation and reliability study of 4 utility indices.
J Rheumatol 1999;26:128590.
9 Gladman DD, Urowitz MB, Ong A et al. A comparison
of five health status instruments in patients with systemic
lupus erythematosus (SLE). Lupus 1996;5:1905.
10 Luo N, Chew LH, Fong KY et al. Validity and reliability of
the EQ-5D self-report questionnaire in Chinese-speaking
patients with rheumatic diseases in Singapore. Ann Acad
Med Singapore 2003;32:68590.
11 Luo N, Chew LH, Fong KY et al. Validity and reliability of
the EQ-5D self-report questionnaire in English-speaking
Asian patients with rheumatic diseases in Singapore. Qual
Life Res 2003;12:8792.
12 Ariza-Ariza R, Hernandez-Cruz B, Navarro-Sarabia F.
EuroQol is a useful instrument for assesing the
health-related quality of life of the patients with systemic
lupus erythematosus. Lupus 2005;14:3345.
13 Thumboo J, Fong KY, Ng TP et al. Initial construct
cross-cultural validation of the Short Form 36 for quality of
life assessment of systemic lupus erythematosus patients
in Singapore. Ann Acad Med Singapore 1997;26:2824.
14 Rood MJ, Borggreve SE, Huizinga TWJ. Sensitivity to
change of the MOS SF-36 quality of life assessment
questionnaire in patients with systemic lupus erythema-
tosus taking immunosuppressive therapy. J Rheumatol
2000;27:20579.
15 Terwee CB, Bot SD, de Boer MR et al . Quality criteria were
proposed for measurement properties of health status
questionnaires. J Clin Epidemiol 2007;60:3442.
16 Mokkink LB, Terwee CB, Patrick DL et al. The COSMIN
study reached international consensus on taxonomy, ter-
minology, and definitions of measurement properties for
health-related patient-reported outcomes. J Clin
Epidemiol 2010;63:73745.
17 Mokkink LB, Terwee CB, Patrick DL et al. The COSMIN
checklist for assessing the methodological quality of
studies on measurement properties of health status
measurement instrument: an international Delphi study.
Qual Life Res 2010;19:53949.
18 Mokkink LB, Terwee CB, Knol DL et al. The
COSMIN checklist for evaluating the methodological
quality of studies on measurement properties: a clarifica-
tion of its content. BMC Med Res Methodol 2010;18:
10
22.
19
Terwee CB, Mokkink LB, Knol DL, Ostelo RW, Bouter LM,
de Vet HC. Rating the methodological quality in system-
atic reviews of studies on measurement properties: a
scoring system for the COSMIN checklist. Qual Life Res
2011. http://www.springerlink.com/content/
4555127664034526/fulltext.pdf (29 November 2011, date
last accessed).
20 Stoll T, Gordon C, Seifert B et al. Consistency and validity
of patient administered assessment of quality of life by the
MOS SF-36; its association with disease activity and
damage in patients with systemic lupus erythematosus.
J Rheumatol 1997;24:160814.
21 Thumboo J, Fong KY, Leong KH, Feng PH, Thio ST,
Boey ML. Validation of the MOS SF-36 for quality of life
assessment of patients with systemic lupus
erythematosus in Singapore. J Rheumatol 1999;26:
97102.
22 Thumboo J, Feng PH, Boey ML, Soh CH, Thio S, Fong KY.
Validation of the Chinese SF-36 for quality of life assess-
ment in patients with systemic lupus erythematosus.
Lupus 2000;9:70812.
23 Burckhardt CS, Archenholtz B, Bjelle A. Measuring
the quality of life of women with rheumatoid arthritis or
systemic lupus erythematosus: a Swedish version of the
quality of life scale (QOLS). Scand J Rheumatol 1992;21:
1905.
24 Aggarwal R, Wilke CT, Pickard AS et al. Psychometric
properties of EuroQol 5D and Short Form 6D in patients
with SLE. J Rheumatol 2009;36:120916.
25 Grootscholten C, Ligtenberg G, Derksen RH et al. Health
related quality of life in systemic lupus erythematosus:
development and validation of a lupus specific symptom
checklist. Qual Life Res 2003;12:63544.
26 Freire EA, Guimaraes E, Maia I, Ciconelli RM. Systemic
lupus erythematosus symptom checklist: cross-cultural
adaptation to Brazilian Portuguese language and reliability
evaluation. Acta Reumatol Port 2007;32:3414.
27 Leong KP, Kong KO, Thong BY et al. Development and
preliminary validation of a systemic lupus
erythematosus-specific quality-of-life instrument
(SLEQOL). Rheumatology 2005;44:126776.
28 Kong KO, Ho HJ, Thong BY et al. Cross-cultural adapta-
tion of the Systemic Lupus Erythematosus Quality of Life
Questionnaire into Chinese. Arthritis Rheum 2007;57:
9805.
29 Freire EA, Bruscato A, Leite DR, Sousa TT, Ciconelli RM.
Translation into Brazilian Portuguese, cultural adaptation
and validation of the systemic lupus erythematosus quality
of life questionnaire (SLEQOL). Acta Reumatol Port 2010;
35:3349.
30 McElhone K, Abbott J, Shelmerdine J et al.
Development and validation of a disease-specific
health-related quality of life measure, the LupusQol, for
www.rheumatology.oxfordjournals.org 695
HRQoL measures used in adults with SLE
at centlancs1 on April 24, 2014http://rheumatology.oxfordjournals.org/Downloaded from
adults with systemic lupus erythematosus. Arthritis Care
Res 2007;57:9729.
31 Jolly M, Pickard AS, Wilke C et al. Lupus-specific health
outcome measure for US patients: the LupusQoL-US
version. Ann Rheum Dis 2010;69:2933.
32 Doward LC, McKenna SP, Whalley D et al. The develop-
ment of the L-QoL: a quality-of-life instrument specific to
systemic lupus erythematosus. Ann Rheum Dis 2009;68:
196200.
33 Likert R. A technique for the development of attitude
scales. Educ Psychol Measure 1952;12:3135.
34 Burckhardt CS, Anderson KL, Archenholtz B, Hagg O.
The Flanagan Quality of Life Scale: evidence of
construct validity. Health Qual Life Outcomes 2003;
1:59.
35 Burckhardt CS, Woods SL, Schultz AA, Ziebarth DM.
Quality of life of adults with chronic illness: a psychometric
study. Res Nurs Health 1989;12:34754.
696 www.rheumatology.oxfordjournals.org
Madhura Castelino et al.
at centlancs1 on April 24, 2014http://rheumatology.oxfordjournals.org/Downloaded from