ArticlePDF Available

Measuring the sensitivity and construct validity of six utility instruments in seven disease areas


Abstract and Figures

Background: Health services which affect the quality of life (QoL) are increasingly evaluated using cost utility analyses (CUA). These commonly employ one of a small number of multi attribute utility instruments (MAUI) to assess the effects of the health service upon utility. However the MAUI differ significantly and the choice of instrument may alter the outcome of an evaluation. Aims: The present paper has two objectives: (i) to compare the results of three measures of the sensitivity of six MAUI and the results of six tests of construct validity in seven disease areas; and (ii) to rank the MAUI by each of the test results in each disease area and by an overall composite index constructed from the tests. Methods: Patients and the general public were administered a battery of instruments which included the six MAUI, disease specific QoL instruments (DSI) and six other comparator instruments. In each disease area, instrument sensitivity was measured three ways; viz, by the unadjusted mean difference in utility between public and patient groups; by the value of the effect size and, by the correlation between MAUI and DSI scores. Content and convergent validity were tested by comparison of MAUI utilities and scores from the six comparator instruments. These included two measures of health state preferences, measures of subjective wellbeing and capabilities and generic measures of physical and mental QoL derived from the SF-36. Results: The apparent sensitivity of instruments varied significantly with the measurement method and by disease area. Validation test results varied with the comparator instruments. Notwithstanding this variability, the 15D, AQoL-8D and the SF-6D generally achieved better test results than the QWB and EQ-5D-5L.
Content may be subject to copyright.
Measuring the Sensitivity
and Construct Validity
of 6 Utility Instruments in 7 Disease Areas
Jeff Richardson, PhD, Angelo Iezzi, MSc, Munir A. Khan, PhD, Gang Chen, PhD,
Aimee Maxwell, BBNSc (Hons)
Background. Health services that affect quality of life
(QoL) are increasingly evaluated using cost utility analyses
(CUA). These commonly employ one of a small number of
multiattribute utility instruments (MAUI) to assess the ef-
fects of the health service on utility. However, the MAUI dif-
fer significantly, and the choice of instrument may alter the
outcome of an evaluation. Aims. The present article has 2
objectives: 1) to compare the results of 3 measures of the sen-
sitivity of 6 MAUI and the results of 6 tests of construct val-
idity in 7 disease areas and 2) to rank the MAUI by each of
the test results in each disease area and by an overall com-
posite index constructed from the tests. Methods. Patients
and the general public were administered a battery of instru-
ments, which included the 6 MAUI, disease-specific QoL in-
struments (DSI), and 6 other comparator instruments. In
each disease area, instrument sensitivity was measured 3
ways: by the unadjusted mean difference in utility between
public and patient groups, by the value of the effect size,
and by the correlation between MAUI and DSI scores. Con-
tent and convergent validity were tested by comparison of
MAUI utilities and scores from the 6 comparator instru-
ments. These included 2 measures of health state preferen-
ces, measures of subjective well-being and capabilities,
and generic measures of physical and mental QoL derived
from the SF-36. Results. The apparent sensitivity of instru-
ments varied significantly with the measurement method
and by disease area. Validation test results varied with the
comparator instruments. Notwithstanding this variability,
the 15D, AQoL-8D, and the SF-6D generally achieved better
test results than the QWB and EQ-5D-5L. Key words: multi-
attribute utility (MAU); sensitivity; validity; cost utility anal-
ysis. (Med Decis Making 2016;36:147–159)
The economic evaluation of health services that
alter the quality of life (QoL) commonly employ
cost utility analyses (CUA) to rank alternative ser-
vices according to the cost of obtaining an additional
quality-adjusted life-year (QALY) from the use of the
service. QALYs are defined as life-years times an
index of the QoL, measured as utility (i.e., the
strength of preference for a health state). This has
been assessed, increasingly, using one of a small
number of multiattribute utility instruments
and the validity of the economic evalua-
tions that use these instruments is therefore depen-
dent on the sensitivity and validity of the
instruments. The validity of the comparison of ser-
vices therefore depends on the validity of the com-
parisons made using these MAUI.
A review of the literature between 2000 and 2010
identified 1663 studies that had employed 1 of the
major MAUI.
These contained 392 head-to-head
comparisons of the utilities obtained from the instru-
ments. These all found a significant correlation
between MAUI scores and, in some cases, demon-
strated the existence of common latent dimensions.
However, most studies also identified significant
Received 22 December 2014 from the Centre for Health Economics,
Monash Business School, Monash University, Australia (JR, AI, MAK,
AM), and Flinders Health Economics Group, Flinders University, Aus-
tralia (GC). This research was funded by the National Health and Medical
Research Council (NHMRC) grant 1006334 and National Health and
Medical Research Council grant 1006334. Revision accepted for pub-
lication 20 September 2015.
Supplementary material and appendices for this article are available on
the Medical Decision Making Web site at
Address correspondence to Jeff Richardson, Centre for Health Eco-
nomics, Monash Business School, Monash University, Rm 284, Level
2, Bldg 75, Clayton Campus, Clayton, Vic 3800 Australia; telephone:
+61 3 9905 0754; fax: +61 3 9905 8344; e-mail: jeffrey.richardson
ÓThe Author(s) 2015
Reprints and permission:
DOI: 10.1177/0272989X15613522
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
differences between estimated utilities, with the larg-
est US-based comparison reporting an average corre-
lation between 4 MAUI of 0.71.
Most authors in the
reviewed studies concluded that instruments ‘‘are
not equivalent,’’
that they are imprecisely related,
and that their comparison warrants caution.
larly, a more recent comparison using 5 MAUI in
a prospective and follow-up study of cataract and
heart failure patients concluded that there is a ‘‘lack
of interchangeability among different preference-
based measures.’’
Between 2010 and 2015, the Web of Science and
Ovid Medline identified 56 articles as having included
3 or 4 MAUI. With few exceptions, these focused on
individual illnesses or health states: arthritis,
multiple sclerosis,
disc displace-
chronic obstructive pulmonary
and urinary incontinence.
majority of these studies note the differences obtained
by the use of different instruments and, generally, their
capacity to discriminate between the public and differ-
ing classes of patient. Several studies explicitly con-
sider instrument validity in the context of the single
Using item response theory, one study
derived test-retest standard deviations and structural
deviations for 4 MAUI.
Results were derived from
the general public, and the validity of extrapolation to
particular patient groups is not reported.
In sum, the empirical literature has demonstrated
differences obtained by MAUI but generally refrained
from recommending one instrument in preference to
another. Notwithstanding the difficulty of comparing
multidimensional instruments, it is an important
question. As recognized in the literature, the differ-
ence between MAUI has important consequences
for CUA.
An economic evaluation is more likely
to find a service to be cost effective if it employs
a MAUI that predicts a larger improvement in utility
than alternative MAUIs, and the funding of a service
by a health authority may depend on the instrument
used in its evaluation. This highlights the need for
more evidence on the comparative strengths and
weaknesses of different MAUI.
The present article is a response to this need. It
examines 2 of the key properties of a MAUI. The first
is its sensitivity, the extent to which predicted utilities
differ with a change in a person’s QoL. The second is
the instrument’s validity: evidence that the instru-
ment measures the theoretical construct that it pur-
ports to measure, which, for a MAUI, is the strength
of people’s preferences for different health states.
In practice, demonstrating validity is problematic, as
an instrument will, typically, be neither (completely)
valid nor (completely) invalid. As argued by Streiner
and Norman,
‘‘validation’’ is ‘‘a process of hypothesis
testing . . . to determine the degree of confidence we can
place on inferences we make . . . based on scores’’ (p
174). Confidence increases when the scores correlate
as expected with scores from other instruments. How-
ever, in the absence of a gold standard, the choice of
comparator instruments is contestable. Consequently,
the process of validation—in Cronbach and Meehl’s ter-
minology, the creation of a ‘‘nomological’’ network of
supporting evidence and theory—evolves with the
addition of new test results.
The present article con-
tributes to this process by reporting the results of 6 tests
of validity for 6 MAUI in 7 disease areas. Three tests of
sensitivity are additionally reported. Final compari-
sons are therefore based on 9 tests.
Tests of validity have been variously classified.
The present study is concerned with the generic cat-
egory of ‘‘construct validity’’ and, more specifically,
1) convergent validity, the strength of the association
with variables that seek to measure a similar or the
same construct (i.e., the strength of preferences),
and 2) content validity, the extent to which an instru-
ment describes or represents the full range of attrib-
utes needed to draw correct inferences with respect
to the construct. It is demonstrated by the correlation
of MAUI scores with measures of the physical and
mental QoL.
The sensitivity of an instrument reflects the corre-
spondence between the attributes of a health state
and the domains included in the instrument. Its mea-
surement is affected by the instrument’s reliability
and whether items are appropriately worded to iden-
tify changes in attributes. However, the change in
utility predicted by a MAUI also depends on the
effective measurement scale that was employed
when utilities were assigned to the instrument’s
health state descriptions. Despite each MAUI seeking
to anchor its scale at 0.00 = death and 1.00 = best
health, the effective measurement scales differ signif-
icantly between instruments.
Consequently, an
increment of utility may have a different meaning
when it has been derived from a different MAUI.
This may mask the sensitivity attributable to instru-
ment content and reliability and make evaluation of
sensitivity problematic.
The present article has 2 specific objectives. The
first is to compare 3 methods for measuring sensitiv-
ity and to compare results from 6 tests of construct
validity. The second objective is to rank instruments
by the results of each test. However, as a choice of
instrument requires a single, not multiple, ranking,
results are also combined into a single indicative
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
index of sensitivity and validity, which is used to
rank the MAUI in each of the 7 disease areas ana-
lyzed. The study is based on a large multi-instrument
comparison (MIC) survey, which is described in the
second section below along with the methods used
for measuring sensitivity and testing construct valid-
ity. The third section presents the results, which are
discussed in the fourth section. Abbreviations used
throughout the paper are defined in Box 1.
The MIC Survey
An online survey was conducted in 6 countries:
Australia, Canada, Germany, Norway, the United King-
dom, and the United States. Respondents were initially
asked to indicate if they had a chronic disease and to
rate their overall health on a visual analogue scale
(VAS), where 0.0 represented death and 100 repre-
sented ‘‘best possible health (physical, mental, social).’’
The ‘‘healthy’’ public was defined by the absence of
chronic disease and by a score greater than 70 on the
VAS. Quotas were used to achieve demographically
representative samples of the public in each country;
that is, public respondents were recruited until the pre-
determined target for each demographic cell was
reached. Quotas were also applied to obtain a target
number of respondents in each of 7 chronic disease
areas, namely, arthritis, asthma, cancer, depression,
diabetes, hearing loss, and heart disease.
Instruments used in the study are listed in Table 1,
which also reports the balance between physical and
psychosocial items in the 6 MAUI. Utilities for 5 of
the MAUI were assigned using algorithms provided
by the instruments’ authors: the 5-level EQ-5D-5L
utilities were obtained from the crosswalks pub-
lished by the EuroQoL Group. Both the MAUI and
comparator instruments, described below, were
administered in random order. The full dataset is
freely available on the AQoL website.
Comparator Instruments
Disease-specific instruments (DSI) listed in Table
1 were selected from the literature upon advice
from leading researchers in each area. Their correla-
tion with the MAUI is presented in bar charts in
Appendix A. Two preferences measures—the VAS
Box 1 Glossary
Utility Instruments Other
MAUI Multiattribute utility instrument Content validity There are sufficient items describing the
construct to permit valid inferences
AQoL-8D Assessment of QoL–8 dimensions Construct validity A general term for successful testing
a construct
EQ-5D-5L Five (response) level EQ-5D Convergent validity A close relationship (correlation) with
other measures of the construct
HUI3 Health Utilities index, version 3 Disease-specific
instrument (DSI)
A disease-specific (quality-of-life)
See Table 1 for the 7 DSI
QWB Quality of well-being Effect size (ES) Effect size calculated with the standard
deviation of the public sample
SF-6D Short Form (from the SF-36),
6 dimensions
15D 15 dimensions (no title)
Nonutility Instruments
ICECAP Capabilities Instrument
ONS Subjective Well-Being Instrument,
of the UK Office of National Statistics
Self TTO Time tradeoff on own health state
SF-36 Short form; 36 items
PCS Physical component score of SF-36
MCS Mental component score of SF-36
VAS Visual analogue scale
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
and Self Time Tradeoff (TTO)—are described in
Appendices B and C. The latter experimental instru-
ment asked respondents to tradeoff time in their pres-
ent health state against length of life.
health–related QoL was measured with the SF-36
physical component score (PCS) and psychosocial
health by both the SF-36 mental component score
(MCS) and with a new subjective well-being (SWB)
instrument developed by the UK Office of National
Statistics (ONS). It was preferred to other measures
of SWB in the MIC survey as it achieved a higher cor-
relation with all other instruments. The Capabilities
Instrument (ICECAP) instrument focuses on a spe-
cific subset of capabilities, namely, the ability to
function as an independent and emotionally ful-
filled individual, a concept close to the notion of
eudemonia. Irrespective of its nomenclature, in the
recent literature it has been nominated as an impor-
tant measure of well-being, which supports its use as
a test of the content validity of the MAUI.
Eight edit criteria were employed. Six of these
were based on 6 questions, which were repeated at
some stage of the survey. Results were not used
when responses to these questions differed by 2 or
more response categories or when 2 or more of the
answers differed by more than 1 response category.
Surveys were removed that were completed in less
than 20 min, which was deemed to be the minimum
time for the provision of more than 300 items of
information. The edit procedures, the questionnaire,
and its administration are described in Richardson
and others.
The online survey was administered
by a global panel company, CINT Pty Ltd. The survey
was approved by Monash University Human
Research Ethics Committee, Monash University, Mel-
bourne, Australia, reference number CF11/3192-
In each of the 7 disease areas, 3 measures of sensi-
tivity were estimated and compared. These were 1)
unadjusted differences between public and patient
mean utilities, 2) the Cohen effect size (the mean dif-
ferences divided by the standard deviation of the
public sample), and 3) the Pearson correlation
between MAUI utilities and scores from the relevant
DSI. The latter 2 measures are independent of scale
effects and were combined into a single index of
Convergent and content validity were tested and
compared using the correlation between each of the
MAUI and the comparator (criterion) variables. Based
on the correlation between them, the criterion varia-
bles were collapsed into 3 broad categories: 1) prefer-
ence, 2) physical, and 3) psychosocial variables. The
utility that MAUI seek to measure refers to the
strength of a person’s preferences, so convergent val-
idity was tested by the correlation between MAUI and
an index of the 2 preference measures. Content valid-
ity was tested by the correlation between MAUI and
indices of physical and psychosocial variables.
In each case in which an index was constructed,
there was no overarching criterion for weighting the
importance of each test. Results are also measured
on different scales. To combine test results, each
result used in an index was therefore converted to
a percentage of the highest test result for that test in
Table 1 Instruments by Role in the Analysis
Subject of Analysis Analysis of Sensitivity Analysis of Validity
Multiattribute Utility
Criterion (Comparator)
Instruments (Generic Well-Being)
Items Arthritis AIMS2-SF
Physical/pain Psychosocial Asthma AQLQ
Capabilities (ICECAP)
Physical component score (PCS) of SF-36
Mental component score (MSC) of SF-36
4 1 Cancer QLQ C-30
Subjective Well-Being (ONS)
3 3 Depression DASS21
Preferences VAS (see Appendix A)
6 2 Depression K10
Preferences Self TTO
(see Appendix B
66 clustered problems; 3 items Diabetes Diabetes-39
10 5 Hearing loss APHAB
10 25 Heart MacNew Instrument
Note: Instruments are described in detail in Richardson and others.
Superscript numbers indicate the reference source.
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
a disease group, and indices were constructed as the
unweighted average of the converted test scores. The
procedure gives equal importance to each test in the
index and equal importance to the same relative dif-
ferences that occur on each of the component scales.
There is also no overarching criterion for combin-
ing the indices of sensitivity and validity, but the
ranking of instruments requires their implicit or
explicit combination. Indicative summary scores
were therefore calculated using the methodology
described above; that is, the indices of sensitivity
and validity for each instrument were converted to
a percentage of the highest index in the disease group
and the indicative score calculated as the average of
the 2 converted indices.
Data were obtained from 9665 individuals. Edit
procedures described above resulted in the removal
of 17% of the total: 9.3% as a result of inconsistencies
on individual repeated questions, 4.8% because of
multiple inconsistencies, and 2.9% because of the
minimum time requirement. Numbers removed by
edit criteria are given in country-specific reports
available on the AQoL website.
The remaining
7933 respondents are classified by age, gender, edu-
cation, and disease area in Supplementary Table S1.
Patient numbers varied from 772 for cancer to 943
for heart disease. The total sample contained a similar
number of respondents by country, varying from 1177
from Norway to 1460 from the United States. The use
of quotas led to a demographic profile for the healthy
public, which closely resembled the public in each
Summary statistics for the full survey population
are reported in Supplementary Table S2. Mean utili-
ties predicted by the MAUI for the public and total
samples vary by 27% and 33%, respectively. For
the total sample, the standard deviation varies by
a factor of 2.08. The range of scores varies by a factor
of 2.16, with minimum utilities varying from 0.3 for
the SF-6D to –0.51 for the EQ-5D-5L. The EQ-5D-5L,
AQoL-8D, and SF-6D assign the maximum utility of
1.00 to 19.1%, 0.35%, and 1.3% of the total respond-
ents, respectively; 0.3%, 1.3%, and 14.7% of utilities
predicted by the 15D, SF-6D, and AQoL-8D fall below
Differences are summarized in Figure 1. This was
constructed by ranking the utilities predicted by
each MAUI from highest to lowest and dividing
them into 100 percentile groups. The figure plots
the average utility in each percentile on the vertical
axis by its rank order on the horizontal axis. The
15D compresses utilities. SF-6D, and to a greater
extent Quality of Well-Being (QWB), utilities initially
decline very quickly but, subsequently, the curves
flatten and utilities become greater than other utili-
ties, with the exception of the 15D.
Instrument Sensitivity
Table 2 reports unadjusted differences between
mean public and patient utilities. These vary from
an average difference of 0.27 for depression to 0.10
for hearing loss. Differences vary significantly when
they are measured by different MAUIs. From the final
column, the difference varies by a factor of 3.62 for
hearing loss and by a factor of 2.11 for all patients;
that is, the estimated effect of curing a disease varies,
on average, by more than 100% depending on the
choice of instrument.
The bottom 2 rows of Table 2 indicate that, as
a measure of sensitivity, the unadjusted difference
in utility between patients and the public is likely
to be confounded and possibly dominated by differ-
ences in the measurement scale of the MAUI. The
rank order of the average difference in utilities
(HUI 3 first; SF-6D last) is almost identical to the
rank order of the range of observations. Consistent
with this, independent analysis of the same data
found, on average, that 30.3% of the difference
between instrument utilities could be explained by
linear scale effects.
The alternative measures of
sensitivity—the Cohen effect size and the correla-
tion with the DSI—take account of linear differences
in the scale.
The 3 measures of sensitivity are reported and
compared in Figure 2a–h. For each disease area, the
6 MAUI are rank ordered in the 3 columns by the
score from the 3 measures of sensitivity, which is
reported in brackets below the MAUI name. The loca-
tion of the MAUI name in each column visually indi-
cates the relative numerical value of the measure on
a 100-point scale. Thus, in Figure 2h, the difference
in mean public and patient utilities for all disease
areas varies from 0.22 for HUI 3 to 0.11 for SF-6D.
Therefore, these 2 instruments are located on the
standardized scale at 100 and 0.0, respectively.
From column 2, the average effect size varies from
1.93 for the 15D to 1.02 for SF-6D, and these 2 instru-
ments are located at 100 and 0.0 on the scale, respec-
tively. In the final column, the Pearson correlation
between the DSI and MAUI varies from 0.56 for
AQoL-8D to 0.37 for the QWB, and these instruments
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
are located on the scale at the 100 and 0.0, respec-
tively. These average results indicate a difference of
100%, 89.2%, and 51% between the maximum and
minimum scores for the 3 measures. The rank order
of the scores varies substantially, and the order varies
by disease area.
Figure 1 Mean multiattribute utility by ranked percentile. Source: Richardson and others.
Table 2 Difference between Public and Patient Mean Utilities by Disease and Multiattribute Utility
EQ-5D SF-6D HUI3 15D QWB AQoL-8D Average Max/Min
Public (mean) 0.88 0.80 0.88 0.94 0.74 0.83 0.84
Public-patient mean difference
Arthritis 0.24 0.13 0.26 0.13 0.17 0.19 0.19 2.02
Asthma 0.12 0.09 0.12 0.09 0.12 0.14 0.11 1.45
Cancer 0.18 0.11 0.20 0.12 0.14 0.16 0.15 1.85
Depression 0.29 0.20 0.35 0.18 0.21 0.38 0.27 2.04
Diabetes 0.17 0.10 0.20 0.11 0.13 0.17 0.15 1.96
Hearing loss 0.09 0.05 0.18 0.06 0.11 0.11 0.10 3.62
Heart disease 0.16 0.10 0.18 0.11 0.14 0.15 0.14 1.84
Average 0.18 0.11 0.22 0.12 0.14 0.18 0.16 2.11
Rank order
Average difference 2 6 1 5 4 2
Range 1 6 2 5 4 3
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
Figure 2 Three measures of sensitivity: each figure ranks the multiattribute utility instruments (MAUIs) by (1) differences between mean
utilities of the patient and public group; (2) by the effect size calculated using the standard deviation of the public; and (3) the Pearson
correlation between the MAUI and the disease-specific, quality-of-life instrument (DSI). Numbers in brackets are the absolute differences
(column 1), effect sizes (column 2) and correlation coefficients (column 3).
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
Construct Validity
The correlation between criteria instrument scores
is reported in Supplementary Table S3. The compar-
atively high correlation between the MCS, ONS, and
ICECAP contrasts with their low correlation with the
PCS and indicates that the 3 instruments are more
closely related to psychosocial than to physical
attributes. Consequently, in the summary Table 3,
the correlations between MAUI and these 3 variables
were averaged to obtain a single index of psychoso-
cial content. Only a single index of physical health–
related QoL was available, namely, the PCS of the
SF-36, and it was employed as the index of physical
content. Correlations between the 2 preference
instruments, the VAS and Self TTO, were also aver-
aged to obtain a single index.
A relatively clear pattern emerges in Table 3. The
15D has the highest and the AQoL-8D the lowest aver-
age correlation with the PCS. However, with the
exception of AQoL-8D and QWB, the average correla-
tion across all disease areas is similar, differing by
only 0.07. AQoL-8D has the highest average correla-
tion with psychosocial variables and QWB the low-
est, followed by EQ-5D-5L. Differences are large,
with the average correlation varying from 0.48 (EQ-
5D-5L) to 0.72 (AQoL-8D). AQoL-8D also has the larg-
est average correlation with the preferences index
across all disease areas (0.51), with QWB and EQ-
5D-5L again having the lowest score (0.38 and 0.43,
respectively). The correlations between the MAUI
utilities and scores from individual criterion varia-
bles are reported in Appendix D for each disease area.
Indicative Summary Scores
The indices for instrument sensitivity, construct
validity, and the indicative summary scores are
reported in Table 4. While results vary by disease,
several patterns emerge. In the majority of disease
areas, the 15D obtains the highest overall composite
score (Table 4c) followed in each case by the AQoL-
8D or HUI3. Exceptions to this pattern are hearing
loss and depression, for which the HUI3 and AQoL-
8D have the highest average scores, respectively. In
contrast, the QWB consistently achieves the lowest
scores on every criterion for every disease except
hearing loss. The EQ-5D obtains the second lowest
score for 5 of the 7 diseases.
For reasons outlined below, the effect size and
therefore the indicative summary scores of the 15D
and SF-6D are inflated and deflated, respectively,
by nonlinear scale effects. Omitting the effect size
Table 3 Correlations of Multiattribute Utility Instruments with Indices of Physical and Psychosocial Content and Preferences by
Patient Groups
Category Physical Content
Psychosocial Content
Preference Measures
Index PCS Average of MCS+ICECAP+ONS Average of VAS and Self TTO
Disease area EQ-5D SF-6D HUI3 15D QWB AQoL-8D EQ-5D SF-6D HUI3 15D QWB AQoL-8D EQ-5D SF-6D HUI3 15D QWB AQoL-8D
Arthritis 0.67 0.69 0.68 0.69 0.61 0.54 0.48 0.63 0.55 0.58 0.44 0.74 0.45 0.50 0.47 0.51 0.43 0.53
Asthma 0.64 0.61 0.61 0.68 0.52 0.49 0.45 0.59 0.52 0.52 0.40 0.70 0.46 0.44 0.45 0.52 0.40 0.54
Cancer 0.66 0.71 0.63 0.70 0.61 0.57 0.50 0.63 0.56 0.57 0.47 0.74 0.43 0.4 0.44 0.49 0.39 0.52
Depression 0.57 0.62 0.53 0.66 0.51 0.48 0.48 0.57 0.57 0.51 0.42 0.69 0.41 0.41 0.44 0.44 0.35 0.48
Diabetes 0.70 0.69 0.66 0.71 0.62 0.58 0.46 0.59 0.54 0.54 0.41 0.71 0.42 0.43 0.45 0.49 0.37 0.52
Hearing loss 0.70 0.69 0.66 0.71 0.62 0.58 0.44 0.57 0.52 0.52 0.37 0.72 0.41 0.38 0.42 0.44 0.30 0.48
Heart disease 0.67 0.70 0.66 0.72 0.63 0.59 0.54 0.64 0.57 0.59 0.48 0.75 0.45 0.49 0.48 0.50 0.44 0.52
Average correlation 0.66 0.67 0.63 0.70 0.59 0.55 0.48 0.60 0.55 0.55 0.43 0.72 0.43 0.45 0.45 0.49 0.38 0.51
a. Bolded numbers indicate the instrument ranked first in the category. Underlining indicates an instrument ranked fifth or sixth in the category.
b. Correlations between MAUI and each criterion variable are reported as supplementary material.
c. Correlation between MAUI and PCS.
d. Correlation between MAUI and average of MCS, ICECAP, and ONS after rescaling to a 0 to 1 scale.
e. Correlation between MAUI and average of the VAS and Self TTO after rescaling to a 0 to 1 scale.
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
Table 4 Indices of Sensitivity, Validity, and Indicative Summary Score
4a Index of Sensitivity = Average of
Effect Size + Correlation with DSI
4b Index of Validity = Average of Rescaled
Indices of Psychosocial Content,
Physical Content, and Preferences
Arthritis 0.88 0.68
0.89 0.94 0.56 0.83 0.57
0.65 0.60 0.63 0.53 0.66
Asthma 0.68
0.69 0.68 1.00 0.52 0.71 0.40
0.42 0.42 0.44 0.32 0.44
Cancer 0.78 0.72
0.81 1.00 0.62 0.72
0.66 0.59 0.63 0.53 0.67
Depression 0.67
0.71 0.77 0.84 0.46 0.94 0.53
0.58 0.57 0.58 0.47 0.61
Diabetes 0.77
0.83 0.98 0.65 0.82 0.57
0.62 0.60 0.63 0.50 0.66
Hearing loss 0.51 0.47
1.00 0.82 0.64 0.69 0.55
0.58 0.57 0.60 0.46 0.64
Heart disease 0.70
0.71 0.75 0.95 0.57 0.78 0.60
0.66 0.62 0.65 0.56 0.68
Average 0.71 0.68
0.82 0.93 0.57 0.78 0.54
0.60 0.57 0.59 0.48 0.62
Rank order
4c Overall Composite A = Average of 4a and 4b
4d Overall Composite B –
Removes Effect Size from 4a
1 2 3456123456
Arthritis 15D
= EQ5D
Asthma 15D
Cancer 15D
= SF-6D
Depression AQoL-8D
) EQ-5D
Diabetes 15D
) EQ-5D
Hearing loss HUI3
) SF-6D
Heart disease 15D
= SF-6D
a. Averages were calculated by first dividing the component indices for each MAUI by the maximum component index. The indices of sensitivity and validity were then obtained.
b. Bold indicates highest score.
c. Lowest score excluding QWB. Index numbers are subscripts presented in parentheses.
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
from the indices (Table 4d) improves the ranking of
the SF-6D for every disease category and reduces
the ranking of the 15D in 5 and the HUI3 in 4 of the
7 disease categories. The ranking of the remaining
instruments remains relatively stable.
In CUA, the comparison of services is based on the
assumption that the QALYs they produce, and there-
fore the utilities used to calculate the QALYs, have
the same cardinal property: an increment of utility
produced by service A has the same importance as
the same increment produced by service B. This
assumption may be violated by the use of different
MAUI, which have different scale properties and dif-
fering sensitivities to health states. For example, from
Table 2, the substitution of the SF-6D for the HUI3
would halve the number of QALYs apparently gained
from returning the ‘‘average’’ arthritis patient to the
average health state of the public and the priority of
arthritis services would fall.
Inconsistency may be avoided by the use of a single
MAUI. However, this solution is problematic because
of variation in the content of instruments and there-
fore differing sensitivities to different health states.
From Table 2, the ‘‘cure’’ of depression (i.e., a return
from patient to public average utility), as measured by
the EQ-5D, HUI3, SF-6D, and AQoL-8D, would gener-
ate 21%, 35%, 54%, and 100% more QALYs than the
cure of arthritis. This implies that the result of CUA
and the funding of services may depend on the choice
of the MAUI.
The first objective of the present article was to
compare alternative methods for measuring instru-
ment sensitivity and construct validity. Results
indicate that common approaches to measuring sen-
sitivity do not yield similar results. In 5 of the 7 dis-
ease areas, the mean difference between public and
patient utilities is greatest when it is measured by
the HUI3. Using the 15D, the difference is smallest
or next to smallest in every disease area. In contrast,
the effect size using the 15D is largest in every case,
except for hearing loss, while the HUI3 retains the
second highest average rank order. However, using
the correlation with a DSI as the criterion, the
average rank order of HUI3 falls to fourth and
AQoL-8D replaces 15D as having the highest average
ranking. SF-6D, which has the lowest average rank-
ing using both mean differences and the effect size
as the criterion rises to have the third highest aver-
age ranking.
A limitation with these observations is that they
document the problem of obtaining an unambiguous
ranking of instrument sensitivities, but they do not
fully identify the reason for the differences. The
change in apparent sensitivity that occurs when it is
measured by the effect size rather than by differences
in mean utilities is attributable to differences in the
standard deviations and the scales employed in the
assignment of utilities. As shown in Figure 1, ceiling
effects differ. Using the EQ-5D-5L, 41% of the healthy
public assigned a utility of 1.00 to their health state.
In contrast, the SF-6D and AQoL-8D assigned a utility
of 1.00 to 4.2% and 4.9% of the healthy public,
Frequency distributions in Figure 1
also vary with the severity of the health state. Near
full health, there is a high compression of 15D utili-
ties resulting in a low standard deviation and large
effect size. The converse is true for the SF-6D. The
present results do not indicate whether these differ-
ences are a result of health state descriptions, the
assignment of utilities to items, or the reliability of
the items, an investigation that would require a sepa-
rate analysis using a technique such as item response
theory. In sum, interpretation and comparison of the
effect sizes is problematic, and as shown in Table 4c
and 4d, its inclusion or exclusion significantly alters
the indicative ranking of instruments.
As DSI are dedicated to particular diseases, they
might be seen as a gold standard for measuring sensi-
tivity. However, their use is also subject to a number
of caveats. First, DSI are typically administered only
to patients in a disease category (as occurred in the
present MIC survey). Consequently, their correlation
with a MAUI indicates sensitivity within the disease
area and not between average patient and full health.
With the varying frequency distributions of utilities,
these may differ. Second, there are a large number
of DSI, and none can claim to be a gold standard for
assessing utility. As in the MAUI literature, there is
an unresolved question concerning the appropriate
content of the instruments and the appropriate com-
position of the descriptive system. The MIC survey
included instruments that, from the literature and
from clinical advice, appeared most able to evaluate
patient QoL. However, the choice of instrument is
The analysis of content and convergent validity
further demonstrated the complexity of comparisons
between MAUI. Evidence for construct validity
varies with the criterion and disease area. As judged
by the instruments in the MIC survey, the 15D scores
most highly with respect to physical content and the
AQoL-8D with respect to both psychosocial content
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
and preference scores. Tests were necessarily limited
to comparisons with a small number of criterion
instruments, and alternative tests are possible. How-
ever, as noted, validation is an ongoing process of
testing the properties and not a ‘‘one-shot’’ definitive
The second objective of the article was to deter-
mine whether, given the ambiguities in the test
results, conclusions could be drawn with respect to
the preferred instrument in each disease area. This
raises the contentious issue of how to combine results
from different tests. However, different weightings of
the 3 broad sets of results are unlikely to significantly
alter the ranking of the overall index in Table 4.
Increasing the importance of physical content would
favor the 15D; increasing the weighting on psychoso-
cial or preference scores would favor AQoL-8D.
The final summary scores in Table 4 do not purport
to identify the best instrument in all contexts: they
summarize a limited number of criteria combined as
described. In selecting the appropriate instrument
for use in a study, other considerations may also be
relevant. In particular, the cross-section MIC survey
does not allow the estimation of test-retest reliability
scores for instruments as, for example, in the 5-instru-
ment comparison by Palta and others.
may also have a focus on a particular aspect of health
due, for example, to the presence or absence of partic-
ular comorbidities in their patients’ population. This
could lead to the preference for an instrument with
a disproportionate focus on physical or psychosocial
health or a property or content omitted from the pres-
ent study.
Caveats also arise from the data used in the analy-
sis. A web-based survey has less control over the
quality of the data at the point of collection. However,
the stringent edit procedures described earlier
resulted in a data set with a high degree of internal
consistency. The correlation between instruments
was generally higher than found in other studies,
which increases confidence in the integrity of
the data. The respondents were nevertheless self-
selected members of a survey panel. But the analysis
in the present article did not require a fully represen-
tative sample of the general or patient populations as
its purpose was the comparison of the properties of
instruments. This required a comparison of results
from the same—not representative—patients and
members of the public.
A further caveat is that the same MAUI algorithms
were used with all individuals irrespective of
their country of origin. Ideally, utility weights should
be derived from representatives with similar
preferences (implying not simply a common nation-
ality but similar socioeconomic and health-related
characteristics). In practice, it is unlikely that the
present results would be affected significantly by
the use of a nationally derived algorithm. Analysis
of MIC data has demonstrated that the differences
between MAUI utilities in the study are overwhelm-
ingly attributable to instrument content and scale
effects. The residual role of the preferences of those
sampled explains less than 4% of the variation in
Even if this figure were larger, there
is no clear way in which differing preferences would
explain the pattern of results reported here, which are
more readily explained by the varying MAUI scale
effects shown in Figure 1 and the dissimilar descrip-
tive systems summarized in Table 1.
Irrespective of the ambiguities arising from differ-
ing tests, researchers using CUA must select an
instrument for measuring utility. However, the
choice is problematic as there is no simple gold stan-
dard for evaluating instruments. This article has
attempted to demonstrate a number of these problems
but to report results that may assist with the selection
of an MAUI.
Results indicate that each of the measures of sensi-
tivity considered here is subject to a bias that is
difficult to quantify. Tests of both the content and
convergent validity of the MAUI give differing
results. The final choice of instrument therefore
requires consideration of several sources of imperfect
evidence. Despite these problems, some conclusions
are well supported by the results. The optimal instru-
ment varies with the disease area: HUI3 achieves the
highest average scores in the context of hearing loss,
the AQoL-8D in the context of psychosocial prob-
lems, and the 15D in the context of physical prob-
lems. Overall, the evidence is favorable to the more
widespread use of the 15D and AQoL-8D and the
more limited use of the EQ-5D-5L.
The authors would like to thank the reviewers for helpful
1. Wisloff T, Hagen G, Hamidi V, Movik E, Klemp M, Olsen JA.
Estimating QALY gains in applied studies: a review of the
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
cost-utility analyses published in 2010. Pharmacoeconomics.
2. Neumann PJ, Thorat T, Shi J, Saret CJ, Cohen JT. The changing
face of the cost-utility literature, 1990-2012. Value Health. 2015;
3. Richardson J, McKie J, Bariola E. Multi attribute utility instru-
ments and their use. In: Culyer AJ, ed. Encyclopedia of Health Eco-
nomics. San Diego: Elsevier Science; 2014. p 341–57.
4. Cherepanov D, Palta M, Fryback DG. Underlying dimensions of
the five health-related quality of life measures used in utility
assessment. Med Care. 2010;48(8):718–25.
5. Fryback DG, Palta M, Cherepanov D, Bolt D, Kim J. Comparison
of 5 health related quality of life indexes using item response the-
ory analysis. Med Decis Making. 2010;30(1):5–15.
6. Moock J, Kohlmann T. Comparing preference-based quality-o-
f-life measures: results from rehabilitation patients with musculo-
skeletal, cardiovascular, or psychosomatic disorders. Qual Life
Res. 2008;17:485–95.
7. McDonough CM, Grove MR, Tosteson TD, Lurie JD, Hilibrand
AS. Comparison of EQ-5D, HUI, and SF-36-derived societal health
state values among Spine Patient Outcomes Research Trial
(SPORT) participants. Qual Life Res. 2005;14:1321–32.
8. Feeney D, Spritzer K, Hays R, et al. Agreement about identifying
patients who change over time: cautionary results in cataract and
heart failure patients. Med Decis Making. 2012;32(2):273–86.
9. Khanna D, Maranian P, Palta M, et al. Health-related quality of
life in adults reporting arthritis: analysis from the National Health
Management Study. Qual Life Res. 2011;20(7):1131–40.
10. Lillegraven S, Kristiansen S, Kvien T. Comparison of utility
measures and their relationship with other health status measures
in 1041 patients with rheumatoid arthritis. Ann Rheum Dis. 2010;
11. Sorensen J, Linde L, Ostergaard M, Hetland M. Quality-ad-
justed life expectancies in patients with rheumatoid arthritis: com-
parison of index scores from EQ-5D, 15D, and SF-6D. Value Health.
12. Kontodimopoulos N, Pappa E, Chadjiapostolou Z, Arvanitaki
E, Papdopoulos A, Niakas D. Comparing the sensitivity of EQ-5D,
SF-6D and 15D utilities to the specific effect of diabetic complica-
tions. Eur J Health Econ. 2012;13(1):111–20.
13. Kuspinar A, Mayo N. Do generic utility measures capture what
is important to the quality of life of people with multiple sclerosis?
Health Qual Life Outcomes. 2013;11:71.
14. Kuspinar A, Mayo N. A review of the psychometric properties
of generic utility measures in mutiple sclerosis. Pharmacoeconom-
ics. 2014;32(8):759–73.
15. Teckle P, Peacock S, McTaggart-Cowan H, et al. The ability of
cancer-specific and generic preference-based instruments to dis-
criminate across clinical and self-reported measures of cancer
severities. Health Qual Life Outcomes. 2011;9:106.
16. McDonough CM, Tosteson TD, Tosteson A, Jette A, Grove MR,
Weinstein J. A longitudinal comparison of 5 preference-weighted
health state classification systems in persons with intervertebral
disk herniation. Med Decis Making. 2011;31(2):270–80.
17. Tosh J, Brazier J, Evans P, Longworth L. A review of generic
preference based measures of health related quality of life in visual
disorders. Value Health. 2012;15:118–27.
18. Groessl RJ, Liu L, Sklar M, Tally S, Kaplan R, Ganiats TG. Mea-
suring the impact of cataract surgery on generic and vision-specific
quality of life. Qual Life Res. 2013;22:1405–14.
19. Yang Y, Longworth L, Brazier J. An assessment of validity and
responsiveness of generic measures of health-related quality of life
in hearing impairment. Qual Life Res. 2013;22(10):2813–28.
20. Pinto AM, Kupperman M, Nakagawa S, et al. Comparison and
correlates of three preference-based health-related quality of life
measures among overweight and obese women with urinary incon-
tinence. Qual Life Res. 2011;20(10):1655–62.
21. Palta M, Chen HY, Kaplan R, Feeny D, Cherepanov D, Fryback
DG. Standard error of measurement of 5 health utility indexes
across the range of health for use in estimating reliability and
responsiveness. Med Decis Making. 2011;31(2):260–9.
22. Drummond M, Brixner D, Gold M, Kind P, McGuire A, Nord E.
Toward a consensus on the QALY. Value Health. 2009;12(1):S31–5.
23. Streiner D, Norman GR. Health Measurement Scales: A Practi-
cal Guide to Their Development and Use. Oxford: Oxford Univer-
sity Press; 2003.
24. Cronbach J, Meehl P. Construct validity in psychological tests.
Psychol Bull. 1955;52:281–302.
25. Richardson J, Iezzi A, Khan MA. Why do multi attribute utility
instruments produce different utilities: the relative importance of
the descriptive systems, scale and ‘micro utility’ effects. Qual
Life Res. 2015;24(8):2045–53.
26. Richardson J, Iezzi A, Khan MA. The Self Time t (TTO) Instru-
ment: Reliability and Survey Results. Research Paper 86. Mel-
bourne (Australia): Centre for Health Economics, Monash
University; 2014.
27. Meenan RF, Gertman PM, Mason JH. Measuring health status
in arthritis: the Arthritis Impact Measurement Scales. Arthritis
Rheum. 1980;23:146–52.
28. Ware J, Sherbourne D. The MOS 36-Item Short-Form Health
Survey (SF-36). I. Conceptual framework and item selection.
Med Care. 1992;30(6):473–83.
29. Marks GB, Dunn SM, Woolcock AJ. A scale for the measure-
ment of quality of life in adults with asthma. J Clin Epidemiol.
30. Al-Janabi H, Flynn T, Coast J. Development of a self-report
measure of capability wellbeing for adults: the ICECAP-A. Qual
Life Res. 2012;21:167–76.
31. Rabin R, Oemar M, Oppe M, Janssen B, Herdman M. EQ-5D-5L
User Guide: Basic Information on How to Use the EQ-5D-5L Instru-
ment. Rotterdam (the Netherlands): EuroQoL Group. Available
from: URL:
user-guide.html. Accessed 6 July 2015.
32. van Hout B, Janssen MF, Feng Y, et al. Interim scoring for the
EQ-5D-5L: mapping the EQ-5D-5L to EQ-5D-3L value sets. Value
Health. 2012;15:708–15.
33. Aaronson NK, Ahmedzai S, Bergman B, et al. The European
Organization for Research and Treatment of Cancer QLQ-C30:
a quality-of-life instrument for use in international clinical trials
in oncology. J National Cancer Inst. 1993;85(5):365–76.
34. Hicks S. Case study: ONS measuring subjective wellbeing: The
UK Office for National Statistics Experience. In: Helliwell J, Layard
R, Sachs J, eds. World Happiness Report 2012. New York: The Earth
Institute, Columbia University; 2012. Available from: http://
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from Accessed 30
April 2012.
35. Brazier J, Roberts J, Deverill M. The estimation of a preference-
based measure of health from the SF-36. J Health Econ. 2002;21:
36. Lovibond SH, Lovibond PF. Manual for the Depression Anxi-
ety Stress Scales. 2nd ed. Sydney: Psychology Foundation; 1995.
37. Richardson J, Khan MA, Iezzi A, Maxwell A. Cross-National
Comparison of Twelve Quality of Life Instruments: MIC Paper 1:
Background, Questions, Instruments. Research Paper 76.
Melbourne (Australia): Centre for Health Economics, Monash Uni-
versity; 2012. Available from: ULR: Accessed 29 July
38. Feeny D, Furlong W, Torrance G, et al. Multi attribute and sin-
gle attribute utility functions for the Health Utilities Index Mark 3
System. Med Care. 2002;40(2):113–28.
39. Kessler RC, Barker PR, Colpe LJ, et al. Screening for serious
mental illness in the general population. Arch Gen Psychiatry.
40. Kaplan R, Bush J, Berry C. Health status: types of validity and
the index of wellbeing. Health Serv Res. 1976;11(4):478–507.
41. Boyer JG, Earp JAL. The development of an instrument for
assessing the quality of life of people with diabetes. Med Care.
42. Sintonen H, Pekurinen M. A fifteen-dimensional measure of
health related quality of life (15D) and its applications. In: Walker
S, Rosser R, eds. Quality of Life Assessment. Dordrecht (the Neth-
erlands): Kluwer Academic; 1993.
43. Cox RM, Alexander GC. The abbreviated profile of hearing aid
benefit. Ear Hear. 1995;19(2):176–86.
44. Richardson J, Sinha K, Iezzi A, Khan MA. Modelling utility
weights for the Assessment of Quality of Life (AQoL) 8D. Qual
Life Res. 2014;23(8):2395–404.
45. Ho
¨fer S, Lim L, Guyatt G, Oldridge N. The MacNew Heart Dis-
ease health-related quality of life instrument: a summary. Health
Qual Life Outcomes. 2004;2(3):1–8.
46. Richardson J, Khan MA, Iezzi A, Maxwell A. Comparing and
explaining differences in the content, sensitivity and magnitude
of incremental utilities predicted by the EQ-5D, SF-6D, HUI 3,
15D, QWB and AQoL-8D multi attribute utility instruments’.
Med Decis Making. 2015;35(3):276–91.
47. AQoL. Assessment of Quality of Life (AQoL). 2015. Available
from: URL: 28 May 2015.
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
... The single index score (15D score), representing the overall HRQoL on a scale of 0-1 (1 = full health, 0 = dead) is calculated from the questionnaire using a set of population-based preference or utility weights (14). In the important properties (reliability, validity, discriminatory power, responsiveness to change), the 15D is at least equally effective as the other preference-based generic instruments (15,16). The 15D has been previously used as a measure of HRQoL in the context of depressive disorders in the national Finnish Health 2000 Survey (17) and as an outcome in a randomized clinical antidepressant pharmacotherapy trial (18). ...
... However, our patients adequately represented the actual primary care patients in the City of Vantaa. Furthermore, different generic HRQoL instruments may produce different distributions of HRQoL scores for the same population (15,16). Therefore, our findings are specific to the 15D and the other instruments should undergo the same analyses to establish the cut-off points based on their results. ...
Full-text available
Background Depression undermines health-related quality of life (HRQoL). Remission is the central aim of all treatments for depression, but the degree of remission necessary for depressive patients’ HRQoL to correspond to the normal range of the general population remains unknown. Methods The Vantaa Primary Care Depression Study prospectively followed-up a screening-based cohort of depressive primary care patients for 5 years. The Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I) was used to diagnose major depressive disorder. HRQoL was measured by the generic 15D instrument at baseline and at 5 years ( N = 106, 77% of baseline patients), and compared with the 15D results of an age-standardized general population sample from the Finnish Health 2011 Survey ( N = 4,157). Receiver operating characteristic analyses determined the optimal Hamilton Depression Rating Scale (HAMD), Beck Depression Inventory (BDI), and Beck Anxiety Inventory (BAI) cut-offs for remission, using the 15D score as the construct validator. Remission was defined as the score at which HRQoL reached the general population range (minimum mean − 1 SD). As age may influence HRQoL, patients older and younger than the median 52 years were investigated separately. Results For HAMD, the optimal cut-off point score was 8.5, for BDI 10.5, and for BAI 11.5. The differences between the findings of the younger and older patients were small. Limitations Cross-sectional analysis, small number of patients in the cohort. Conclusion Depressive primary care patients’ HRQoL reaches the normal variation range of the general population when their depression and anxiety scores reach the conventional clinical cut-offs for remission.
... One of the possible reasons is that the SF-6Dv2 has one more dimension, resulting in a larger descriptive system than EQ-5D-5L (18,750 vs. 3,125 health states). This result was consistent with one previous study, which found that the SF-6D in general showed better sensitivity and construct validity than the EQ-5D-5L in seven diseases [62]. Moreover, although the hypotheses for known-group validity were fulfilled in all tested groups, this study found that both instruments were not sensitive enough (ES < 0.8) to Table 4 Sensitivity of EQ-5D-5L and SF-6Dv2 to detect differences in different self-reported health status groups (N = 1,000) differentiate overweight and obesity respondents in different degrees of severity. ...
Full-text available
Objective To evaluate and compare the measurement properties of the EQ-5D-5L and SF-6Dv2 among Chinese overweight and obesity populations. Methods A representative sample of Chinese overweight and obesity populations was recruited stratified by age, gender, body mass index (BMI), and area of residence. Social-demographic characteristics and self-reported EQ-5D-5L and SF-6Dv2 responses were collected through the online survey. The agreement was assessed using intraclass correlation coefficients (ICC). Convergent validity and known-group validity were examined using Spearman’s rank correlation and effect sizes, respectively. The test-retest reliability was assessed using among a subgroup of the total sample. Sensitivity was compared using relative efficiency and receiver operating characteristic. Results A total of 1000 respondents (52.0% male, mean age 51.7 years, 67.7% overweight, 32.3% obesity) were included in this study. A higher ceiling effect was observed in EQ-5D-5L than in SF-6Dv2 (30.6% vs. 2.1%). The mean (SD) utility was 0.851 (0.195) for EQ-5D-5L and 0.734 (0.164) for SF-6Dv2, with the ICC of the total sample was 0.639 (p < 0.001). The Spearman’s rank correlation (range: 0.186–0.739) indicated an acceptable convergent validity between the dimensions of EQ-5D-5L and SF-6Dv2. The EQ-5D-5L showed basically equivalent discriminative capacities with the SF-6Dv2 (ES: 0.517–1.885 vs. 0.383–2.329). The ICC between the two tests were 0.939 for EQ-5D-5L and 0.972 for SF-6Dv2 among the subgroup (N = 150). The SF-6Dv2 had 3.7–170.1% higher efficiency than the EQ-5D-5L at detecting differences in self-reported health status, while the EQ-5D-5L was found to be 16.4% more efficient at distinguishing between respondents with diabetes and non-diabetes. Conclusions Both the EQ-5D-5L and SF-6Dv2 showed comparable reliability, validity, and sensitivity when used in Chinese overweight and obesity populations. The two measures may not be interchangeable given the systematic difference in utility values between the EQ-5D-5L and SF-6Dv2. More research is needed to compare the responsiveness.
... It has been proved that the different approaches of utility estimation yield different values, due to the difference in the dimensions described, the number of levels, and the severity range [21]. The quality of the economic evaluation is dependent on the sensitivity and validity of the approaches. ...
Full-text available
Background The increased prevalence of myopia creates and earlier age of onset has created public health concerns for the long-term eye health, vision impairment and carries with it a significant economic burden. The quality of the economic evaluation is dependent on the sensitivity and validity of the approaches. Nowadays, there are many approaches to measure patients’ health state utility (HSU). However, little is known regarding the performance of direct approach and indirect approach in people with myopia. This study is aimed to compare the psychometric properties of four HSU approaches among patients with myopia in mainland China, including two direct approaches (TTO and SG), the generic preference-based measures (PBM) (AQoL-7D) and the disease-specific PBM (VFQ-UI). Methods A convenience sampling framework was used to recruit patients with myopia who attended a large ophthalmic hospital in Jinan, China. Spearman’s rank correlations coefficient was used to assess concurrent validity. Known-group validity was analyzed by: (1) whether the patients wear corrective devices; (2) severity of myopia as low or moderate to high of the better eye; (3) duration of myopia as ≤ 10 years or > 10 years. Effect size (ES), relative efficiency (RE) statistic and the largest area under the receiver operating characteristic curve (AUC) were used to assess sensitivity. The intra-class correlation coefficient (ICC) and Bland–Altman plots were used to assess agreement. Results A valid sample size of 477 myopia patients was analyzed (median duration: 10 years). The mean HSU scores between TTO and SG were similar (0.95) and higher than AQoL-7D (0.89) and VFQ-UI (0.83). Overall, the VFQ-UI had the best performance based on the psychometric analysis. The agreement indicated that there was no pair of approaches that could be used interchangeably. Conclusions The VFQ-UI showed better psychometric properties than other three approaches for providing health state utility in Chinese myopia patients. Given the widespread use and its generic nature of the AQoL-7D, it could be used alongside with VFQ-UI to provide complementary health state utility from a generic and disease-specific perspective for economic evaluation. More evidence on the responsiveness of four health utility approaches in myopia patients is required.
... Previous studies have assessed the construct validity of AQoL-6D and SF-6D in different patient populations [22,23], but no studies have yet compared them to the Expanded Prostate Cancer Index Composite Instrument (EPIC-26), one of the most widely validated prostate cancer-specific measures of QoL [24]. This present study aimed to explore the construct (convergent and known groups validity and responsiveness of the generic preference-based HRQoL measures, SF-6D and AQoL-6D when compared to EPIC-26 in men with prostate cancer. ...
Full-text available
Purpose: To assess construct validity and responsiveness of the Expanded Prostate Cancer Index Composite Instrument (EPIC-26) relative to the Short-Form Six-Dimension (SF-6D) and Assessment of Quality of Life 6-Dimension (AQoL-6D) in patients following treatment for prostate cancer. Methods: Retrospective prostate cancer registry data were used. The SF-6D, AQoL-6D, and EPIC-26 were collected at baseline and one year post treatment. Analyses were based on Spearman's correlation coefficient, Bland-Altman plots and intra-class correlation coefficient, Kruskal Wallis, and Effect Size and the Standardised Response Mean for responsiveness. Results: The study sample was comprised of 1915 patients. Complete case analysis of 3,697 observations showed moderate evidence of convergent validity between EPIC-26 vitality/hormonal domain and AQoL-6D (r = 0.45 and 0.54) and SF-6D (r = 0.52 and 0.56) at both timepoints. Vitality/hormonal domain also showed moderate convergent validity with coping domain of AQoL-6D (r = 0.45 and 0.54) and with role (r = 0.41 and 0.49) and social function (r = 0.47 and 0.50) domains of SF-6D at both timepoints, and with independent living (r = 0.40) and mental health (r = 0.43) of AQoL-6D at one year. EPIC-26 sexual domain had moderate convergent validity with relationship domain (r = 0.42 and 0.41) of AQoL-6D at both timepoints. Both AQoL-6D and SF-6D did not discriminate between age groups and tumour stage at both timepoints but AQoL-6D discriminated between outcomes for different treatments at one year. All EPIC-26 domains discriminated between age groups and treatment at both timepoints. The EPIC-26 was more responsive than AQoL-6D and SF-6D between baseline and one year following treatment. Conclusions: AQoL-6D can be used in combination with EPIC-26 in place of SF-12. Although EPIC-26 is not utility based, its popularity amongst clinicians and ability to discriminate between disease-specific characteristics and post-treatment outcomes in clinical trials makes it a candidate for use within cost-effectiveness analyses. The generic measure provides a holistic assessment of quality of life and is suitable for generating quality adjusted life years (QALYs).
... Data were obtained from a Multi-instrument Comparison Study of 8022 individuals in 6 countries [13,14]. The survey was administered to patients diagnosed with 1 of 7 chronic illnesses. ...
Full-text available
Different wellbeing measures have been used among cancer patients. This study aimed to first investigate the sensitivity of health state utility (HSU), capability, and subjective wellbeing (SWB) instruments in cancer. A cancer-specific instrument (QLQ-C30) was included and transferred onto the cancer-specific HSU scores. Furthermore, it examined the relative importance of key life domains explaining overall life satisfaction. Data were drawn from the Multi-instrument Comparison survey. Linear regression was used to explore the extent to which the QLQ-C30 sub-scales explain HSU and SWB. Kernel-based Regularized Least Squares (KRLS), a machine learning method, was used to explore the life domain importance of cancer patients. As expected, the QLQ-C30 sub-scales explained the vast majority of the variance in its derived cancer-specific HSU (R2 = 0.96), followed by generic HSU instruments (R2 of 0.65–0.73) and SWB and capability instruments (R2 of 0.33–0.48). The cancer-specific measure was more closely correlated with generic HSU than SWB measures, owing to the construction of these instruments. In addition to health, life achievements, relationships, the standard of living, and future security all play an important role in explaining the overall life satisfaction of cancer patients.
... Studies have compared the performance of the EQ-5D (both 3-level and 5-level versions) with the SF-6D extensively in different patient groups, which cumulated relatively sufficient evidence on the comparative performance from different settings and types of study [12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28]. There have also been some head-to-head comparison studies in the general population [29][30][31][32][33]. ...
Full-text available
Objectives To explore the comparative performance and develop the mapping algorithms between EQ-5D-5L and SF-6Dv2 in China. Methods Respondents recruited from the Chinese general population completed both EQ-5D-5L and SF-6Dv2 during face-to-face interviews. Ceiling/floor effects were reported. Discriminative validity in self-reported chronic conditions was investigated using the effect sizes (ES). Test–retest reliability was evaluated using intra-class correlation coefficient (ICC) and Bland–Altman plots in a subsample. Correlation and absolute agreements between the two measures were estimated with Spearman’s rank correlation coefficient and ICC, respectively. Ordinary least squares (OLS), generalized linear model, Tobit model, and robust MM-estimator were explored to estimate mapping equations between EQ-5D-5L and SF-6Dv2. Results 3320 respondents (50.3% males; age 18–90 years) were recruited. 51.1% and 12.2% of respondents reported no problems on all EQ-5D-5L and SF-6Dv2 dimensions, respectively. The mean EQ-5D-5L utility was higher than SF-6Dv2 (0.947 vs. 0.827, p < 0.001). Utilities were significantly different across all chronic conditions groups for both measures. The mean absolute difference of utilities between the two tests for EQ-5D-5L was smaller (0.033 vs. 0.043) than SF-6Dv2, with a slightly higher ICC (0.859 vs. 0.827). Fair agreement (ICC = 0.582) was observed in the utilities between the two measures. Mapping algorithms generated by the OLS models performed the best according to the goodness-of-fit indicators. Conclusions Both measures showed comparable discriminative validity. Systematic differences in utilities were found, and on average, the EQ-5D-5L generates higher values than the SF-6Dv2. Mapping algorithms between the EQ-5D-5L and SF-6Dv2 are reported to enable transformations between these two measures in China.
Objective: Health-related quality of life (HRQoL) is a multidimensional patient-related outcome. Less is known about the role of depressive symptoms on HRQoL in chronic diseases. This follow-up study analyzed depressive symptoms' association with HRQoL change measured with 15D in patients with chronic diseases. Design and setting: A total of 587 patients from the Siilinjärvi Health Center, Finland were followed up due to the treatment of hypertension (HA), coronary artery disease (CAD) or diabetes (DM). Depressive symptoms were based on Beck Depression Inventory (BDI) (BDI ≥10 =depressive symptoms). HRQoL was assessed at the baseline and after 12 months. Results: There were 244 patients with HA (mean age 70 years, 59% women); 103 patients (72 years, 38%) with CAD and 240 with DM (67 years, 52%). The change from baseline to the 12-month follow-up in 15D was significantly different between patients without and with depressive symptoms in CAD (p < 0.001) and DM (p = 0.024). In CAD with depressive symptoms, the change was -0.064 (95% CI: -0.094 to -0.035) and in DM -0.018 (95% CI: -0.037 to 0.001). In the 15 HRQoL dimensions of 15D, a depressive symptoms-related decrease was found in three dimensions with HA, in 9 with CAD and in 7 with DM. As a function of the BDI at baseline, the 15D score decreased significantly among patients with CAD and DM. Conclusions: Depressive symptoms impact negatively on future HRQoL among primary care patients with coronary artery disease and diabetes emphasizing that mood should be acknowledged in their care and follow-up. Trial registration: Clinical Trials registration number: NCT02992431, registered December 14th 2016.
CADTH reimbursement reviews are comprehensive assessments of the clinical effectiveness and cost-effectiveness, as well as patient and clinician perspectives, of a drug or drug class. The assessments inform non-binding recommendations that help guide the reimbursement decisions of Canada's federal, provincial, and territorial governments, with the exception of Quebec. This review assesses tebentafusp (Kimmtrak), 100 mcg/0.5 mL, solution, IV infusion. Indication: For the treatment of human leukocyte antigen (HLA)-A*02:01-positive adults with unresectable or metastatic uveal melanoma.
CADTH reimbursement reviews are comprehensive assessments of the clinical effectiveness and cost-effectiveness, as well as patient and clinician perspectives, of a drug or drug class. The assessments inform non-binding recommendations that help guide the reimbursement decisions of Canada's federal, provincial, and territorial governments, with the exception of Quebec. This review assesses pembrolizumab (Keytruda), solution for infusion, 100 mg/4 mL vial. Indication: Indicated for the adjuvant treatment of adult and pediatric (12 years and older) patients with stage IIB or IIC melanoma following complete resection.
Full-text available
Health state utilities measured by the major multi-attribute utility instruments differ. Understanding the reasons for this is important for the choice of instrument and for research designed to reconcile these differences. This paper investigates these reasons by explaining pairwise differences between utilities derived from six multi-attribute utility instruments in terms of (1) their implicit measurement scales; (2) the structure of their descriptive systems; and (3) 'micro-utility effects', scale-adjusted differences attributable to their utility formula. The EQ-5D-5L, SF-6D, HUI 3, 15D and AQoL-8D were administered to 8,019 individuals. Utilities and unweighted values were calculated using each instrument. Scale effects were determined by the linear relationship between utilities, the effect of the descriptive system by comparison of scale-adjusted values and 'micro-utility effects' by the unexplained difference between utilities and values. Overall, 66 % of the differences between utilities was attributable to the descriptive systems, 30.3 % to scale effects and 3.7 % to micro-utility effects. Results imply that the revision of utility algorithms will not reconcile differences between instruments. The dominating importance of the descriptive system highlights the need for researchers to select the instrument most capable of describing the health states relevant for a study. Reconciliation of inconsistent utilities produced by different instruments must focus primarily upon the content of the descriptive system. Utility weights primarily determine the measurement scale. Other differences, attributable to utility formula, are comparatively unimportant.
Purpose: This review examines generic preference based measures and their ability to reflect health related quality of life in patients with visual disorders. Methods: A systematic search was undertaken to identify clinical studies of patients with visual disorders where health state utility values (HSUVs) were measured and reported. Data were extracted to assess the validity and responsiveness of the measures. A narrative synthesis of the data was undertaken due to the heterogeneity between different studies. Results: There was considerable heterogeneity in the 31 studies identified in terms of patient characteristics, visual disorders and outcomes reported. Vision loss was associated with a reduction in scores across the preference-based measure, but the evidence on validity and responsiveness was mixed. The EQ-5D’s performance differed according to condition, with poor performance in age-related macular degeneration and diabetic retinopathy. The more limited evidence on the HUI-3 found it performed best in differentiating between severity groups of patients with glaucoma, AMD, cataracts and diabetic retinopathy. One study reported data on the SF-6D and showed it was able to differentiate between patients with AMD. Conclusion: The performance of the EQ-5D in visual disorders was mixed. The HUI-3 seemed to perform better in some conditions, but the evidence on this and SF-6D is limited. More head to head comparisons of these three measures are required. The new 5-level version of EQ-5D may do better at the milder end of visual function and there is research being undertaken into adding a vision relevant dimension.
The rise of medical technology assessment and QALY ideology have intensified the need and demand for a generic (disease-independent), sensitive, valid, reliable and easy-to-use measure of health-related quality of life (HRQOL). However, none of the measures and approaches suggested and developed over the years can claim to have established a position as the measure, either as a way of classifying and describing states of HRQOL or for valuing them. In addition, most of them have problems in meeting more than one of the above criteria.
Health-related multiattribute utility (MAU) instruments are questionnaires, which measure an individual's health-related quality of life. They provide a formula for calculating a utility score from every combination of answers, i.e., for every health state defined by the questionnaire. The utility scores are designed primarily for economic evaluation of health-related programs. However, their use is not limited to this.
Cost-utility analyses (CUAs) have been published widely over the years to measure the value of health care interventions. We investigated the growth and characteristics of CUAs in the peer-reviewed English-language literature through 2012. We analyzed data from the Tufts Medical Center Cost-Effectiveness Analysis (CEA) Registry, a database containing more than 3700 English-language CUAs published through 2012. We summarized various study characteristics (e.g., intervention type, funding source, and journal of publication) and methodological practices (e.g., use of probabilistic sensitivity analysis) over three time periods: 1990 to 1999, 2000 to 2009, and 2010 to 2012. We also examined CUAs by country, region, and the degree to which diseases studied correlate with disease burden. The number of published CUAs rose from 34 per year from 1990 to 1999 to 431 per year from 2010 to 2012. The proportion of studies focused on the United States declined from 61% during 1990 to 1999 to 35% during 2010 to 2012 (P < 0.0001). Although still small compared with CUAs in higher income countries, the number of CUAs focused on lower and middle-income countries has risen sharply. A large fraction of studies pertain to pharmaceuticals (46% during 2010-2012). In recent years, most studies included probabilistic sensitivity analysis (67% during 2010-2012). Journals publishing CUAs vary widely in the percentage of their studies funded by drug companies. Some conditions, such as injuries, have high burden but few CUAs. Our review reveals considerable growth and some change in the cost-utility literature in recent years. The data suggest growing interest in cost-utility methodology, particularly in non-Western countries. Copyright © 2015 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
"Construct validation was introduced in order to specify types of research required in developing tests for which the conventional views on validation are inappropriate. Personality tests, and some tests of ability, are interpreted in terms of attributes for which there is no adequate criterion. This paper indicates what sorts of evidence can substantiate such an interpretation, and how such evidence is to be interpreted." 60 references. (PsycINFO Database Record (c) 2006 APA, all rights reserved).