Content uploaded by Aimee Maxwell
Author content
All content in this area was uploaded by Aimee Maxwell on May 19, 2016
Content may be subject to copyright.
Measuring the Sensitivity
and Construct Validity
of 6 Utility Instruments in 7 Disease Areas
Jeff Richardson, PhD, Angelo Iezzi, MSc, Munir A. Khan, PhD, Gang Chen, PhD,
Aimee Maxwell, BBNSc (Hons)
Background. Health services that affect quality of life
(QoL) are increasingly evaluated using cost utility analyses
(CUA). These commonly employ one of a small number of
multiattribute utility instruments (MAUI) to assess the ef-
fects of the health service on utility. However, the MAUI dif-
fer significantly, and the choice of instrument may alter the
outcome of an evaluation. Aims. The present article has 2
objectives: 1) to compare the results of 3 measures of the sen-
sitivity of 6 MAUI and the results of 6 tests of construct val-
idity in 7 disease areas and 2) to rank the MAUI by each of
the test results in each disease area and by an overall com-
posite index constructed from the tests. Methods. Patients
and the general public were administered a battery of instru-
ments, which included the 6 MAUI, disease-specific QoL in-
struments (DSI), and 6 other comparator instruments. In
each disease area, instrument sensitivity was measured 3
ways: by the unadjusted mean difference in utility between
public and patient groups, by the value of the effect size,
and by the correlation between MAUI and DSI scores. Con-
tent and convergent validity were tested by comparison of
MAUI utilities and scores from the 6 comparator instru-
ments. These included 2 measures of health state preferen-
ces, measures of subjective well-being and capabilities,
and generic measures of physical and mental QoL derived
from the SF-36. Results. The apparent sensitivity of instru-
ments varied significantly with the measurement method
and by disease area. Validation test results varied with the
comparator instruments. Notwithstanding this variability,
the 15D, AQoL-8D, and the SF-6D generally achieved better
test results than the QWB and EQ-5D-5L. Key words: multi-
attribute utility (MAU); sensitivity; validity; cost utility anal-
ysis. (Med Decis Making 2016;36:147–159)
The economic evaluation of health services that
alter the quality of life (QoL) commonly employ
cost utility analyses (CUA) to rank alternative ser-
vices according to the cost of obtaining an additional
quality-adjusted life-year (QALY) from the use of the
service. QALYs are defined as life-years times an
index of the QoL, measured as utility (i.e., the
strength of preference for a health state). This has
been assessed, increasingly, using one of a small
number of multiattribute utility instruments
(MAUI),
1,2
and the validity of the economic evalua-
tions that use these instruments is therefore depen-
dent on the sensitivity and validity of the
instruments. The validity of the comparison of ser-
vices therefore depends on the validity of the com-
parisons made using these MAUI.
A review of the literature between 2000 and 2010
identified 1663 studies that had employed 1 of the
major MAUI.
3
These contained 392 head-to-head
comparisons of the utilities obtained from the instru-
ments. These all found a significant correlation
between MAUI scores and, in some cases, demon-
strated the existence of common latent dimensions.
4
However, most studies also identified significant
Received 22 December 2014 from the Centre for Health Economics,
Monash Business School, Monash University, Australia (JR, AI, MAK,
AM), and Flinders Health Economics Group, Flinders University, Aus-
tralia (GC). This research was funded by the National Health and Medical
Research Council (NHMRC) grant 1006334 and National Health and
Medical Research Council grant 1006334. Revision accepted for pub-
lication 20 September 2015.
Supplementary material and appendices for this article are available on
the Medical Decision Making Web site at http://mdm.sagepub.com/
supplemental.
Address correspondence to Jeff Richardson, Centre for Health Eco-
nomics, Monash Business School, Monash University, Rm 284, Level
2, Bldg 75, Clayton Campus, Clayton, Vic 3800 Australia; telephone:
+61 3 9905 0754; fax: +61 3 9905 8344; e-mail: jeffrey.richardson
@monash.edu.
ÓThe Author(s) 2015
Reprints and permission:
http://www.sagepub.com/journalsPermissions.nav
DOI: 10.1177/0272989X15613522
MEDICAL DECISION MAKING/FEBRUARY 2016 147
ORIGINAL ARTICLE
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
differences between estimated utilities, with the larg-
est US-based comparison reporting an average corre-
lation between 4 MAUI of 0.71.
5
Most authors in the
reviewed studies concluded that instruments ‘‘are
not equivalent,’’
6
that they are imprecisely related,
5
and that their comparison warrants caution.
7
Simi-
larly, a more recent comparison using 5 MAUI in
a prospective and follow-up study of cataract and
heart failure patients concluded that there is a ‘‘lack
of interchangeability among different preference-
based measures.’’
8
Between 2010 and 2015, the Web of Science and
Ovid Medline identified 56 articles as having included
3 or 4 MAUI. With few exceptions, these focused on
individual illnesses or health states: arthritis,
9–11
diabe-
tes,
12
multiple sclerosis,
13,14
cancer,
15
disc displace-
ment,
16
vision,
17,18
chronic obstructive pulmonary
disease,
8
hearing,
19
and urinary incontinence.
20
The
majority of these studies note the differences obtained
by the use of different instruments and, generally, their
capacity to discriminate between the public and differ-
ing classes of patient. Several studies explicitly con-
sider instrument validity in the context of the single
disease.
14
Using item response theory, one study
derived test-retest standard deviations and structural
deviations for 4 MAUI.
21
Results were derived from
the general public, and the validity of extrapolation to
particular patient groups is not reported.
In sum, the empirical literature has demonstrated
differences obtained by MAUI but generally refrained
from recommending one instrument in preference to
another. Notwithstanding the difficulty of comparing
multidimensional instruments, it is an important
question. As recognized in the literature, the differ-
ence between MAUI has important consequences
for CUA.
22
An economic evaluation is more likely
to find a service to be cost effective if it employs
a MAUI that predicts a larger improvement in utility
than alternative MAUIs, and the funding of a service
by a health authority may depend on the instrument
used in its evaluation. This highlights the need for
more evidence on the comparative strengths and
weaknesses of different MAUI.
The present article is a response to this need. It
examines 2 of the key properties of a MAUI. The first
is its sensitivity, the extent to which predicted utilities
differ with a change in a person’s QoL. The second is
the instrument’s validity: evidence that the instru-
ment measures the theoretical construct that it pur-
ports to measure, which, for a MAUI, is the strength
of people’s preferences for different health states.
In practice, demonstrating validity is problematic, as
an instrument will, typically, be neither (completely)
valid nor (completely) invalid. As argued by Streiner
and Norman,
23
‘‘validation’’ is ‘‘a process of hypothesis
testing . . . to determine the degree of confidence we can
place on inferences we make . . . based on scores’’ (p
174). Confidence increases when the scores correlate
as expected with scores from other instruments. How-
ever, in the absence of a gold standard, the choice of
comparator instruments is contestable. Consequently,
the process of validation—in Cronbach and Meehl’s ter-
minology, the creation of a ‘‘nomological’’ network of
supporting evidence and theory—evolves with the
addition of new test results.
24
The present article con-
tributes to this process by reporting the results of 6 tests
of validity for 6 MAUI in 7 disease areas. Three tests of
sensitivity are additionally reported. Final compari-
sons are therefore based on 9 tests.
Tests of validity have been variously classified.
23
The present study is concerned with the generic cat-
egory of ‘‘construct validity’’ and, more specifically,
1) convergent validity, the strength of the association
with variables that seek to measure a similar or the
same construct (i.e., the strength of preferences),
and 2) content validity, the extent to which an instru-
ment describes or represents the full range of attrib-
utes needed to draw correct inferences with respect
to the construct. It is demonstrated by the correlation
of MAUI scores with measures of the physical and
mental QoL.
The sensitivity of an instrument reflects the corre-
spondence between the attributes of a health state
and the domains included in the instrument. Its mea-
surement is affected by the instrument’s reliability
and whether items are appropriately worded to iden-
tify changes in attributes. However, the change in
utility predicted by a MAUI also depends on the
effective measurement scale that was employed
when utilities were assigned to the instrument’s
health state descriptions. Despite each MAUI seeking
to anchor its scale at 0.00 = death and 1.00 = best
health, the effective measurement scales differ signif-
icantly between instruments.
25
Consequently, an
increment of utility may have a different meaning
when it has been derived from a different MAUI.
This may mask the sensitivity attributable to instru-
ment content and reliability and make evaluation of
sensitivity problematic.
The present article has 2 specific objectives. The
first is to compare 3 methods for measuring sensitiv-
ity and to compare results from 6 tests of construct
validity. The second objective is to rank instruments
by the results of each test. However, as a choice of
instrument requires a single, not multiple, ranking,
results are also combined into a single indicative
RICHARDSON AND OTHERS
148 MEDICAL DECISION MAKING/FEBRUARY 2016
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
index of sensitivity and validity, which is used to
rank the MAUI in each of the 7 disease areas ana-
lyzed. The study is based on a large multi-instrument
comparison (MIC) survey, which is described in the
second section below along with the methods used
for measuring sensitivity and testing construct valid-
ity. The third section presents the results, which are
discussed in the fourth section. Abbreviations used
throughout the paper are defined in Box 1.
DATA AND METHODS
The MIC Survey
An online survey was conducted in 6 countries:
Australia, Canada, Germany, Norway, the United King-
dom, and the United States. Respondents were initially
asked to indicate if they had a chronic disease and to
rate their overall health on a visual analogue scale
(VAS), where 0.0 represented death and 100 repre-
sented ‘‘best possible health (physical, mental, social).’’
The ‘‘healthy’’ public was defined by the absence of
chronic disease and by a score greater than 70 on the
VAS. Quotas were used to achieve demographically
representative samples of the public in each country;
that is, public respondents were recruited until the pre-
determined target for each demographic cell was
reached. Quotas were also applied to obtain a target
number of respondents in each of 7 chronic disease
areas, namely, arthritis, asthma, cancer, depression,
diabetes, hearing loss, and heart disease.
Instruments used in the study are listed in Table 1,
which also reports the balance between physical and
psychosocial items in the 6 MAUI. Utilities for 5 of
the MAUI were assigned using algorithms provided
by the instruments’ authors: the 5-level EQ-5D-5L
utilities were obtained from the crosswalks pub-
lished by the EuroQoL Group. Both the MAUI and
comparator instruments, described below, were
administered in random order. The full dataset is
freely available on the AQoL website.
47
Comparator Instruments
Disease-specific instruments (DSI) listed in Table
1 were selected from the literature upon advice
from leading researchers in each area. Their correla-
tion with the MAUI is presented in bar charts in
Appendix A. Two preferences measures—the VAS
Box 1 Glossary
Utility Instruments Other
MAUI Multiattribute utility instrument Content validity There are sufficient items describing the
construct to permit valid inferences
AQoL-8D Assessment of QoL–8 dimensions Construct validity A general term for successful testing
a construct
EQ-5D-5L Five (response) level EQ-5D Convergent validity A close relationship (correlation) with
other measures of the construct
HUI3 Health Utilities index, version 3 Disease-specific
instrument (DSI)
A disease-specific (quality-of-life)
instrument
See Table 1 for the 7 DSI
QWB Quality of well-being Effect size (ES) Effect size calculated with the standard
deviation of the public sample
SF-6D Short Form (from the SF-36),
6 dimensions
15D 15 dimensions (no title)
Nonutility Instruments
ICECAP Capabilities Instrument
ONS Subjective Well-Being Instrument,
of the UK Office of National Statistics
Self TTO Time tradeoff on own health state
SF-36 Short form; 36 items
PCS Physical component score of SF-36
MCS Mental component score of SF-36
VAS Visual analogue scale
SENSITIVITY AND CONSTRUCT VALIDITY OF 6 UTILITY INSTRUMENTS
ORIGINAL ARTICLE 149
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
and Self Time Tradeoff (TTO)—are described in
Appendices B and C. The latter experimental instru-
ment asked respondents to tradeoff time in their pres-
ent health state against length of life.
26
Physical
health–related QoL was measured with the SF-36
physical component score (PCS) and psychosocial
health by both the SF-36 mental component score
(MCS) and with a new subjective well-being (SWB)
instrument developed by the UK Office of National
Statistics (ONS). It was preferred to other measures
of SWB in the MIC survey as it achieved a higher cor-
relation with all other instruments. The Capabilities
Instrument (ICECAP) instrument focuses on a spe-
cific subset of capabilities, namely, the ability to
function as an independent and emotionally ful-
filled individual, a concept close to the notion of
eudemonia. Irrespective of its nomenclature, in the
recent literature it has been nominated as an impor-
tant measure of well-being, which supports its use as
a test of the content validity of the MAUI.
Eight edit criteria were employed. Six of these
were based on 6 questions, which were repeated at
some stage of the survey. Results were not used
when responses to these questions differed by 2 or
more response categories or when 2 or more of the
answers differed by more than 1 response category.
Surveys were removed that were completed in less
than 20 min, which was deemed to be the minimum
time for the provision of more than 300 items of
information. The edit procedures, the questionnaire,
and its administration are described in Richardson
and others.
46
The online survey was administered
by a global panel company, CINT Pty Ltd. The survey
was approved by Monash University Human
Research Ethics Committee, Monash University, Mel-
bourne, Australia, reference number CF11/3192-
2011001748.
Methods
In each of the 7 disease areas, 3 measures of sensi-
tivity were estimated and compared. These were 1)
unadjusted differences between public and patient
mean utilities, 2) the Cohen effect size (the mean dif-
ferences divided by the standard deviation of the
public sample), and 3) the Pearson correlation
between MAUI utilities and scores from the relevant
DSI. The latter 2 measures are independent of scale
effects and were combined into a single index of
sensitivity.
Convergent and content validity were tested and
compared using the correlation between each of the
MAUI and the comparator (criterion) variables. Based
on the correlation between them, the criterion varia-
bles were collapsed into 3 broad categories: 1) prefer-
ence, 2) physical, and 3) psychosocial variables. The
utility that MAUI seek to measure refers to the
strength of a person’s preferences, so convergent val-
idity was tested by the correlation between MAUI and
an index of the 2 preference measures. Content valid-
ity was tested by the correlation between MAUI and
indices of physical and psychosocial variables.
In each case in which an index was constructed,
there was no overarching criterion for weighting the
importance of each test. Results are also measured
on different scales. To combine test results, each
result used in an index was therefore converted to
a percentage of the highest test result for that test in
Table 1 Instruments by Role in the Analysis
Subject of Analysis Analysis of Sensitivity Analysis of Validity
Multiattribute Utility
Instruments
Disease-Specific
Instruments
Criterion (Comparator)
Instruments (Generic Well-Being)
Items Arthritis AIMS2-SF
27
SF-36
28
Physical/pain Psychosocial Asthma AQLQ
29
Capabilities (ICECAP)
30
Physical component score (PCS) of SF-36
Mental component score (MSC) of SF-36
EQ-5D-5L
31,32
4 1 Cancer QLQ C-30
33
Subjective Well-Being (ONS)
34
SF-6D
35
3 3 Depression DASS21
36
Preferences VAS (see Appendix A)
37
HUI3
38
6 2 Depression K10
39
Preferences Self TTO
26
(see Appendix B
26
)
QWB
40
66 clustered problems; 3 items Diabetes Diabetes-39
41
15D
42
10 5 Hearing loss APHAB
43
AQoL-8D
44
10 25 Heart MacNew Instrument
45
Note: Instruments are described in detail in Richardson and others.
37
Superscript numbers indicate the reference source.
RICHARDSON AND OTHERS
150 MEDICAL DECISION MAKING/FEBRUARY 2016
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
a disease group, and indices were constructed as the
unweighted average of the converted test scores. The
procedure gives equal importance to each test in the
index and equal importance to the same relative dif-
ferences that occur on each of the component scales.
There is also no overarching criterion for combin-
ing the indices of sensitivity and validity, but the
ranking of instruments requires their implicit or
explicit combination. Indicative summary scores
were therefore calculated using the methodology
described above; that is, the indices of sensitivity
and validity for each instrument were converted to
a percentage of the highest index in the disease group
and the indicative score calculated as the average of
the 2 converted indices.
RESULTS
Data were obtained from 9665 individuals. Edit
procedures described above resulted in the removal
of 17% of the total: 9.3% as a result of inconsistencies
on individual repeated questions, 4.8% because of
multiple inconsistencies, and 2.9% because of the
minimum time requirement. Numbers removed by
edit criteria are given in country-specific reports
available on the AQoL website.
47
The remaining
7933 respondents are classified by age, gender, edu-
cation, and disease area in Supplementary Table S1.
Patient numbers varied from 772 for cancer to 943
for heart disease. The total sample contained a similar
number of respondents by country, varying from 1177
from Norway to 1460 from the United States. The use
of quotas led to a demographic profile for the healthy
public, which closely resembled the public in each
country.
Summary statistics for the full survey population
are reported in Supplementary Table S2. Mean utili-
ties predicted by the MAUI for the public and total
samples vary by 27% and 33%, respectively. For
the total sample, the standard deviation varies by
a factor of 2.08. The range of scores varies by a factor
of 2.16, with minimum utilities varying from 0.3 for
the SF-6D to –0.51 for the EQ-5D-5L. The EQ-5D-5L,
AQoL-8D, and SF-6D assign the maximum utility of
1.00 to 19.1%, 0.35%, and 1.3% of the total respond-
ents, respectively; 0.3%, 1.3%, and 14.7% of utilities
predicted by the 15D, SF-6D, and AQoL-8D fall below
0.4.
Differences are summarized in Figure 1. This was
constructed by ranking the utilities predicted by
each MAUI from highest to lowest and dividing
them into 100 percentile groups. The figure plots
the average utility in each percentile on the vertical
axis by its rank order on the horizontal axis. The
15D compresses utilities. SF-6D, and to a greater
extent Quality of Well-Being (QWB), utilities initially
decline very quickly but, subsequently, the curves
flatten and utilities become greater than other utili-
ties, with the exception of the 15D.
Instrument Sensitivity
Table 2 reports unadjusted differences between
mean public and patient utilities. These vary from
an average difference of 0.27 for depression to 0.10
for hearing loss. Differences vary significantly when
they are measured by different MAUIs. From the final
column, the difference varies by a factor of 3.62 for
hearing loss and by a factor of 2.11 for all patients;
that is, the estimated effect of curing a disease varies,
on average, by more than 100% depending on the
choice of instrument.
The bottom 2 rows of Table 2 indicate that, as
a measure of sensitivity, the unadjusted difference
in utility between patients and the public is likely
to be confounded and possibly dominated by differ-
ences in the measurement scale of the MAUI. The
rank order of the average difference in utilities
(HUI 3 first; SF-6D last) is almost identical to the
rank order of the range of observations. Consistent
with this, independent analysis of the same data
found, on average, that 30.3% of the difference
between instrument utilities could be explained by
linear scale effects.
25
The alternative measures of
sensitivity—the Cohen effect size and the correla-
tion with the DSI—take account of linear differences
in the scale.
The 3 measures of sensitivity are reported and
compared in Figure 2a–h. For each disease area, the
6 MAUI are rank ordered in the 3 columns by the
score from the 3 measures of sensitivity, which is
reported in brackets below the MAUI name. The loca-
tion of the MAUI name in each column visually indi-
cates the relative numerical value of the measure on
a 100-point scale. Thus, in Figure 2h, the difference
in mean public and patient utilities for all disease
areas varies from 0.22 for HUI 3 to 0.11 for SF-6D.
Therefore, these 2 instruments are located on the
standardized scale at 100 and 0.0, respectively.
From column 2, the average effect size varies from
1.93 for the 15D to 1.02 for SF-6D, and these 2 instru-
ments are located at 100 and 0.0 on the scale, respec-
tively. In the final column, the Pearson correlation
between the DSI and MAUI varies from 0.56 for
AQoL-8D to 0.37 for the QWB, and these instruments
SENSITIVITY AND CONSTRUCT VALIDITY OF 6 UTILITY INSTRUMENTS
ORIGINAL ARTICLE 151
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
are located on the scale at the 100 and 0.0, respec-
tively. These average results indicate a difference of
100%, 89.2%, and 51% between the maximum and
minimum scores for the 3 measures. The rank order
of the scores varies substantially, and the order varies
by disease area.
Figure 1 Mean multiattribute utility by ranked percentile. Source: Richardson and others.
46
Table 2 Difference between Public and Patient Mean Utilities by Disease and Multiattribute Utility
Instrument
EQ-5D SF-6D HUI3 15D QWB AQoL-8D Average Max/Min
Public (mean) 0.88 0.80 0.88 0.94 0.74 0.83 0.84
Public-patient mean difference
Arthritis 0.24 0.13 0.26 0.13 0.17 0.19 0.19 2.02
Asthma 0.12 0.09 0.12 0.09 0.12 0.14 0.11 1.45
Cancer 0.18 0.11 0.20 0.12 0.14 0.16 0.15 1.85
Depression 0.29 0.20 0.35 0.18 0.21 0.38 0.27 2.04
Diabetes 0.17 0.10 0.20 0.11 0.13 0.17 0.15 1.96
Hearing loss 0.09 0.05 0.18 0.06 0.11 0.11 0.10 3.62
Heart disease 0.16 0.10 0.18 0.11 0.14 0.15 0.14 1.84
Average 0.18 0.11 0.22 0.12 0.14 0.18 0.16 2.11
Rank order
Average difference 2 6 1 5 4 2
Range 1 6 2 5 4 3
RICHARDSON AND OTHERS
152 MEDICAL DECISION MAKING/FEBRUARY 2016
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
Figure 2 Three measures of sensitivity: each figure ranks the multiattribute utility instruments (MAUIs) by (1) differences between mean
utilities of the patient and public group; (2) by the effect size calculated using the standard deviation of the public; and (3) the Pearson
correlation between the MAUI and the disease-specific, quality-of-life instrument (DSI). Numbers in brackets are the absolute differences
(column 1), effect sizes (column 2) and correlation coefficients (column 3).
SENSITIVITY AND CONSTRUCT VALIDITY OF 6 UTILITY INSTRUMENTS
ORIGINAL ARTICLE 153
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
Construct Validity
The correlation between criteria instrument scores
is reported in Supplementary Table S3. The compar-
atively high correlation between the MCS, ONS, and
ICECAP contrasts with their low correlation with the
PCS and indicates that the 3 instruments are more
closely related to psychosocial than to physical
attributes. Consequently, in the summary Table 3,
the correlations between MAUI and these 3 variables
were averaged to obtain a single index of psychoso-
cial content. Only a single index of physical health–
related QoL was available, namely, the PCS of the
SF-36, and it was employed as the index of physical
content. Correlations between the 2 preference
instruments, the VAS and Self TTO, were also aver-
aged to obtain a single index.
A relatively clear pattern emerges in Table 3. The
15D has the highest and the AQoL-8D the lowest aver-
age correlation with the PCS. However, with the
exception of AQoL-8D and QWB, the average correla-
tion across all disease areas is similar, differing by
only 0.07. AQoL-8D has the highest average correla-
tion with psychosocial variables and QWB the low-
est, followed by EQ-5D-5L. Differences are large,
with the average correlation varying from 0.48 (EQ-
5D-5L) to 0.72 (AQoL-8D). AQoL-8D also has the larg-
est average correlation with the preferences index
across all disease areas (0.51), with QWB and EQ-
5D-5L again having the lowest score (0.38 and 0.43,
respectively). The correlations between the MAUI
utilities and scores from individual criterion varia-
bles are reported in Appendix D for each disease area.
Indicative Summary Scores
The indices for instrument sensitivity, construct
validity, and the indicative summary scores are
reported in Table 4. While results vary by disease,
several patterns emerge. In the majority of disease
areas, the 15D obtains the highest overall composite
score (Table 4c) followed in each case by the AQoL-
8D or HUI3. Exceptions to this pattern are hearing
loss and depression, for which the HUI3 and AQoL-
8D have the highest average scores, respectively. In
contrast, the QWB consistently achieves the lowest
scores on every criterion for every disease except
hearing loss. The EQ-5D obtains the second lowest
score for 5 of the 7 diseases.
For reasons outlined below, the effect size and
therefore the indicative summary scores of the 15D
and SF-6D are inflated and deflated, respectively,
by nonlinear scale effects. Omitting the effect size
Table 3 Correlations of Multiattribute Utility Instruments with Indices of Physical and Psychosocial Content and Preferences by
Patient Groups
a,b
Category Physical Content
c
Psychosocial Content
d
Preference Measures
e
Index PCS Average of MCS+ICECAP+ONS Average of VAS and Self TTO
Disease area EQ-5D SF-6D HUI3 15D QWB AQoL-8D EQ-5D SF-6D HUI3 15D QWB AQoL-8D EQ-5D SF-6D HUI3 15D QWB AQoL-8D
Arthritis 0.67 0.69 0.68 0.69 0.61 0.54 0.48 0.63 0.55 0.58 0.44 0.74 0.45 0.50 0.47 0.51 0.43 0.53
Asthma 0.64 0.61 0.61 0.68 0.52 0.49 0.45 0.59 0.52 0.52 0.40 0.70 0.46 0.44 0.45 0.52 0.40 0.54
Cancer 0.66 0.71 0.63 0.70 0.61 0.57 0.50 0.63 0.56 0.57 0.47 0.74 0.43 0.4 0.44 0.49 0.39 0.52
Depression 0.57 0.62 0.53 0.66 0.51 0.48 0.48 0.57 0.57 0.51 0.42 0.69 0.41 0.41 0.44 0.44 0.35 0.48
Diabetes 0.70 0.69 0.66 0.71 0.62 0.58 0.46 0.59 0.54 0.54 0.41 0.71 0.42 0.43 0.45 0.49 0.37 0.52
Hearing loss 0.70 0.69 0.66 0.71 0.62 0.58 0.44 0.57 0.52 0.52 0.37 0.72 0.41 0.38 0.42 0.44 0.30 0.48
Heart disease 0.67 0.70 0.66 0.72 0.63 0.59 0.54 0.64 0.57 0.59 0.48 0.75 0.45 0.49 0.48 0.50 0.44 0.52
Average correlation 0.66 0.67 0.63 0.70 0.59 0.55 0.48 0.60 0.55 0.55 0.43 0.72 0.43 0.45 0.45 0.49 0.38 0.51
a. Bolded numbers indicate the instrument ranked first in the category. Underlining indicates an instrument ranked fifth or sixth in the category.
b. Correlations between MAUI and each criterion variable are reported as supplementary material.
c. Correlation between MAUI and PCS.
d. Correlation between MAUI and average of MCS, ICECAP, and ONS after rescaling to a 0 to 1 scale.
e. Correlation between MAUI and average of the VAS and Self TTO after rescaling to a 0 to 1 scale.
RICHARDSON AND OTHERS
154 MEDICAL DECISION MAKING/FEBRUARY 2016
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
Table 4 Indices of Sensitivity, Validity, and Indicative Summary Score
4a Index of Sensitivity = Average of
Effect Size + Correlation with DSI
a,b
4b Index of Validity = Average of Rescaled
Indices of Psychosocial Content,
Physical Content, and Preferences
a,b
MAUI EQ-5D SF-6D HUI3 15D QWB AQoL-8D EQ-5D SF6D HUI3 15D QWB AQoL-8D
Arthritis 0.88 0.68
c
0.89 0.94 0.56 0.83 0.57
c
0.65 0.60 0.63 0.53 0.66
Asthma 0.68
c
0.69 0.68 1.00 0.52 0.71 0.40
c
0.42 0.42 0.44 0.32 0.44
Cancer 0.78 0.72
c
0.81 1.00 0.62 0.72
c
0.57
c
0.66 0.59 0.63 0.53 0.67
Depression 0.67
c
0.71 0.77 0.84 0.46 0.94 0.53
c
0.58 0.57 0.58 0.47 0.61
Diabetes 0.77
c
0.77
c
0.83 0.98 0.65 0.82 0.57
c
0.62 0.60 0.63 0.50 0.66
Hearing loss 0.51 0.47
c
1.00 0.82 0.64 0.69 0.55
c
0.58 0.57 0.60 0.46 0.64
Heart disease 0.70
c
0.71 0.75 0.95 0.57 0.78 0.60
c
0.66 0.62 0.65 0.56 0.68
Average 0.71 0.68
c
0.82 0.93 0.57 0.78 0.54
c
0.60 0.57 0.59 0.48 0.62
Rank order
4c Overall Composite A = Average of 4a and 4b
4d Overall Composite B –
Removes Effect Size from 4a
1 2 3456123456
Arthritis 15D
(0.79)
HUI3
(0.75)
AQoL-8D
(0.74)
EQ-5D
(0.72)
SF6D
(0.67
) QWB
(0.54)
AQoL-8D
(0.83)
15D
(0.76)
HUI3
(0.54)
SF6D
(0.72)
= EQ5D
(0.72)
QWB
(0.54)
Asthma 15D
(0.72)
AQoL-8D
(0.57)
SF-6D
(0.56)
HUI3
(0.55)
EQ-5D
(0.54
) QWB
(0.42)
15D
(0.72)
SF-6D
(0.63)
AQoL-8D
(0.62)
HUI3
(0.61)
EQ-5D
(0.58)
QWB
(0.41)
Cancer 15D
(0.82)
HUI3
(0.70)
AQoL-8D
(0.69)
= SF-6D
(0.69)
EQ-5D
(0.67
) QWB
(0.57)
15D
(0.82)
SF-6D
(0.80)
AQoL-8D
(0.77)
HUI3
(0.74)
EQ-5D
(0.72)
QWB
(0.63)
Depression AQoL-8D
(0.77)
15D
(0.71)
HUI3
(0.67)
SF-6D
(0.65
) EQ-5D
(0.60
) QWB
(0.46)
AQoL-8D
(0.80)
SF-6D
(0.71)
HUI3
(0.64)
15D
(0.63)
EQ-5D
(0.58)
QWB
(0.45)
Diabetes 15D
(0.80)
AQoL-8D
(0.74)
HUI3
(0.72)
SF-6D
(0.69
) EQ-5D
(0.67
) QWB
(0.58)
AQoL-8D
(0.82)
SF-6D
(0.81)
15D
(0.79)
HUI3
(0.72)
EQ-5D
(0.69)
QWB
(0.63)
Hearing loss HUI3
(0.79)
15D
(0.71)
AQoL-8D
(0.66)
EQ-5D
(0.53
) SF-6D
(0.53
) QWB
(0.55)
HUI3
(0.79)
AQoL-8D
(0.72)
15D
(0.71)
SF-6D
(0.58)
QWB
(0.58)
EQ-5D
(0.52)
Heart disease 15D
(0.80)
AQoL-8D
(0.73)
HUI 3
(0.68)
= SF-6D
(0.68)
EQ-5D
(0.65
) QWB
(0.56)
AQoL-8D
(0.84)
SF-6D
(0.79)
15D
(0.78)
HUI3
(0.71)
EQ-5D
(0.67)
QWB
(0.58)
a. Averages were calculated by first dividing the component indices for each MAUI by the maximum component index. The indices of sensitivity and validity were then obtained.
b. Bold indicates highest score.
c. Lowest score excluding QWB. Index numbers are subscripts presented in parentheses.
155
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
from the indices (Table 4d) improves the ranking of
the SF-6D for every disease category and reduces
the ranking of the 15D in 5 and the HUI3 in 4 of the
7 disease categories. The ranking of the remaining
instruments remains relatively stable.
DISCUSSION
In CUA, the comparison of services is based on the
assumption that the QALYs they produce, and there-
fore the utilities used to calculate the QALYs, have
the same cardinal property: an increment of utility
produced by service A has the same importance as
the same increment produced by service B. This
assumption may be violated by the use of different
MAUI, which have different scale properties and dif-
fering sensitivities to health states. For example, from
Table 2, the substitution of the SF-6D for the HUI3
would halve the number of QALYs apparently gained
from returning the ‘‘average’’ arthritis patient to the
average health state of the public and the priority of
arthritis services would fall.
Inconsistency may be avoided by the use of a single
MAUI. However, this solution is problematic because
of variation in the content of instruments and there-
fore differing sensitivities to different health states.
From Table 2, the ‘‘cure’’ of depression (i.e., a return
from patient to public average utility), as measured by
the EQ-5D, HUI3, SF-6D, and AQoL-8D, would gener-
ate 21%, 35%, 54%, and 100% more QALYs than the
cure of arthritis. This implies that the result of CUA
and the funding of services may depend on the choice
of the MAUI.
The first objective of the present article was to
compare alternative methods for measuring instru-
ment sensitivity and construct validity. Results
indicate that common approaches to measuring sen-
sitivity do not yield similar results. In 5 of the 7 dis-
ease areas, the mean difference between public and
patient utilities is greatest when it is measured by
the HUI3. Using the 15D, the difference is smallest
or next to smallest in every disease area. In contrast,
the effect size using the 15D is largest in every case,
except for hearing loss, while the HUI3 retains the
second highest average rank order. However, using
the correlation with a DSI as the criterion, the
average rank order of HUI3 falls to fourth and
AQoL-8D replaces 15D as having the highest average
ranking. SF-6D, which has the lowest average rank-
ing using both mean differences and the effect size
as the criterion rises to have the third highest aver-
age ranking.
A limitation with these observations is that they
document the problem of obtaining an unambiguous
ranking of instrument sensitivities, but they do not
fully identify the reason for the differences. The
change in apparent sensitivity that occurs when it is
measured by the effect size rather than by differences
in mean utilities is attributable to differences in the
standard deviations and the scales employed in the
assignment of utilities. As shown in Figure 1, ceiling
effects differ. Using the EQ-5D-5L, 41% of the healthy
public assigned a utility of 1.00 to their health state.
In contrast, the SF-6D and AQoL-8D assigned a utility
of 1.00 to 4.2% and 4.9% of the healthy public,
respectively.
46
Frequency distributions in Figure 1
also vary with the severity of the health state. Near
full health, there is a high compression of 15D utili-
ties resulting in a low standard deviation and large
effect size. The converse is true for the SF-6D. The
present results do not indicate whether these differ-
ences are a result of health state descriptions, the
assignment of utilities to items, or the reliability of
the items, an investigation that would require a sepa-
rate analysis using a technique such as item response
theory. In sum, interpretation and comparison of the
effect sizes is problematic, and as shown in Table 4c
and 4d, its inclusion or exclusion significantly alters
the indicative ranking of instruments.
As DSI are dedicated to particular diseases, they
might be seen as a gold standard for measuring sensi-
tivity. However, their use is also subject to a number
of caveats. First, DSI are typically administered only
to patients in a disease category (as occurred in the
present MIC survey). Consequently, their correlation
with a MAUI indicates sensitivity within the disease
area and not between average patient and full health.
With the varying frequency distributions of utilities,
these may differ. Second, there are a large number
of DSI, and none can claim to be a gold standard for
assessing utility. As in the MAUI literature, there is
an unresolved question concerning the appropriate
content of the instruments and the appropriate com-
position of the descriptive system. The MIC survey
included instruments that, from the literature and
from clinical advice, appeared most able to evaluate
patient QoL. However, the choice of instrument is
contestable.
The analysis of content and convergent validity
further demonstrated the complexity of comparisons
between MAUI. Evidence for construct validity
varies with the criterion and disease area. As judged
by the instruments in the MIC survey, the 15D scores
most highly with respect to physical content and the
AQoL-8D with respect to both psychosocial content
RICHARDSON AND OTHERS
156 MEDICAL DECISION MAKING/FEBRUARY 2016
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
and preference scores. Tests were necessarily limited
to comparisons with a small number of criterion
instruments, and alternative tests are possible. How-
ever, as noted, validation is an ongoing process of
testing the properties and not a ‘‘one-shot’’ definitive
test.
The second objective of the article was to deter-
mine whether, given the ambiguities in the test
results, conclusions could be drawn with respect to
the preferred instrument in each disease area. This
raises the contentious issue of how to combine results
from different tests. However, different weightings of
the 3 broad sets of results are unlikely to significantly
alter the ranking of the overall index in Table 4.
Increasing the importance of physical content would
favor the 15D; increasing the weighting on psychoso-
cial or preference scores would favor AQoL-8D.
The final summary scores in Table 4 do not purport
to identify the best instrument in all contexts: they
summarize a limited number of criteria combined as
described. In selecting the appropriate instrument
for use in a study, other considerations may also be
relevant. In particular, the cross-section MIC survey
does not allow the estimation of test-retest reliability
scores for instruments as, for example, in the 5-instru-
ment comparison by Palta and others.
21
Research
may also have a focus on a particular aspect of health
due, for example, to the presence or absence of partic-
ular comorbidities in their patients’ population. This
could lead to the preference for an instrument with
a disproportionate focus on physical or psychosocial
health or a property or content omitted from the pres-
ent study.
Caveats also arise from the data used in the analy-
sis. A web-based survey has less control over the
quality of the data at the point of collection. However,
the stringent edit procedures described earlier
resulted in a data set with a high degree of internal
consistency. The correlation between instruments
was generally higher than found in other studies,
which increases confidence in the integrity of
the data. The respondents were nevertheless self-
selected members of a survey panel. But the analysis
in the present article did not require a fully represen-
tative sample of the general or patient populations as
its purpose was the comparison of the properties of
instruments. This required a comparison of results
from the same—not representative—patients and
members of the public.
A further caveat is that the same MAUI algorithms
were used with all individuals irrespective of
their country of origin. Ideally, utility weights should
be derived from representatives with similar
preferences (implying not simply a common nation-
ality but similar socioeconomic and health-related
characteristics). In practice, it is unlikely that the
present results would be affected significantly by
the use of a nationally derived algorithm. Analysis
of MIC data has demonstrated that the differences
between MAUI utilities in the study are overwhelm-
ingly attributable to instrument content and scale
effects. The residual role of the preferences of those
sampled explains less than 4% of the variation in
preferences.
25
Even if this figure were larger, there
is no clear way in which differing preferences would
explain the pattern of results reported here, which are
more readily explained by the varying MAUI scale
effects shown in Figure 1 and the dissimilar descrip-
tive systems summarized in Table 1.
CONCLUSIONS
Irrespective of the ambiguities arising from differ-
ing tests, researchers using CUA must select an
instrument for measuring utility. However, the
choice is problematic as there is no simple gold stan-
dard for evaluating instruments. This article has
attempted to demonstrate a number of these problems
but to report results that may assist with the selection
of an MAUI.
Results indicate that each of the measures of sensi-
tivity considered here is subject to a bias that is
difficult to quantify. Tests of both the content and
convergent validity of the MAUI give differing
results. The final choice of instrument therefore
requires consideration of several sources of imperfect
evidence. Despite these problems, some conclusions
are well supported by the results. The optimal instru-
ment varies with the disease area: HUI3 achieves the
highest average scores in the context of hearing loss,
the AQoL-8D in the context of psychosocial prob-
lems, and the 15D in the context of physical prob-
lems. Overall, the evidence is favorable to the more
widespread use of the 15D and AQoL-8D and the
more limited use of the EQ-5D-5L.
ACKNOWLEDGMENTS
The authors would like to thank the reviewers for helpful
suggestions.
REFERENCES
1. Wisloff T, Hagen G, Hamidi V, Movik E, Klemp M, Olsen JA.
Estimating QALY gains in applied studies: a review of the
SENSITIVITY AND CONSTRUCT VALIDITY OF 6 UTILITY INSTRUMENTS
ORIGINAL ARTICLE 157
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
cost-utility analyses published in 2010. Pharmacoeconomics.
2014;32(4):367–75.
2. Neumann PJ, Thorat T, Shi J, Saret CJ, Cohen JT. The changing
face of the cost-utility literature, 1990-2012. Value Health. 2015;
18(2):271–7.
3. Richardson J, McKie J, Bariola E. Multi attribute utility instru-
ments and their use. In: Culyer AJ, ed. Encyclopedia of Health Eco-
nomics. San Diego: Elsevier Science; 2014. p 341–57.
4. Cherepanov D, Palta M, Fryback DG. Underlying dimensions of
the five health-related quality of life measures used in utility
assessment. Med Care. 2010;48(8):718–25.
5. Fryback DG, Palta M, Cherepanov D, Bolt D, Kim J. Comparison
of 5 health related quality of life indexes using item response the-
ory analysis. Med Decis Making. 2010;30(1):5–15.
6. Moock J, Kohlmann T. Comparing preference-based quality-o-
f-life measures: results from rehabilitation patients with musculo-
skeletal, cardiovascular, or psychosomatic disorders. Qual Life
Res. 2008;17:485–95.
7. McDonough CM, Grove MR, Tosteson TD, Lurie JD, Hilibrand
AS. Comparison of EQ-5D, HUI, and SF-36-derived societal health
state values among Spine Patient Outcomes Research Trial
(SPORT) participants. Qual Life Res. 2005;14:1321–32.
8. Feeney D, Spritzer K, Hays R, et al. Agreement about identifying
patients who change over time: cautionary results in cataract and
heart failure patients. Med Decis Making. 2012;32(2):273–86.
9. Khanna D, Maranian P, Palta M, et al. Health-related quality of
life in adults reporting arthritis: analysis from the National Health
Management Study. Qual Life Res. 2011;20(7):1131–40.
10. Lillegraven S, Kristiansen S, Kvien T. Comparison of utility
measures and their relationship with other health status measures
in 1041 patients with rheumatoid arthritis. Ann Rheum Dis. 2010;
69(10):1762–7.
11. Sorensen J, Linde L, Ostergaard M, Hetland M. Quality-ad-
justed life expectancies in patients with rheumatoid arthritis: com-
parison of index scores from EQ-5D, 15D, and SF-6D. Value Health.
2012;15(2):334–9.
12. Kontodimopoulos N, Pappa E, Chadjiapostolou Z, Arvanitaki
E, Papdopoulos A, Niakas D. Comparing the sensitivity of EQ-5D,
SF-6D and 15D utilities to the specific effect of diabetic complica-
tions. Eur J Health Econ. 2012;13(1):111–20.
13. Kuspinar A, Mayo N. Do generic utility measures capture what
is important to the quality of life of people with multiple sclerosis?
Health Qual Life Outcomes. 2013;11:71.
14. Kuspinar A, Mayo N. A review of the psychometric properties
of generic utility measures in mutiple sclerosis. Pharmacoeconom-
ics. 2014;32(8):759–73.
15. Teckle P, Peacock S, McTaggart-Cowan H, et al. The ability of
cancer-specific and generic preference-based instruments to dis-
criminate across clinical and self-reported measures of cancer
severities. Health Qual Life Outcomes. 2011;9:106.
16. McDonough CM, Tosteson TD, Tosteson A, Jette A, Grove MR,
Weinstein J. A longitudinal comparison of 5 preference-weighted
health state classification systems in persons with intervertebral
disk herniation. Med Decis Making. 2011;31(2):270–80.
17. Tosh J, Brazier J, Evans P, Longworth L. A review of generic
preference based measures of health related quality of life in visual
disorders. Value Health. 2012;15:118–27.
18. Groessl RJ, Liu L, Sklar M, Tally S, Kaplan R, Ganiats TG. Mea-
suring the impact of cataract surgery on generic and vision-specific
quality of life. Qual Life Res. 2013;22:1405–14.
19. Yang Y, Longworth L, Brazier J. An assessment of validity and
responsiveness of generic measures of health-related quality of life
in hearing impairment. Qual Life Res. 2013;22(10):2813–28.
20. Pinto AM, Kupperman M, Nakagawa S, et al. Comparison and
correlates of three preference-based health-related quality of life
measures among overweight and obese women with urinary incon-
tinence. Qual Life Res. 2011;20(10):1655–62.
21. Palta M, Chen HY, Kaplan R, Feeny D, Cherepanov D, Fryback
DG. Standard error of measurement of 5 health utility indexes
across the range of health for use in estimating reliability and
responsiveness. Med Decis Making. 2011;31(2):260–9.
22. Drummond M, Brixner D, Gold M, Kind P, McGuire A, Nord E.
Toward a consensus on the QALY. Value Health. 2009;12(1):S31–5.
23. Streiner D, Norman GR. Health Measurement Scales: A Practi-
cal Guide to Their Development and Use. Oxford: Oxford Univer-
sity Press; 2003.
24. Cronbach J, Meehl P. Construct validity in psychological tests.
Psychol Bull. 1955;52:281–302.
25. Richardson J, Iezzi A, Khan MA. Why do multi attribute utility
instruments produce different utilities: the relative importance of
the descriptive systems, scale and ‘micro utility’ effects. Qual
Life Res. 2015;24(8):2045–53.
26. Richardson J, Iezzi A, Khan MA. The Self Time t (TTO) Instru-
ment: Reliability and Survey Results. Research Paper 86. Mel-
bourne (Australia): Centre for Health Economics, Monash
University; 2014.
27. Meenan RF, Gertman PM, Mason JH. Measuring health status
in arthritis: the Arthritis Impact Measurement Scales. Arthritis
Rheum. 1980;23:146–52.
28. Ware J, Sherbourne D. The MOS 36-Item Short-Form Health
Survey (SF-36). I. Conceptual framework and item selection.
Med Care. 1992;30(6):473–83.
29. Marks GB, Dunn SM, Woolcock AJ. A scale for the measure-
ment of quality of life in adults with asthma. J Clin Epidemiol.
2006;45(5):461–72.
30. Al-Janabi H, Flynn T, Coast J. Development of a self-report
measure of capability wellbeing for adults: the ICECAP-A. Qual
Life Res. 2012;21:167–76.
31. Rabin R, Oemar M, Oppe M, Janssen B, Herdman M. EQ-5D-5L
User Guide: Basic Information on How to Use the EQ-5D-5L Instru-
ment. Rotterdam (the Netherlands): EuroQoL Group. Available
from: URL: http://www.euroqol.org/about-eq-5d/publications/
user-guide.html. Accessed 6 July 2015.
32. van Hout B, Janssen MF, Feng Y, et al. Interim scoring for the
EQ-5D-5L: mapping the EQ-5D-5L to EQ-5D-3L value sets. Value
Health. 2012;15:708–15.
33. Aaronson NK, Ahmedzai S, Bergman B, et al. The European
Organization for Research and Treatment of Cancer QLQ-C30:
a quality-of-life instrument for use in international clinical trials
in oncology. J National Cancer Inst. 1993;85(5):365–76.
34. Hicks S. Case study: ONS measuring subjective wellbeing: The
UK Office for National Statistics Experience. In: Helliwell J, Layard
R, Sachs J, eds. World Happiness Report 2012. New York: The Earth
Institute, Columbia University; 2012. Available from: http://
RICHARDSON AND OTHERS
158 MEDICAL DECISION MAKING/FEBRUARY 2016
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from
www.earthinstitute.columbia.edu/articles/view/2960. Accessed 30
April 2012.
35. Brazier J, Roberts J, Deverill M. The estimation of a preference-
based measure of health from the SF-36. J Health Econ. 2002;21:
271–92.
36. Lovibond SH, Lovibond PF. Manual for the Depression Anxi-
ety Stress Scales. 2nd ed. Sydney: Psychology Foundation; 1995.
37. Richardson J, Khan MA, Iezzi A, Maxwell A. Cross-National
Comparison of Twelve Quality of Life Instruments: MIC Paper 1:
Background, Questions, Instruments. Research Paper 76.
Melbourne (Australia): Centre for Health Economics, Monash Uni-
versity; 2012. Available from: ULR: http://www.buseco.monash.
edu.au/centres/che/pubs/researchpaper76.pdf. Accessed 29 July
2013.
38. Feeny D, Furlong W, Torrance G, et al. Multi attribute and sin-
gle attribute utility functions for the Health Utilities Index Mark 3
System. Med Care. 2002;40(2):113–28.
39. Kessler RC, Barker PR, Colpe LJ, et al. Screening for serious
mental illness in the general population. Arch Gen Psychiatry.
2003;60(2):184–89.
40. Kaplan R, Bush J, Berry C. Health status: types of validity and
the index of wellbeing. Health Serv Res. 1976;11(4):478–507.
41. Boyer JG, Earp JAL. The development of an instrument for
assessing the quality of life of people with diabetes. Med Care.
1997;35(5):440–53.
42. Sintonen H, Pekurinen M. A fifteen-dimensional measure of
health related quality of life (15D) and its applications. In: Walker
S, Rosser R, eds. Quality of Life Assessment. Dordrecht (the Neth-
erlands): Kluwer Academic; 1993.
43. Cox RM, Alexander GC. The abbreviated profile of hearing aid
benefit. Ear Hear. 1995;19(2):176–86.
44. Richardson J, Sinha K, Iezzi A, Khan MA. Modelling utility
weights for the Assessment of Quality of Life (AQoL) 8D. Qual
Life Res. 2014;23(8):2395–404.
45. Ho
¨fer S, Lim L, Guyatt G, Oldridge N. The MacNew Heart Dis-
ease health-related quality of life instrument: a summary. Health
Qual Life Outcomes. 2004;2(3):1–8.
46. Richardson J, Khan MA, Iezzi A, Maxwell A. Comparing and
explaining differences in the content, sensitivity and magnitude
of incremental utilities predicted by the EQ-5D, SF-6D, HUI 3,
15D, QWB and AQoL-8D multi attribute utility instruments’.
Med Decis Making. 2015;35(3):276–91.
47. AQoL. Assessment of Quality of Life (AQoL). 2015. Available
from: URL: http://www.aqol.com.au.cited 28 May 2015.
SENSITIVITY AND CONSTRUCT VALIDITY OF 6 UTILITY INSTRUMENTS
ORIGINAL ARTICLE 159
at Monash University on May 18, 2016mdm.sagepub.comDownloaded from