Items from patient-oriented instruments can be integrated into interval
scales to operationalize categories of the International Classification of
Functioning, Disability and Health
Alarcos Ciezaa,b, Roger Hilfikerb, Annelies Boonenc, Somnath Chatterjid, Nenad Kostanjseke,
Bedirhan T. U ¨stu ¨ne, Gerold Stuckia,b,f,*
aICF Research Branch of the WHO Collaborating Center for the Family of International Classifications at the German Institute of Medical Documentation
and Information (DIMDI), IHRS, Ludwig-Maximilian University, Munich, Germany
bHuman Functioning Sciences Division, Swiss Paraplegic Research, Nottwil, Switzerland
cDivision of Rheumatology, Department of Internal Medicine, University Hospital Maastricht, Maastricht, The Netherlands
dDepartment of Measurement and Health Information Systems, World Health Organization, Switzerland
eClassification, Terminology and Standards Team, World Health Organization, Switzerland
fDepartment of Physical Medicine and Rehabilitation, University Hospital Munich, Ludwig-Maximilian University, Munich, Germany
Accepted 28 April 2008
Objective: To exemplify the construction of interval scales for specified categories of the International Classification of Functioning,
Disability and Health (ICF) by integrating items from a variety of patient-oriented instruments.
Study Design and Setting: Psychometric study using data from a convenience sample of 122 patients with rheumatoid arthritis. Pa-
tients completed six different patient-oriented instruments. The contents of the instrument items were linked to the ICF. Rasch analyses for
ordered-response options were used to examine whether the instrument items addressing the ICF category b130: Energy and drive functions
constitute a psychometrically sound interval scale.
Results: Nineteen items were linked to b130: Energy and drive functions. Sixteen of the 19 items fit the Rasch model according to the
chi-square (c2) statistic (c2
df532538.25, P5 0.21) and the Z-fit statistic (ZMean50.451, ZSD51.085 and ZMean5 ?0.223, ZSD5 1.132 for
items and persons, respectively). The Person Separation Index rbwas 0.93.
Conclusion: The ICF category interval scales to operationalize single ICF categories can be constructed. The original format of the
items included in the interval scales remains unchanged. This study represents a step forward in the operationalization and future imple-
mentation of the ICF. ? 2009 Elsevier Inc. All rights reserved.
Keywords: Outcome; assessment; Health; Psychometrics; Classification; Questionnaires; Fatigue
Functioning and disability are universal human experi-
ences [1,2] that are at the core of medicine  and public
health . They are also of essential relevance in sectors,
such as labor, education, and social affairs .
In medicine, the management of limitations in function-
ing complements medical and surgical care throughout the
service continuum, from the acute to the community health
care situation . Improving or maintaining functioning or
the prevention of disability is becoming one of the most
urgent outcomes in public health . In the labor, educa-
tion, and social affairs sectors, planning and implementa-
tion of preventative actions are only viable if the needs of
people experiencing, or likely to experience, disability are
considered . Accordingly, concepts, classifications, and
measures of functioning and disability are of great interest
and importance across professional disciplines and sectors.
With the International Classification of Functioning, Dis-
ability and Health (ICF) , the WHO, for the first time, pro-
vides a universal and globally accepted framework and
classification to describe the full range of human functioning
and disability that may be affected by a health condition .
The ICF model identifies three components of the dimension
functioning, namely body functions and structures, activities,
are called impairments, activity limitations, and participation
* Corresponding author. Department of Physical Medicine and Reha-
bilitation, University Hospital Munich, Ludwig-Maximilian University,
Marchioninistr. 15, 81377 Munich, Germany. Tel.: þ49-89-7095-4050;
E-mail address: Gerold.email@example.com (G. Stucki).
0895-4356/09/$ e see front matter ? 2009 Elsevier Inc. All rights reserved.
Journal of Clinical Epidemiology 62 (2009) 912e921
ability. Dimensions of functioning and of disability are both
tual factors (environmental and personal).
The components of body functions and structures, activ-
ities and participation, and environmental factors (a list of
personal factors awaits further research and development)
are classified based on the ICF categories. The ICF contains
a total of 1,424 categories that are mutually exclusive and
organized within a hierarchically nested structure with up
to four different levels. The ICF categories are denoted
by unique alphanumeric codes with which it is possible
disability, both on the individual and population levels.
Because the ICF categories are always accompanied by
a short definition and inclusions and exclusions, as appro-
priate, the information on the aspects of functioning can
be reported unambiguously and compared based on ICF
categories [10,11]. Examples of ICF categories, with their
definitions, inclusions, and exclusions, can be found in
Table A1 (available on the journal’s website at www.else
vier.com). An example of the hierarchically nested struc-
ture is presented in the following:
‘‘b1 Mental functions’’ (first/chapter level)
‘‘b130 Energy and drive functions’’ (second level)
‘‘b1301 Motivation’’ (third level)
In principle, there are two approaches to measure a spec-
ified ICF category, that is, to quantify the extent of variation
therein. The first is to use the so-called ICF qualifier as an
expert rating scale ranging from 0 to 4 (Table A2 [available
on the journal’s website at www.elsevier.com]). With this
approach, impairments, activity limitations, participation
restrictions, and contextual factors are directly rated
according to established coding guidelines . However,
as in any rating scale, the expert can access whatever sour-
ces of information are available .
The second approach is to use information obtained with
a clinical test that includes standardized expert and techni-
cal examinations, or a patient-oriented instrument that
includes patient- and proxy-reported, self-administered, or
interview-administered questionnaires, and to transform
this information into the ICF qualifier. In a first step, a clin-
ical test or patient-oriented instrument is linked to the ICF
based on established linking rules . In a second step, the
scores obtained with a clinical test or a patient-oriented in-
strument are transformed to the ICF qualifier.
This second approach has the advantage that information
already available can be transformed into the standard lan-
guage of the ICF, to be understood by all interested profes-
sionals irrespective of their disciplines or the sectors (e.g.,
health, labor, or education) in which they are involved.
The transformation to the ICF qualifier is straightfor-
ward when interval-scaled clinical tests or patient-oriented
instruments, which comprehensively and uniquely cover
the content of a respective ICF category, are readily
available. For example, the visual analog scale (VAS) for
the assessment of pain can be linked, in a first step, to
the ICF category b280: Sensation of pain. In a second step,
the values of the VAS-Pain can be transformed into an ICF
qualifier in a straightforward manner, because it represents
a 100-mm interval scale marked as ‘‘no pain’’ at one end
and as ’’worst pain’’ at the other . Considering the per-
centage values of the ICF qualifier of Table A2 (available
on the journal’s website at www.elsevier.com), a person
marking a level of pain between 0 and 4 mm would receive
qualifier 0 in the ICF category b280: Sensation of pain;
between 5 and 24 mm, qualifier 1; between 25 and
49 mm, qualifier 2; between 50 and 95 mm, qualifier 3;
and between 96 and 100 mm, qualifier 4.
If there are no readily available clinical tests or patient-
oriented instruments, a third approach can be developed.
One may consider using parts of clinical test batteries or
selected items of patient-oriented instruments that cover
a specified ICF category. Thus, an ICF category interval
scale can be constructed to serve as an interface between
the clinical test or patient-oriented instrument and the
Established linkage rules [13,15] can be used to identify
suitable parts of clinical tests and items of patient-oriented
instruments. Rasch or Item Response Theory (IRT) models
 can be applied to construct interval scales. However,
the complete process of how to develop ICF category inter-
val scales and how they can serve as an interface have not
been described to date.
The objective of this article was to exemplify the con-
struction of interval scales for specified ICF categories by
integrating items from a variety of widely used patient-
oriented instruments, which were filled in by a convenience
sample of rheumatoid arthritis (RA) patients. The specific
aims are to:
1. identify candidate items from a range of patient-
oriented instruments that address specific ICF cate-
2. estimate the extent to which selected items that
address the ICF category ‘‘Energy and drive func-
tions’’ form a unidimensional, ordered interval scale.
2. Materials and methods
2.1. Study design
The psychometric study used data from a convenience
sample of patients with RA that was collected in a cross-
sectional study conducted at the University Hospital Maas-
tricht, The Netherlands. Patients were asked to fill in a num-
ber of patient-oriented instruments for reasons other than
this psychometric study.
The study protocol and informed consent forms of the
cross-sectional study wereapprovedbythe EthicsCommittee
913A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921
of the University Hospital in Maastricht. Inclusion criteria for
patients were: a diagnosis of RA according to the revised
American College of Rheumatology (ACR) criteria, at least
18 years of age, sufficient knowledge of the Dutch language,
and signed, informed consent.
Patients completed the following patient-oriented instru-
ments: the Rheumatoid Arthritis Quality of Life (RAQoL
) Questionnaire, the Health Assessment Questionnaire
(HAQ ), the Medical Outcomes Study Short Form-36
(SF-36 ), the European Quality of Life Instrument
(EQ-5D ), the Multidimensional Fatigue Inventory
(MFI ), and the Center for Epidemiological Studies
Depression Scale (CES-D ).
2.3. Identification of candidate items
To identify the candidate items for integration into an
ICF category interval scale, the contents of the items from
the instruments as completed by patients were indepen-
dently linked to the ICF by two health professionals using
established linkage rules [13,15]. According to these rules,
all concepts contained in an item will be identified and
linked to the most precise ICF categories. For example,
in the item ‘‘It’s too much effort to go out and see people’’
of the RAQoL, two different concepts were identified,
namely ‘‘it’s too much effort’’ and ‘‘go out and see people.’’
They were linked to the ICF categories b1301: Motivation,
defined as ‘‘mental functions that produce the incentive to
act; the conscious or unconscious driving force of action’’
and to d9205: Socializing, defined as ‘‘engaging in informal
or casual gatherings with others, such as visiting friends or
relatives or meeting informally in public places.’’ The SF-
36 and the EQ-5D had already been linked in another study
. However, they were linked again for this investigation
by the two health professionals who linked all other patient-
oriented instruments. Where disagreement regarding the
identified ICF categories arose, a third person (A.C.) made
an informed decision.
To evaluate the reliability of the linkage process, the per-
centage of observed agreement was calculated based on the
two independent linkage versions of the patient-oriented in-
struments. In addition, the kappa coefficients  and non-
parametric bootstrapped confidence intervals (CIs) [25,26]
were also calculated to investigate if there was a greater
agreement than might occur by chance. The reliability of
the linkage process was studied for all different levels of
the ICF, that is, component first, second, and third levels.
The number of items linked to the different second-level
ICF categories were counted, which means that when third-
or fourth-level categories had been selected in the linking
process, the second-level category was considered. The sec-
ond-level ICF category to which most instrument items
could be linked was selected as an exemplary ICF category
to demonstrate the construction of an ICF category interval
2.4. Validation of candidate items
The validation of candidate items for the selected ICF
category as identified in the content linkage was based on
the Rasch model for ordered-response options, which is
an extension of the original dichotomous model amplified
to account for ordinal scaleetype data . The candidate
items were expected to fulfill the criteria for unidimension-
ality according to the Rasch model, which is described in
detail in the following section.
2.5. Construction of an International Classification of
Functioning, Disability and Health category interval
The integration of the candidate items into an interval
scale for the selected ICF category was also based on the
Rasch model for ordered-response options. The Rasch
model implies that the parameters of a person’s ability
and an item’s difficulty are both placed along one and the
same single dimension, which indicates the latent trait to
be measured. The units of this dimension, as defined by
the model, are logits (the natural log odds of success vs.
failure), which make up an equidistant scale.
The following properties were studied: unidimensional-
ity and reliability of the scale, item fit, functioning of the
response options of the single items, and the targeting be-
tween the items and the person’s abilities.
The unidimensionality of the suitable items was checked
by the itemetrait interaction chi-square (c2)  and the
Z-fit statistics . A significant c2probability of less than
0.05 indicates misfit of the scale to the model.
Reliability was studied with the Person Separation Index
rß, which is analogous to the traditional test theory indices
Kuder-Richardson Formula 21 or Cronbach’s alpha and
ranges between 0 and 1, where the value of 1 indicates per-
fect reproducibility of person placements [27,30]. The
number of distinct ability strata Hpthat can be reliably
identified by the scale scores was calculated by the formula
Hp5 (4Gpþ 1)/3 , where Gpis a measure of the sam-
ple standard deviation expressed in standard error units.
The test of fit of the individual items was also conducted
based on c2statistics. Additionally, the standardized fit re-
sidual values of ‘‘z’’ are considered. They should range be-
tween ?2.5 and þ2.5 to indicate model fit . These
values provide information about the direction of the devi-
ation of the observed item data from the model.
The functioning of the response options was studied based
on the threshold estimates for each ICF category. The thresh-
old corresponds to the location on the latent continuum at
which it is equally likely that a person will be classified into
adjacent response options, and therefore, to obtain one of
two successive scores. The number of thresholds equals the
914A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921
number of response options minus one, and they should have
increasing values, because disordered threshold estimates in-
dicate a failure to construct a measure in which successive
scores reflect increasing levels of the dimension being mea-
sured [33,34]. When threshold values are disordered, the re-
sponse options are collapsed, that is, the item is rescaled,
taking into consideration frequency distributions and their
probability curves. After rescaling the item, the thresholds
should display the intended increasing order.
Misfitofitems anddisordered-response option thresholds
question the validity of all further conclusions, including the
iteratively revised, the model recalibrated, and unidimen-
sionality and item fit checked until itemetrait interaction
c2does not present a significant result. We revised the data
in two steps: by collapsing response options and stepwise
deletion of items with the smallest c2probabilities.
The targeting of the items was studied by examining the
respective distribution of persons’ abilities and items’ diffi-
culties along the latent-trait continuum. Comparison of the
mean person location with the mean item location, which is
set to 0 by definition, indicates domain targeting. The
smaller the difference, the better the targeting.
The logit scale with the arbitrary mean of 0 was trans-
formed to a more meaningful scale, which we called the
ICF category interval scale, ranging from 0 to 100 using
the following formula:
where s 5 (desired range)/(current range) and m 5 (low-
est desired value)? (current lowest value ?s) . The
thresholds of the response options of each item were also
To illustrate how the patients’scores on the ICF category
interval scale can be used to estimate the value on the ICF
qualifier (from 0 to 4), the response options of the ICF qual-
ifier 0 corresponds to a score of up to 5 on the ICF category
ifier 3, up to 95; and qualifier 4, up to 100.
2.6. Statistical programs
The descriptive statistics were performed with SPSS Ver-
sion 14.0 (SPSS Inc, Chicago, IL ). The kappa analyses
were performed with SAS 9.0 (SAS Institute Inc, Cary, NC
). Rasch analyses were conducted using RUMM2020
3.1. Study population
The demographic and RA-related characteristics of the
convenience sample of 122 patients are shown in Table 1.
3.2. Identification of candidate items
Table 2 shows the results of the evaluation of the linkage
procedure as the percentage of observed agreement, kappa
statistics, and bootstrapped CIs for all different levels of the
ICF. None of the 95% CIs encloses 0, indicating that the
agreement exceeded chance.
The second-level ICF category b130: Energy and drive
functions was the ICF category to which the largest number
of items were linked, that is, a total of 19 items. It was,
therefore, the ICF category selected to exemplify the con-
struction of an ICF category interval scale. The definition
of b130: Energy and drive functions is presented in Table
A1 (available on the journal’s website at www.elsevier.
com). The 19 items, with their corresponding response
options, are presented in Table 3.
The notion reflected in the ICF category b130: Energy
and drive functions, according to the ICF, is a neutral
notion. However, based on the selected items and their cor-
responding response options, as presented in Table 3, the
addressed notion is ‘‘impairment in energy and drive func-
tions,’’ which is also the concept denoted by the ICF qual-
ifier, namely the extent or magnitude of the impairment (see
also Table A2 [available on the journal’s website at
3.3. Validation of candidate items and construction of
the International Classification of Functioning,
Disability and Health category interval scale
The overall-fit statistics of the 19 items according to the
c2statistic showed a significant value, indicating that not
Demographic and disease characteristics of 122 patients with RA
included in this study
Male patients; n (%)
Age, yr; mean (SD)
Current work status
Paid employment; %
Unemployed (because of RA); %
Unemployed (because of some other reason); %
Keeping house/homemaker; %
Duration of disease, yr; mean (SD)
Number of comorbiditiesa; mean (SD); median (0e11)
Rheumatoid Arthritis Disease Activity Index
(RADAI ) (0e10); mean (SD)
4.3 (1.98); 4
Abbreviations: RA, rheumatoid arthritis; SD, standard deviation.
aNumber of comorbidities according to the Self-Administered Co-
morbidity Questionnaire (SCQ ): 10 patients had heart disease, 32
had high blood pressure, 11 had lung disease, 8 had diabetes, 12 had ulcers
or stomach disease, 3 had kidney disease, 2 had liver disease, 4 had anemia
or other blood disease, 3 had cancer, 39 had osteoarthritis, and 40 had back
pain. In addition, 81 patients indicated that they had some other disease not
explicitly mentioned in the SCQ.
915A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921
all the items measured the dimension Energy and drive
df538569.401, P ! 0.01). A high variability of the
Z-fit statistic was also found (ZMean5 0.32, ZSD5 1.22
and ZMean5 ?0.15, ZSD5 1.24 for items and persons,
respectively). The Person Separation Index rbwas 0.94,
indicating high reliability.
According to the fit of the individual items to the model,
three items showed significant misfit according to c2prob-
df525 6.49, P 5 0.04).
Four items showed disordered threshold parameters.
Therefore, the response options of these four items were
collapsed, and model fit and unidimensionality were
checked again. The collapsing strategy for each of the four
items presenting disordered-response options is also pre-
sented in Table 3.
After collapsing the response options, only one of the
three misfitting items still showed significant misfit (‘‘I
have to go to bed earlier than I would like to’’ [RAQoL
1]). However, two other items that were not originally mis-
fitting presented a significant misfit (‘‘I did not feel like eat-
ing’’ [CES-D 2] and ‘‘Physically, I feel I am in bad
condition’’ [MFI 14]). All three of these last, misfitting
items were deleted stepwise from the analyses.
The overall-fit statistics according to the c2statistic of
the remaining 16 suitable items did not show a significant
df5325 38.25, P 5 0.21). The results of the Z-fit
ZMean5?0.223, ZSD5 1.132 for items and persons, re-
spectively. The Person Separation Index rbwas now 0.93.
Five distinct ability strata can be reliably identified based
on the scores of the 16-item scale.
The fit of the individualitems and the estimated threshold
parameters of the model before and after accounting for dis-
ordered thresholds and item misfit are presented in Table A3
(available on the journal’s website at www.elsevier.com).
Figure A1 (available on the journal’s website at www.el
sevier.com) shows the distribution of persons and items
along the measurement continuum. The mean person loca-
tion was 0.15, which, compared with the mean item loca-
tion 0.0, reveals appropriate targeting. The response
options of the items span over 8.9 logits, covering a large
df525 8.78, P 5 0.01; c2
df525 8.78, P 5 0.01;
proportion of RA patients presenting moderate to severe
problems in energy and drive functions.
Figure 1 represents the ICF category interval scale of
the continuum impairment in energy and drive with values
ranging from 0 to 100. The items, with their corresponding
descriptions, are represented in the rows of the figure. The
value beside the description represents the average value
for that item across all thresholds and refers to the item
location or level of difficulty of the item. The thresholds
correspond to the values at which the gray tones of the rows
change. The two easiest items are ‘‘Did you feel tired?’’
(SF-36 9i) and ‘‘I feel I am in excellent condition’’ (MFI
20). Here, the term ‘‘easiest’’ means that a person with even
a very low level of impairment in energy and drive would
endorse that item. In other words, easy items allow for a dif-
ferentiation of persons with no to minor impairment in
Energy and drive functions.
The two most difficult items are ‘‘I could not get ‘going’’’
(RAQoL 25). The term ‘‘difficult’’ means that only persons
tive logit value, and easy items have a negative logit value.
Based on a concrete example, Fig. 1 can be explained as
follows: ‘‘Did you feel tired?’’ (SF-36 9i), the easiest item,
was rescaled (i.e., its response options collapsed) from the
original six response options to three response options
(1 5 none of the time, 2 5 some time, 3 5 all of the time).
After collapsing the response options, the item has two
thresholds(threshold1 5 7.9andthreshold2 5 78.8),which
correspond to the values at which the gray tone of the lowest
son with a very low level of impairment in energy and drive
(7.9% impairment) would endorse that item (i.e., select the
impairment, would select the next response option 3 5 all of
the time (threshold 25 78.8). The position of each of the
response options of the ICF qualifier in the ICF category
interval scale is indicated by vertical arrows in Fig. 1.
All persons obtaining a score of up to 5 on the ICF
category interval scale would be assigned qualifier 0 in
the ICF category b130: Energy and drive functions. Those
persons obtaining a score between 5 and 25 would be as-
signed qualifier 1; those with a score between 25 and 50,
qualifier 2; those with a score between 50 and 95, qualifier
3; and those with a score between 95 and 100, qualifier 4.
The raw scores that can be obtained by adding the an-
swers to the 16 items were transformed to the logit scale,
to the ICF category interval scale, and to the ICF qualifier.
A part of the transformation table is presented in Table 4.
For example, if a person obtains a raw score of 4, his or
her position on the logit scale would be ?2.94, his or her
score on the ICF category interval scale would be 29,
Percentage of observed agreement, estimated kappa coefficients, and
nonparametric bootstrapped 95% confidence intervals at the
component, chapter first, second, and third levels of the International
Classification of Functioning, Disability and Health
916A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921
and s/he would be assigned the qualifier of 2, that is, mod-
erate problem in Energy and drive functions.
We have illustrated how the value on the ICF qualifier can
be estimated based on the interval scales developed for
specified ICF categories by integrating the items from pa-
tient-oriented instruments. The original format of the items
used to construct the ICF category interval scale remained
the same time, within the context of the ICF. This application
can be extremely useful, given the increasing use of the ICF
and the ICF qualifierasreferences when documenting and re-
functions can be applied to any ICF category.
The construction of ICF category interval scales relies on
linking instrument items to ICF categories [13,15]. The accu-
tistics to assess the reliability between the two raters .
linkage is an appropriate method to identify items addressing
the sameICFcategory. Thefactthat16ofthe 19items identi-
fied from the content linkage fitted the Rasch model, that is,
The 19 items linked to the ICF category b130: Energy and drive functions, the questionnaire from which they proceeded with their instructions, their
response options, and the collapsing strategy followed when presenting disordered-response thresholds
Items Questionnaire Response optionsCollapsing strategy
Instructions for the MFI 20: By means of the following statements we would like to get an idea of how you have been feeling lately. There is, for example, the
statement ‘‘I FEEL RELAXED.’’ If you think that this is entirely true, that indeed you have been feeling relaxed lately, please, place an X in the extreme
left box. The more you disagree with the statement, the more you can place an X in the direction ‘‘no, that is not true.’’ Please do not miss out a statement
and place one X next to each statement.
I feel fit.MFI 20From 0 5yes, that is true to 4 5no, that is not true
I feel very active. MFI 20"
I feel tired. MFI 20"
I am rested. MFI 20"
Physically, I feel only able to do a little.MFI 20"
Physically, I can take on a lot.MFI 20"
Physically, I feel I am in bad condition.MFI 20"
I tire easily.MFI 20"
Physically, I feel I am in excellent
0 1 1 2 3
3 2 2 1 0
2 1 1 1 0
Instructions of the CES-D: Below is a list of the ways you might have felt or behaved. Please tell me how often you have felt this way during the past week.
I did not feel like eating; my appetite
was poor.15SOME or a LITTLE of the time. (1e2 days)
25OCCASIONALLY or a MODERATE amount of the time. (3e4 days)
35MOST or ALL of the time. (5e7 days)
I felt that everything I did was an effort. CES-D"
I could not get ‘‘going.’’CES-D"
CES-D05RARELY or NONE of the time. (Less than 1 day)
Instructions for the SF-36: These questions are about how you feel and how things have been with you during the past 4 weeks. For each question, please give
the one answer that comes closest to the way you have been feeling. How much of the time during the past 4 weeks (circle one number on each line)
Did you have a lot of energy? SF-3615All of the Time
25Most of the Time
35A Good Bit of the Time
45Some of the Time
55A Little of the Time
65None of the Time
Did you feel worn out? SF-36"
Did you feel tired?SF-36" 3 2 2 2 2 1
Instructions for the RAQoL: Below you will find some statements which have been made by people who have rheumatoid arthritis. Please read each statement
carefully. We would like you to tick ‘‘yes’’ if you feel the statement applies to you, and tick ‘‘no’’ if it does not.
I have to go to bed earlier than I would
It’s too much effort to go out and see
I have to keep stopping what I am doing,
I feel tired whatever I do.RAQoL"
Abbreviations: ICF, International Classification of Functioning, Disability and Health; MFI, Multidimensional Fatigue Inventory; CES-D, Center for
Epidemiological Studies Depression Scale; SF-36, Short Form-36; RAQoL, Rheumatoid Arthritis Quality of Life.
917A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921
of Energy and drive functions. The three misfitting items that
were consequently deleted were ‘‘I have to go to bed earlier
than I would like to’’ (RAQoL 1), ‘‘I did not feel like eating’’
(CES-D 2), and ‘‘Physically, I feel I am in bad condition’’
(MFI 14). One could argue that all three items refer to an ac-
rectly related to the level of energy and drive. In other words,
the fact thatonegoestobed earlierthanonewould liketo can
the morning. Furthermore, the fact that one does not feel like
tuitively say that there are many peoplewho know they are in
and drive related to it.
Factor analysis could also have been performed to study
whether the items address a common, single dimension.
However, we opted for Rasch analyses to obtain not only
information on unidimensionality but also on additional
properties of the items, such as performance of the response
options, targeting, and items’ difficulty.
We evaluated the linkage process by calculating kappa
coefficients, which showed satisfactory results for linker
agreement. Kappa is an often used indicator of agreement
that accounts for chance. One can argue that unsystematic
error because of chance appears to be of secondary rele-
vance for the linkage procedure. Therefore, modeling
methods, such as many-facet , latent-class , and la-
tent-trait  analyses, would be useful in the future to ex-
plain any disagreement between the linkers (e.g., owing to
experience or profession).
The overall-fit statistics and the fit of the individual
items to the model support the construct validity of the cre-
ated ICF category interval scale for Energy and drive func-
tions. The high Person Separation Reliability Index of 0.94
shows that high precision of measurement can be achieved
with its use. Thus, using the scores obtained with it, persons
can be reliably distinguished into at least five separate strata
in the dimension Energy and drive functions.
Only four items presented disordered thresholds. Rever-
sal of thresholds occurs when persons’ choices of response
options are not in accordance with the expectations from
their estimated Energy and drive functions level. This
Fig. 1. The x and the y axes represent the International Classification of Functioning, Disability and Health (ICF) category interval scale of the continuum
energy and drive, with values ranging from 0 to 100. Not all values from 0 to 100 are represented on the y axis because of space constraints. The 16 items in
order of difficulty from the easiest item (bottom) to the most difficult item (top) are presented on the y axis. The value corresponding to the position of the
items is presented next to them. The position of the thresholds of the response options of the items are represented by the bars in the diagram. The different
gray tones represent the different response options for each individual item. The vertical arrows represent the position of each of the response options of the
Part of the conversion table from raw scores to logit scale, to the ICF
category interval scale and to the ICF qualifier
(16 items)Logit scale
ICF category interval
the 16 items. The complete table can be obtained from the authors on request.
918A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921
might be attributed to the perceived ambiguity of the
response options or the narrow range of the response op-
tions . Interestingly, three of the four items presenting
disordered thresholds were from the MFI, in which the an-
swers are not specified, but are defined by anchors from
‘‘yes, that is true’’ to ‘‘no, that is not true.’’ It would be
worthwhile to study the psychometric properties of this in-
strument using Rasch or other IRT models to investigate the
extent to which the response options perform as expected.
The fourth item showing disordered threshold was from
the SF-36. The psychometric properties of the SF-36 using
Rasch analyses and IRT models have been studied in differ-
ent investigations. The items addressing physical health
have received special attention [45e48]. However, to our
knowledge, no investigations have studied the functioning
of the response options with regard to the items included
in the subscale of vitality.
The created ICF category interval scale also proved to
be well targeted for a large proportion of RA patients pre-
senting moderate to severe problems in Energy and drive
functions. However, it also displayed floor effects, as RA
patients with a low level of impairment in energy and drive
at the lower end of the measurement continuum could not
be covered by the items included in the scale.
The developed ICF category interval scale may be con-
assessment instrument for Energy and drive functions. How-
ever, it is important to emphasize that the purpose of this
article is to illustrate the methodology and not to create
a new instrument. We are aware that, for the construction
functions, a number of issues need to be addressed. First, the
selection of items to be included in the scale should be done
before performing the study (and not a posteriori, as in this
study), taking into consideration a large number of instru-
ments, including generic-, condition-, and domain-specific
instruments . This would facilitate the inclusion of easy
comparison of health-status measures with the ICF could be
a very valuable source of information to achieve this aim
[23,50e52]. Researchers embarking on the construction of
ICF category interval scales to measure specified ICF cate-
gories may also use the results of the National Institute of
Health (NIH) Patient-Reported Outcomes Measurement In-
formation System (PROMIS) initiative, which develops,
validates, and standardizes item banks to measure patient-
reported outcomes (PROs) [53,54]. In the future, PROMIS
willhave the mostcomprehensivesourceof candidate items.
Second, a larger sample size would be necessary to per-
form the analyses, especially when including additional
items. Even though there are no standard criteria to define
how large the sample should be to obtain usefully stable
item calibrations, the estimation by Linacre (2002) 
provides a frame of reference. According to it, at least 10
observations per response option and item are needed. In
addition, the personeitem deviation residuals should be
examined by principal components analysis (PCA)  to
assure that the assumptions of local independence hold.
The independence of personeitem deviation residuals,
taken with adequate fit to the Rasch model, support
Third, data from different countries would be necessary
to evaluate the cross-cultural validity of the measure. As the
ICF was developed for international application , this
consideration is of special relevance.
Fourth, additional analyses should also be performed
from the perspective of the Traditional Test Theory (TTT)
, such as factor analysis and analyses to study the con-
vergent and discriminant validity.
Last, but not least, if this methodology is used to create
newinterval scales for ICF categories, special attentionmust
be paid to the content and level of difficulty of the items. To
ficulty should not be included in the same interval scale.
The use of the methodology presented in this article, as
a starting point for the construction of new instruments, is
of special interest for the ICF categories for which no
instrument exists which comprehensively and uniquely
covers the content of a respective ICF category. Instruments
developed based on the ICF category interval scales have
the immediate advantages that the scores obtained are
intuitive, for example, from 0 to 100, and that the ICF
qualifier can be estimated based on them.
In conclusion, this study demonstrates how items from
different patient-oriented instruments can be integrated into
a psychometrically sound ICF category interval scale to oper-
scores on this scale can be easily transformed to the response
options of the ICF qualifier. It represents a step forward for
the operationalization and future implementation of the ICF.
The authors thank Heinrich Gall, Alicia Garza, Andrea
Gla ¨ssel, and Michaela Kirschneck for their support in con-
ducting this study, and Pieter Lozekoot and Jos Ramaker
for collecting the data.
This study was partially supported by a grant from the
European League Against Rheumatism (EULAR).
 Zola IK. Towards the necessary universalizing of disability policy.
Part 2: disability policy: restoring socioeconomic independence. Mil-
bank Q 1989;67(Suppl 2):401e28.
 Bickenbach JE, Chatterji S, Bradley EM, U ¨stu ¨n TB. Models of dis-
ablement, universalism and the international classification of impair-
ments, disabilities and handicaps. Soc Sci Med 1999;48:1173e87.
 Cieza A, Stucki G. New approaches to understanding the impact of
PractRes Clin Rheumatol
919A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921
 Lollar DJ. Public health and disability: emerging opportunities. Pub-
lic Health Rep 2002;117:131e6.
 Stucki G. International Classification of Functioning, Disability, and
Health (ICF): a promising framework and classification for rehabili-
tation medicine. Am J Phys Med Rehabil 2005;84:733e40.
 Stucki G, Stier-Jarmer M, Grill E, Melvin J. Rationale and principles
of early rehabilitation care after an acute injury or illness. Disabil
 Stucki G, Cieza A, Melvin J. The International Classification of
Functioning, Disability and Health (ICF): a unifying model for the
conceptual description of the rehabilitation strategy. J Rehabil Med
 World Health Organization. International Classification of Function-
ing, Disability and Health: ICF. Geneva: WHO; 2001.
 Stucki G, Ewert T, Cieza A. Value and application of the ICF in
rehabilitation medicine. Disabil Rehabil 2002;24:932e8.
 Finger ME, Cieza A, Stoll J, Stucki G, Huber EO. Identification of
intervention categories for physical therapy, based on the interna-
tional classification of functioning, disability and health: a Delphi
exercise. Phys Ther 2006;86:1203e20.
 Rentsch HP, Bucher P, Dommen-Nyffeler I, Wolf C, Hefti H, Fluri E,
et al. The implementation of the ‘‘International Classification of
Functioning, Disability and Health’’ (ICF) in daily practice of neuro-
rehabilitation: an interdisciplinary project at the Kantonsspital of
Lucerne, Switzerland. Disabil Rehabil 2003;25:411e21.
 McDowell I. Measuring health: a guide to rating scales and question-
naires. 3rd edition. New York: Oxford University Press; 2006.
 Cieza A, Geyh S, Chatterji S, Kostanjsek N, Ustu ¨n B, Stucki G. ICF
linking rules: an update based on lessons learned. J Rehabil Med
 Wallerstein SL. Scaling clinical pain and pain relief. In: Bromm B,
editor. Pain measurement in man: neurophysiological correlates of
pain. New York: Elsevier; 1984.
 Cieza A, Brockow T, Ewert T, Amman E, Kollerits B, Chatterji S,
et al. Linking health-status measurements to the international classi-
fication of functioning, disability and health. J Rehabil Med 2002;34:
 Andrich D. Controversy and the Rasch model: a characteristic of
incompatible paradigms? Med Care 2004;42(1 Suppl):I7e16.
 Tijhuis GJ, de Jong Z, Zwinderman AH, Zuijderduin WM,
Jansen LM, Hazes JM, et al. The validity of the Rheumatoid Arthritis
Quality of Life (RAQoL) questionnaire. Rheumatology 2001;40:
 Fries JF, Spitz P, Kraines RG, Holman HR. Measurement of patient
outcome in arthritis. Arthritis Rheum 1980;23:137e45.
 Ware JE, Sherbourne CD. The MOS 36-item short-form health sur-
vey (SF-36). A. Conceptual framework and item selection. Med Care
 The Euroqol Group. Euroqolda facility for the measurement of
health-related quality of life. Health Policy 1990;16:199e208.
 Smets EM, Garssen B, Bonke B, De Haes JC. The Multidimensional
Fatigue Inventory (MFI) psychometric qualities of an instrument to
assess fatigue. J Psychosom Res 1995;39:315e25.
 Center for Epidemiologic Studies, National Institute of Mental
Health. Center for Epidemiologic Studies Depression Scale (CES-D).
Rockville, MD: National Institute of Mental Health; 1971.
tioning, disability and health (ICF). Qual Life Res. 2005;14:1225e37.
 Cohen J. A coefficient of agreement for nominal scales. Educ Psychol
 Efron B. The Jackknife, the bootstrap and other resampling plans.
Philadelphia: SIAM; 1982.
 Vierkant RA. A SAS macro for calculating bootstrapped confidence
intervals about a Kappa coefficient. Available at. In: SAS Users
Group International Online Proceedings;. http://www2.sas.com/pro-
ceedings/sugi22/STATS/PAPER295.PDF. Accessed July 23, 2004.
 Andrich D. Application of a psychometric rating model to ordered
categories, which are scored with successive integers. Appl Psychol
 Andrich D. Rasch models for measurement. In: Sage University
Paper Series on Quantitative Applications in the Social Sciences,
07-068. Newbury Park, CA: Sage; 1988.
 Styles I, Andrich D. Linking the standard and advanced forms of
the Raven’s progressive matrices in both the pencil-and-paper and
computer-adaptive-testing formats. Educ Psychol Meas 1993;53:
 Andrich D. An index of person separation in latent trait theory, the
traditional KR.20 index, and the Guttman scale response pattern.
Educ Res Perspect 1982;9:95e104.
 Fisher WP. Reliability statistics. Rasch Meas Trans 1992;6:238.
 Wright BD, Masters GN. Rating scale analysis. Chicago: MESA;
 Linacre JM. Optimizing rating scale category effectiveness. J Appl
 Andrich D. The Rasch model explained. In: Alagumalai S,
Durtis DD, Hungi N, editors. Applied Rasch measurement: a book
of exemplars. Dordrecht, The Netherlands: Springer-Kluwer; 2005.
p. 308e28. Chapter 3.
 Andrich D, Sheridan BS, Luo G. RUMM2020: Rasch unidimensional
models for measurement. Perth, Western Australia: RUMM Labora-
 SPSS Inc. SPSS Release 14.0.2. Chicago, Illinois; 2006.
 SAS Institute Inc. The SAS System for Windows, Version 8.2. Cary,
NC: SAS Institute Inc; 2001.
 Stucki G, Liang MH, Stucki S, Bru ¨hlmann P, Michel BA. A
self-administered rheumatoid arthritis disease activity index (RA-
DAI) for epidemiologic research. Arthritis Rheum 1995;38:795e8.
 Sangha O, Stucki G, Liang MH, Fossel AH, Katz JN. The
Self-Administered Comorbidity Questionnaire: a new method to
assess comorbidity for clinical and health services research. Arthritis
 Stucki G, Melvin J. The International Classification of Functioning,
Disability and Health: a unifying model for the conceptual descrip-
tion of physical and rehabilitation medicine. J Rehabil Med
 Jette AM. Toward a common language for function, disability, and
health. Phys Ther 2006;86:726e34.
 Linacre JM. Many-Facet Rasch measurement. Chicago: MESA Press;
 Dillon WR, Mulani N. A probabilistic latent class model for
assessing inter-judge reliability. Multivariate Behav Res 1984;19:
 Uebersax JS, Grove WM. A latent trait finite mixture model for the
analysis of rating agreement. Biometrics 1993;49:823e35.
 Stucki G, Daltroy L, Katz JN, Johannesson M, Liang MH. Interpre-
tation of change scores in ordinal clinical scales and health status
measures: the whole may not equal the sum of the parts. J Clin Epi-
 Haley SM, McHorney CA, Ware JE. Evaluation of the MOS SF-36
physical functioning scale (PF-10): I. Unidimensionality and repro-
ducibility of the Rasch item scale. J Clin Epidemiol 1994;47:671e84.
 McHorney CA, Haley SM, Ware JE. Evaluation of the MOS SF-36
Physical Functioning Scale (PF-10): II. Comparison of relative preci-
sion using Likert and Rasch scoring methods. J Clin Epidemiol
metric properties of the Short Form 36 physical function score and the
Health Assessment Questionnaire disability index in patients with psori-
atic arthritis and rheumatoid arthritis. Arthritis Rheum 2007;57:723e9.
 Guyatt GH, Feeny DH, Patrick DL. Measuring health-related quality
of life. Ann Intern Med 1993;118:622e9.
health-related quality of life measures used in stroke based on the
920A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921
international classification of functioning, disability and health (ICF):
a systematic review. Qual Life Res 2007;16:833e51. Epub February
 Stucki A, Stucki G, Cieza A, Schuurmans MM, Kostanjsek N, Ruof J.
Content comparison of health-related quality of life instruments for
COPD. Respir Med 2007;101:1113e22. Epub January 9, 2007.
 GrillE, StuckiG,Scheuringer M, MelvinJ.ValidationofInternational
Classification of Functioning, Disability, and Health (ICF) Core Sets
for early postacute rehabilitation facilities: comparisons with three
other functional measures. Am J Phys Med Rehabil 2006;85:640e9.
 Fries JF, Bruce B, Cella D. The promise of PROMIS: using item re-
sponse theory to improve assessment of patient-reported outcomes.
Clin Exp Rheumatol 2005;23(5 Suppl 39):S53e7.
 Cella D, Yount S, Rothrock N, Gershon R, Cook K, Reeve B,
et alPromis Cooperative Group. The Patient-Reported Outcomes
Measurement Information System (PROMIS): progress of an NIH
Roadmap cooperative group during its first two years. Med Care
2007;45(5 Suppl 1):S3eS11.
 Smith EVJ. Detecting and evaluating the impact of multidimension-
ality using item fit statistics and principal component analysis of
residuals. J Appl Meas 2002;3:205e31.
 World Health Organization. Towards a common language for func-
tioning, Disability and Health, ICF. Geneva: World Health Organiza-
 Nunnally JC, Bernstein I. Psychometric theory. 3rd edition. New
York: McGraw Hill; 1994.
921A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921
051015 20 25 303540455055 60 6570758085 9095100
MFI 2: Physically, I feel only able
to do a little
SF-36 9e: Did you have a lot of
MFI 3: I feel very active
MFI 5: I feel tired
RAQoL 25: It’s too much effort
to go out and see people
CES-D 20: I could not get „going“
CES_D 07: I felt that everything I
did was an effort
SF-36 9g: Did you feel worn out?
MFI 8: Physicaly, I can take a lot
MFI 1: I feel fit
MFI 16: I tire easy
MFI 12: I am rested
RAQoL 10: I have to keep
stopping what I am doing, to rest
RAQoL 21: I feel tired whatever I
MFI 20: Physically, I feel I am in a
SF-36 9i: Did you feel tired?
Fig. A1. Personeitem threshold distribution.
921.e1A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921
ICF qualifier with percentage values provided by WHO 
Percentage of problem
0dNO problem (none, absent, negligible, .)
1dMILD problem (slight, low, .)
2dMODERATE problem (medium, fair, .)
3dSEVERE problem (high, extreme, .)
4dCOMPLETE problem (total, .)
Abbreviation: WHO, World Health Organization.
a‘‘Having a problem may mean an impairment, limitation, restriction or barrier, depending on the construct’’ [8, p. 222], i.e., depending on whether we
are classifying body functions and structures (impairments), activity and participation (limitations or restrictions), or environmental factors (barriers or
Examples of ICF categories with their corresponding code, title, and definition
b130: Energy and drive functions General mental functions of physiological and psychological mechanisms that cause the individual
to move toward satisfying specific needs and general goals in a persistent manner
Inclusions: functions of energy level, motivation, appetite, craving (including craving for sub-
stances that can be abused), and impulse control
Exclusions: consciousness functions (b110), temperament and personality functions (b126), sleep
functions (b134), psychomotor functions (b147), emotional functions (b152)
Sensation of unpleasant feeling indicating potential or actual damage to some body structure
Inclusions: sensations of generalized or localized pain, in one or more body part, pain in a der-
matome, stabbing pain, burning pain, dull pain, aching pain, impairments, such as myalgia,
analgesia, and hyperalgesia.
b280: Sensation of pain
s730: Structure of upper extremity
d450: WalkingMoving along a surface on foot, step by step, so that one foot is always on the ground, such as
when strolling, sauntering, walking forward, backward, or sideways
Inclusions: walking short or long distances, walking on different surfaces, walking around
Exclusions: transferring oneself (d420), moving around (d455)
Engaging in any form of play, recreational or leisure activity, such as informal or organized play
and sports, programs of physical fitness, relaxation, amusement or diversion, going to art gal-
leries, museums, cinemas or theaters; engaging in crafts or hobbies, reading for enjoyment,
playing musical instruments; sightseeing, tourism, and traveling for pleasure
Inclusions: play, sports, arts and culture, crafts, hobbies, and socializing
Exclusions: riding animals for transportation (d480), remunerative and nonremunerative work
(d850 and d855), religion and spirituality (d930), political life and citizenship (d950)
Any natural or human-made object or substance gathered, processed, or manufactured for me-
dicinal purposes, such as allopathic and naturopathic medication
d920: Recreation and leisure
Abbreviation: ICF, International Classification of Functioning, Disability and Health.
aThe letter b refers to body functions; s, body structures; d, activities and participation domains; and e, environmental factors.
921.e2A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921
Table A3 Download full-text
Fit of individual items to the model before and after accounting for disordered thresholds and item misfit
DRSE FRDF X2DFP T1 T2T3T4T5DR SEFRDF X2DFP T1 T2T3 T4T5
MFI 1: I feel fit.
MFI 3: I feel very active.
MFI 5: I feel tired.
MFI 12: I am rested.
MFI 2: Physically, I feel only
able to do a little.
MFI 8: Physically, I can take
on a lot.
MFI 14: Physically, I feel I
am in bad condition.
MFI 16: I tire easily.
MFI 20: Physically, I feel I
am in an excellent
CES-D 02: I did not feel like
eating; my appetite was
CES-D 07: I felt that
everything I did was an
CES-D 20: I could not get
SF-36 9e: Did you have a lot
SF-36 9g: Did you feel worn
SF-36 9i: Did you feel tired?
RAQoL 1: I have to go to bed
earlier than I would like to.
RAQoL 25: It’s too much
effort to go out and see
RAQoL 10: I have to keep
stopping what I am doing,
RAQoL 21: I feel tired
whatever I do.
?0.39 10 0.10
?0.33 11 0.10
0.06 13 0.10
7 0.11 ?0.74 111.33 3.49 2
1.19 111.33 0.37 2
1.33 111.33 0.67 2
0.11 ?1.85 111.33 2.14 2
2.58 111.33 8.78 2
0.83 L0.7 L0.9
?0.07 10 0.13
0.34 12 0.13
7 0.12 ?0.33 110.27 1.92 2
0.25 110.27 0.98 2
9 0.101.52 110.27 1.97 2
5 0.11 ?1.47 110.27 1.33 2
1.35 110.27 4.16 2
0.38 ?2.0 ?0.7
0.37 ?0.8 ?0.6 ?0.2 1.5
0.51 ?1.3 ?0.7
0.12 ?1.4 ?0.3
?0.6 ?0.2 1.4
5 0.4 1.5
?0.4290.111.69 110.41 5.81 20.05
?0.158 0.111.73 109.36 4.71 20.09 ?1.2 ?0.50.1 1.6
?0.478 0.11 ?1.35 111.33 3.60 20.17 L0.9 L1.0
0.11 ?0.04 111.33 1.87 2
1.04 110.41 1.32 20.52
1.19 109.36 0.73 2
0.39 110.27 1.18 2
0.69 ?0.7 ?0.4
0.55 ?1.1 ?0.9
3.77 19 0.190.81 110.41 5.91 20.05
0.73 16 0.140.19 110.41 6.49 20.04
?0.32.21.07 14 0.140.47 109.36 6.22 20.04 ?2.0 ?0.3 2.3
0.82 17 0.141.47 109.49 3.05 20.22
?0.52.21.16 15 0.14 2.16 108.45 0.92 20.63 ?1.8 ?0.52.3
?0.11 12 0.110.34 109.49 1.97 20.37
?1.00.2 1.1 2.40.19 11 0.110.91 108.45 1.94 20.38 ?2.9 ?1.00.1 1.1 2.6
0.56 14 0.11 0.73 108.57 0.61 20.74
?1.3 0.0 1.3 2.70.89 13 0.111.52 107.54 2.15 20.34 ?2.8 ?1.4 ?0.1 1.4 2.9
0.67 15 0.22
30.120.16 110.41 5.60 2
1.56 109.49 8.78 2
0.06 L3.1 L1.9
0.5 2.4 2.2 ?1.181 0.31 ?0.27 109.36 2.17 2 0.34 ?4.2
1.39 18 0.25 ?0.63 107.65 3.10 2 0.210.01.73 16 0.25 ?0.33 106.62 2.75 20.250.0
?0.9040.22 ?1.19 108.57 4.11 20.13 0.0
?0.644 0.23 ?0.92 107.54 3.13 20.210.0
?1.0620.22 ?1.19 110.41 1.74 20.420.0
?0.813 0.23 ?0.95 109.36 2.00 20.370.0
The items are presented according to the instrument to which they belong, as in Table 3.
Items MFI 2, MFI 3, MFI 14, and SF-36 9i presented disordered-response options and were rescaled.
The disordered thresholds are presented in bold.
Abbreviations: D, estimate of item difficulty; R, rank order according to item-difficulty estimation; SE, standard error associated with item-difficulty estimation; FR, standardized fit residual z; c2, chi
square; DF, degrees of freedom; P, probability associated with c2; T1, first threshold estimate; T2, second threshold estimate; T3, third threshold estimate, T4, fourth threshold estimate (there is only one
threshold when the items are dichotomous).
A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921