Getting the measure of spasticity in multiple
sclerosis: the Multiple Sclerosis Spasticity Scale
Brain (2005) Page 1 of 11
J. C. Hobart,1,2,3A. Riazi,2A. J. Thompson,2I. M. Styles,3W. Ingram,1P. J. Vickery,1M. Warner,1
P. J. Fox1and J. P. Zajicek1
1Peninsula Medical School, Plymouth,2Neurological Outcome Measures Unit, Institute of Neurology, London, UK and
3School of Education, Murdoch University, Perth, Western Australia
Correspondence to: Dr Jeremy Hobart, Senior Lecturer and Honorary Consultant Neurologist,
Department of Clinical Neuroscience, Peninsula Medical School, Room N16 ITTC Building, Tamar Science Park,
Davy Road, Plymouth, Devon PL6 8BX, UK
Spasticity is most commonly defined as an inappropriate, velocity dependent, increase in muscle tonic stretch
reflexes, due to the amplified reactivity of motor segments to sensory input. It forms one component of the
upper motor neuron syndrome and often leads to muscle stiffness and disability. Spasticity can, therefore, be
measured through electrophysiological, biomechanical and clinical evaluation, the last most commonly using
the Ashworth scale. None of these techniques incorporate the patient experience of spasticity, nor how it
affects people’s daily lives. Consequently, we set out to construct a rating scale to quantify the perspectives of
the impact of spasticity on people with multiple sclerosis. Qualitative methods (in-depth patient interviews and
focus groups, expert opinion and literature review) were used to develop a conceptual framework of spasticity
impact, and to generate a pool of items with the potential to convert this framework into a rating scale with
multiple sclerosis and spasticity. Guided by Rasch analysis, we constructed and validated a rating scale for each
componentoftheconceptualframework. Decisionsregardingitemselectionwere basedontheintegrationand
assimilation of seven specific analyses including clinical meaning, ordering of thresholds, fit statistics
and differential item functioning. The qualitative phase (17 patient interviews, 3 focus groups) generated
144 potential scale items and a conceptual model with eight components addressing symptoms (muscle stiff-
ness, pain and discomfort and muscle spasms,), physical impact (activities of daily living, walking and body
movements) and psychosocial impact (emotional health, social functioning). The first postal survey was sent to
272peoplewithmultiplesclerosis andhadaresponserateof 88%. Findings supported the developmentofscales
for each component but demonstrated that five item response options were too many. The 144-item ques-
tionnaire, reformatted with four-item response options, was administered with four validating instruments to
an independent sample of 259 people with multiple sclerosis (response rate 78%). From the responses, an 88-
item instrument with eight subscales was developed that satisfied criteria for reliable and valid measurement.
Correlations with other measures were consistent with predictions. The 88-item Multiple Sclerosis Spasticity
Scale (MSSS-88) is a reliable and valid, patient-based, interval-level measure of the impact of spasticity in
multiple sclerosis. It has the potential to advance outcomes measurement in clinical trials and clinical practice,
and provides a new perspective in the clinical evaluation of spasticity.
Spasticity Scale; MSIS-29 = Multiple Sclerosis Impact Scale
Received March 30, 2005. Revised September 30, 2005. Accepted October 4, 2005
#The Author (2005). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: firstname.lastname@example.org
Brain Advance Access published November 9, 2005
by guest on June 5, 2013
Spasticity is common, clinically and pathophysiologically
complex, and disabling. It affects at least 35% of people
post-stroke (Watkins et al., 2002) and up to 90% of people
with multiple sclerosis at some point (Paty and Ebers, 1998).
A range of treatments is available including spasticity
reduction strategies, specialist rehabilitation therapy, oral
intrathecal infusions and surgery. Problematic spasticity
typically requires a combination of treatments (Crayton
et al., 2004), and should involve a patient-focused, co-ordi-
nated, multidisciplinary team approach (Thompson et al.,
2005). These facts emphasize that scientifically sound and
clinically meaningful spasticity measurement is indispensable
to clinical practice and research in this area (Voerman
et al., 2005).
Spasticity measurement, like spasticity management, is
complicated. In broad terms, measurement instruments can
be categorized into neurophysiological methods (Voerman
et al., 2005), biomechanical techniques (Wood et al., 2005)
and clinical scales (Platz et al., 2005). The clinical meaning-
fulness of neurophysiological and biomechanical approaches
has been questioned, as they focus on highly specific exam-
inations (e.g. H-reflex or single joint analysis), correlate
poorly with clinical indicators of spasticity and have problems
with reliability and sensitivity (Voerman et al., 2005; Wood
et al., 2005). Clinical scales used in the measurement of
spasticity have also been found wanting (Platz et al., 2005).
Of the 24 scales recently reviewed, (Platz et al., 2005) 18 were
single item measures and, as a consequence, have poor
reliability (McHorney et al., 1992), validity (Manning et al.,
1982; Hobart, 2003) and responsiveness (Sloan et al., 2002).
Only three scales had more than three items: two of these
assessed resistance to passive movement, the third measured
the extensor toe sign. No scale had been developed to address
the broader consequences of spasticity for the patient.
If spasticity management is to be patient-focused, clinical
trials and clinical practice need rigorous measurement
methods that capture patients’ experiences and perceptions
of spasticity, and complement the existing range of measures.
That challenge, which has not been met by existing scales
(Platz et al., 2005), was the aim of this study.
There were three stages. First, we used a range of qualitative studies
to develop a conceptual framework of spasticity impact, and a pool
of potential items hypothesized to convert this framework into a
scale. Second, we administered the items, as a questionnaire, to a
sample of people with multiple sclerosis and spasticity and, using
Rasch analysis, undertook the preliminary steps of constructing a
subscale for each component of the conceptual framework. Third,
we undertook a second survey to finalize and validate the instru-
ment. The research ethics committees of Derriford Hospital and the
National Hospital for Neurology and Neurosurgery (NHNN)
approved the study.
Stage 1: conceptual model formation and
Four pieces of qualitative work were undertaken to develop a con-
ceptual framework of spasticity impact and to generate a pool of
items with the potential to convert (operationalize) this framework
into a scale with multiple subscales. First, in-depth, semi-structured
interviews were conducted with individual multiple sclerosis
patients from NHNN, until no new themes emerged. Second, three
sclerosis patients from Derriford Hospital. Patients were chosen to
and disease type. Interviews and focus groups were tape-recorded,
transcribed and content analysed (WINMAX; Kuckartz, 1996).
Third, a comprehensive literature review was undertaken to identify
relevant health areas and potential items. Lastly, expert opinion on
the impact of spasticity was sought from neurologists, spasticity
nurses, multiple sclerosis nurses and rehabilitation staff.
A preliminary questionnaire was formatted and pre-tested in a
small group of patients with multiple sclerosis and variable degrees
Stage 2: first postal survey
The questionnaire was posted to a random half-sample of the 544
patients from the Cannabinoids in Multiple Sclerosis study (CAMS;
Zajicek et al., 2003) who had commenced trial medication and were
still under follow-up. To encourage high response rates we used
personalized letters, standardized instructions and reminders for
non-responders at 3 and 5 weeks.
Scale development was guided by Rasch measurement principles
(Rasch, 1960) and analyses (Andrich et al., 1997–2004). The key
principle is that the mathematical (Rasch) model articulates a set
of requirements that must be met for rating scale data to generate
internally valid, equal-interval measurements that are stable (invar-
iant) across items and people. In contrast, scales whose development
is guided by traditional psychometric methods generate ordinal
scores whose invariance is unknown (Wright and Linacre, 1989).
We constructed a scale for each area defined as important to
patients by the qualitative studies. The aim was that each scale con-
sisted of a set of clinically meaningful items that satisfied require-
ments for measurement. This goal was achieved by choosing a set of
items hypothesized to constitute a scale for each area, analysing the
observed data against measurement criteria and making decisions
on item selection and deletion. Appraisals according to these criteria
were not conducted singularly and sequentially, but simultaneously
and interactively within the specific context of the item set being
examined. The seven measurement criteria were:
Clinical meaning. We examined all items in each set to judge the
extent to which they were clinically cohesive. Items deemed non-
specific were considered for deletion.
Thresholds for item response options. For each item, the use of
response categories scored with successive integer scores (1 = not
at all to 5 = extremely) implies a continuum of increasing impact,
by examining the ordering of thresholds (or points of crossover
between two adjacent response categories) ascertained by the
Rasch analysis (Andrich, 1978). A threshold is the point on the
Page 2 of 11Brain (2005)J. C. Hobart et al.
by guest on June 5, 2013
measurement continuum defined by a scale (e.g. degree of muscle
stiffness) at which the probability of responding to adjacent cate-
gories (e.g. ‘not at all’ and ‘a little’) is equal. Disordered thresholds
imply scoring functions that are not working as intended (Andrich,
1978). Such items were considered for deletion.
Item fit statistics. Rasch analysis tests the extent to which the
observed data (patients’ responses to items) accord with (fit) the
responses expected by a mathematical (Rasch) model. Misfit implies
an item is not working as intended in a scale, and may be regarded as
not measuring the construct under consideration. There are many
methods of examining the fit of data to the model, no method alone
is sufficient to make a judgement about fit. We examined three
indicators. First, log residuals that summarize the difference between
observed and expected responses to an item across all people
(item–person interaction). Second, chi square valuesthat summarize
the difference between observed and expected responses to an item
for class intervals of people who have relatively similar levels of
disability (item–trait interaction). Third, item characteristic curves
(ICC) that display graphically the expected responses across the
continuum of person scores and the observed values for each
class interval of person scores. There are no absolute criteria for
interpreting fit statistics. It is more meaningful to interpret them
together, and in light of the clinical usefulness of an item set.
Itemlocations. The items of a scale definethe continuum on which
people are measured. Rasch analysis locates items and people on this
continuum. Ideally, and logically, items should be evenly spread over
a reasonable range and targeted to the people they are measuring.
Items with similar locations on the continuum indicate that one of
them might be redundant.
Differential item functioning (DIF). Stable measurement rulers are
required for people to be measured precisely and validly (Linacre
et al., 1994). That is, the items of a scale are required to perform
similarly across different groups of people. More specifically, for any
given level of disability the expected value of an item is required to be
the same irrespective of which group a person belongs to. We exam-
ined all items for the extent to which their functioning was differ-
entially affected by gender, age, mobility level (unaided, with aid and
wheelchair user) and degree of spasticity (self-reported as minimal,
mild, moderate or severe). Items demonstrating DIF, determined by
statistical (ANOVA) and visual (ICC) tests, were considered for
deletion (Hagquist and Andrich, 2004).
Correlations between standardized residuals. A residual is the dif-
ference between the observed and expected response for a person to
an item. A standardized residual is computed by squaring and sum-
ming all residuals for an item and dividing this value by its standard
deviation. Correlations between residuals assess the extent to which
the response to one item is biased by the response to another. The
Values of >0.30 imply dependency among items and were used to
identify items for evaluation (Andrich, 1988).
Person separation index (PSI). This reliability statistic, analogous
to Cronbach’s alpha (Andrich, 1982), quantifies the error associated
with the measurements of people in this sample. Higher values indi-
cate greater reliability. When items were deleted the impact on relia-
bility was determined.
Stage 3: second postal survey
surveyed, excluding 13 local people participating in another study.
This sample was divided into random half samples that received
booklets containing the new spasticity scale, a self-report spasticity
grading (0 = minimal; 1 = mild; 2 = moderate; 3 = severe), demo-
the Multiple Sclerosis Impact Scale (MSIS-29; Hobart et al., 2001b)
and the physical functioning (SF36PF) and mental health (SF36MH)
subscales from the Short Form Health Survey (SF-36; Ware et al.,
1993). Booklet 2 contained the mobility (FAMSmob) and emotional
well-being (FAMSewb) subscales of the Functional Assessment of
Gompertz et al., 1994) and General Health Questionnaire (GHQ-12;
Goldberg and Hillier, 1979). Standard survey methods were used.
All analyses described above were repeated. In addition, internal
construct validity was examined by computing intercorrelations
among subscales of the new spasticity instrument and by determin-
ing the ability of the subscales to detect differences between groups
defined by their self-report spasticity grading. Convergent and dis-
criminant construct validity was examined by determining the
extent to which correlations between the new spasticity instrument
and validating variables were consistent with expectation. These
methods are described elsewhere (Hobart et al., 1996; Hobart et al.,
2001a; Scientific Advisory Committee of the Medical Outcomes
Stage 1: conceptual model formation and
Seventeen interviews (75% female; mean age 47 years) were
conducted until no new information was extracted. There
were three focus groups (71% female, mean age 54 years)
that included a total of 14 people. Expert opinion was can-
vassed from neurologists, multiple sclerosis nurses, spasticity
nurses and rehabilitation therapists. Content analysis of the
interview and focus group transcripts generated ?2000
statements concerning the impact of spasticity. These were
extracted, grouped into main themes and examined for
This qualitative work generated a preliminary conceptual
model of spasticity impact and, on the basis of that model,
a preliminary questionnaire with 144 items was developed.
Three main domains (symptoms, physical functioning and
psychosocial functioning) were identified, with a total of
8 subscales: muscle stiffness (19 items); pain and discomfort
(10 items); muscle spasms (23 items); activities of daily living
(ADL) (14 items); body movements (21 items); walking
(16 items). All items were given the same five-point response
options: 1 = not at all bothered; 2 = a little bothered; 3 =
moderately bothered; 4 = quite a bit bothered and 5 =
Items were pre-tested in an independent sample of 17 out-
patients and in-patients (NHNN) with varying levels of spas-
ticity. Appropriate modifications were made and demo-
graphic questions were included in the booklet. At this
Measuring spasticity in multiple sclerosis: the MSSS-88 Brain (2005)Page 3 of 11
by guest on June 5, 2013
early stage, all 144 items were retained and put into the
most clinically appropriate grouping even though a number
of items were considered non-specific indicators of that con-
struct. For example, we put the item ‘bothered by heaviness
anywhere in your limbs’ in the subscale concerning muscle
stiffness, although we were unsure that it would be part of the
final operationalization of that construct.
Stage 2: first postal survey
Questionnaire booklets were sent to 272 people, and 240 were
returnedcompleted(conservative response rate88%).Table 1
shows the respondents’ characteristics.
The main finding was that empirical analysis using the Rasch
measurement model did not support the five-point item
response option. Most items (132 of 144) had disordered
thresholds implying the scoring function was not working
as anticipated. The category probability curves (CPC),
of endorsing each item response category on the y-axis, sug-
gested the main reason for disordering was that patients
could not discriminate reliably between the five response
options. In particular, people appeared unable to reliably
distinguish ‘a little’ from ‘moderately’, and ‘moderately’
from ‘quite a bit’.
Given this finding we undertook a post hoc analysis. First,
we examined the effect of reducing the response options from
five to four by combining ‘moderately’ with either ‘a little’ or
‘quite a bit’, as suggested by each item’s CPC. This left seven
items with disordered thresholds. With items re-scored in
this manner, preliminary Rasch analyses of the hypothesized
item groups were performed and supported the feasibility
of constructing valid subscales for the eight components.
However, post hoc analyses make assumptions about how
people would have responded if a category had not been
available. Therefore, in stage 3, we repeated the 144-item
survey in an independent sample with a four-point item
response option (1 = not at all; 2 = a little; 3 = moderately
and 4 = extremely).
Stage 3: second postal survey
Questionnaire booklets were sent to 259 people. Random
half samples received booklets 1 and 2. A total of 202 people
returned completed questionnaire booklets (78% response
rate). Table 1 shows their characteristics. In essence, this
was an older sample of people with multiple sclerosis with
moderate-long disease duration, half were wheelchair users,
and most reported their spasticity to be moderate (32%) or
The final decision as to which items should remain in each
were considered for inclusion in the muscle spasms subscale.
Four of these items were eliminated because they were con-
to more, of the degree of muscle spasms. These items were:
‘juddering/jolting related to spasms’; ‘feet or legs bouncing up
handshake’ and ‘spasms leading to difficulty giving hugs’.
The remaining 19 items were entered into a Rasch analysis.
Four of these items had reversed thresholds (‘spasms waking
up your partner’; ‘spasms that are difficult to stop’; ‘spasms
provoked by temperature change’ and ‘feeling that your
knees are stuck together’) indicating that the four-point
response option was not working as intended for these
items. Although these items appear clinically important,
they were removed because of the reversed thresholds and
becauseother itemsinthesethadsimilarlocationsand, there-
fore, they could be regarded as redundant in measurement
terms. Another item, ‘spasms when transferring’, demon-
strated DIF in different mobility groups. That is, this item
had a different meaning for people with different levels of
mobility (even though they had the same total score on the
subscale) and was therefore unstable in measurement terms.
This item was removed because of this problem, and also
because it had a location similar to other items in the set,
Table 1 Characteristics of survey samples
Characteristic First postal
returned: n (%)
Female: n (%)
Mean (SD); range
Duration of multiple sclerosis (since onset, patient report)
Mean (SD); range18 (8.9); 5–50
Self-reported degree of spasticity
Minimal 10 (7.6%)
Mild 7 (5.3%)
Moderate 35 (26.5%)
Mean (SD); range22.2 (9.9); 2–52
164 (68%) 121 (63%)
53 (7.6); 32–67 54 (7.0) 35–68
21 (8.6); 5–44
19.4 (11.1); 1–57
last score recorded in the CAMS study for that patient.
Page 4 of 11 Brain (2005)J. C. Hobart et al.
by guest on June 5, 2013
and could be regarded as redundant. The remaining 14 items
appeared to constitute a clinically meaningful set, relating to
muscle spasms, and satisfied the pre-determined criteria as a
measurement instrument. Details of the complete instrument
development process are available from the authors.
Tables 2–5 show for all subscales the item locations, standard
errors and fit statistics (fit residuals and chi square values),
and the subscale reliabilities. For each subscale, the item loca-
tions spread across a reasonable range of their continua, the
standard errors were small, almost all log residuals lay within
the recommended range of ?2.5 to +2.5 and chi squared
statistics were small. All person separation indices were
high (?0.92). These findings support the reliability and valid-
ity of each MSSS-88 subscale. Table 8 shows the distributions
full subscale range and floor and ceiling effects were less than
the recommended maximum of 20% (McHorney and Tarlov,
1995). However, the three physical functioning subscales had
larger floor effects than the other subscales (range 11–19.8%).
Correlations among subscales, and with other measures and
variables. Tables 6 and 7 show correlations among MSSS-88
subscales, and between MSSS-88 subscales, validating instru-
ments and descriptive variables. The magnitude and pattern
of these correlations was generally consistent with expecta-
tions based on the constructs perceived to be measured by
the instruments. This provides further evidence, albeit
circumstantial, for the validity of the MSSS-88. For example,
correlations among MSSS-88 subscales range from 0.35 to
0.83 (12–69% shared variation), implying the eight subscales
measured related but discrete constructs. Correlations
between MSSS-88 subscales and the seven validating instru-
ments collected at the same point in time were broadly con-
health and social functioning subscales correlated most highly
with the MSIS-29 psychological impact subscale, SF-36 MH
subscale and GHQ-12.
Table 6 also shows the correlations between MSSS-88 sub-
scales, self-report degree of spasticity, indoor mobility level,
correlations were consistent with expectation. For example,
correlations with age and gender were low (range ?0.15 to
+0.09), whereas those with a four-point patient-reported
grading of spasticity severity (minimal, mild, moderate and
severe) were moderate (0.35–0.51).
Group differences validity. Table 8 reports the mean
MSSS-88 locations for people who graded their spasticity as
a stepwise and statistically significant increase in mean value
associated with increasing self-reported spasticity severity.
The aim of this study was to develop a scale for measuring
patients’ perceptions of the impact of spasticity in multiple
sclerosis. The resulting instrument, the 88-item Multiple
Table 2 MSSS-88: muscle stiffness and pain and discomfort scales
ItemAbbreviated item labelLocation SE Fit statistics
Muscle stiffness (reliability* = 0.95)
1 When walking
2 Anywhere—lower limbs
3 Same position long time
4 First thing in morning
5 Tightness anywhere in lower limbs
6Lower limbs feeling rigid
7When standing up
8 Tightness in muscles
9 That is unpredictable
10 Feeling muscles pulling
11In your whole body
12Whole body feeling rigid
Pain and discomfort (reliability 0.95)
1Restricted and uncomfortable
2Uncomfortable sitting-long time
4 Pain—when in same position for too long
5 Uncomfortable lying down for long time
6 Difficulties—comfortable position to sleep
7 Pain—muscles—getting out of bed-morning
8Pain—muscles provoked by movement
9Constant pain in muscles
*Person separation index = true variance (total ? error variance)/total variance.
Measuring spasticity in multiple sclerosis: the MSSS-88 Brain (2005) Page 5 of 11
by guest on June 5, 2013
Table 3 MSSS-88: muscle spasms and ADL scales
Item Abbreviated item labelLocation SEFit statistics
Residual Chi square
Muscle spasms (reliability 0.93)
Activities of Daily Living (ADL; reliability 0.95)
1 Putting on your socks or shoes
2Housework such as cooking/cleaning
3Getting in and out of a car
4 Getting in and out of shower/bath
5 Sitting up in bed
6 Getting into or out of bed
7 Turning over in bed
8Getting into or out of a chair
9 Getting dressed or undressed
10Getting on or off the toilet seat
11Drying yourself with a towel
Spasms—come on unpredictably
Powerful or strong spasms
Spasms—first getting out of bed—morning
Spasms provoked by changing positions
Provoked by movement
Where your legs kick out in front of you
Provoked by certain positions
Spasms disturbing sleep
When doing certain tasks
When travelling over bumps or cobbles
Where your knees pull up
Causing legs to hit things
Provoked by touch
Pushing you out of chair or wheelchair
Table 4 MSSS-88: walking and body movements scales
Item Abbreviated item label LocationSE Fit statistics
Residual Chi square
Walking (reliability 0.96)
Body movement (reliability 0.96)
Difficulties walking smoothly
Being slow when walking
Having to concentrate on your walking
Having to increase effort to walk
Being slow going up and down stairs
Being clumsy when walking
Tripping over/stumbling when walking
Feeling like walking through treacle
Losing your confidence to walk
Feeling embarrassed to walk
Difficulties moving freely
Difficulties moving smoothly
Limited range of movement
Difficulties moving parts of your body
Difficulties bending your limbs
Your body being resistant to movement
Your body or limbs feeling locked
Awkward or jerky movement
Difficulties straightening your limbs
Difficulties relaxing parts of your body
No control over your body
Page 6 of 11Brain (2005) J. C. Hobart et al.
by guest on June 5, 2013
Sclerosis Spasticity Scale (MSSS-88; Appendix see supple-
mentary material), attempts to quantify the impact of
spasticity in eight clinically relevant areas: three spasticity
specific symptoms (muscle stiffness, pain and discomfort
and muscle spasms), three areas of physical functioning
(ADL, walking, body movements), emotional health and
social functioning. This patient-derived model of the impact
of spasticity highlights the complexity of an apparently
unidimensional clinical concept. It can also be viewed as a
framework for evidence-based management in the develop-
ment of integrated care pathways, guidelines for care
(Multiple Sclerosis Council for Clinical Practice Guidelines,
2003) and service development.
Does the MSSS-88 have too many items to be clinically
useful and are the specialized skills required for Rasch
analysis justified? Three questions underpin these concerns.
Table 5 MSSS-88: emotional health and social functioning scales
ItemAbbreviated item label LocationSEFit statistics
Residual Chi square
Emotional health (reliability 0.96)
Social functioning (reliability 0.95)
Feeling less confident in yourself
Loss of self-worth
Feeling like a failure
Difficulties going out
Difficulties finding energy for others
Feeling reluctant to go out
Feeling less sociable
Difficult family relationships
Difficulties interacting with people
Table 6 Correlations among MSSS-88 scales, and between MSSS-88 scales and other variables
MSSS-88 scaleMSSS-88 scale
Pain and discomfort
Degree of spasticity*
Duration of multiple sclerosis
*Grading of degree of spasticity: 0 = minimal; 1 = mild; 2 = moderate; 3 = severe.†Indoor mobility grading: 1 = walks unaided; 2 = walks
with an aid; 3 = uses a wheelchair.
Measuring spasticity in multiple sclerosis: the MSSS-88Brain (2005)Page 7 of 11
by guest on June 5, 2013
First, why are there so many items? Second, what evidence is
available and what mechanisms are in place to ensure clinical
usability? Third, do the clinical advantages of Rasch ana-
lysis outweigh the necessity for specialized knowledge and
First, the reason there are so many items is the need for
breadth, range and precision of measurement. The qualitative
phase of the study identified that the clinically appropriate
breadth of measurement was eight subscales. For each of
these eight subscales, adequate measurement range requires
the two most extreme scale items to be well separated. Mea-
surement precision is determined by the number of units into
which the range is divided and is defined mainly by the
number of items of the scale. In addition, the number of
items in a scale determines the confidence intervals around
an individual patient’s estimate (Wright and Masters, 1982),
and a reasonable number of items are required to ‘anchor’ the
construct measured by any scale. As clinicians and clinical
trials require scales that give precise estimates of people’s
locations on the continuum that are also able to detect
clinically significant change, scales need to have a reasonable
number of items located at regular intervals across a substan-
tial range. Thus, at this stage of scale development, we were
reluctant to reduce the number of items further.
that 88 items is not too many for clinical usefulness. The two
postal surveys in this study, our surveys in non-trial samples
of multiple sclerosis and other neurological disorder, and
the work of others in health measurement have consistently
demonstrated high response rates and data completeness
despite large numbers of items. Nevertheless, we are aware
that the MSSS-88 may be used as one of many outcomes and
we have, therefore, constructed it to be flexible and adaptable
to meet different measurement needs. For example, a clinical
trial may be interested primarily in the impact of a treat-
ment on spasticity symptoms. In contrast, a clinical service
may be more interested in functional outcomes. Here, we
recommend the use of the most appropriate MSSS-88
subscale/s to address the measurement question. This is pos-
sible because each subscale is a stand-alone measurement
In other situations, such as the evaluation of a busy multi-
disciplinary spasticity service, clinicians or researchers may
wish to measure all eight areas but feel that 88 items is too
the MSSS-88. That is, a selection of the most clinically appro-
priate items from each subscale. For example, users could
choose items 1, 3, 5, 8, 10, 13 and 14 from the Muscle Spasms
Table 7 MSSS-88: correlations with validating scales
Scale MSSS-88 scale
mob = mobility; ewb = emotional well-being.
Table 8 MSSS-88: Group difference and relative validity
Scale Person measures
mean (SD); range
Patient-reported spasticity severity F(p)
n = 43–46
n = 59–60
n = 81–83
+0.69 (2.04); ?4.63 to +5.12
+0.11 (2.17); ?4.55 to +4.23
?0.46 (1.77); ?4.38 to +4.31
?1.45 (2.74); ?4.92 to +5.21
+1.72 (2.34); ?5.35 to +4.61
+1.12 (2.45); ?5.16 to +5.05
?0.53 (2.18); ?4.93 to +4.28
?0.59 (1.72); ?3.54 to +3.66
F = ratio of between-groups to within-groups variance; ceiling = % scoring minimum value = minimum disability; floor = % scoring maximum
value = maximum disability. *P < 0.001.†Mean person measure for subgroup defining their spasticity as minimal or mild.‡Walking scale
analyses only involved people who could walk hence n’s smaller: 26, 45 and 39 respectively.
Page 8 of 11 Brain (2005)J. C. Hobart et al.
by guest on June 5, 2013
subscale to give a seven-item short form whose item locations
are evenly spread across the continuum. This approach is
possible because the item locations of each subscale are cali-
brated with respect to each other. Consequently, investigators
can use any subset of items from any subscale and generate
results that are referable to the long form version of that scale
(Choppin, 1968, Wright, 1977). It is, however, important to
be aware that scales with few items have limited precision
(unless their range in very restricted) and are less able to
detect small but clinically meaningful change.
Third, Rasch analysis offers four clinically meaningful sci-
entific advantages that we believe far outweigh concerns
about the necessity for specialized knowledge and software:
(i) it offers clinicians the ability to construct interval-level
measurements from ordinal-level rating scale data, thereby
addressing a major concern of using rating scales as outcome
measures (Whitaker et al., 1995; Platz et al., 2005); (ii) it
enables clinicians to obtain estimates suitable for individual
person analyses rather than only for group comparison stu-
dies;(iii)itenablesclinicians tousesubsets ofitemsfromeach
subscale rather than all items from the scale (detailed above),
yet still be able to compare scores using different sets of items;
(iv) missing item data can be handled scientifically, rather
than on the basis of assumption, because Rasch analysis com-
missing data to be imputed.
Nevertheless, Rasch analysis appears complicated, is not
widely used, and there are few clinicians and researchers
trained in its use and interpretation. For this reason we
offer three methods for computing MSSS-88 subscale scores.
In the first method, item scores can be summed, without
weighting or standardization, to generate ordinal-level total
scores just as any other Likert-type scale. Missing responses to
items can be replaced with the mean score of the items com-
pleted (person-specific item mean score) provided that 50%
or more of the items in a scale have been completed (Ware
et al., 1993). In the second method of computing MSSS-88
subscale scores, the ordinal summed scores generated above
can be transformed into interval-level measurements using
conversion tables that can be made available with the scales.
In the third method of computing MSSS-88 scores, investi-
gators can Rasch analyse their own data. Furthermore, if these
analyses use (anchor) the item and threshold locations from
our dataset, available on request, people in the new sample
will be measured on an identical interval-level metric to the
one we have constructed.
Is the use of ordinal summed MSSS-88 scores justified and
an advance over another ordinal scale? It is justified because
the very use of integer and total scores depends on the data
conforming to the Rasch model (Andrich, 1978). Whether or
not they do, is an empirical question that is answered by
a Rasch analysis. In many situations scores on items are
simply summed, but it is usually done by assumption without
checking thoroughly that it is a valid procedure and whether
it is justified to use integer scores for the successive item res-
ponse categories. Such thorough checks cannot be achieved
using traditional psychometricmethods (Massoff,2002). This
is one advance over other ordinal scales, another advance is
the linearization of scores that follows directly from the
The evidence that Rasch analysis transforms ordinal
summed scores into interval-level measurements lies in the
properties of the mathematical model (Rasch, 1961; Wright
and Stone, 1979; Andrich 1988). Effectively, the difference
between (comparison of) any two people, any two items,
or any one person and one item, is defined by the logarithm
of the relative probabilities. In essence, albeit an oversimpli-
fication, the observed scores in the data matrix arereplacedby
the expected probabilities of occurrence, and relative differ-
ences computed as ratios of the relative probabilities (as these
are consistent indicators of relative differences). This ratio of
the relative probabilities is then expressed on a linear scale in
an additive form by taking logarithms. In addition, it can be
provenmathematically that thesummedscore isthesufficient
statistic for estimating the item and person locations, and the
estimation of these locations is independent of each other. As
such, the Rasch model is able to transform summed scores
into linear measures ofpersons and items that are on the same
scale with a common unit and freed up from the distribu-
tional properties of each other. Thus the Rasch modelrealizes,
mathematically, the requirements for scientific measurement
(Rasch, 1961; Massof, 2002; Andrich, 2004): invariant com-
parisons of people, and items, on the same linear scale.
Unfortunately, Rasch analysis is applicable only to multiple
item scales, not single item scales such as the Expanded Dis-
ability Status Scale (EDSS), Rankin scale, Hauser Ambulation
Index, EDMUS and Ashworth scale. It may be possible to
Systems if data satisfy the requirements of the Rasch model,
and it is considered clinically meaningful to combine scores
across the eight systems. Of note, Kurtzke did not think this
was clinically appropriate (Kurtzke, 1961).
It is important to clarify whatis implied by scores and score
differences. In estimating the linear person measurements, it
is not implied that, say, pain at one part of the continuum
is twice that at another part of the continuum. That would
require a natural zero point. Second, it is entirely possible that
at different parts of the same continuum, there may be some
qualitatively different responses or reactions. By analogy, in
heating water more and more quantitatively, there is a qua-
litative reaction as it starts to boil. The investigation of pos-
sible qualitative differences on the continuum is for further
clinical study having constructed a quantitative scale.
In this study we took an approach to scale development
that is recommended strongly (Andrich, 2002a, b) but differs
somewhat from the approach adopted by others. Specifically,
we first developed and used a conceptual model to define the
areas for subscale development and then used an explicit
mathematical model to guide subscale development in each
of these areas. It is more typical for developers of health rating
scales to use statistical techniques,such as factoranalysis of an
item pool, to define the areas for subscale development, and
Measuring spasticity in multiple sclerosis: the MSSS-88Brain (2005)Page 9 of 11
by guest on June 5, 2013
then traditional psychometric methods of testing reliability
and validity to refine those subscales. Using factor analysis
ing to their intercorrelations and, thus, makes the assumption
that correlations between items indicate the extent to which
they measure the same thing. This is an oversimplification
(Duncan, 1985). Furthermore, factor analysis is strongly
influenced by sample sizes (Nunnally and Bernstein, 1994),
the number and type of items analysed, and the targeting of
items and persons. The advantage of using an explicit math-
ematical model to guide scale construction is that it enables
sophisticated checks on the internal validity and consistency
of the scores, as well as the construction of stable linear
measurement systems (Wright, 1977; Linacre et al., 1994;
Massof, 2002; Andrich, 2004).
The three physical subscales of the MSSS-88 have larger
floor effects than the other subscales, implying that it may be
beneficial to extend their range of measurement in the future.
This can be achieved without affecting the subscales as they
other. Furthermore, future scale developments can be empiri-
cally driven. The distribution of the relative item locations
(Tables 2–5) shows the ‘gaps’ in each subscale (notable dis-
tances between item locations), which developers may wish to
fill, and the distribution of person measurements shows when
it may be valuable to extend the continuum in either direc-
tion. Tables 2–5 indicate the nature of items either side of the
gaps, and those that define upper and lower limits of mea-
surement. Consequently, potential items that are appropriate
to those locations can be generated and examined in small
samples to get reasonably accurate preliminary calibrations
before needing to undertake more definitive surveys.
This study has limitations. The use of a controlled clinical
study population, perhaps more motivated than the general
multiple sclerosis population, might give a false impression
of the utility of an 88-item scale in everyday practice. We
discussed earlier circumstantial evidence that this may not
be the case. Other limitations are that this work contributes
little to our understanding of the relationship between self-
potential implications of study medications on the findings.
The CAMS study finished 12–18 months before MSSS-88
measurements were collected, so concurrent Ashworth
measurements were not available and all patients were off
study medication. Nevertheless, we predict low correlations
between these two instruments (e.g. <0.50) because they
quantify very different clinical manifestations of spasticity
and capture different people’s perspectives. The Ashworth
scale is a clinician-based evaluation of a clinical sign (muscle
tone) at rest, whereas the MSSS-88 is a subjective assessment
of day-to-day spasticity symptoms and functional impact in
eight clinically separate areas. Consequently, the two instru-
ments should be viewed as complementary, not competing,
outcome measures for clinical trials.
Measurement of different manifestations also underpins
the finding of low correlations between neurophysiological,
biomechanical and clinical indicators of spasticity. Some
interpret these findings with surprise or disappointment
(Wood et al., 2005), or question the validity of one or
other or both measurement methods. However, low correla-
tions between different indicators of spasticity are predictable
and appropriate, and have a number of important implica-
tions. They highlight that the selection of outcome measures
underpins the meaningful interpretation of studies, that a
range of carefully selected and complementary outcomes
may need to be measured, that measures of one entity are
unlikely to be adequate surrogate markers of another (e.g.
MRI and disability) and that relying on correlations to vali-
date scales can be limited.
The MSSS-88 represents an attempt to conceptualize and
measure the impact of a complex neurological problem from
the patients’ perspective. Qualitative research, clinical experi-
ence and sophisticated measurement methods have been
integrated to develop a scale that complements existing
methods of evaluating spasticity in multiple sclerosis. It has
great potential to advance the measurement and, thereby,
management ofthisdisabling clinical problem. Further exam-
ination and responsiveness testing are now required to under-
stand the clinical meaning of MSSS-88 scores and score
changes, and evaluations of the instrument in people with
non-multiple sclerosis spasticity will determine its applicabil-
ity as a generic instrument.
For a copy of the MSSS-88 scale, see Supplementary material
at Brain online.
The authors wish to thank the patients who participated and
Professor David Andrich for his input. Dr Hobart’s recent
sabbatical in Australia was supported by the Royal Society of
Medicine (Ellison-Cliffe Travelling Fellowship),the Peninsula
Medical School, the Multiple Sclerosis Society of Great
Britain and Northern Ireland, and the NHS Health Technol-
ogy Assessment Programme. However, the views and opi-
nions expressed are not necessarily those of the NHS
Executive. The Neurological Outcome Measures Unit is
supported by the De Lazlo Foundation.
Andrich D. A rating formulation for ordered response categories.
Psychometrika 1978; 43: 561–73.
KR20 index, and the Guttman scale response pattern. Educ Psychol Res
1982; 9: 95–104.
Andrich D. Rasch models for measurement. Beverley Hills, CA: Sage
Andrich D. A framework relating outcomes based education and the
taxonomy of educational objectives. Stud Educ Eval 2002a; 28: 35–59.
Andrich D. Implication and applications of modern test theory in the context
of outcomes based research. Stud Educ Eval 2002b; 28: 103–21.
Page 10 of 11Brain (2005) J. C. Hobart et al.
by guest on June 5, 2013
Andrich D. Controversy and the Rasch model: a characteristic of
incompatible paradigms? Med Care 2004; 42: I7–I16.
Andrich D, Sheridan B, Luo G. RUMM 2020. Perth, WA: RUMM Laboratory
Pty Ltd; 1997–2004.
Cella DF, Dineen K, Arnason B, Reder A, Webster KA, Karabatsos G, et al.
Validation of the functional assessment of multiple sclerosis quality of life
instrument. Neurology 1996; 47: 129–39.
Choppin B. An item bank using sample free calibration. Nature 1968; 219:
Crayton H, Heyman R, Rossman H. A multimodal approach to managing the
symptoms of multiple sclerosis. Neurology 2004; 63: S12–S18.
Duncan OD. Probability, disposition and the inconsistency of attitudes and
behaviours. Synthese 1985; 42: 21–34.
Goldberg DP, Hillier VF. A scaled version of the General Health Question-
naire. Psychol Med 1979; 9: 139–45.
Gompertz P, Pound P, Ebrahim S. A postal version of the Barthel Index.
Clin Rehabil 1994; 8: 233–9.
Hagquist C, Andrich D. Is the sense of coherence instrument applicable on
adolescents a latent trait analysis using Rasch modelling. Pers Individ Dif
2004; 36: 955–68.
Hobart JC. Rating scales for neurologists. J Neurol Neurosurg Psychiatr 2003;
Hobart JC, Lamping DL, Thompson AJ. Evaluating neurological outcome
measures: the bare essentials. J Neurol Neurosurg Psychiatr 1996; 60:
in multiple sclerosis: why basic assumptions must be tested. J Neurol
Neurosurg Psychiatr 2001a; 71: 363–70.
Hobart JC, Lamping DL, Fitzpatrick R, Riazi A, Thompson AJ. The multiple
sclerosis impact scale (MSIS-29): a new patient-based outcome measure.
Brain 2001b; 124: 962–73.
Kurtzke JF. On the evaluation of disability in multiple sclerosis. Neurology
1961; 11: 686–94.
Kuckartz U. WINMAX pro ’96 - scientific text analysis. Berlin: Scolari Sage
Linacre JM, Heinemann AW, Wright BD, Granger CV, Hamilton BB. The
structure and stability of the functional independence measure. Arch Phys
Med Rehabil 1994; 75: 127–32.
Manning W, Newhouse J, Ware JE Jr. The status of health in demand estima-
tion: or, beyond excellent, good, fair, and poor. In: Fuchs V, editor.
Economic aspects of health. Chicago: The University of Chicago Press;
1982. p. 143–84.
Massof R. The measurement of vision disability. Optom Vis Sci 2002; 79:
McHorney CA, Tarlov AR. Individual-patient monitoring in clinical
practice: are available health status surveys adequate? Qual Life Res
1995; 4: 293–307.
McHorney CA, Ware JE Jr, Rogers W, Raczek AE, Lu JFR. The validity
and relative precision of MOS short- and long-form health status
scales and DartmouthCOOP
Multiple Sclerosis Council for Clinical Practice Guidelines. Spasticity
management in multiple sclerosis. Consortium of multiple sclerosis
Nunnally JC, Bernstein IH. Psychometric theory. New York: McGraw-Hill;
Paty DW, Ebers GC. Clinical features. In: Paty DW, Ebers GC, editors.
Multiple sclerosis. Philadelphia: F.A. Davis company; 1998.
Platz T, Eickhof C, Nuyens G, Vuadens P. Clinical scales for the assessment
of spasticity, associated phenomena, and function: a systematic review of
the literature. Disabil Rehabil 2005; 27: 7–18.
Rasch G. On general laws and the meaning of measurement in psychology.
In: Neyman J, editor. Proceedings of the Fourth Berkeley Symposium on
Mathematical Statistics and Probability IV. Berkeley CA: University of
California Press; 1961. p. 321–34.
Rasch G. Probabilistic models for some intelligence and attainment tests.
Copenhagen: Danish Institute for Education Research; 1960. Reprinted
Chicago: University of Chicago Press; 1980.
Scientific Advisory Committee of the Medical Outcomes Trust. Assessing
health status and quality of life instruments: attributes and review criteria.
Qual Life Res 2002; 11: 193–205.
Sloan JA, Aaronson N, Cappelleri JC, Fairclough DL, Varricchio C, The
Clinical Significance Consensus Meeting Group. Assessing the clinical sig-
nificanceof single items relativetosummatedscores. MayoClin Proc 2002;
Thompson AJ, Jarrett L, Lockley L, Marsden J, Stevenson V. Clinical
management of spasticity. J Neurol Neurosurg Psychiatr 2005; 76:
Voerman GE, Gregoric M, Hermens HJ. Neurophysiological methods for the
assessment of spasticity: the Hoffmann reflex, the tendon reflex, and the
stretch reflex. Disabil Rehabil 2005; 27: 33–68.
Ware JE Jr, Snow KK, Kosinski M, Gandek B. SF-36 Health Survey manual
and interpretation guide. Boston, MA: Nimrod Press; 1993.
Watkins C, Leathley M, Gregson JM, Moore AP, Smith TL, Sharma A.
Prevalence of spasticity post stroke. Clin Rehabil 2002; 16: 515–22.
Whitaker JN, McFarlandHF,Rudge
assessment in multiple sclerosis trials: a critical analysis. Mult Scler
1995; 1: 37–47.
Wood D, Burridge J, van Wijck F, McFadden C, Hitchcock RA, Pandyan AD,
et al. Biomechanical approaches applied to the lower and upper limb
for the measurement of spasticity: a systematic review of the literature.
Disabil Rehabil 2005; 27: 19–32.
Wright BD. Solving measurement problems with the Rasch model. J Educ
Meas 1977; 14: 97–116.
Wright BD, Linacre JM. Observations are always ordinal: measurements,
however must be interval. Arch Phys Med Rehabil 1989; 70: 857–60.
Wright BD, Masters GN. Rating scale analysis. Chicago: MESA Press; 1982.
Wright BD, Stone MH. Best test design. Chicago: MESA Press; 1979.
Zajicek J, Fox P, Sanders H, Wright D, Vickery J, Nunn A, et al. Cannabinoids
for treatment of spasticity and other symptoms related to multiple sclerosis
(CAMS study): multi-centre randomised placebo-controlled trial. Lancet
2003; 362: 1517–26.
Measuring spasticity in multiple sclerosis: the MSSS-88 Brain (2005) Page 11 of 11
by guest on June 5, 2013