Assessing reliable and clinically significant
change on Health of the Nation Outcome Scales:
method for displaying longitudinal data
Alberto Parabiaghi, Angelo Barbato, Barbara D’Avanzo, Arcadio Erlicher, Antonio Lora
Objective: Many authors recommended that reliable and clinically significant change
(RCSC) should be calculated when reporting results of interventions. To test the reliability
of the Health of the Nation Outcome Scales (HoNOS) in identifying RCSC, we applied
the Jacobson and Truax model to two HoNOS assessments in a large group of people
evaluated in 10 community mental health services in Lombardy, Italy, in 2000.
Method: The HoNOS was administered to 9817 patients; of these, 4759 (48%) were
re-assessed. Reliable change (RC) was calculated using Cronbach’s alpha (α), as a
parameter of the reliability of the measure. Clinical significance cut-offs were calculated
using a classification of severity based on HoNOS items.
Results: In the whole sample, the clinical improvement cut-off was 11 and the remission
RC index calculated on the whole group and on the subgroup of severe patients indicated
that eight-point and seven-point changes, respectively, were needed to be confident that a
as examples of reporting RCSC on HoNOS total scores in a routine data collection: 91.6%
of the whole sample (4361) was stable, 5.6% (269) improved and 1.8% (129) worsened.
Conclusion: Our study proposes a methodological framework for computing RCSC nor-
mative data on a widely used outcome scale and for identifying different degrees of clinical
Key words: clinical significance, HoNOS, longitudinal study, measuring change, reliable
Australian and New Zealand Journal of Psychiatry 2005; 39:719–725
Psychometric scales are a valid means of assessing
routine outcome in mental health. Where outcomes are
Alberto Parabiaghi, Researcher (Correspondence); Angelo Barbato, Senior
Scientist; Barbara D’Avanzo, Unit Chief
Unit of Epidemiology and Social Psychiatry, ‘Mario Negri’ Institute
for Pharmacological Research, Via Eritrea 62, 20157 Milan, Italy.
Arcadio Erlicher, Senior Consultant Psychiatrist
Dipartimento di Salute Mentale, Ospedale Niguarda Ca’ Granda,
Antonio Lora, Senior Consultant Psychiatrist
represented by dichotomous variables it is easy to assess
them but, given the dimensional nature of psychiatric
signs and symptoms, professionals often find themselves
dealing with continuous scales with no clear border be-
tween ‘illness’ and ‘wellbeing’. Psychopharmacological
and psychotherapeutic clinical trialstypically relyon sta-
tistical significance tests that are useful in identifying
group effects but give no information on how meaningful
the observed changes are . Statistical significance be-
tween test scores can sometimes be achieved even when
the actual difference between scores is very small and it
is less useful when we try to understand the outcome of
an individual (or a group of service users) in terms of re-
turn to a previous condition of ‘normality’ or adjustment.
720CLINICALLY SIGNIFICANT CHANGE ON HoNOS
Moreover, statistical and methodological problems limit
efforts to understand individual changes in the context of
a whole service or outcome study .
Health of the Nation Outcome Scales and
Jacobson and Truax  proposed a method to deter-
mine reliable and clinically significant change (RCSC).
This concept is meant to assure that a change observed
in one individual is: (i) beyond what could be attributed
to measurement error or to chance (reliable change); and
(ii) such as to bring the person from a score typical of a
problematic or suffering patient to a score typical of the
‘normal’ population (clinically significant change). Nor-
mative data for analysing clinically significant change
has been provided for widely used outcome scales
such as the Hamilton Rating Scale for Depression ,
the Symptom Checklist-90-R , the Brief Psychiatric
Rating Scale  and the Edinburgh Postnatal Depression
The Health of the Nation Outcome Scales (HoNOS)
were developed as a standardized assessment tool for
routine use in mental health services [7,8]. It consists
of 12 scales, each using five points, from 0 (no prob-
lem) to 4 (severe/very severe), yielding a total score
from 0 to 48. It has been translated into Italian [9,10].
Independent studies have evaluated its reliability ,
subscale structure , sensitivity to change  and ap-
propriateness for routine clinical use in busy psychiatric
services [14,15]. However, the application of the RCSC
model to HoNOS scores is still controversial. In Audin
et al. , the lack of normative data in non-clinical
populations prevented the use of appropriate methods
of determining clinical cut-off scores . In Rees et al.
, clinical change was measured using a statistical
significance test (Greenhouse–Geisser test) indicating
that a difference of three to four points was clinically
To apply the RCSC model to two subsequent HoNOS
assessments in a large group of people evaluated in
10 community mental health services (CMHS) in
To explore changes and to test the HoNOS total score
reliability and feasibility in identifying RCSC.
To display longitudinal changes on a two-dimensional
into contact with the staff of one of the collaborating CMHS during the
months of January, May and November in 2000.
We collected 16738 complete HoNOS assessments, concerning
9817 patients. Of these subjects, 4759 (48%) were evaluated at least
twice in 2000. Data from the first assessment were used to calculate a
reliable change (RC) index and clinically significant (CS) change; sub-
sequently longitudinal changes were explored by applying the RCSC
model to the patients with at least two assessments. Further details are
given elsewhere [18,19].
Classification of severity
A simple method for classifying patients’ severity was applied. This
was proposed by Lelliott  who defined severe patients as having
higher scores in at least one item. We adopted a similar classifica-
tion based on a score of ≥3 in at least one item to discriminate be-
tween severe and non-severe patients. We further distinguished severity
by taking the group of very severe subjects with a score of ≥3 in at
least two items. Instead, the group of subclinical subjects had a score
<2 in all items . Therefore, the criterion discriminating: (i) be-
tween ‘very severe’ and ‘moderately severe’ patients is having more
than one item’s score of ≥3; and (ii) between ‘mild’ and ‘subclini-
cal’ patients is having at least one item’s score of 2 (see Results and
22% 21% 40% 17%
Very severe Moderately severe
15.9 (4.6) 9.7 (3.3) 6.8 (3.1)
Figure 1. Classification of severity based on the
frequency of HoNOS scores (severe, at least one item
≥3; non-severe, each item <3; very severe, at least two
items ≥3; moderately severe, one item ≥3; mild, at least
one item = 2; subclinical, each item <2). Distribution
and mean total HoNOS scores (mean (SD); 0 = no
distress; 48 = highest distress). For 9817 patients who
came into contact with community mental health
services (CMHS) in 2000.
A. PARABIAGHI, A. BARBATO, B. D’AVANZO, A. ERLICHER, A. LORA 721
Reliable change refers to the extent to which an observed change
falls beyond the range attributable to the measurement error.
Reliable change (RCindex; for formula see Appendix) is assessed us-
ing a variation on the standard error (SEdiff; for formula see Appendix)
line and follow-up) [21–23].
We calculated RC both on the whole population of service users
(9817) and on the subgroup of patients with a score of at least ≥3 (i.e.
very severe and moderately severe) (4179).
Clinically significant change is when a person’s score moves from
the ‘dysfunctional population’ range into the ‘functional population’
This requires determination of the cut-off point where the chance of
belonging to either distribution is the same (CScut-off; for formula see
Tingey et al.  proposed using multiple clinical groups (e.g. in-
patients vs outpatients) to determine cut-off points, aiming at a more
realistic determination of ‘stepwise’ changes. We calculated the cut-off
(cut-off1) that separated the group of ‘very severe’ patients from the
other service users and the cut-off (cut-off2) that separated the group of
subclinical subjects from the group of clinical subjects (mild, moder-
referring to clinically different subgroups.
Considering only the subgroup of severe patients (4179), we calcu-
Taking patients who were evaluated at least twice in 2000 (4759),
we plotted longitudinal changes on a two-dimensional graph with base-
line assessment on the x-axis and follow-up assessment on the y-axis.
We also explored longitudinal changes in the subgroup of ‘moderately
severe’ and ‘very severe’ patients (2146).
We tested whether cut-off scores could be assumed as a proxy of the
category change for each individual. Each subject was classified in two
different ways, according to: (i) RC and CS change; and (ii) RC and
the real-shift to a different category of severity. In order to explore the
the variables and analysed their correlation (Spearman rho; p<0.01).
All analyses were carried out using SPSS for Windows Release 11.
Classification of severity
into four categories of severity. The mean HoNOS scores are reported.
The ‘very severe’ patients’ score (mean=15.9) was more than seven
points higher than the whole population’s score (mean=8.7).
Reliable and clinically significant change
Cronbach’s α for the 12 items, as calculated in the total population
being needed to give 95% confidence that a real change had occurred
in the individual (RC); cut-off1was 11 and cut-off2was 5.
Figure 2 shows longitudinal changes of the 4759 patients evaluated
twice. The central diagonal line indicates the points where absolutely
no change was observed in at least 6 months (y=x). The ‘rails’ on
each side of the diagonal show the limits of the RC area; for anyone
falling within this area a change could be attributed to chance and
measurement error. Subjects with a point falling above the upper rail
showed a reliable worsening of their clinical condition as measured
by HoNOS, and those below showed reliable improvement. The two
horizontal lines on the y-axis (follow-up) indicate the limits of clinical
improvement and remission. Those falling below these lines can be
considered clinically improved or recovered. The two vertical lines on
the x-axis (baseline) represent the limits of clinical deterioration and
recurrence; subjects falling to the left of these lines showed significant
worsening of their clinical condition or a recurrence, being previously
A total of 91.6% of the sample was stable, 5.6% (269) improved and
1.8% (129) worsened.
Figure 3 shows longitudinal changes in the subgroup of severe pa-
tients (2146), where only cut-off3was considered; 82.5% of the sample
was stable, 14.4% improved and 3.2% worsened.
The points plotted both in Figs 2 and 3 do not represent a fixed
number of subjects.
Table 1 presents a cross-tabulation of the two outcome variables on
which the total sample was classified. Accordance was good, because
82% and 74% of patients were clinically improved and worsened on
both classifications, respectively. Only one patient was designated as
improved with one classification and deteriorated with the other. In
bivariate analysis, the two outcome variables significantly correlated,
with a Spearman’s rho of 0.9.
Although the model proposed by Jacobson and Truax
 was designed to translate psychotherapy research re-
sults into clinical practice, it is applicable to the mea-
surement of change on any continuous scale for any
clinical problem. Crosby et al.  reviewed current
approaches to define and identify clinically meaning-
ful change in health-related quality of life (QoL). Two
broad methods are available: (i) anchor-based methods;
and (ii) distribution-based methods. The first approach
has been used to determine clinically meaningful change
by comparing QoL measures to other phenomena with
clinical relevance. The second approach is based on the
statistical characteristics of the sample and measurement
precision. Jacobson and Truax  proposed that individ-
uals should be considered improved or deteriorated only
when they fulfilled both the anchor-based (i.e. CS) and
the distribution-based (i.e. RC) criteria for change.
722 CLINICALLY SIGNIFICANT CHANGE ON HoNOS
Figure 2. Longitudinal changes of the 4759 patients evaluated at least twice in 2000: total HoNOS score (0 = no
distress; 48 = highest distress). Plot of reliable and clinically significant change parameters.
For Crosby et al.  an integrated model to deter-
mine clinically meaningful change should also consider
the initial severity of impairment: an outcome of RC is
different for a patient who showed marked impairment at
ditions (i.e., HoNOS<12) (Fig. 3). An integrated model
should consequently have some means of classifying and
quantifying baseline severity.
for HoNOS as the mean baseline score plus the mean dis-
charge score, halved. We used a classification of severity
to single out the group of most severe and the group
of subclinical patients so as to bypass the collection of
normative data from a general population sample and
to establish cut-off scores which were more sensitive to
change and more able to pick up improvement or wors-
ening in a severe service-user population.
Schauenburg and Strack  tried to apply the strat-
egy proposed by Tingey et al.  to multiple clin-
ical groups (inpatients and outpatients) using the
Symptom Checklist-90-R; baseline scores did not differ-
to be able to establish a cut-off. The strategy adopted in
the present study may be preferable, because it takes into
account clinical groups that have been differentiated on a
Matthey  calculated RCSC in postnatal depression
using the Edinburgh Postnatal Depression Scale; he pro-
posed using the RC index to detect improvement or dete-
rioration and both the RC index and CS change to estab-
lish recovery. In the present study, we took a step forward
by adopting RC to indicate improvement or deterioration
or deterioration (cut-off1) and remission or recurrence
(cut-off2). The method takes into account different de-
grees of change in the direction of either improvement or
Another important issue is the method of visualizing
clinical significance (Fig. 2); horizontal and vertical cut-
off scores are useful to place patients’ current health
state into ranges relationed to their baseline condition.
As shown in Fig. 3, RCSC in a subgroup of severe pa-
tients was also explored. The advantage of evaluating
improvement or worsening in a more homogeneous sam-
ple of subjects is that the RCSC model becomes more
sensitive to change (RC is smaller and the CS cut-off is
The overall tendency toward stability shown by our
analysis is not surprising. The RC was high both in the
total population and in the subgroup of most severe pa-
A. PARABIAGHI, A. BARBATO, B. D’AVANZO, A. ERLICHER, A. LORA723
Figure 3. Longitudinal changes of the 2146 ‘very severe’ and ‘moderately severe’ patients evaluated at least twice in
2000: total HoNOS score (0 = no distress; 48 = highest distress). Plot of reliable and clinically significant change
Table 1.Distribution of patients evaluated at least twice in 2000 (4759) in two different categories of outcome: (i)
based on RCSC; and (ii) one based on RC and on the real shift to a different category of severity
Outcome 2Outcome 1
Improvement Deterioration Remission Recurrence Total StabilityClinical
Clinical improvement n0
724CLINICALLY SIGNIFICANT CHANGE ON HoNOS
to detect specific changes in a cohort of patients; we
just applied the RCSC method to routine data collec-
tion. Moreover, we cannot assume that the changes in
the patients with follow-up data were the same of those
without; patients who recovered, for example, might not
have been evaluated twice in 2000, so remissions may be
Psychometric scales summarize results in total scores
which are useful not only in experimental and epidemi-
ological studies but also for clinical purposes. However,
HoNOS is a group of scales that covers different dimen-
sions of mental illness and the use of a total score to
detect RCSC could be questionable.
The external validity of the classification of severity
proposed by Lelliott  has still to be shown, so the
validity of CS cut-offs as anchor-based criteria must be
confirmed in order to adopt the normative data in future
This research was carried out in the everyday envi-
ronment of CMHS, thus the findings may be considered
‘practice-based’ evidence. Although the aim was to iden-
tify criteria for RCSC, the method we applied and its
visual representation is helpful to illustrate the overall
pattern of a service-user population in order to draw at-
who showed a deterioration that was reliable but not CS).
The methodological framework we propose allows to
compute outcome data that takes account of the actual
change of individual patients and that could be used both
to monitor services’ performances and to evaluate the
effectiveness of psychiatric interventions.
Future research should be aimed at analysing the bal-
ance between the specificity and sensitivity of this model
and at testing the external validity of the severity criteria
1.Hafkenscheid A. Psychometric measures of individual change:
an empirical comparison with the Brief Psychiatric Rating Scale
(BPRS). Acta Psychiatrica Scandinavica 2000; 101:235–242.
Evans C, Margison F, Barkham M. The contribution of reliable
and clinically significant change methods to evidence-based
mental health. Evidence-Based Mental Health 1998; 1:70–72.
Jacobson NS, Truax P. Clinical significance: a statistical
approach to defining meaningful change in psychotherapy
research. Journal of Consulting and Clinical Psychology 1991;
Grundy CT, Lambert MJ, Grundy EM. Assessing clinical
significance: Application to the Hamilton Rating Scale for
Depression. Journal of Mental Health 1996; 5:25–33.
Schauenburg H, Strack M. Measuring psychotherapeutic change
with the symptom checklist SCL-90-R. Psychotherapy and
Psychosomatics 1999; 68:199–206.
6.Matthey S. Calculating clinically significant change in postnatal
depression studies using the Edinburgh Postnatal Depression
Scale. Journal of Affective Disorders 2004; 78:269–272.
Wing JK, Beevor AS, Curtis RH, Park SB, Hadden S, Burns A.
Health of the Nation Outcome Scales (HoNOS). Research and
development. British Journal of Psychiatry 1998; 172:11–18.
Wing JK, Curtis RH, Beevor AS. HoNOS: Health of the Nation
Outcome Scales. Trainers’ guide. London: College Research
Rossi R, Blaco R, Castelli C et al. Il costo dei pazienti
psichiatrici per classi di gravit` a. Epidemiologia e Psichiatria
Sociale 1999; 8:198–208.
Lora A, Bai G, Bianchi S et al. La versione italiana della
HoNOS (Health of the Nation Outcome Scales), una scala per la
valutazione della gravit` a e dell’esito nei servizi di salute
mentale. Epidemiologia e Psichiatria Sociale 2001; 10:198–208.
Orrell M, Yard P, Handysides J, Schapira R. Validity and
reliability of the Health of the Nation Outcome Scales in
psychiatric patients in the community. British Journal of
Psychiatry 1999; 174:409–412.
Trauer T. The subscale structure of the Health of the Nation
Outcome Scales (HoNOS). Journal of Mental Health 1999;
Trauer T, Callaly T, Hantz P, Little J, Shields R, Smith J. Health
of the Nation Outcome Scales. Results of the Victorian field
trial. British Journal of Psychiatry 1999; 174:380–388.
Bebbington P, Brugha T, Hill T, Marsden L, Window S.
Validation of the Health of the Nation Outcome Scales. British
Journal of Psychiatry 1999; 174:389–394.
Sharma VK, Wilkinson G, Fear S. Health of the Nation
Outcome Scales: a case study in general psychiatry. British
Journal of Psychiatry 1999; 174:395–398.
Audin K, Margison FR, Clark JM, Barkham M. Value of
HoNOS in assessing patient change in NHS psychotherapy and
psychological treatment services. British Journal of Psychiatry
Rees A, Richards A, Shapiro DA. Utility of the HoNOS in
measuring change in a Community Mental Health Care
population. Journal of Mental Health 2004; 13:295–304.
Cavazza M, Civenti G, Ravasio R. Servizi e pazienti
reclutati. Epidemiologia e Psichiatria Sociale 2002; 11 (Suppl
Lora A, Cavazza M, Mapelli V. Diagnosi e gravit` a.
Epidemiologia e Psichiatria Sociale 2002; 11 (Suppl 5):38–52.
Lelliott P. Definition of severe mental illness. In: Charlwood P,
Mason A, Goldacre M, Cleary R, Wilkinson E, eds. Health
outcome indicators: severe mental illness. Report of a working
group to the Department of Health. Oxford: National Centre for
Health Outcomes Development, 1999:87–93.
Christensen L, Mendoza JL. A method of assessing change in a
single subject: an alteration of the RC index. Behavior Therapy
Jacobson NS, Follette WC, Revenstorf D. Psychotherapy
outcome research: methods for reporting variability and
evaluating clinical significance. Behavior Therapy 1984;
Jacobson NS, Revenstorf D. Statistics for assessing the clinical
significance of psychotherapy techniques: issues, problems and
new developments. Behavioural Assessment 1988; 10:133–145.
Tingey RC, Lambert MJ, Burlingame GM, Hansen NB.
Assessing clinical significance: proposed extensions to method.
Psychotherapy Research 1996; 6:109–123.
Crosby RD, Kolotkin RL, Williams GR. Defining clinically
meaningful change in health-related quality of life. Journal of
Clinical Epidemiology 2003; 56:395–407.
A. PARABIAGHI, A. BARBATO, B. D’AVANZO, A. ERLICHER, A. LORA 725
The SE of a measurement of a difference is calculated
where SD1is the standard deviation of the baseline ob-
servations and α is Cronbach’s coefficient.
The RC index is calculated as:
(1 − α),
RCindex= 1.96 × SEdiff.
The CS ‘cut-off’ point is calculated as:
CScut-off=(meanclin× SDnorm) + (meannorm× SDclin)
where meanclinand meannormare the mean scores of the
‘dysfunctional population’ and the ‘functional popula-
tion’, respectively and SDnormand SDclinare the standard
deviations of the scores in these two groups.