ArticlePDF Available

The GRID-HAMD: Standardization of the Hamilton depression rating scale

Authors:
  • Center for Telepsychology
  • CROnos Clinical Consulting Services

Abstract and Figures

This report describes the GRID-Hamilton Depression Rating Scale (GRID-HAMD), an improved version of the Hamilton Depression Rating Scale that was developed through a broad-based international consensus process. The GRID-HAMD separates the frequency of the symptom from its intensity for most items, refines several problematic anchors, and integrates both a structured interview guide and consensus-derived conventions for all items. Usability was established in a small three-site sample of convenience, evaluating 29 outpatients, with most evaluators finding the scale easy to use. Test-retest (4-week) and interrater reliability were established in 34 adult outpatients with major depressive disorder, as part of an ongoing clinical trial. In a separate study, interrater reliability was found to be superior to the Guy version of the HAMD, and as good as the Structured Interview Guide for the Hamilton Depression Rating Scale (SIGH-D), across 30 interview pairs. Finally, using the SIGH-D as the criterion standard, the GRID-HAMD demonstrated high concurrent validity. Overall, these data suggest that the GRID-HAMD is an improvement over the original Guy version as well as the SIGH-D in its incorporation of innovative features and preservation of high reliability and validity.
Content may be subject to copyright.
120 Original article
The GRID-HAMD: standardization of the Hamilton
Depression Rating Scale
Janet B.W. Williams
a,c
, Kenneth A. Kobak
c
, Per Bech
g
, Nina Engelhardt
c
,
Ken Evans
f
, Joshua Lipsitz
a,c
, Jason Olin
b
, Jay Pearson
d
and Amir Kalali
e
This report describes the GRID-Hamilton Depression
Rating Scale (GRID-HAMD), an improved version of the
Hamilton Depression Rating Scale that was developed
through a broad-based international consensus process.
The GRID-HAMD separates the frequency of the symptom
from its intensity for most items, refines several
problematic anchors, and integrates both a structured
interview guide and consensus-derived conventions for all
items. Usability was established in a small three-site
sample of convenience, evaluating 29 outpatients, with
most evaluators finding the scale easy to use. Test–retest
(4-week) and interrater reliability were established in 34
adult outpatients with major depressive disorder, as part of
an ongoing clinical trial. In a separate study, interrater
reliability was found to be superior to the Guy version of
the HAMD, and as good as the Structured Interview Guide
for the Hamilton Depression Rating Scale (SIGH-D), across
30 interview pairs. Finally, using the SIGH-D as the criterion
standard, the GRID-HAMD demonstrated high concurrent
validity. Overall, these data suggest that the GRID-HAMD is
an improvement over the original Guy version as well as
the SIGH-D in its incorporation of innovative features
and preservation of high reliability and validity. Int Clin
Psychopharmacol 23:120–129 c2008 Wolters Kluwer
Health | Lippincott Williams & Wilkins.
International Clinical Psychopharmacology 2008, 23:120–129
Keywords: clinical trials, depressive disorder, Psychiatric Status Rating
Scales, treatment outcome
a
Columbia University, New York, New York,
b
Novartis Pharmaceuticals, East
Hanover, New Jersey, USA,
c
MedAvante Inc., Hamilton, Ontario, Canada,
d
Merck
Research Laboratories, Whitehouse Station, New Jersey,
e
Quintiles and the
University of California, San Diego, California, USA,
f
Ontario Cancer Biomarker
Network, Toronto, Ontario, Canada and
g
Psychiatric Research Institute and
Frederiksborg General Hospital, Copenhagen, Denmark
Correspondence to Janet B.W. Williams, DSW, MedAvante Inc., 100 American
Metro Blvd., Suite 106, Hamilton, NJ 08619, USA
Tel: + 609 528 9472; fax: + 609 528 9405;
e-mail: jbw5@columbia.edu or jwilliams@medavante.net
Received 29 April 2007 Accepted 21 January 2008
The Hamilton Depression Rating Scale (HAMD) (Ha-
milton, 1960) was introduced in 1960 to measure severity
of depression primarily in a hospitalized depressed
population, though it is now used most frequently in
outpatient settings. It is the most widely used clinician-
administered depression rating scale, and is used in most
clinical trials of antidepressant medications. However,
although the scale has been used successfully to assess
the effectiveness of antidepressants, it has been criticized
for poor item reliabilities, and, for some items, poor
discriminative properties across the range of depressive
severity (Faries et al., 2000; Santor and Coyne, 2001;
Evans et al., 2004).
The number of versions of the HAMD is legendary; in
fact, there are so many that researchers and clinicians
have lost track of the characteristics of each version.
Williams (2001) reviewed and summarized 11 published
versions, which differ widely in the number (ranging from
17 to 29), sequence, and wording of items. Few reports
provide a reference to the version of the HAMD used in a
trial (Zitman et al., 1990), and no single version of the
HAMD or single set of conventions has been universally
accepted. Over time, different aids for administering the
HAMD and modifications to the scale have been
proposed: self-report (Reynolds and Kobak, 1995) and
computerized versions (Kobak et al., 1990, 2000),
structured interview guides (Williams, 1988; Potts et al.,
1990; Whisman et al., 1999), and reduced (Maier and
Philipp, 1985; Bech et al., 1986; Gibbons et al., 1993) and
expanded (Thase, 1984; Gelenberg et al., 1990) item sets.
Although some of these modifications and aids claim to
have improved the reliability and sensitivity of the scale,
others have not, and interrater and intrarater item
reliability has remained problematic.
Despite its limitations, the scale remains a useful and
popular instrument, and there are good arguments for
continuing its use, at least until there is a superior
alternative with better psychometric characteristics.
Zimmerman has highlighted several advantages to the
HAMD, such as its provision of long-term continuity in
assessment methodology that allows comparison across
decades of studies, and its ability to discriminate drug
from placebo in clinical trials and to measure change
(Zimmerman et al., 2005). However, half a century after
0268-1315 c2008 Wolters Kluwer Health | Lippincott Williams & Wilkins
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
its development, most would agree that a revision and
standardization of the scale is timely (Bagby et al., 2004).
The Depression Rating Scale Standardization
Team
A proposal was made at the 1999 meeting of the National
Institute of Mental Health sponsored New Clinical Drug
Evaluation Unit to establish a common set of standards
for scoring and administering the HAMD. This proposal
led to the formation of the Depression Rating Scale
Standardization Team (DRSST), a collaboration of
individuals from academia, clinical practice, the pharma-
ceutical industry, and government. A core working group
was selected (with representatives from all four groups).
The mission of the core group was to establish a process
whereby individuals from different and sometimes com-
peting disciplines could work cooperatively to develop a
standard approach to administering and scoring the
HAMD that would be used by academic and pharmaceu-
tical industry researchers in academic and clinical
settings.
A 3-day meeting was held in October 2000 to draft a
standardized version of the scale. At the outset, it was
agreed that the goal was to standardize the administration
and scoring of the HAMD without significantly altering
the original intent of Hamilton’s items or the scoring
profile. Problems with this approach were identified so
that the work of the group could focus on correcting these
deficiencies.
Several issues were addressed. First, most items of the
HAMD require an assessment of both the intensity and
frequency of a given symptom (or set of symptoms). This
creates a challenge for scoring because both frequency
and intensity must be taken into account before an
overall severity score can be assigned. No generally
accepted guidelines to help the rater determine the
contribution of both to symptom severity exist. For this
reason, the group developed a grid format, described
below.
Second, the possible responses for severity within many
of the items, that is, options 0–4, are ambiguously worded
and lack useful clinical examples. Work focused on
clarifying wording and adding examples. In the Depressed
Mood item of the GRID-HAMD, for example, a score of 1
(mild) was changed from ‘These feeling states indicated
only on questioning.’ to ‘Feelings of sadness, discourage-
ment, low self-esteem, pessimism.’
Third, a consistent observation across many items was
that the description for a score of 4 (‘very severe’)
pertained to the hospitalized patient and not to the
depressed outpatient with whom the scale is most
frequently used. In the GRID-HAMD, item descriptions
were modified to enhance the reliability of the scale and
its relevance to depressed outpatients. For example, a
score of 4 for Work and Activities was changed from
‘Stopped working because of present illness. In hospital,
rate 4 if patient engages in no activities except ward
chores or if patient fails to perform ward chores
unassisted.’ to, in the GRID-HAMD, ‘Unable to work;
needs help performing self-care activities; unable to
function without assistance.’
Fourth, the group focused on several HAMD items that
were regarded as especially problematic. For example,
insight is nearly always scored ‘0’ in outpatient studies
(Evans et al., 2004), and hypochondriasis has shown poor
item-total correlation and low interrater reliability. As a
result, these items add limited value to the assessment of
change in condition. These items were revised; not
discarded.
Fifth, for several severity levels, item descriptions have
been difficult for raters to interpret. For example, in the
Feelings of Guilt item, the third severity anchor states,
‘present illness is a punishment. Delusions of guilt.’ It is
not clear if delusional thinking is sufficient, or is also
necessary for this severity level.
An additional problem is that some commonly used item
descriptions have become outdated and confusing in light
of post-Diagnostic and Statistical Manual of Mental
Disorders-IV diagnostic criteria; for example, requiring
‘heaviness in limbs, back, or head’ for a positive rating of
Somatic Symptoms General. These confusing anchor and
item descriptions were clarified and updated.
The group concluded that many of these factors have
contributed to poor item reliabilities. Any revision to the
scale would need to: (1) be more straightforward to
use; (2) more clearly operationalize the anchor points;
(3) simplify scoring by allowing the rater to consider the
dimensions of intensity and frequency independently to
arrive at an overall severity rating; and (4) adopt a
standardized scale and conventions to obviate the need
for raters to learn different guidelines and conventions for
different simultaneous studies. Revision and standardiza-
tion are particularly needed when one considers the great
variety in the backgrounds and experience of raters
administering the HAMD in clinical trials today, ranging
from psychiatrists to study coordinators with little, if any,
clinical experience.
The group agreed that its goal was to develop a standard
method for administering and scoring the HAMD rather
than to develop a new instrument. Therefore, it was
crucial to improve the instrument while avoiding
significant changes to Hamilton’s original intent or
scoring profile. Thus, a patient scoring 18 on the original
GRID-HAMD Williams et al. 121
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
HAMD-17 version would ideally score approximately 18
on the GRID-HAMD. Hamilton’s original guidelines for
administration and scoring were frequently consulted to
ensure fidelity to his original intent (Hamilton, 1960,
1967).
Follow-up conference calls and e-mail correspondence
resulted in a complete draft of the GRID-HAMD that
was distributed in April 2001 for feedback to approxi-
mately 200 depression researchers and clinicians world-
wide. The working group reviewed responses, blind to
authorship, and items were revised based on feedback
and then redistributed. Once consensus was achieved,
the probes and conventions were developed by the
working group and reviewed in a manner similar to the
process outlined above.
The GRID-HAMD
The complete GRID-HAMD has three components: the
GRID scoring system, the manual of scoring conventions,
and a semistructured interview guide. This paper
describes the final GRID system, and presents data from
two pilot studies as well as a study of the reliability and
validity of the system.
The GRID scoring system
The DRSST proposed a ‘grid’ scoring system in which
the dimensions of intensity and frequency of a symptom
are rated independently for each relevant item (Fig. 1),
with greater frequency generally resulting in a higher
score for a given intensity. Such scoring methods are used
in other fields (Chouinard and Miller, 1999) and can
simplify the assignment of a given rating. Symptom
intensity is considered on the vertical axis and symptom
frequency on the horizontal axis. The intersection of
these points reveals the patient’s score for the item.
Although the grid scoring system is different in appear-
ance from earlier HAMD scoring forms, the group’s
intention was to preserve the underlying scoring profile of
the original scale. It is hoped that the added clarity will
make each item easier to rate in a consistent manner.
To increase the reliability of individual items, ratings of
intensity and frequency were formalized using the grid
structure, item content was clarified, anchor descriptions
were enhanced with clinical examples at each severity
level, and the most severe response option within
5-option items was rephrased to increase the utility of
the HAMD in an outpatient population (e.g. eliminating
references to inpatient-specific functioning). Symptom
intensity, which generally includes degree of subjective
distress and functional impairment in the GRID-HAMD,
is rated as ‘absent, mild, moderate, severe, and very
severe’ or ‘absent, mild, and marked.’ Symptom fre-
quency is rated as ‘absent, occasional, much of the time,
and almost all of the time,’ with operational criteria
provided on each page of the GRID-HAMD. For
example, occasional frequency is defined as ‘Infrequent;
less than 3 days; and up to 30% of the week.’ Examples of
degrees of intensity and frequency are provided in the
GRID-HAMD itself.
Structured interview guide
It is widely held that the item reliability of the HAMD is
compromised by administering the instrument in an
unstructured manner (Moberg et al., 2001). To assess
‘Depressed Mood,’ for example, the rater must ask a
series of questions to determine the degree of depressed
mood. However, the specific questions that must be
asked, and the conventions used to interpret and score
responses, vary from study to study as well as from rater
to rater. In addition, there is marked intersite incon-
sistency with respect to rater training and clinical
experience.
A number of studies have found that standardized
instructions for completing and scoring the HAMD
improve interrater reliability (Williams, 1988; Moberg
et al., 2001). Semistructured interview methods such as
the Structured Interview Guide for the Hamilton
Depression Rating Scale (SIGH-D) have been developed
(Williams, 1988), though both structured and unstruc-
tured interview techniques are in common use. A
semistructured interview guide provides a series of basic
questions to the rater, who is directed to add his or her
own questions when necessary to obtain additional
information from the patient or to clarify an ambiguous
response. Moberg et al. (2001) compared the test–retest
reliability of the SIGH-D with the test–retest reliability
of an unstructured HAMD on the same series of patients.
He found that the SIGH-D produced uniformly higher
item and total-scale score reliabilities than the unstruc-
tured HAMD. Such semistructured interview guides have
been shown to facilitate training on the scale, and to
require no more interviewing time than unstructured
administration. No compelling reason to avoid the use of
a semistructured interview guide exists, although it is
unknown whether the use of such a guide translates into
better detection of treatment effects.
The GRID-HAMD includes a semistructured interview
guide, based on the SIGH-D (Williams, 1988). It begins
with a general contextual question, and then presents
questions for specific items, including items 18–21 for
optional use. The rater is instructed to ask the specified
questions exactly as they are written. Additional optional
questions are in parentheses and raters may add their own
questions as well to obtain more information. Questions
that assess frequency are standardized throughout the
interview guide. In addition, for those ratings in which
several symptoms are listed, the GRID-HAMD clarifies
whether one or all of the symptoms are required to merit
122 International Clinical Psychopharmacology 2008, Vol 23 No 3
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
a particular score. Like many other psychiatric semi-
structured interview guides, the GRID-HAMD should be
used ideally by individuals who have received adequate
training in the assessment of mood in a depressed
population and are familiar with the use of the GRID-
HAMD specifically.
Fig. 1
1. Depressed mood
Absent
Not occurring
or clinically
insignificant
0
0
(1)
(2)
11
2
2
3
4
4
1
2
3
(3) (4) (5)
(8)
(11)
(14)
(7)(6)
(9)
4
3
(12) (13)
Conventions
This item should NOT be considered a global
measure of depressive severity. Item 1 assesses one
of several core symptoms of depression.
Normal mood fluctuations without clinical
significance should be rated "0."
Rate depressed mood even if patient attributes
mood to real life problems (e.g., depressed due to
bad job, marital conflict).
Some patients describe feelings of low
mood without acknowledging "sadness" or
"depression" (e.g., "down," "blah," "numb").
Rate as symptomatic.
Nonverbal signs (e.g., slumped posture,
infrequent eye contact, frowning, sad facial
expression) are also considered in assessing severity.
Do no rate angry, irritable, or anxious mood on
this item.
GRID-HAMD 1 ITEM SCORE:
(10)
Occasional
Infrequent;
less than 3 days;
up to 30% of the
week
Frequency
Much of the time
Often;
3–5 days;
31%–75% of the
week
Almost all of the time
Persistent;
6–7 days;
more than 75% of the
week
This item assesses feelings of sadness,
hopelessness, helplessness, and
worthlessness.
Note: This is not a global rating of depressive
illness.
Symptom intensity
Absent
Mild
Feelings of sadness, discouragement, low
self-esteem, pessimism
Moderate
Severe
Very severe
Extreme sadness, intractable hopelessness or
helplessness
What's your mood been like this past week (compared to when you feel OK)?
Have you been feeling down or depressed? Sad or hopeless? Helpless? Worthless?
(Can you describe what this feeling has been like for you? How bad is the feeling?)
Does the feeling lift at all if something good happens?
(Does it go away completely, or is it just less intense?)
How long have you been feeling this way?
How are you feeling about the future?
Have you been crying at all? IF YES: How often?
Frequency
During the past week, how often did you feel this way?
How much of the time did you feel this way?
How many days in the past week?
(Was it every day? How much of each day?)
Notes:
Intense sadness, weeping, hopelessness about
most aspects of life, feelings of complete
helplessness or worthlessness
Clear nonverbal signs of sadness (such as tearful-
ness), feelings of hopelessness, helplessness, or
worthlessness about some aspects of life
Example of GRID with structured interview guide and conventions for item 1: depressed mood.
GRID-HAMD Williams et al. 123
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
Scoring conventions
Some investigators and pharmaceutical industry sponsors
have developed scoring conventions for the HAMD,
whereas others leave this to the rater’s clinical judgment.
As none of these approaches has been universally
accepted by researchers, investigational study staff are
routinely trained to different (and often contradictory)
sets of scoring conventions. This can lead to confusion
and disregard for a particular set of conventions or
methods. To ensure that users of the GRID-HAMD are
consistent, a brief set of rating conventions was devel-
oped. Specific conventions are listed in the instrument
alongside each item, making these easier to follow than if
they were presented in a separate document. General
guidelines for administering the instrument are pre-
sented in an introductory section.
Studies
Two pilot studies and a reliability/validity study have
been conducted using the GRID-HAMD. The first pilot
study focused on ease of use, and the second study
evaluated interrater and 4-week test–retest reliability.
Pilot study 1: usability
Raters at one Canadian and two US sites administered
the GRID-HAMD to a series of psychiatric outpatients in
their clinics. The intent was to assess the ease of use of
the new grid scoring system, the clarity of anchor
descriptions, and the accuracy of ratings compared with
a version of the HAMD that was customarily used at each
site, as well as the quality and ease of use of the new
structured interview guide and conventions. Data were
collected via a paper-and-pencil questionnaire.
Twelve raters were given a very brief introduction to the
GRID-HAMD in a telephone conference, and then
evaluated a total of 29 patients using the GRID-HAMD
with its structured interview guide and conventions.
Raters’ years of experience with the HAMD ranged from
1 to 26 years (mean = 7 years). Ten raters had previously
used the SIGH-D. None of the raters were involved in
the development of the GRID-HAMD.
Table 1 summarizes the site and rater characteristics.
Most (75%) of the raters found the GRID-HAMD ‘very
easy’ or ‘easy’ to use; no one rated it ‘very difficult.’ The
anchor descriptions were found to be ‘much more clear
or ‘a little more clear’ than other versions of the HAMD
used at the sites (11/12 raters). The GRID-HAMD was
judged to result in ‘much more’ or ‘a little more’ accurate
ratings by 10 of the 12 raters; the other two thought there
was no difference from their usual method of using the
scale. No one rated the GRID-HAMD as less accurate
than their usual way of rating. All (12/12) raters indicated
that they thought the GRID conventions were ‘much
better’ or ‘a little better than the other sets of
conventions used at their site. Eight of the 12 raters
thought the structured interview guide was easy to use;
three were neutral on the question, and one rater
disagreed.
Pilot study 2: reliability
Five clinical investigative sites (Investigators participat-
ing in the GRID-HAMD reliability component of the
main study sponsored by Organon Inc.: Alan Feiger, MD,
Jon Heiser, MD, James Ferguson, MD, Charles Merideth,
MD, and John Carman, MD.) taking part in a large open-
label multicenter study of a novel antidepressant, agreed
to assess the reliability of the GRID-HAMD. Trial
participants were males and females between 18 and 70
years of age (N= 34) with moderate-to-severe major
depressive disorder, who were deemed appropriate for
long-term antidepressant therapy. Raters were briefly
oriented to the GRID-HAMD by teleconference (60 to
90 min) with members of the DRSST core team, as in the
usability study. Participants were administered the
GRID-HAMD twice at baseline by two different raters
(interrater reliability), and twice at week 4 (sensitivity to
change), by the same two raters whenever possible.
Data from participants assessed at baseline and week 4
are presented in Tables 2–5. Twenty raters across the
sites provided ratings; data were combined across sites
and rater pairs. The baseline HAMD-17 total scores had a
mean of 23.2 and a SD of 5.0. Table 3 presents the
intraclass correlation for the individual items and the
total score for baseline, week 4, and both visits combined.
A random effects intraclass correlation (ICC) model was
used. The ICCs were high for most items, and for the
total sample were improved from the original SIGH-D in
13 of 17 items, including item 1 (depressed mood) and
the total score.
Interrater reliability of the GRI D-HAMD
The structure of the GRID-HAMD, by providing
separate ratings of intensity and frequency for each item,
facilitates detailed analysis of agreement on these two
Table 1 Pilot study no. 1: rater and site characteristics
Site A Site B Site C
Number of raters 5 3 4
Number of patients 15 12 12
Site type Nonacademic
research
Academic Nonacademic
research
Experience with the HAMD (years) 2 years 1 year 3 years
3 years 2 years 4 years
4 years 3 years 4 years
13 years 20 years
26 years
Used SIGH-D previously? 5/5 1/3 4/4
Average administration time
(minutes)
14 22 21
HAMD, Hamilton Depression Rating Scale; SIG H-D, Structured Interview Guide
for the Hamilton Depression Rating Scale.
124 International Clinical Psychopharmacology 2008, Vol 23 No 3
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
elements. Results are summarized here; additional details
are provided elsewhere (Engelhardt et al., 2003).
The percentage of ratings in which raters entered the
same item score was calculated for all pairs of ratings of
participants at baseline (Table 4, Item Score column). For
items with an exact match between raters on item
scores, the percentage of ratings for which the GRID
coordinates matched exactly was calculated (Table 4,
GRID Coordinate column). For example, if two raters
both scored a 3 on item 1, then they were counted as
using the GRID consistently if they had also recorded
exactly the same intensity and frequency designation.
Chance agreement on GRID coordinates varies from 25 to
50%, depending on the number of possible responses for
each item.
Sensitivity of the GRID to change
For participants who were rated twice at baseline and
twice at week 4 by the same rater pairs, interrater
reliability (ICC) for change scores was examined
(Table 5). Change scores were calculated as the baseline
score minus the week 4 score. Agreement on change
scores varied across items from 0.97 (sexual interest) to
0.03 (insight) and 0.18 (loss of weight). For all but
these last two items, the interrater correlation of change
scores was 0.56 or above. For the total score, the
correlation was 0.91.
Reliability and validity study
Twenty-nine raters from 10 US investigative sites agreed
to participate in a study of the validity of the GRID-
HAMD. A total of 150 patients (15 per site) with major
depressive disorder were administered a version of the
HAMD twice, by two independent interviewers blind to
each other’s scores. Scales were administered in counter-
balanced order on the same day. Four cells were present:
patients received either a GRID-HAMD and a SIGH-D
(n= 60), two GRID-HAMDs, (n= 30) two SIGH-Ds
(n= 30), or two HAMDs using the original (Guy) version
of the scale, without a structured interview guide
(n= 30). In this study, no training was provided for any
Table 3 Pilot study no. 2: mean score differences between raters
Baseline (N= 34) Week 4 (N= 31)
Rater 1 23.29 13.10
Rater 2 23.15 11.87
Difference 0.15 1.23
T 0.239 1.779
P 0.813 0.085
Table 4 Pilot study no. 2: percent agreement on individual item
scores and on GRID scoring coordinates (baseline and week 4
combined) N=65
Percent agreement
GRID-HAM D item Item score
a
GRID coordinate
b
Depressed mood 70.8 67.4
Guilt 67.7 61.4
Suicide 80.0 75.0
Insomnia early 87.7 70.2
Insomnia middle 73.8 72.9
Insomnia late 75.4 63.3
Work and activities 66.2 62.8
Retardation 63.1
c
Agitation 63.1
c
Anxiety psychic 61.5 75.0
Anxiety somatic 60.0 78.1
Loss of appetite 80.0 73.17
Somatic symptoms 64.6 66.6
Sexual interest 89.2
c
Hypochondriasis 69.2 66.4
Loss of weight 81.5
c
Insight 95.2
c
a
Percent agreement is the percent exact match on the item score.
b
Percent agreement is the percent exact match on the GRID coordinate for those
who had an exact match on the item score.
c
Item not presented as a GRID.
HAMD, Hamilton Depression Rating Scale.
Table 2 Pilot study no. 2: reliability of the GRID-HAMD items and total score
GRID-HAM D baseline intraclass
correlation, N=34
GRID-HAM D week 4 intraclass
correlation, N=31
GRID-HAM D baseline and
week 4, N=65
SIGH-D intraclass correlations
1988, N=23
Depressed mood 0.78 0.76 0.85 0.80
Guilt 0.55 0.66 0.72 0.63
Suicide 0.80 0.76 0.81 0.64
Insomnia early 0.83 0.79 0.82 0.80
Insomnia middle 0.71 0.68 0.71 0.62
Insomnia late 0.75 0.66 0.73 0.30
Work and activities 0.64 0.83 0.89 0.54
Retardation 0.28 0.22 0.33 0.32
Agitation 0.49 0.27 0.38 0.11
Anxiety psychic 0.52 0.61 0.65 0.78
Anxiety somatic 0.55 0.55 0.60 0.66
Loss of appetite 0.77 0.69 0.75 0.59
Somatic symptoms 0.37 0.59 0.60 0.61
Sexual interest 0.92 0.94 0.94 0.70
Hypochondriasis 0.68 0.70 0.71 0.55
Loss of weight 0.63 0.23 0.43 0.58
Insight 0.03
Total HAMD-17 0.75 0.81 0.89 0.81
Highest value in each row is in bold font.
HAMD, Hamilton Depression Rating Scale.
GRID-HAMD Williams et al. 125
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
of the scale versions and each rater was assigned to
administer only one of the three scales, although
undoubtedly, most of the raters had used the Guy version
and the SIGH-D in previous studies.
Interrater reliability of the GRID-HAMD was compared
with that of the SIGH-D and to the unstructured Guy
HAMD. In addition, concurrent validity of the GRID-
HAMD was estimated by comparing total and item scores
of the GRID-HAMD to the ‘gold standard’ Structured
Interview Guide for the HAMD (SIGH-D, Williams,
1988). The raters had an average of 20 years’ clinical
experience, and 13 years’ experience in administering a
version of the HAMD. Fifty-two percent were MDs, with
the remainder having BS/BA or RN degrees.
Results of reliability and validity study
The interrater reliability for both the GRID-HAMD and
the SIGH-D were high (ICC = 0.95 and 0.94, respec-
tively), and not significantly different from each other,
Z= 0.73, P= 0.47. The ICC for the Guy HAMD
(ICC = 0.78) was significantly lower than the ICC for
the GRID-HAMD, Z= 3.4461, P= 0.001 and the
SIGH-D, Z= 4.0889, P< 0.0001. Internal consistency
reliability (coefficient a) for the GRID-HAMD was 0.78;
for the SIGH-D was 0.71, and for the Guy HAMD
was 0.64.
The item correlations for the GRID, SIGH-D, and Guy
HAMD are presented in Table 6. Interrater reliability on
the item level is good for both the GRID and SIGH-D,
and greater than the Guy version on all items (except for
item 17, Insight, which was poor for all three versions).
Moderately high correlations between the GRID and
SIGH-D for all items were present (except insight).
GRID-HAMD versus SIGH-D
The mean score on the GRID-HAMD (22.23; SD = 6.82)
was not significantly different from the mean score
obtained on the same patients with the SIGH-D (22.03;
SD = 5.96), t(59) = 1.692, P= 0.693. The two scales
were highly correlated (ICC = 0.81, P< 0.001).
GRID frequency versus intensity dimensions
One of the advantages of rating frequency and intensity
separately on the GRID-HAMD is that it enables an
examination of the utility and incremental value of each
individual dimension. Comparisons of ICC by item and
total score using only the frequency dimension, only the
intensity dimension, and both dimensions are presented
in Table 7. The item ICCs were greatest on all but three
items when both frequency and intensity were used to
determine the score; however, the confidence intervals
for frequency and intensity overlapped (frequency
ICC = 0.913, CI: 0.829, 0.957; intensity ICC = 0.896;
ICC = 0.788, 0.951), suggesting that these may not be
statistically different. In addition, the mean score
differences between the GRID and SIGH-D were
smallest when the combination of frequency and
intensity was used (Table 8).
Discussion
The GRID-HAMD offers a number of potential advan-
tages over other versions of the HAMD. First, the scale
itself, the conventions, and the interview guide reflect a
broad consensus of researchers and clinicians in academia
and the pharmaceutical industry. Second, scoring was
facilitated to allow raters to make separate determina-
tions of intensity and frequency to arrive at an overall
Table 6 Pilot study no. 3: item correlations (ICC) for the GRID,
SIGH-D, and GUY versions of the HAMD and ICC between GRID
and SIGH-D
GRID vs.
GRID (n= 31)
SIGH-D vs.
SIGH-D
(n= 27)
Guy vs. Guy
(n= 27)
GRID vs.
SIGH-D
(n=60)
Depressed mood 0.92 0.87 0.43 0.69
Guilt 0.81 0.85 0.78 0.58
Suicide 0.77 0.90 0.56 0.77
Insomnia early 0.92 0.73 0.63 0.75
Insomnia middle 0.84 0.87 0.69 0.55
Insomnia late 0.79 0.82 0.67 0.67
Work and
activities
0.89 0.84 0.51 0.60
Retardation 0.77 0.69 0.21 0.53
Agitation 0.48 0.67 0.06 0.47
Anxiety psychic 0.73 0.79 0.13 0.51
Anxiety somatic 0.62 0.90 0.41 0.66
Loss of appetite 0.95 0.63 0.80 0.68
Somatic
symptoms
0.78 0.74 0.26 0.56
Sexual interest 0.78 0.74 0.58 0.79
Hypochondriasis 0.88 0.90 0.79 0.54
Loss of weight 0.77 0.89 0.79 0.61
Insight 0.00 0.49 0.06 0.09
Total score 0.94 0.95 0.78 0.81
HAMD, Hamilton Depression Rating Scale; ICC, intraclass correlation; SIGH-D,
Structured Interview Guide for the Hamilton Depression Rating Scale.
Table 5 Pilot study no. 2: reliability of GRID-HAMD change scores
Reliability of GRID-HAMD change scores: interrater ICC
GRID-HAM D Change from baseline, N=20
Depressed mood 0.91
Guilt 0.59
Suicide 0.69
Insomnia early 0.77
Insomnia middle 0.70
Insomnia late 0.63
Work and activities 0.76
Retardation 0.76
Agitation 0.56
Anxiety psychic 0.62
Anxiety somatic 0.68
Loss of appetite 0.62
Somatic symptoms 0.82
Sexual interest 0.97
Hypochondriasis 0.80
Loss of weight 0.18
Insight 0.03
Total score 0.91
HAMD, Hamilton Depression Rating Scale; ICC, intraclass correlation.
126 International Clinical Psychopharmacology 2008, Vol 23 No 3
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
severity score. This scoring system should improve
reliability and allow for detailed analyses of specific areas
of disagreement between raters and provide useful
information regarding the validity of severity scores based
on a particular composite of frequency and intensity.
Third, the increased specificity of the instructions and
item definitions, and the integration of the conventions
into the instrument itself facilitate training on the
instrument.
The reliability study indicates that, even with a large
number of raters new to the instrument and with minimal
training, overall reliability is excellent, and agreement on
total HAMD score and most of the individual items is
improved. Reliability was established across different
indices for both baseline and assessments made after
4 weeks of treatment. Considering the number of raters
involved in the study, the level of reliability achieved is
impressive, given that larger numbers of raters make
reliability harder to obtain. In addition, no reliability
training was done between raters at the different sites.
Despite this, reliabilities were comparable with those
found by Moberg et al. (2001), whose raters underwent
extensive reliability training. Thus, the GRID-HAMD
should help improve reliability even in a cohort of raters
not calibrated to each other. Raters were using the GRID-
HAMD consistently, indicating its clinical utility and
value as an aid to reliability of ratings, which said, it
should be noted that the GRID-HAMD does not obviate
the need for reliability training, as the degree of interrater
reliability is directly related to the quantity and quality of
training provided. Raters on the whole found the GRID-
HAMD to be easy to use, and that the conventions were
improved over earlier versions.
The concurrent validity of the GRID was demonstrated
by high item and total score correlations with the SIGH-
D, and equivalent mean scores were obtained when both
scales were administered to the same patients. Both the
GRID-HAMD and the SIGH-D, which are semistruc-
tured interviews, demonstrated excellent interrater
reliability compared with the unstructured Guy version,
which had significantly lower interrater agreement than
either of the other two scales. The unique features of the
GRID-HAMD, that is, a standardized scoring system, and
conventions and an interview guide that are integrated
into the instrument, may provide specific benefits for
raters who have less clinical and assessment experience
than the highly experienced raters in this study.
Several studies have demonstrated a tendency (conscious
or unconscious) on the part of raters to assign a threshold
score to patients to justify their inclusion in a particular
study (DeBrota et al., 1999; Feltner et al., 2001; Kobak,
et al., 2005). For example, if a study requires a HAMD
of 18, raters may be more likely to give that score, even
if not quite justified. The increased specificity of
the GRID structure may force raters to be more accurate
in assigning item scores, and the GRID allows a
more specific ‘audit trial’ of why a particular score was
assigned. In addition, as investigational compounds
become more targeted to individual symptoms and
specific symptom complexes, item reliability increases
in importance.
Accumulating evidence exists that interrater reliability is
directly related to the amount and type of training
performed (Kobak et al., 2003), so that if one rater
receives 3 weeks of intense training, including observa-
tion of applied clinical skills, and uses the SIGH-D, and
another rater receives almost no training and uses the
GRID-HAMD, any resulting reliability superiority of the
SIGH-D may be because of training rather than the
instrument itself. Moberg et al. (2001) achieved very high
interrater agreement levels using the SIGH-D, but it
should be noted that his raters trained until they
achieved agreement ‘in excess of 0.91 before they began
the study. The raters in this study attained good
reliabilities with only minimal training. Presumably,
with better rating and therefore better reliability, there
would be better signal detection in a clinical trial. Muller
has modeled statistically the significant impact that
reliability has on signal detection (Muller and Szegedi,
2002).
Table 7 Item and total ICC by frequency and intensity dimensions
GRID frequency
and intensity
GRID frequency
only
GRID intensity
only
Depressed mood 0.92 0.94 0.77
Guilt 0.81 0.81 0.60
Suicide 0.77 0.62 0.73
Insomnia early 0.92 0.90 0.82
Insomnia middle 0.84 0.85 0.84
Insomnia late 0.79 0.76 0.79
Work and activities 0.89 0.75 0.76
Anxiety psychic 0.73 0.69 0.50
Anxiety somatic 0.62 0.56 0.42
Loss of appetite 0.95 0.88 0.94
Somatic symptoms 0.78 0.80 0.61
Hypochondriasis 0.88 0.87 0.77
Insight 0.00 0.00 0.00
Total score 0.93 0.91 0.89
GRID scoring not used for items 8, 9, 14, 16, and 17. Highest ICC bolded.
ICC, intraclass correlation.
Table 8 Mean score differences between the GRID and SIGH-D
using frequency alone, severity alone, and frequency and severity
combined
GRID frequency
only GRID severity only
GRID frequency
and severity
SIGH-D 21.94 21.95 21.95
GRID 24.55 20.74 22.03
Difference 2.6 0 1.25 0.20
T 4.60 2.24 0.40
P 0.00 0.029 0.693
SIGH-D, Structured Interview Guide for the Hamilton Depression Rating Scale.
GRID-HAMD Williams et al. 127
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
Recently, interview quality, that is, the rater’s interview-
ing skills, has been shown to be related to signal
detection (Kobak et al., 2005). The GRID should help
improve interview quality by providing a structure for the
interview approach, and incorporating many of the
necessary follow-up probes associated with good clinical
technique.
Throughout the field, much energy is being directed
toward addressing these issues, the GRID-HAMD being
one approach. The main value of the new GRID-HAMD
is that it represents a consensus of the stakeholders.
Across a sample of academic experts and industry
representatives, all were able to agree on the structure
of the new instrument, its structured interview guide,
and its conventions. What remains is a test of whether use
of the GRID-HAMD increases the likelihood of finding
significant drug–placebo differences compared with use
of other versions of the HAMD.
Summary
The GRID-HAMD was designed to both clarify and
standardize administration and scoring of the HAMD in
clinical practice and research. Item descriptions have
been modified to enhance the reliability of the scale and
its relevance to depressed outpatients. In preliminary
testing, the GRID-HAMD showed good to very good
interrater item and overall score reliability. Validity of the
GRID-HAMD was also demonstrated by high correlations
and equivalent mean scores to the SIGH-D, the current
gold standard in the field.
Acknowledgements
The authors thank the contributions of Michael Giberti-
ni, PhD, who facilitated the incorporation of the GRID-
HAMD in a study being conducted by Organon Inc., and
Margaret Rothman, PhD, who participated in the initial
work of the DRSST. They also thank the hundreds of
clinicians who reviewed drafts of the GRID and sent
them helpful comments. A nonprofit organization, the
International Society of CNS Drug Development
(ISCDD), was formed to bring together representa-
tives of pharmaceutical companies, as well as academi-
cians and government scientists. This new group
absorbed the DRSST, providing support and funding
for its continuing work. In an effort to facilitate
widespread use of the instrument, a web site has
been developed (http://www.iscdd.org) to offer the
GRID-HAMD free of charge to be printed or stored
electronically.
The DRSST and the DID Project are funded by the
International Society for CNS Drug Development. The
GRID-HAMD may be downloaded, free of charge, at
www.iscdd.org.
References
Bagby RM, Ryder AG, Schuller DR, Marshall MB (2004). The Hamilton
Depression Rating Scale: has the gold standard become a lead weight?
Am J Psychiatry 161:2163–2177.
Bech P, Kastrup M, Rafaelsen OJ (1986). Mini-compendium of rating scales for
states of anxiety, depression, mania, schizophrenia with corresponding DSM-
II syndromes. Acta Psychiatr Scand 73:5–37.
Chouinard G, Miller R (1999). A rating scale for psychotic symptoms (RSPS) part
I: theoretical principles and subscale 1: perception symptoms (illusions and
hallucinations). Schizophr Res 38:101–122.
DeBrota D, Demitrack M, Landin R, Kobak KA, Greist J H, Potter W (1999). A
Comparison Between Interactive Voice Response System-Administered
HAM-D and Clinician-Administered HAM-D in Patients with Major
Depressive Episode. Paper presented at the National Institute of Mental
Health, New Clinical Drug Evaluation Unit, 39th Annual Meeting, Boca Raton,
Florida.
Engelhardt N, Kalali A, Gibertini M, Kobak K, Williams J, Evans K, et al. (2003).
The GRID-HAM D: a reliability study in patients with major depression. Paper
presented at the National Institute of Mental Health, New Clinical Drug
Evaluation Unit, 43rd Annual Meeting, Boca Raton, Florida.
Evans KR, Sills T, DeBrota DJ, Gelwicks S, Engelhardt N, Santor D (2004). An
Item Response analysis of the Hamilton Depression Rating Scale using
shared data from two pharmaceutical companies. J Psychiatr Res 38:
275–284.
Faries D, Herrera J, Rayamajhi J, DeBrota D, Demitrack M, Potter WZ (2000). The
responsiveness of the Hamilton Depression Rating Scale. J Psychiatr Res
34:3–10.
Feltner DE, Kobak KA, Crockatt J, Haber H, Kavoussi R, Pande A, et al. (2001).
Interactive Voice Response (IVR) for Patient Screening of Anxiety in a
Clinical Drug Trial. Paper presented at the National Institute of Mental Health,
New Clinical Drug Evaluation Unit, 41st Annual Meeting, Phoenix, Arizona.
Gelenberg AJ, Wojcik JD, Falk WE, Baldessarini RJ, Zeisel SH, Schoenfeld D,
et al. (1990). Tyrosine for depression: a double-blind trial. J Affect Disord
19:125–132.
Gibbons RD, Clark DC, Kupfer DJ (1993). Exactly what does the Hamilton
Depression Rating Scale measure. J Psychiatr Res 27:259–273.
Hamilton M (1960). A rating scale for depression. J Neurol, Neurosurg Psychiatry
23:56–62.
Hamilton M (1967). Development of a rating scale for primary depressive illness.
Br J Soc Clin Psychiatry 6:278–296.
Kobak KA, Reynolds WM, Rosenfeld R, Greist JH (1990). Development and
validation of a computer-administered version of the Hamilton Depression
Rating Scale. Psychol Assess 2:56–63.
Kobak KA, Mundt JC, Greist JH, Katzelnick DJ, Jefferson JW (2000). Computer
assessment of depression: automating the Hamilton Depression Rating
Scale. Drug Inf J 34:145–156.
Kobak KA, Lipsitz JD, Feiger A (2003). Development of a standardized training
program for the Hamilton Depression Scale using internet-based technolo-
gies: results from a pilot study. J Psychiatr Res 37:509–515.
Kobak KA, Feiger AD, Lipsitz JD (2005). Interview quality and signal detection in
clinical trials. Am J Psychiatry 162:628.
Kobak KA, Taylor LV, Warner G, Futterer R (2005). St. John’s wort vs. placebo
in social phobia: results from a placebo-controlled pilot study. J Clin
Psychopharmacol 25:51–58.
Maier W, Philipp M (1985). Improving the assessment of severity of depressive
states: a reduction of the Hamilton Depression Scale. Pharmacopsychiatry
18:114–115.
Moberg PJ, Lazarus LW, Mesholam RI, Bilker W, Chuy IL, Neyman I, et al. (2001).
Comparison of the standard and structured interview guide for the Hamilton
Depression Rating Scale in depressed geriatric inpatients. Am J Geriatr
Psychiatry 9:35–40.
Muller MJ, Szegedi A (2002). Effects of interrater reliability of psychopathologic
assessment on power and sample size calculations in clinical trials. J Clin
Psychopharmacol 22:318–325.
Potts MK, Daniels M, Burnam MA, Wells KB (1990). A structured interview
version of the Hamilton Depression Rating Scale: evidence of reliability and
versatility of administration. J Psychiatr Res 24:335–350.
Reynolds WM, Kobak KA (1995). Reliability and validity of the Hamilton
Depression Inventory: a paper-and-pencil version of the Hamilton Depression
Rating Scale clinical interview. Psychol Assess 7:472–483.
Santor DA, Coyne JC (2001). Examining symptom expression as a function of
symptom severity: item performance on the Hamilton Rating Scale for
Depression. Psychol Assess 13:127–139.
Thase ME (1984). A Hamilton subscale for endogenomorphic depression.
Hillside J Clin Psychiatry 6:57–68.
128 International Clinical Psychopharmacology 2008, Vol 23 No 3
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
Whisman MA, Strosahl K, Fruzzetti AE, Schmaling KB, Jacobson NS, Miller DM
(1999). A structured interview version of the Hamilton Rating Scale for
Depression: reliability and validity. Psychol Assess 1:238–241.
Williams JBW (1988). A structured interview guide for the Hamilton Depression
Rating Scale. Arch Gen Psychiatry 45:742–747.
Williams JBW (2001). Standardizing the Hamilton Depression Rating Scale: past,
present, and future. Eur Arch Psychiatry Clin Neurosci 251:11/16–11/12.
Zimmerman M, Posternak MA, Chelminski I (2005). Is it time to replace
the Hamilton Depression Rating Scale as the primary outcome measure
in treatment studies of depression? J Clin Psychopharmacol 25:
105–110.
Zitman FG, Mennen MF, Griez E, Hooijer C (1990). The different versions
of the Hamilton Depression Rating Scale. Psychopharmacol Ser 9:
28–34.
GRID-HAMD Williams et al. 129
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
... It has demonstrated acceptable to excellent reliability [39]. For a more in-depth overview of the severity of depressive symptoms, the German version of the GRID-HAMD [51] will be used. The GRID-HAMD represents a revised version of the HAMD [52] and integrates a guideline for the semi-structured interview. ...
... In the current study, we will use the 21-item version of the GRID-HAMD complemented with the items, helplessness, hopelessness, and worthlessness of the HAMD-24. The GRID-HAMD has shown both validity, and acceptable to excellent reliability (Cronbach's alpha = 0.78, ICC = 0.95; [51]). During the diagnostic session, the first part of the SCID-5-CV assessing current depressiveness (SCID-5-CV chapter A) and the HAMD will be combined to maintain a more pleasant conversation flow during the interview. ...
Article
Full-text available
Background Dysfunctional depressogenic cognitions are considered a key factor in the etiology and maintenance of depression. In cognitive behavioral therapy (CBT), the current gold-standard psychotherapeutic treatment for depression, cognitive restructuring techniques are employed to address dysfunctional cognitions. However, high drop-out and non-response rates suggest a need to boost the efficacy of CBT for depression. This might be achieved by enhancing the role of emotional and kinesthetic (i.e., body movement perception) features of interventions. Therefore, we aim to evaluate the efficacy of a cognitive restructuring task augmented with the performance of anti-depressive facial expressions in individuals with and without depression. Further, we aim to investigate to what extent kinesthetic markers are intrinsically associated with and, hence, allow for the detection of, depression. Methods In a four-arm, parallel, single-blind, randomized controlled trial (RCT), we will randomize 128 individuals with depression and 128 matched controls without depression to one of four study conditions: (1) a cognitive reappraisal training (CR); (2) CR enhanced with instructions to display anti-depressive facial expressions (CR + AFE); (3) facial muscle training focusing on anti-depressive facial expressions (AFE); and (4) a sham control condition. One week after diagnostic assessment, a single intervention of 90–120-minute duration will be administered, with a subsequent follow-up two weeks later. Depressed mood will serve as primary outcome. Secondary outcomes will include current positive mood, symptoms of depression, current suicidality, dysfunctional attitudes, automatic thoughts, emotional state, kinesthesia (i.e., facial expression, facial muscle activity, body posture), psychophysiological measures (e.g., heart rate (variability), respiration rate (variability), verbal acoustics), as well as feasibility measures (i.e., treatment integrity, compliance, usability, acceptability). Outcomes will be analyzed with multiple methods, such as hierarchical and conventional linear models and machine learning. Discussion If shown to be feasible and effective, the inclusion of kinesthesia into both psychotherapeutic diagnostics and interventions may be a pivotal step towards the more prompt, efficient, and targeted treatment of individuals with depression. Trial registration The study was preregistered in the Open Science Framework on August 12, 2022 (https://osf.io/mswfg/) and retrospectively registered in the German Clinical Trials Register on November 25, 2024. Clinical Trial Number: DRKS00035577.
... For categorical outcomes, response was defined at ≥50% improvement for each scale and remission was defined at HDRS-17<8, BDI-II<10, PHQ-9<5 and GAD-7<5. The clinician-rated assessments (HDRS-17) in the present series were all performed by the same individual (author BM), who was an experienced psychiatric nurse practitioner trained in the administration of a standardized form of the HDRS-17, the GRiD-HAMD [58]. ...
Preprint
Full-text available
Background: Conventional transcranial magnetic stimulation (TMS) regimens are logistically burdensome, requiring days or weeks of clinic visits. Here we describe a TMS regimen enabling delivery of an entire therapeutic course in a single day. Methods: This retrospective case series reports outcomes for an optimized, neuroplastogen-enhanced depression (ONE-D) treatment regimen delivering 600-pulse iTBS (120% MT) targeting left DLPFC via scalp heuristic, every 30 minutes for 20 sessions in 9.5 hours, enhancing neuroplasticity via single-dose d-cycloserine (125 mg) and lisdexamfetamine (20 mg), off-label, given 1 hour pre-treatment. 32 TMS-eligible adults with medication-resistant unipolar depression underwent the ONE-D regimen, with assessments on day-of-treatment then weekly x 6 weeks (HDRS-17, BDI-II, PHQ-9, and GAD-7). Results: Every patient completed the regimen successfully, with no serious adverse events (mean scalp discomfort, 5.8±2.1/10). Response was not immediate but followed an exponential-decay trajectory over the 6-week followup: mean weekly scores of 22.6±5.3(baseline), 13.5±6.4, 10.6±6.4, 7.9±4.9, 6.6±4.9, 6.3±4.8, 5.5±4.2 (HDRS-17), 37.5±9.0(baseline), 23.8±12.2, 17.1±11.1, 14.1±11.2, 11.0±8.7, 9.5±8.2, 7.6±7.8 (BDI-II,) 18.4±3.5(baseline), 11.4±5.1, 9.4±5.7, 6.9±5.2, 6.0±3.9, 5.3±4.0, 4.6±4.2 (PHQ-9), 14.3±5.2(baseline), 8.7±4.5, 6.4±5.0, 4.3±4.0, 3.8±3.6, 3.3±3.3, 3.1±2.7 (GAD-7). Response / remission rates (cross-sectional, not aggregated) were 90.3% and 74.2% (HDRS-17), 93.5% and 71.0% (BDI-II), 90.3% and 58.1% (PHQ-9), 93.3% and 76.7% (GAD-7) at week 6, and 92.6% and 77.8% (HDRS-17), 92.3% and 73.1% (BDI-II), 86.4% and 65.4% (PHQ-9), 91.7% and 80.0% (GAD-7) at week 12. Conclusion: Delivery of an effective TMS course in one day appears feasible, safe, and well-tolerated. With neuroplastogen-enhancement, despite non-personalized, scalp-based targeting, the response and remission rates appeared robust and sustained in representative clinical populations. Follow-up studies may allow further acceleration of the regimen and generalization to other TMS indications.
... 39 A semistructured version of the scale was used. 40 The self-report Beck Depression Inventory-II (BDI-II) 41 was used as a secondary measure of depression, with higher scores reflecting greater symptom severity. The BDI-II is frequently used as a complementary measure to the HAM-D in clinical trials because it assesses subjective experiences and includes items related to cognitive symptoms of depression (eg, self-dislike, self-criticalness). ...
Article
Background Depression is a common nonmotor complication in Parkinson's disease (PD). However, few studies have evaluated the efficacy of first‐line psychological therapies for depression in this patient population. Objectives This randomized controlled trial evaluated the efficacy of interpersonal psychotherapy (IPT), an empirically validated intervention for depression that focuses on the bidirectional relationship between mood disturbance and interpersonal and social stressors. A secondary aim was to assess maintenance of treatment gains at 6‐month follow‐up. Methods Participants with PD stages I to III and a comorbid depressive disorder were randomly assigned to 12 sessions of IPT (n = 32) or supportive therapy (ST) (n = 31), our active control intervention. The primary outcome was the Hamilton Depression Rating Scale (HAM‐D) administered blindly by telephone. Secondary outcomes included self‐report depression and anxiety, quality of life, clinician‐rated motor symptom, interpersonal relationships, and attachment style. Results IPT compared to ST resulted in a greater reduction in posttreatment HAM‐D scores (least square mean difference = −3.77, 95% confidence interval [CI]: −6.19 to −1.34, P = 0.003) and was associated with a greater odds of meeting remission (odds ratio = 3.23, 95% CI: 1.10–9.51, P = 0.034). The advantage of IPT over ST on HAM‐D scores and remission rates was not sustained at the 6‐month follow‐up. Both treatments improved self‐report depression, anxiety, quality of life, and aspects of interpersonal functioning. Conclusions This trial demonstrates the benefits of acute treatment with IPT in reducing depressive symptoms in PD. Clinicians should consider psychotherapy, alone or in combination with medication, as an important treatment option for PD depression. © 2024 The Author(s). Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.
... The GSH was well-received by participants and therapists; 86% of participants attended the predefined 'dose' of six treatment sessions and 71% attended all nine sessions. We used two self-report (Patient Health Questionnaire-9 (PHQ-9) and Beck Depression Inventory-II (BDI-II)) and one interview measure (Hamilton Rating Scale for Depression 22 of depression in the feasibility study. Inter-rater reliability for the interview measure was less than adequate, the two self-report measures were well-aligned and many participants suggested a preference for the BDI-II as a selfreport measure with item sets of closed statements less subject to misinterpretation. ...
Article
Full-text available
Introduction Depression is three to four times more prevalent in autistic people and is related to reduced quality of life. There is a need for empirically supported psychological interventions for depression specifically adapted to meet the needs of autistic adults. ADEPT-2 aims to establish the clinical and cost-effectiveness of an adapted low-intensity psychological intervention (guided self-help) for depression in autistic adults. Methods and analysis A two parallel-group multicentre pragmatic randomised controlled trial investigating the effectiveness of GSH for depression in autistic adults. Participants (n=248) aged ≥18 years with a clinical diagnosis of autism currently experiencing depression will be randomised to GSH or treatment as usual (TAU). GSH is a low-intensity psychological intervention based on the principles of behavioural activation adapted for autism. GSH comprises informational materials for nine individual sessions facilitated online by a GSH coach who has received training and supervision in delivering the intervention. The primary outcome will be Beck Depression Inventory-II depression scores at 16 weeks post randomisation with follow-up measures at 32 and 52 weeks. Additional measures of anxiety, patient-rated global improvement, quality of life, work and social adjustment, positive and negative affect will be measured 16 and 52 weeks post randomisation. The primary health economic analysis will assess the cost-effectiveness of GSH compared with TAU over 52 weeks, from a societal perspective including the National Health Service, personal social services, personal expenses, voluntary services and productivity. An embedded qualitative study will explore the acceptability, experiences and adherence of participants and therapists to treatment principles. Ethics and dissemination This trial has been approved by the East of England - Essex Research Ethics Committee on 10 June 2022 (REC Reference number: 22/EE/0091). The findings of the research will be submitted for publication in peer-reviewed journals and disseminated in an appropriate format to trial participants and the wider public. Trial registration number ISRCTN17547011.
... Eligible participants were aged ≥18 years and diagnosed with current major depressive disorder (DSM-5 criteria- [27]), that was rated as moderate to severe (≥16 on the 17-item GRID Hamilton Depression Rating Scale [28]) and resistant to treatment (≥ 2 Massachusetts General Hospital Staging Score [29] which was adapted for new treatment options [25]). ...
Article
Full-text available
Background The BRIGhTMIND study was a double-blind RCT comparing repetitive transcranial magnetic stimulation at a standard simulation site (the “F3” location given by the International 10–20 system, F3-rTMS) versus connectivity-guided intermittent theta burst stimulation (cgiTBS) for treatment-resistant depression. This present study reports the acceptability, safety, and tolerability of F3-rTMS versus cgiTBS. Methods The present study used quantitative and qualitative methods. Two hundred fifty-four participants were included in the quantitative BRIGhTMIND acceptability and safety analysis (n = 126 F3-rTMS, n = 128 cgiTBS). Qualitative analysis included interviews for 15 participants (n = 7 F3-rTMS, n = 8 cgiTBS) and 582 written comments made by any participant randomised to the BRIGhTMIND trial regarding their experience of TMS and the study. Statistical analyses were used to explore differences between F3-rTMS and cgiTBS, as well as associations between acceptability, impression of change and safety. Qualitative data was analysed using an inductive thematic framework approach. Outcomes Acceptability, TMS benefits/negative effects and impression of improvement ratings did not differ across the two treatment protocols, with ratings maintained long-term (71.4 % rated TMS acceptable, 48.8 % indicated benefits of TMS outweighed negative effects and 52.2 % feeling somewhat or much better at 26 week follow-up n = 203). Impression of improvement was positively associated with acceptability and TMS benefits. Qualitative themes included participants' TMS experience, TMS response variability, and lay theories of effectiveness. Safety profiles were comparable between F3-rTMS and cgiTBS, with 74.5 % of participants (n = 190/254) experiencing at least one adverse event possibly, probably, or definitely related to TMS. The majority of adverse events were transient and mild, with a sizeable number requiring simple treatments or small adjustments to TMS intensity and coil positioning. The F3-rTMS group had a significantly greater proportion of participants that required small adjustments to TMS to tolerate treatment compared to the cgiTBS group. Serious adverse events were rare, with one serious event in each treatment arm possibly related to TMS (F3-rTMS- psychotic episode, cgiTBS-manic episode). Conclusion F3-rTMS and cgiTBS are comparably safe, tolerable and highly acceptable interventions for treatment-resistant depression. BRIGhTMIND systematically collected data from a large sample, providing evidence to meet the information needs of patients, clinicians and policy makers.
Article
Full-text available
Introduction Bipolar disorder has a long depressive episode and high risk of suicide. In clinical practice, patients often show no response to pharmacotherapy, which results in prolongation of the depressive episode. Repetitive transcranial magnetic stimulation (rTMS) is a non-invasive technique expected to serve as a treatment option for bipolar depression. For bipolar depression, a meta-analysis suggested that low-frequency stimulation to the right prefrontal cortex was possibly effective. However, a medium or large sample, randomized, double blind, sham controlled study has not yet been performed. Objective To examine the efficacy and safety of 1-Hz rTMS to the right prefrontal cortex in patients with treatment-resistant bipolar depression. rTMS was approved by the Ministry of Health, Labor, and Welfare as a highly advanced medical technology on March 1, 2019. Methods In this multicenter, double-blind, randomized, sham stimulation-controlled trial for bipolar depression, patients will be individually allocated to active or sham stimulation plus usual medication and followed up for 6 months. The conditions of stimulation by the Mag Pro R30 transcranial magnetic stimulation device (Magventure) will be a frequency of 1-Hz, intensity of 120% motor threshold, and duration of 1800 seconds to the right prefrontal cortex 5 days a week for 4 weeks during the acute treatment period. The primary endpoint will be a total change in the Montgomery-Åsberg Depression Rating Scale score during the acute treatment period. Discussion The outcomes of this study will inform clinical practice for the treatment of bipolar depression. Clinical trial registration https://jrct.niph.go.jp/latest-detail/jRCTs032180138, identifier jRCTs032180138.
Article
Full-text available
Introduction Restless legs syndrome (RLS) is a sensorimotor disorder of the nervous system that is mainly characterized by nighttime leg discomfort and can be accompanied by significant anxiety, depression, and other mood disorders. RLS seriously affects the quality of life. Clinical studies have confirmed that acupuncture can alleviate the clinical symptoms of RLS. This randomized controlled trial (RCT) aims to investigate the efficacy of acupuncture in the treatment of RLS and further explore the central response mechanism of acupuncture in the treatment of RLS. Methods and analysis In this RCT, a total of 124 eligible patients in Shanghai will be randomly assigned to one of the following two groups: treatment group (acupuncture) and control group (sham acupuncture). Treatment will be given three times per week for 4 consecutive weeks. The primary outcome is the International Restless Legs severity rating scale (IRLSS). The secondary outcomes are the RLS-Quality of Life (RLSQoL), the Insomnia Severity Index (ISI), Pittsburgh Sleep Quality Index (PSQI), the Hamilton Depression Scale (HAMD), and the Hamilton Anxiety Scale (HAMA). The objective evaluation tools will be polysomnography, positron emission tomography–computed tomography (PET-CT), and functional magnetic resonance imaging (fMRI) of the brain. All adverse effects will be assessed by the Treatment Emergent Symptom Scale. Outcomes will be evaluated at baseline (1 week before the first intervention), during the intervention (the second week of the intervention), after the intervention (at the end of the intervention), at 1-month follow-up, and at 3-month follow-up. Ethics and dissemination The trial has been approved by the Ethics Committee of Yueyang Hospital of Integrated Traditional Chinese and Western Medicine (no. 2022-061). Written informed consent will be obtained from all participants. The results of this study will be published in peer-reviewed journals or presented at academic conferences. Clinical trial registration https://www.chictr.org.cn/, identifier ChiCTR2000037287.
Preprint
Full-text available
Background: Randomised Controlled Trials (RCTs) are widely regarded as the most powerful research design for evidence-based practice. However, recruiting to RCTs can be challenging resulting in heightened costs and delays in research completion and implementation. Enabling successful recruitment is crucial in mental health research. Despite the increase in the use of remote recruitment strategies and digital health interventions there is limited evidence on methods to improve recruitment to remotely delivered mental health trials. The paper outlines practical examples and recommendations on how to successfully recruit participants to remotely delivered mental health trials. Methods: The Alpha Stim-D Trial was a multi-centre double-blind randomised controlled trial, for people aged 16 years upwards, addressing depressive symptoms in primary care. Despite a six-month delay in beginning recruitment due to the COVID-19 pandemic, the trial met the recruitment target within the timeframe and achieved high retention rates. Several strategies were implemented to improve recruitment, some of these were adapted in response to the COVID-19 pandemic. This included adapting the original in-person recruitment strategies. Subsequently, systematic recruitment using postal invitations from criteria-specific search of the sites’ electronic health records was added to opportunistic recruitment to increase referrals in response to sub-target recruitment whilst also reducing the burden on referring sites. Throughout the recruitment process, the research team collaborated with key stakeholders such as primary care clinicians and the project’s Patient and Public Involvement and Engagement (PPI/E) representatives who gave advice on recruitment strategies. Furthermore, the study researchers played a key role in communicating with participants and building rapport from study introduction to data collection. Conclusions: Our findings suggest that trial processes can influence recruitment, therefore consideration and a regular review of the recruitment figures and strategies is important. Recruitment of participants can be maximised by utilising remote approaches, which reduce the burden and amount of time required by referring sites and allow the research team to reach more participants whilst providing participants and researchers with more flexibility. Effectively communicating and working collaboratively with key stakeholders throughout the trial process, as well as building rapport with participants may also improve recruitment rates.
Article
Full-text available
Current clinical computing systems have evolved over the past three decades and can now be implemented from any touch-tone telephone using interactive voice response (IVR) technology, permitting accessibility 24 hours a day, 7 days a week. Ten years of research and development investigating computer automation of the Hamilton Depression Rating Scale (HDRS), involving 10 separate studies and 1761 subjects, shows strong correspondence with clinician administered assessments. The psychometric properties concerning assessment reliability and validity of clinician- and computer-administered versions of the HDRS are compared and points of divergence between methods drawn from a recently completed clinical trial are discussed Future challenges facing expanded use of such clinical computing systems in clinical trials of investigational drugs are also discussed.
Article
Full-text available
Reliability and validity data are provided for pre- and posttreatment administrations of a structured interview version of the Hamilton Rating Scale for Depression (HRSD) integrated with the National Institute of Mental Health Diagnostic Interview Schedule (DIS). Ss were 70 adult patients requesting therapy for depression. Results indicate excellent agreement between DIS–HRSD ratings made by graduate students and psychiatrist-administered HRSD ratings. The DIS–HRSD exhibited a pattern of correlation with other scales of depression similar to that of the HRSD, thus supporting the validity of the new scale. Intraclass correlations and concurrent validity estimates obtained from analyzing data separately for pre- and posttest administrations were consistently lower than those obtained from the whole sample, suggesting that methodological shortcomings in prior psychometric studies of the HRSD may have spuriously inflated the obtained results. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
A computer-administered form of the Hamilton Rating Scale for Depression was designed to provide scores with a high degree of correspondence with the clinician-administered 17-item version of the scale. Both forms of the Hamilton scale were administered in a counterbalanced design to 97 subjects, including 52 outpatients with a Research Diagnostic Criteria diagnosis of Major Depression, 20 outpatients with Minor Depression, and 25 nonpsychiatric control subjects. Both the computer- and clinician-administered interviews demonstrated high internal consistency reliability of .91 and .90, respectively. A correlation of .96 was found between the two versions, and the mean score difference between the two forms was nonsignificant for the total sample. Both forms also demonstrated clinical sensitivity and specificity in differentiating between Major and Minor Depression and Control group subjects. Overall results support the clinical and research use of the computer administered version of the Hamilton scale. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Developed a computer-administered form of the Hamilton Anxiety Scale (HAS), designed to provide a high degree of correspondence with the clinician interview version of the HAS. Both computer and clinician forms of the HAS were administered to 214 psychiatric outpatients and 78 community-based adults (all Ss aged 18–77 yrs). The computer-administered HAS demonstrated high internal consistency and test–retest reliability. A correlation of r(290) = .92, p ≤ .001, was found between the computer and the clinician versions. The mean score difference between versions was small but significant. In Ss with anxiety disorders the mean score difference between computer and clinician versions was not significant. Results support the reliability and validity of the computer-administered HAS as an alternative to the clinician-administered version of this measure. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
A self-report, paper-and-pencil version of the Hamilton Depression Rating Scale (HDRS; M. Hamilton, 1960) was developed. This measure, the Hamilton Depression Inventory (HDI; W. M. Reynolds & K. A. Kobak, 1995) consists of a 23-item full form, a 17-item form, and a 9-item short form. The 17-item HDI form corresponds in content and scoring to the standard 17-item HDRS. With a sample of psychiatric outpatients with major depression ( n = 140 ), anxiety disorders ( n = 99), and nonreferred community adults ( n = 118), the HDI forms demonstrated high levels of reliability ( rα = .91 to .94, rtt = .95 to .96). Extensive validity evidence was presented, including content, criterion related, construct, and clinical efficacy of the HDI cutoff score. Overall, the data support the reliability and validity of the HDI as a self-report measure of severity of depression. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Reliability and validity data are provided for pre- and posttreatment administrations of a structured interview version of the Hamilton Rating Scale for Depression (HRSD) integrated with the National Institute of Mental Health Diagnostic Interview Schedule (DIS). Ss were 70 adult patients requesting therapy for depression. Results indicate excellent agreement between DIS-HRSD ratings made by graduate students and psychiatrist-administered HRSD ratings. The DIS-HRSD exhibited a pattern of correlation with other scales of depression similar to that of the HRSD, thus supporting the validity of the new scale. Intraclass correlations and concurrent validity estimates obtained from analyzing data separately for pre- and posttest administrations were consistently lower than those obtained from the whole sample, suggesting that methodological shortcomings in prior psychometric studies of the HRSD may have spuriously inflated the obtained results.
Article
Current clinical computing systems have evolved over the past three decades and can now be implemented from any touch-tone telephone using interactive voice response (IVR) technology, permitting accessibility 24 hours a day, 7 days a week. Ten years of research and development investigating computer automation of the Hamilton Depression Rating Scale (HDRS), involving 10 separate studies and 1761 subjects, shows strong correspondence with clinician administered assessments. The psychometric properties concerning assessment reliability and validity of clinician- and computer-administered versions of the HDRS are compared and points of divergence between methods drawn from a recently completed clinical trial are discussed. Future challenges facing expanded use of such clinical computing systems in clinical trials of investigational drugs are also discussed.
Article
We treated 65 outpatients with RDC major depression in a randomized, prospective, double-blind comparison of oral L-tyrosine, 100 mg/kg/day, imipramine, 2.5 mg/kg/day, or placebo for 4 weeks. Tyrosine increased and imipramine decreased 3-methoxy-4-hydroxyphenylglycol (MHPG) excretion significantly, but there was no evidence that tyrosine had antidepressant activity. The only side effect to achieve statistical significance was greater dry mouth with imipramine. MHPG excretion and plasma amino acid concentrations failed to predict or correlate with clinical improvement.