Content uploaded by Gene V Glass
Author content
All content in this area was uploaded by Gene V Glass on Jul 25, 2014
Content may be subject to copyright.
Meta-Analysis
of
Psychotherapy
Outcome
Studies
MARY
LEE
SMITH
GENE
V
GLASSUniversity
of
Colorado—Boulder
University
of
Colorado—Boulder
ABSTRACT:
Results
of
nearly
400
controlled
evalua-
tions
of
psychotherapy
and
counseling
were coded
and
integrated
statistically.
The findings
provide
convincing
evidence
of the
efficacy
of
psychotherapy.
On the
average,
the
typical therapy
client
is
better
off
than
75% of
untreated
individuals.
Few
important
differ-
ences
in
effectiveness
could
be
established
among
many
quite
different
types
of
psychotherapy.
More
generally,
virtually
no
difference
in
effectiveness
was
observed
be-
tween
the
class
of all
behavioral
therapies
(systematic
desensitization,
behavior
modification)
and the
nonbe-
havioral
therapies
(Rogerian,
psychodynamic,
rational-
emotive,
transactional
analysis,
etc.).
Scholars
and
clinicians have argued bitterly
for
decades
about
the
efficacy
of
psychotherapy
and
counseling. Michael Scriven proposed
to the
Ameri-
can
Psychological Association's Ethics Committee
that
APA-member clinicians
be
required
to
present
a
card
to
prospective clients
on
which
it
would
be
explained
that
the
procedure
they
were about
to
undergo
had
never been proven superior
to a
placebo ("Psychotherapy
Caveat,"
1974). Most
academics have read little more
than
Eysenck's
(1952,
1965)
tendentious diatribes
in
which
he
claimed
to
prove
that
75%
of
neurotics
got
better
regardless
of
whether
or not
they were
in
therapy—
a
conclusion based
on the
interpretation
of six
con-
trolled studies.
The
perception
that
research shows
the
ineffkacy
of
psychotherapy
has
become
part
of
conventional wisdom even within
the
profession.
The
following
testimony
was
recently presented
before
the
Colorado
State
Legislature:
Are
they [the legislators] also aware
of the
relatively
primitive
state
of
the art of
treatment outcome evaluation
which
is
still,
after
fifty
years,
in
kind
of a
virginal state?
About
all
we've been able
to
prove
is
that
a
third
of the
people
get
better,
a
third
of the
people stay
the
same,
and
a
third
of the
people
get
worse,
irregardless
of the
treat-
ment
to
which they
are
subjected. (Quoted
by
Ellis, 1977,
P. 3)
Only close followers
of the
issue have read
Bergin's
(1971)
astute dismantling
of the
Eysenck
myth
in his
review
of the findings of 23
controlled
evaluations
of
therapy. Bergin
found
evidence
that
therapy
is
effective.
Emrick (1975) reviewed
72
studies
of the
psychological
and
psychopharmaco-
logical treatment
of
alcoholism
and
concluded
that
evidence
existed
for the
efficacy
of
therapy. Lubor-
sky, Singer,
and
Luborsky
(1975)
reviewed about
40
controlled studies
and
found
more evidence.
Al-
though
these reviews were reassuring,
two
sources
of
doubt remained.
First,
the
number
of
studies
in
which
the
effects
of
counseling
and
psychotherapy
have been tested
is
closer
to 400
than
to 40. How
representative
the 40 are of the 400 is
unknown.
Second,
in
these reviews,
the
"voting method"
was
used;
that
is, the
number
of
studies with statisti-
cally significant results
in
favor
of one
treatment
or
another
was
tallied.
This
method
is too
weak
to
answer
many important questions
and is
biased
in
favor
of
large-sample studies.
The
purpose
of the
present research
has
three
parts:
(1)
to
identify
and
collect
all
studies
that
tested
the
effects
of
counseling
and
psychotherapy;
(2)
to
determine
the
magnitude
of
effect
of the
therapy
in
each study;
and
(3)
to
compare
the
effects
of
different
types
of
therapy
and
relate
the
size
of
effect
to the
characteristics
of the
therapy
(e.g.,
diagnosis
of
patient, training
of
therapist)
and of the
study.
Meta-analysis,
the
integration
of
research through statistical analysis
of the
analyses
of
individual studies (Glass, 1976),
was
used
to
investigate
the
problem.
Procedures
Standard search procedures were used
to
identify
1,000 documents: Psychological
Abstracts,
Disser-
tation
Abstracts,
and
branching
off
of
bibliographies
of
the
documents themselves.
Of
those documents
located, approximately
500
were selected
for
inclu-
sion
in the
study,
and
375
were
fully
analyzed.
To
be
selected,
a
study
had to
have
at
least
one
ther-
752
•
SEPTEMBER
1977
•
AMERICAN
PSYCHOLOGIST
apy
treatment
group compared
to an
untreated
group
or to a
different
therapy group.
The
rigor
of
the
research design
was not a
selection criterion
but was one of
several features
of the
individual
study
to be
related
to the
effect
of the
treatment
in
that
study.
The
definition
of
psychotherapy
used
to
select
the
studies
was
presented
by
Meltzoff
and
Kornreich
(1970):
Psychotherapy
is
taken
to
mean
the
informed
and
planful
application
of
techniques
derived
from
established
psycho-
logical
principles,
by
persons
qualified
through
training
and
experience
to
understand
these
principles
and to
apply these
techniques
with
the
intention
of
assisting
individuals
to
modify
such personal characteristics
as
feelings,
values,
attitudes,
and
behaviors
which
are
judged
by the
therapist
to be
maladaptive
or
maladjustive.
(p. 6)
Those
studies
in
which
the
treatment
was
labeled
"counseling"
but
whose methods
fit the
above
definition
were included. Drug
therapies,
hypno-
therapy,
bibliotherapy,
occupational therapy, milieu
therapy,
and
peer counseling were excluded. Sensi-
tivity training, marathon encounter groups, con-
sciousness-raising groups,
and
psychodrama were
also
excluded.
Those
studies
that
Bergin
and
Luborsky
eliminated because they used
"analogue"
therapy were retained
for the
present research.
Such
studies have been designated analogue studies
because
therapy lasted only
a few
hours
or the
therapists
were
relatively
untrained.
Rather
than
arbitrarily eliminating large numbers
of
studies
and
losing potentially valuable information,
it was
deemed
preferable
to
retain these studies
and in-
vestigate
the
relationship between length
of
ther-
apy,
training
of
therapists,
and
other characteristics
of
the
study
and
their measured
effects.
The
arbi-
trary elimination
of
such analogue studies
was
based
on an
implicit assumption
that
they
differ
not
only
in
their methods
but
also
in
their
effects
and how
those
effects
are
achieved. Considering
methods,
analogue studies
fade
imperceptibly into
"real"
therapy,
since
the
latter
is
often
short
term,
or
practiced
by
relative novices, etc. Furthermore,
the
magnitude
of
effects
and
their relationships
with other variables
are
empirical questions,
not to
be
assumed
out of
existence. Dissertations
and
fugitive
documents were likewise
retained,
and the
The
research
reported here
was
supported
by a
grant
from
the
Spencer
Foundation,
Chicago,
Illinois.
This
paper
draws
in
part
from
the
presidential address
of the
second
author
to the
American Educational Research Asso-
ciation,
San
Francisco,
April
21,
1976.
Requests
for
:reprints
should
be
sent
to
Gene
V
Glass,
Laboratory
of
Educational Research, University
of
Colo-
rado,
Boulder,
Colorado
80302.
measured
effects
of the
studies
compared according
to the
source
of the
studies.
The
most important
feature
of an
outcome study
was the
magnitude
of the
effect
of
therapy.
The
definition
of the
magnitude
of
effect—or
"effect
size"—was
the
mean
difference
between
the
treated
and
control subjects divided
by the
standard devia-
tion
of the
control group,
that
is, ES =
(.XT
—
XC)/SG-
Thus,
an
"effect
size"
of +1
indicates
that
a
person
at the
mean
of the
control group
would
be
expected
to
rise
to the
84th percentile
of
the
control group
after
treatment.
The
effect
size
was
calculated
on
any-outcome
variable
the
researcher chose
to
measure.
In
many
cases,
one
study yielded more than
one
effect
size,
since
effects
might
be
measured
at
more
than
one
time
after
treatment
or on
more than
one
different
type
of
outcome
variable.
The
effect-size measures
represent
different
types
of
outcomes: self-esteem,
anxiety,
work/school achievement, physiological
stress, etc. Mixing
different
outcomes together
is
defensible.
First,
it is
clear
that
all
outcome mea-
sures
are
more
or
less
related
to
"well-being"
and
so
at a
general level
are
comparable. Second,
it is
easy
to
imagine
a
Senator conducting hearings
on
the
NIMH
appropriations
or a
college president
deciding
whether
to
continue
funding
the
counsel-
ing
center asking,
"What
kind
of
effect
does ther-
apy
produce—on
anything?" Third, each primary
researcher made value judgments concerning
the
definition
and
direction
of
positive therapeutic
ef-
fects
for the
particular clients
he or she
studied.
It is
reasonable
to
adopt these value judgments
and
aggregate them
in the
present study. Fourth, since
all
effect
sizes
are
identified
by
type
of
outcome,
the
magnitude
of
effect
can be
compared
across
type
of
outcome
to
determine whether therapy
has
greater
effect
on
anxiety,
for
example,
than
it
does
on
self-esteem.
Calculating
effect
sizes
was
straightforward when
means
and
standard deviations were reported.
Al-
though this
information
is
thought
to be
funda-
mental
in
reporting research,
it was
often
over-
looked
by
authors
and
editors. When means
and
standard
deviations
were
not
reported,
effect
sizes
were
obtained
by the
solution
of
equations
from
t
and
F
ratios
or
other inferential test
statistics.
Probit transformations were used
to
convert
to ef-
fect
sizes
the
percentages
of
patients
who
improved
(Glass,
in
press).
Original
data
were
requested
from
several authors when
effect
sizes could
not be
derived
from
any
reported information.
In two
instances,
effect
sizes were impossible
to
recon-
AMERICAN
PSYCHOLOGIST
•
SEPTEMBER
1977
• 7S3
struct:
(a)
nonparametric
statistics irretrievably
disguise
effect
sizes,
and (b) the
reporting
of no
data except
the
alpha level
at
which
a
mean
differ-
ence
was
significant gives
no
clue other than that
the
standardized mean
difference
must exceed some
known
value.
Eight hundred thirty-three
effect
sizes were
computed
from
375
studies, several studies yielding
effects
on
more than
one
type
of
outcome
or at
more
than
one
time
after
therapy. Including more
than
one
effect
size
for
each study perhaps intro-
duces
dependence
in the
errors
and
violates some
assumptions
of
inferential statistics. However,
the
loss
of
information that would have resulted
from
averaging
effects
across types
of
outcome
or at
different
follow-up
points
was too
great
a
price
to
pay for
statistical purity.
The
effect
sizes
of the
separate studies became
the
"dependent variable"
in the
meta-analysis.
The
"independent
variables" were
16
features
of the
study
described
or
measured
in the
following
ways:
1.
The
type
of
therapy
employed,
for
example,
psycho-
dynamic, client centered,
rational-emotive,
behavior modifi-
cation,
etc.
There
were
10
types
in
all;
each
will
be
men-
tioned
in the
Results
section.
2.
The
duration
of
therapy
in
hours.
3.
Whether
it was
group
or
individual
therapy.
4. The
number
of
years'
experience
of the
therapist.
5.
Whether clients
were
neurotics
or
psychotics.
6.
The age of the
clients.
7.
The
IQ
of the
clients.
8. The
source
of the