Content uploaded by Melissa G. Hunt
Author content
All content in this area was uploaded by Melissa G. Hunt on May 05, 2014
Content may be subject to copyright.
Self-Report Bias and Underreporting of Depression
on the BDI–II
HUNT, AURIEMMA, CASHAWSELF-REPORT BIAS ON THE BDI–II
Melissa Hunt, Joseph Auriemma, and Ashara C. A. Cashaw
Department of Psychology
University of Pennsylvania
One problem in identifying and treating depression is underreporting of symptoms by individu-
als. Previous research suggests that there may be systematic sex differences in self-report bias,
with men tending to minimize their depressive symptoms more than women. This study used an
experimental design with a sample of 238 community members to test whether disguising the
purpose of the Beck Depression Inventory–II would significantly reduce self-report bias, espe-
cially in men. We found a main effect of condition such that both men and women reported sig-
nificantly more core depressive symptoms in the covert condition, suggesting that surveys of
community samples may underestimate the prevalence of depression.
One of the major problems in the area of assessing and treat-
ing depression is that we must rely, to a large extent, on the
self-report of individuals. As a result, general prevalence
rates of unipolar depression based on community sampling
may significantly underestimate true rates of depression in
both men and women. Eaton, Neufeld, Chen, and Cai (2000)
examined data from the Baltimore Epidemiologic Catchment
Area follow-up and found a strong bias toward
underreporting depressive symptoms. This has profound im-
plications for access to treatment. Druss, Hoff, and
Rosenheck (2000), for example, found that a serious gap ex-
ists between the established efficacy of antidepressant medi-
cations and rates of treatment in the “real world.” They
blamed this gap, in large part, on underreporting of depres-
sive symptoms. They also found that telling a primary care
provider about depressive symptoms predicted a 10-fold in-
crease in treatment.
Although underreporting is a problem for both sexes,
there is reason to believe that it is even more exaggerated in
men than in women. For example, Allen-Burge, Storandt,
Kinscherf, and Rubin (1994) found that the Beck Depression
Inventory (BDI) missed one fourth to one half of the index
cases of unipolar depression in male geriatric psychiatric in-
patients, depending on the cutoff score used. One of the most
widely cited epidemiological findings in mental health is the
2:1 sex difference in rates of unipolar depression (e.g.,
Nolen-Hoeksema, 1987; Weissman & Klerman, 1977).
Among the many explanations for this consistent finding is
that men may simply underreport depressive symptoms be-
cause it is socially undesirable for them to admit to problems
with mental health (e.g., Vredenburg, Krames, & Flett,
1986). The central assertion of this argument is that men ex-
perience depressive symptoms as frequently as women do,
but are less likely to admit it.
A number of studies have addressed the problem of
self-report bias on depression inventories, particularly with
respect to sex differences, but none have produced conclu-
sive results (Nolen-Hoeksema, 1987). For example, Byrne
(1981) conducted a large-scale (n= 756) study of a commu-
nity sample. He found sex differences in the point prevalence
of depression. Although he discussed the possible impact of
social pressures and gender roles, he did not experimentally
manipulate those factors.
King and Buchwald (1982) conducted an experimental in-
vestigation of the role of self-report bias by comparing public
and private disclosure conditions in an undergraduate sam-
ple. They hypothesized that men in the public disclosure con-
dition (which consisted of a structured interview) would
have significantly lower BDI scores than women in a public
disclosure condition or men in a private disclosure condition
(which consisted of completing the BDI by themselves).
Their results did not support their hypothesis because they
did not find men to be any less willing to disclose symptoms
than women, even in the public condition. However, this re-
sult is hardly surprising in light of the fact that college stu-
dents are one of the few populations that rarely show the sex
difference in depression (Hammen & Padesky, 1977; Nolan
& Wilson, 1994; Stangler & Printz, 1980).
Also using an undergraduate sample, Page and Bennesch
(1993) attempted to experimentally manipulate social desir-
JOURNAL OF PERSONALITY ASSESSMENT, 80(1), 26–30
Copyright © 2003, Lawrence Erlbaum Associates, Inc.
ability more subtly. Half of their participants completed the
BDI, accurately labeled as a “Depression” questionnaire.
The other half of their participants completed a BDI that was
disguised as a questionnaire addressing the “daily hassles of
normal living commonly experienced by all people.” To aid
the deception, they padded the BDI with items from the
Hopkins Symptom Checklist. They found that both men and
women reported more depressive symptoms in the masked
condition, but did not find evidence for a sex difference.
Again, this study used a college student sample, so the lack of
sex difference is not unexpected.
Although it is frequently assumed that self-report bias (i.e.,
the minimizing of actual symptoms) is due in part to the social
desirability of such reports for men versus women, very few
studies have examined the issue of social desirability directly.
For example, Clark, Crewsdon, and Purdon (1998) did find
that BDI scores were significantly negatively correlated with
scores on social desirability scales, but they did not address the
question of sex differences. Vredenburg et al. (1986), on the
other hand, investigated sex differences in the expression of
depressive symptoms in a sample of psychiatric patients. They
found that men were more likely to report
sex-role-appropriate symptoms such as work-related prob-
lems and somatic concerns, but they made no effort to experi-
mentallymanipulatethesocialdesirabilityofsuchreporting.
Finally, the BDI–II is a relatively new instrument. Al-
though its reliability and validity have been well established
(e.g., Beck, Steer, & Brown, 1996; Steer, Ball, Ranieri, &
Beck, 1997), relatively few experimental studies with the
BDI–II exist. Moreover, although the original BDI was used
in numerous studies exploring sex differences in community
samples (e.g., Lips & Ng, 1986; Oliver & Simmons, 1985),
relatively few studies have examined sex differences in
self-report on this newer instrument and most have utilized
undergraduate samples (e.g., Dozois, Dobson, & Ahnberg,
1998).
This study is an effort to address self-report bias and
underreporting in a normative community sample using the
BDI–II. We propose to use a method similar to that of Page and
Bennesch (1993). That is, using the BDI–II as our core symp-
tom inventory, we devised two new questionnaires of equal
length. The first, accurately labeled “Depression Inventory,”
consisted of the BDI–II padded with other depression relevant
content items. The second, misleadingly labeled “Life Stress
Inventory,” consisted of the BDI–II padded with benign items
unlikely to trigger social desirability concerns.
We hypothesized that experimental condition would in-
fluence the number of depressive symptoms endorsed by
both men and women, but that men would show a greater ef-
fect. Specifically, we expected that both men and women
would score higher on the covert depression inventory than
on the overt depression inventory. Moreover, we expected
that men and women in the overt condition would show the
usual sex difference in depression, but that the sex difference
would be significantly reduced in the covert condition.
METHOD
Participants
There were a total of 238 participants (131 women and 107
men) in our sample. To obtain a broad community sample, par-
ticipants were recruited from varying venues including
churches, fitness centers, private companies, university per-
sonnel, and the jury selection pool for the City of Philadelphia.
The age of the group varied from 16 to 88 years of age, with a
mean age of 36.5 years. (For the participants who were under
the age of 18, parental consent was also obtained.) The women
in the sample had a mean age of 38, the men had a mean age of
34.5, ns. Approximately 60% of the sample was White and
37% were Black. All participants in the study were volunteers
and all returned fully completed questionnaires.
Procedure
This study used a2×2between-subjects design, comparing
men and women in both the control and experimental groups.
Participants were given a randomly selected packet contain-
ing a consent form, demographic/contact sheet, and ques-
tionnaire labeled either “Depression Inventory” or “Life
Stress Inventory.” For all participants, the consent form indi-
cated that they were being invited to participate in a study of
“life stress.” On completing the study materials, participants
were debriefed and given a written copy of a debriefing letter.
All questions were answered at this point, and the partici-
pants were thanked for their participation.
Materials
The BDI–II is a 21-item self-report measure created to assess
the severity or intensity of depressive symptoms (Beck &
Steer, 1993, Beck et al., 1996). The BDI has been found to be
a reliable and valid instrument to measure depression in a va-
riety of normal and psychiatric populations (e.g., Beck, Steer,
& Garbin, 1988). According to the manual for the BDI–II,
scores ranging from 0 to 13 are considered not depressed,
scores from 14 to 19 mildly depressed, 20 to 28 moderately
depressed, and 29 to 36 severely depressed. Rather than man-
dating specific cutoff scores for research purposes, Beck et
al. recommended choosing cutoff scores carefully, depend-
ing on the need for either specificity or sensitivity. They
noted that a very conservative cutoff score of 17 yields a true
positive rate of 93% and a false positive rate of 18%. They
recommended that if the purpose of the study is to identify
the maximum number of possible cases of depression, cutoff
scores should be set somewhat lower, but still within the
range of scores (14 to 19) representing “mild depression.”
We chose to use a cutoff of 15 or greater for identifying prob-
able index cases of clinically significant or diagnosable ma-
jor depression as a sensitive indicator, but still conservatively
above the lowest score in the range.
SELF-REPORT BIAS ON THE BDI–II 27
In our study, the control group completed a questionnaire
clearly labeled “Depression Inventory.” This questionnaire
contained the 21 items from the BDI–II as well as 14 filler
items developed by the research team from sources such as
the Zung Self-Rating Scale for Depression (Zung, 1965). An
example of a depressed content filler item would be:
0 I find myself as attractive as usual
1 I feel less attractive than I used to be
2 I feel unattractive
3 I have never found myself attractive
The experimental group completed a questionnaire clearly
labeled “Life Stress Inventory.” This questionnaire contained
the 21-item BDI–II with 14 filler items that were developed by
the research team to reflect various examples of mild life
stressors.Anexampleofamildstressorfilleritemwouldbe:
0 Traffic does not bother me
1 Traffic is no more annoying to me than it is to anyone
else
2 Traffic often irritates me
3 Traffic is a major source of stress in my life
To score both sets of questionnaires, only the core BDI–II
items were counted.
RESULTS
The main finding of this study was that condition signifi-
cantly affected the number of depressive symptoms endorsed
by both men and women, whereas the effects of sex and the
interaction between sex and condition were not statistically
significant. A two-way analysis of variance yielded a signifi-
cant main effect of condition, F(1, 238) = 11.841, p=.001,
but showed no effect of either sex or the interaction of sex and
condition, both F(1, 238) < 1, both p> .30. Disguising the in-
tent of the depression inventory led both men and women to
report more depressive symptoms (see Figure 1).
A second test of the main effect of condition was con-
ducted using probable index cases of depression defined by a
cutoff score of 15 or higher on the BDI. Overall, 13% of par-
ticipants in the overt condition scored 15 or greater on the
BDI, whereas 34% of participants in the covert condition
scored at or above the cutoff (see Figure 2).
We also examined whether there were any sex differences
in the data using the index case method. Epidemiologically,
the finding is that twice as many women as men will experi-
ence clinically significant depressive episodes in their life-
time. The finding is not that women are “twice as depressed”
as men. Therefore, we reanalyzed the sex difference data us-
ing the cutoff score of 15. Looked at this way, we did find
some minimal suggestion of a sex difference in self-report
bias (see Figure 3).
In the overt condition, 16% of women were at or above the
cutoff, whereas only 10% of men were at or above the cutoff.
In the covert condition, 33% of men and 35% of women were
at or above the cutoff of 15 (see Table 1 for a summary of all
these findings). In other words, in the overt condition, we
found a sex difference in depression of approximately 3:2,
whereas in the covert condition, the ratio was approximately
1:1. However, the chi-square statistic for this difference was
not significant. Because we knew that there was a main effect
of condition for both men and women, we deemed it reason-
able to compare the effect sizes for each sex. For men, the in-
crease in depression was moderately strong, with an effect
size of d= .59. For women, the effect was somewhat less, at d
= .44. In comparing two effects sizes, it is also possible to
subtract one from the other. The resulting difference is itself
an effect size. For these data, the difference in effect size for
men and women is d= .15. Cohen would define this effect
28 HUNT, AURIEMMA, CASHAW
FIGURE 1 Mean Beck Depression Inventory score by condition.
FIGURE 2 Percentage depressed by condition.
FIGURE 3 Percentage depressed by condition and sex.
size as small. Although not compelling, these effect sizes
suggest that the manipulation had slightly more impact on
men than it did on women, but that this effect was minimal.
Finally, comparisons were also run for Whites versus
Blacks. There were no significant differences between these
two groups.
DISCUSSION
The main finding of this study was that masking the purpose
of the depression inventory led to greater endorsement of
symptoms by both men and women in a broad community
sample. Indeed, we found relatively high rates of depressive
symptoms, ranging from 10% to 35%. Because 1-month
prevalence rates for any affective disorder have been reported
to be 1.7% for men and 3.1% for women (Robins & Regier,
1991), our results raise questions about the effect of padding
the BDI with depression-congruent or benign items. It seems
possible that response set pulled our participant’s scores
higher than they “should” actually have been.
Other studies, however, have suggested that prevalence
rates of depression are, in fact, much higher. For example,
Brantley, Mehan, and Thomas (2000) reported that the prev-
alence of depressive disorders among the general populace is
approximately 17 to 24%. Moreover, approximately 21% of
patients being treated by family practice physicians describe
clinically significant depressive symptoms (Zung,
Broadhead, & Roth, 1993). Oliver and Simmons (1985)
compared current depression diagnoses based on either the
Diagnostic Interview Schedule (DIS) or the BDI. In a sample
of 298 paid volunteers selected by random digit dialing, they
found that according to the DIS, 7.7% were diagnosed as cur-
rent affective disorder, whereas 19.8% scored depressed ac-
cording to the BDI. More recently, Eaton et al. (2000)
compared the DIS to the Schedule for Clinical Assessment in
Neuropsychiatry (or SCAN). Agreement between the two on
diagnoses of Major Depressive Disorder was only fair. In-
deed, the DIS, the instrument used most often in large
epidemiologic studies, actually missed many cases meeting
SCAN criteria. Many of these false negatives could be ex-
plained by individuals underreporting symptoms because
they attributed them to life stress. Interestingly, their results
suggested that the DIS was particularly likely to miss clini-
cally significant depression in men. Our results suggest that
prevalence rates of Major Depressive Disorder may indeed
be much higher than currently accepted prevalence rates.
However, our results are consistent with other recent studies
that have also found much higher rates of depressive disor-
ders and symptoms in the general population.
There were no significant sex differences in mean BDI
score for the entire sample or for either condition examined
separately. The lack of a significant sex difference in depres-
sive symptoms is somewhat surprising given the representa-
tive nature of the sample obtained. There was also no
significant interaction between sex and experimental condi-
tion on mean BDI score. That is, contrary to our predictions,
men’s mean BDI score was not affected by experimental
condition significantly more than women’s mean BDI score.
There were hints in these data, however, suggesting that
reducing self-report bias might mitigate the sex difference in
rates of index cases of depression. Specifically, in the overt
condition, probable index cases of depression occurred at a
ratio (women to men) of approximately 3:2, whereas in the
covert condition, the ratio dropped to 1:1. However, the re-
sults as they stand do not allow us to conclude that this differ-
ence was meaningful or reliable.
Because depression occurs at a relatively low rate, broad
community samples yield only a small percentage of de-
pressed cases. Power analyses showed that quadrupling our
total sample size (for a total sample size of approximately
1,000, but only 36 probable index cases of depressed men
across experimental conditions) would have provided suffi-
cient power to result in a significant chi-square. Unfortu-
nately, obtaining such a large community sample was
beyond the scope of this study.
One weakness of this study is the lack of a significant sex
difference in mean BDI or rates of depression, even in the
overt condition. This raises the possibility that our sample
was not comparable to the large epidemiological studies that
have found such a sex difference. Although we made every
effort to make our sample as broad and inclusive as possible,
we did not use methods such as random digit dialing. Clearly,
African Americans were overrepresented in our sample,
whereas Asian Americans and Hispanic Americans were
underrepresented. The possibility remains that our sample
was unusual, and this obviously limits the generalizability of
the findings. However, we believe the sample was more rep-
resentative of the population than the typical White under-
graduate sample of convenience. Although African
Americans may be overrepresented in our sample, they are
often not represented at all. Therefore, however limited our
results may be, they are clearly more generalizable than
much of the other research on this topic.
The second main weakness is the relatively low sample
size. Although an initial sample of 238 people may seem
large, our primary interest was in probable index cases of de-
SELF-REPORT BIAS ON THE BDI–II 29
TABLE 1
Summary of Results by Condition and Sex
Overt Covert
Male Female Male Female
N52 63 55 68
M(BDI–II) 7.08 8.32 11 11.71
SD 6.67 7.74 8.47 9.2
SEM 0.925 0.975 1.14 1.12
Ndepressed 5 10 18 24
% depressed 10 16 33 35
Note. BDI–II = Beck Depression Inventory–II.
pression. This is a problem that plagues any research relying
on normative samples to yield small percentages of index
cases of psychopathology. From that initial sample of 238,
we found only 57 individuals across both conditions who
were at or over the cutoff of 15 on the BDI–II. By the time the
sample was divided by sex and condition, there were as few
as 5 participants in some cells. It is our hope that this study
will be replicated using a larger sample to establish whether
the sex differences our data hint at actually exist.
Despite these drawbacks, this study suggests that our cur-
rent prevalence rates for unipolar depression (as estimated by
large cachement area studies) may be underestimates of true
rates of depression. Diagnostic practices must always find a
compromise between false negatives and false positives. The
covert administration of the BDI–II probably resulted in very
high sensitivity, but poor specificity. Perhaps primary care
providers would be well advised to couch their diagnostic que-
ries with both men and women in terms that are more socially
acceptable than “depression.” Asking an individual if he or she
is “under stress” might yield a more honest and fruitful admis-
sion of symptoms. Although this might result in some individ-
uals being prescribed antidepressants who do not need them,
research suggests that the alternative, in which depressed indi-
viduals will go untreated, is both much more common and
more serious. Despite significant progress in recent years,
mental health diagnoses still carry powerful stigma and are
likely to arouse strong self-presentational concerns and so-
cially desirable responding. Finding a way to help distressed
individuals acknowledge their symptoms should lead to better
diagnosis and treatment outcomes overall.
REFERENCES
Allen-Burge, R., Storandt, M., Kinscherf, D. A., & Rubin, E. H. (1994). Sex
differences in the sensitivity of two self-report depression scales in older
depressed inpatients. Psychology and Aging, 9, 443–445.
Beck, A. T., & Steer, R. A. (1993). Manual for the Beck Depression Inven-
tory. San Antonio, TX: Psychological Corporation, Harcourt, Brace.
Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Manual for the Beck De-
pression Inventory (2nd ed.). San Antonio, TX: Psychological Corpora-
tion, Harcourt, Brace.
Beck, A. T., Steer, R. A., & Garbin, M. (1988). Psychometric properties of
the Beck Depression Inventory: Twenty-five years of evaluation. Clinical
Psychology Review, 8, 77–100.
Brantley, P. J., Mehan, D. J., & Thomas, J. L. (2000). The Beck Depression In-
ventory and the Center for Epidemiologic Studies Depression Scale. In M.
E. Maruish (Ed.), Handbook of psychological assessment in primary care
settings (pp. 391–421). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Byrne, D. (1981). Sex differences in reporting symptoms of depression in
the general population. British Journal of Clinical Psychology, 20, 83–92.
Clark, D., Crewsdon, N., & Purdon, C. (1998). No worries, no cares: An in-
vestigation into self-reported “nondistress” in college students. Cognitive
Therapy and Research, 22, 209–224.
Dozois, D. J. A., Dobson, K. S., & Ahnberg, J. L. (1998). A psychometric
evaluation of the Beck Depression Inventory–II. Psychological Assess-
ment, 10(2), 83–89.
Druss, B. G., Hoff, R. A., & Rosenheck, R. A. (2000). Underuse of anti-
depressants in major depression: Prevalence and correlates in a na-
tional sample of young adults. Journal of Clinical Psychiatry, 61,
234–237.
Eaton, W. W., Neufeld, K., Chen, L. S., & Cai, G. (2000). A comparison of
self-report and clinical diagnostic interviews for depression: Diagnostic
Interview Schedule and Schedules for Clinical Assessment in
Neuropsychiatry in the Baltimore Epidemiologic Catchment Area fol-
low-up. Archives of General Psychiatry, 57, 217–222.
Hammen, C., & Padesky, C. (1977). Sex differences in expression of depres-
sive responses on the Beck Depression Inventory. Journal of Abnormal
Psychology, 86, 609–614.
King, D., & Buchwald, A. (1982). Sex differences in subclinical depression:
Administration of the Beck Depression Inventory in public and private
disclosure situations. Journal of Personality and Social Psychology, 42,
963–969.
Lips, H. M., & Ng, M. (1986). Use of the Beck Depression Inventory with
three nonclinical populations. Canadian Journal of Behavioural Science,
18(1), 62–74.
Nolan, R., & Wilson, V. (1994). Gender and depression in an undergraduate
population. Psychological Reports, 75, 1327–1330.
Nolen-Hoeksema, S. (1987). Sex differences in unipolar depression: Evi-
dence and theory. Psychological Bulletin, 101, 259–277.
Oliver, J. M., & Simmons, M. E. (1985). Affective disorders and depression
as measured by the Diagnostic Interview Schedule and the Beck Depres-
sion Inventory in an unselected adult population. Journal of Clinical Psy-
chology, 41, 469–477.
Page, S., & Bennesch, S. (1993). Gender and reporting differences in mea-
sures of depression. Canadian Journal of Behavioral Science, 25,
579–589.
Robins, L. N., & Regier, D. A. (Eds.). (1991). Psychiatric disorders in Amer-
ica: The ECA study. New York: Free Press.
Stangler, R., & Printz, A. (1980). DSM–III: Psychiatric diagnosis in univer-
sity populations. American Journal of Psychiatry, 137, 937–940.
Steer, R. A., Ball, R., Ranieri, W. F., & Beck, A. T. (1997). Further evidence
for the construct validity of the BDI–II with psychiatric outpatients. Psy-
chological Reports,80, 443–446.
Vredenburg, K., Krames, L., & Flett, G. (1986). Sex differences in the clini-
cal expression of depression. Sex Roles, 14, 37–49.
Weissman, M., & Klerman, G. (1977). Sex differences and the epidemiology
of depression. Archives of General Psychiatry, 34, 98–109.
Zung, W. W. (1965). A self-rating scale for depression. Archives of General
Psychiatry, 12(1), 63–70.
Zung, W. W., Broadhead, W. E., & Roth, M. E. (1993). Prevalence of de-
pressive symptoms in primary care. Journal of Family Practice, 37,
337–344.
Melissa Hunt
Department of Psychology
University of Pennsylvania
3815 Walnut Street
Philadelphia, PA 19104–6196
E-mail: mhunt@cattell.psych.upenn.edu
Received September 24, 2001
Revised June 23, 2002
30 HUNT, AURIEMMA, CASHAW