Do structured interviews eliminate bias? A meta-analytic comparison of structured and unstructured interviews



We conducted a meta-analysis of studies investigating the extent to which structured and unstructured interviews are affected by such sources of potential bias as applicant attractiveness, pregnancy, weight, sex, race, and use of non-verbal cues. To be included in the meta-analysis a study had to use an experimental design and directly compare interviews scores of structured and unstructured interviews. On the basis of 24 effect sizes, we found that both unstructured (d = .59) and structured interviews (d = .23) were affected by sources of bias. Though both interviews were affected, unstructured interviews were significantly more susceptible to bias than were structured interviews.
Poster presented at the annual meeting of the Society for Industrial-Organizational Psychology, May 2006, Dallas Texas
Though the high costs of employee turnover and incompetent employees have resulted in
the development of sophisticated and technologically advanced selection techniques, the
employment interview continues to be used by virtually all organizations in making selection
decisions. The employment interview has been defined as “an interpersonal interaction of limited
duration between one or more interviewers and a job-seeker for the purpose of identifying
interviewee knowledge, skills, abilities and behaviors that may predict success in subsequent
employment. The operational indicators of this success include the criteria of job performance,
training success, promotion, and tenure (Wiesner & Cronshaw, 1988, p. 276). Although the job
interview remains the preeminent selection tool for most organizations, research has not found a
strong relationship between scores from interviews low in structure and measures of job
performance (Huffcutt & Arthur, 1994). In addition to its lack of predictive validity, meta-
analysis results indicate that interviews low in structure (d = .32) are more prone to adverse
impact than are interviews high in structure (d = .23; Huffcutt & Roth, 1998). Furthermore,
research has indicated that selection decisions based on interviews low in structure yield lower
scores for applicants that are disabled (Wennet, 1994), obese (Kutcher & Bragger, 2004), or
pregnant (Bragger, Kutcher, Morgan, & Firth (2002). The purpose of our research is to conduct a
meta-analysis of research investigating the susceptibility of unstructured and structured
interviews to biases against applicants on the basis of reactions to non job-related cues such as
pregnancy and obesity.
The Predictive Validity of the Job Interview
To assess the utility of the job interview in predicting job performance, Hunter and
Hunter (1984) conducted a meta-analysis comparing the mean validity coefficients (correlations
with supervisory ratings) from studies that investigated any of 11 different predictors (e.g.
cognitive ability tests, assessment centers, biodata) used in selection for entry-level jobs. The
mean validity calculated for interviews ranked 6th at 0.14 (Hunter & Hunter, 1984). This finding
suggests that less than two percent of the variance in job performance is explained by
performance on the job interview.
Predictive validity and structure. Subsequent research has established that adding
structure to the interview process can improve the predictive validity, as well as other
psychometric properties, of the selection interview as a predictor. The term “structure”, when
referring to an interview, can be broadly defined as “any enhancement of the interview that is
intended to increase the psychometric properties by increasing standardization or otherwise
assisting the interviewer in determining what questions to ask or how to evaluate responses”
(Campion, Palmer, & Campion, 1997, p. 656).
Campion et al. (1997) presented a review of the many ways that structure can be
integrated into an interview, and how these components of structure impact the validity and
reliability of the interview. Among the 15 components of structure are the following: base
questions on a job analysis; ask all candidates the same questions in the same order; limit
prompting, follow-up questioning, and elaboration; ask questions that are situational, behavior-
based, or focused on job knowledge; ask a greater number of questions; control ancillary
information such as application forms, resumes, test scores, etc.; rate each answer on scales
tailored for each question; use behaviorally anchored rating scales (BARS); take detailed notes
on applicants’ responses; use multiple interviewers; use the same interviewers for all candidates;
do not allow interviewers to discuss the candidates answers; provide extensive training to the
interviewers; use statistical procedures to determine the best candidate.
While it would be ideal to incorporate all 15 components of structure, it is more likely to
find a few components used at a time. Accordingly, any interview’s overall degree of structure
falls somewhere on a continuum, where any or all of the suggestions above are applied to some
degree. That is, whereby some organizations implement the highest degree of a given component
of structure (e.g., interviewers ask the exact same questions in the exact same order to all
applicants), other organizations use a milder form (e.g., interviewers are given more flexible
questioning guidelines). Because there are so many methods used to structure an interview, and
such variability in the intensity with which each method is applied, there are innumerable ways
to add more structure to an interview; hence, each additional structuring element applied will add
incrementally to the interview’s validity (Campion et al., 1997).
The identification of structure as a way to improve the interview has led to numerous
research studies, and some meta-analyses, documenting the structured interview as a more valid
selection tool than originally thought, as well as a vast improvement over the unstructured
interview. Wiesner and Cronshaw (1988) conducted a meta-analysis comparing structured vs.
unstructured interviews. The structured interview corrected validity coefficient (ρ = .62) was
twice that of the unstructured interview (ρ = .31). Wright, Lichtenfels and Pursell (1989) found a
meta-analytic estimated effect size for structured interviews of r = .39, though the analysis did
not compare this to unstructured interviews. A third meta-analysis by McDaniel, Whetzel,
Schmidt, and Maurer (1994) found the mean corrected validity (ρ)for the unstructured interview
to be .33, the mean corrected validity (ρ) for the structured interview to be .44, and the mean
corrected validity for the situational interview (a type of structured interview) to be even higher
at .50. Recognizing that structure can be applied to different degrees, Huffcut and Arthur (1994)
coded 114 interview validity coefficients into four categories of structure, ranging from (1) no
formal structure to (4) structured questioning and scoring. The resulting meta-analytic validity
coefficients ranged from ρ = 0.20 (no structure) to ρ = 0.57 (greatest structure). Similarly,
Conway, Jako, and Goodman (1995) coded interviews along a structured continuum of high,
moderate and low structure. The mean validities were r = 0.67, 0.56, and 0.34 respectively,
indicating that (a) the predictive validity of the structured interview was almost twice that of the
unstructured interview and (b) that increasing degrees of structure incrementally add to its utility.
All of the above meta-analyses found improved psychometric properties of the structured
interview, identifying it as a more valid tool for predicting job performance than the unstructured
interview. A structured interview incorporates more standardization and decreases or eliminates
subjectivity leading to a greater reliance on job-related criteria (Campion, Pursell, & Brown,
1988). Therefore, besides increasing predictive validity, the structured interview should also
reduce the effect of bias in employment decisions.
Bias in the interview. Many research studies (e.g. Latham & Saari,1984; Latham, Saari,
Pursell, Campion, 1980) have explored the relationship between structuring the job interview and
resulting predictive validity. Relatively fewer studies have introduced specific systematic
sources of bias into the job interview, and then systematically investigated the influence of bias
on structured and unstructured interviewer ratings. Pingitore, Dugoni, Tindale, and Spring (1994)
considered the influence of weight and gender bias by having participants watch mock
employment interviews of male and female normal-weight vs. overweight job applicants and rate
whether they would hire the applicant. Through the use of costumes and scripts, the
qualifications of the applicants remained constant across conditions. Ratings portrayed a
significant effect of both weight and gender, where both overweight and female applicants were
recommended for hire significantly less than their normal-weight and male counterparts,
respectively. Additional research has also found an obvious selection bias against physically
unattractive and overweight job applicants (Cash, Gillen, & Burns, 1977; Morrow, 1990;
Kutcher & Bragger, 2004). Other biases that have been found to influence interviewer scores
include disability (Bricout & Bentley, 2000; Miceli, Harvey, & Buckley, 2001; Ravaud, Madiot,
& Bille, 1992), attire (Forsythe, Drake, & Cox, 1985), age (Perry, Kulik, & Bourhis, 1997), and
non-verbal expression (Burnett & Motowidlo, 1998; DeGroot & Motowidlo, 1999).
If predictive validity is the relationship between a job applicant’s score on a pre-
employment selection test (i.e., the interview) and an ultimate measurement of job performance,
then bias refers to all sources (systematic and random) that influence the selection test scores
(and decisions) but are not related to how the applicant would perform on the job. Though it is
important to assess the predictive validity of the structured and unstructured interview, a problem
with doing so is that criterion measures of job performance (i.e. sales, goals met, performance
appraisal ratings) are also associated with bias. The sources of bias in job performance
measurement may be the same as or similar to the sources of bias in the job interview
measurement, causing inflated correlations between the two scores. This may be especially true
when the same people are involved in both measurement events. This is particularly problematic
when the sources of bias are the age, race, gender, ethnicity or disability of the job candidate.
Can structure actually eliminate this bias? Several research studies indicate that structure seems
at least to reduce it (e.g., Bragger, Kutcher, Morgan, & Firth, 2002; Brecher, Bragger, Kutcher,
& Miller, 2004; Kutcher & Bragger, 2004).
Experimental studies that research bias in the job interview can control the credentials of
the job candidate as determined by their interview responses; this establishes a candidate’s “true
score”, which can then be compared to an interviewer’s ratings. This “true score” would not be
influenced by measurement bias in job performance. We therefore see an importance in
determining the influence of specific sources of bias on interviewer ratings and the influence of
structuring the job interview in reducing rating bias. The purpose of our research is to conduct a
meta-analysis of those studies that introduce and investigate bias against candidates in structured
versus unstructured job interviews. Accordingly, we present the following predictions regarding
the literature studying bias in interviews:
Hypothesis 1: Potential sources of bias will significantly affect unstructured interview scores.
Hypothesis 2: Potential sources of bias will not significantly affect structured interview scores.
Hypothesis 3: Potential sources of bias will have a greater affect on unstructured interview scores
than on structured interview scores.
Finding Studies
The first step in the meta-analysis was to locate studies directly comparing the effect of a
source of irrelevant information (e.g., attractiveness, pregnancy, obesity) on structured and
unstructured interview scores. The search for such studies was concentrated on journal articles,
theses, and dissertations published between 1970 and 2005. To find relevant studies, the
following sources were used:
Dissertation Abstracts Online was used to search for relevant dissertations.
WorldCat was used to search for relevant master’s theses, dissertations, and books.
WorldCat is a listing of books contained in many libraries throughout the world and was
the single best source for finding relevant master’s theses. There were a few theses that
could not be obtained because their home library would not loan them and they were not
available for purchase.
PsycInfo, InfoTrac OneFile, ArticleFirst, ERIC, Periodicals Contents Index, Factiva, and
Lexis-Nexis were used to search for relevant journal articles and other periodicals.
Hand searches were made of the Journal of Applied Psychology, Personnel Psychology,
Applied H.R.M. Research, and the International Journal of Selection and Assessment.
Reference lists from journal articles, theses, and dissertations were used to identify other
relevant material.
Keywords used to search electronic databases included combinations of interview terms
(e.g., structured, situational, behavioral, interview, unstructured) with potential sources of bias
(e.g., sex, race, attractiveness, obesity, pregnancy, first impressions, contrast effects).
The search for documents stopped when computer searches failed to yield new sources
and no new sources from reference lists appeared. To be included in our meta-analysis, a study
had to directly compare structured and unstructured interviews article and had to include a d
score, another statistic that could be converted to a d score (e.g., r, t, F,
2), or tabular data or raw
data that could be analyzed to yield a d score. Studies that investigated a source of bias on only
one type of interview were not included. The literature search yielded nine relevant studies: 5
journal articles, 2 master’s theses, and 2 conference presentations. From these nine studies, 24
independent effect sizes (12 structured, 12 unstructured) were used in the meta-analysis.
Converting Research Findings to d Scores
Once the studies were located, statistical results that needed to be converted into d scores
were done so using the formulas provided in Arthur, Bennett, and Huffcutt (2001). In some
cases, raw data or frequency data listed in tables were entered into an Excel program to directly
compute a d score. If a study provided more than two levels of structure, we categorized the
highest level as being structured, the lowest level as being unstructured, and ignored the levels in
Cumulating d Scores
After the individual d scores were computed, the effect size for each study was weighted
by the size of the sample and the coefficients combined using the method suggested by Hunter
and Schmidt (1990) and Arthur, Bennett, and Huffcutt (2001). In addition to the mean effect
size, the observed variance, amount of variance expected due to sampling error, and 95%
confidence interval were calculated. All meta-analysis calculations were performed using Meta-
Analyzer 5.2, an Excel-based meta-analysis program.
Searching for Moderators and Generalizing Results
Being able to generalize meta-analysis findings across all similar organizations and
settings (validity generalization) is an important goal of any meta-analysis. In this meta-analysis
when variance due to sampling error accounted for less than 75% of observed variance, the next
step was to remove outliers. Outliers were defined as effect sizes that were at least three
standard deviations from the mean. Outliers are removed from meta-analyses because the
assumption is that a study obtaining results that are very different from those found in other
studies did so due to such factors as calculation errors, coding errors, or the use of a unique
sample. After removing outliers, if the variance accounted for was still less than 75%, a search
for such potential moderators was conducted. Potential moderators explored included in this
meta-analysis were the type of potential bias (priming, nonverbal cues, disability, pregnancy,
race, disability, weight, sex), interview medium (face-to-face, video), provision of other
information about the applicant (no, yes), interview scoring method (sum of question ratings,
overall rating), and question type (situational only, situational and behavioral, general
Meta-analyses were conducted separately for the structured and unstructured interview
effect sizes. We hypothesized that unstructured interviews would be significantly affected by
potential sources of bias. As shown in Table 1, the mean effect size for unstructured interviews is
not significant as the confidence interval included zero. Because less than 75% of the observed
variance in unstructured interviews could be expected by sampling error, we looked for outliers.
The d of 2.24 from Study 2 of Kutcher and Bragger (2004) was removed as it was more than
three standard deviations from the mean effect size. As shown in Table 1, after removing this
study, the mean effect size for unstructured interview was significantly different from zero and
100% of the observed variability in effect sizes would have been expected by sampling error.
Thus, hypothesis one was supported and these results can be generalized as there is no need to
search for moderators.
Our second hypothesis was that structured interviews would not be significantly affected
by potential sources of bias. As shown in Table 1, this hypothesis was not supported as the mean
effect size for structured interviews was significantly different from zero.
Our third hypothesis that structured interviews would be less susceptible to sources of
bias than unstructured interviews was supported. We tested this hypothesis by comparing the
effect size for structured interviews (d = .23) with the effect size for unstructured interviews (d =
.59). Because the 95% confidence intervals surrounding the two effect sizes do not overlap, we
can conclude that they are significantly different from one another.
Table 1: Meta-analysis results
95% Confidence
Interview type K N d
Lower Upper SE% Qw
Overall 24 1,359 .47 - .19 1.13 40% 60.4*
Structured 12 663 .23 .23 .23 100% 3.2
Unstructured 12 696 .70 - .08 1.50 32% 37.6*
Outlier removed 11 648 .59 .59 .59 100% 10.7
K=number of studies, N=sample size, d = mean effect size, SE% = percentage of variance explained by sampling error
* Effect sizes are not homogeneous
Since the introduction of structured interviewing in the personnel selection literature,
several studies have attempted to show its superiority over traditional interviews. Meta-analytic
reviews have demonstrated how more structure in the collection and evaluation of interview
information effects greater validity coefficients. One of the primary mechanisms through which
this greater validity operates is the reduction of contamination by irrelevant biases. The current
meta-analysis contributes to the literature by specifically comparing the effect sizes of
interviewer biases during structured and unstructured interviews.
The evidence from the current investigation clearly shows that bias affects interviews. In
both structured and unstructured interviews, the estimated effect size of biases is considerable (d
= .23, .59 respectively). While the support of Hypothesis 1 (that bias is indeed significantly
associated with unstructured interviews) was expected, the lack of support of Hypothesis 2 (that
bias would not be associated with structured interview scores) was not. This indicates that biases
also affect the decision making when the collection and evaluation of information is guided by
stricter standardization and guidance. Hypothesis 3 was also supported; while bias does appear to
have a meaningful affect on structured interviews, it is even stronger in unstructured interviews.
Although counter to our hypotheses, it is understandable that biases may impact a highly
structured situation. One of the more common structuring elements in an interview setting is the
nature of the decision making. Whereas in an unstructured interview, a single holistic hiring
decision is formed, in a structured interview, several question-level or dimension-level decisions
or evaluations are made. Although other structuring elements would ideally be encouraging more
thoughtful and careful processing of relevant information only, it is possible that interviewer
biases are simply affecting more decisions. The finding of a small but significant bias/structured
interview effect size should be considered along with the support for the final hypothesis – that
there is a larger association between biases and unstructured interviews. New research may seek
to investigate which of the many structuring elements are most efficacious at reducing the impact
of interviewer bias.
Some limitations should be noted with the overall conclusions. In our method, we
discarded any data representing intermediate or partially structured interviews. Many studies
present the interview structure variable as dichotomous, where interviews are either completely
unstructured or highly structured. The reality is that most interviews, when conducted in practice,
are likely mildly structured. Furthermore, there has been a call for research in structured
interviewing to represent more than two levels of structure (Lievens & DePaepe, 2004).
Therefore, the studies that attempt to represent more than two levels of structure are probably the
most informative in terms of generalizability to practice. In our study, we looked solely at
unstructured and structured interviews to establish the main effect finding that bias has a greater
impact on unstructured interviews. Other primary studies, and ultimately other meta-analyses,
would benefit from the incorporation of intermediately structured interviews.
Although the findings did not point toward a need for tests of potential moderating
variables, the different biases examined in the collection of studies were diverse. Biases were
related to demographic features, behaviors, and appearance factors in the applicant (e.g., Bragger
et al., 2002; Martin & Stockner, 2000), orientations of the interviewer (e.g., Gousie, 1993), and
properties of the interview or format (e.g., Beech, 1996). One might expect that a significant
moderation effect would have appeared, but no such heterogeneity in effect size distribution was
evident. This lends both more confidence to the effect size estimates found, and motivation to
test additional biases in structured interview settings. For example, although some studies have
linked structured interviews to more legally defensible hiring practices with respect to racial
discrimination (Williamson, Campion, Malos, Roehling & Campion, 1997), no laboratory
studies have manipulated interview structure to examine racial biases across interviews. Other
biases that have yet to be examined alongside interview structure include religious affiliation,
sexual orientation, and deeper psychological biases such as similarity (between the interviewer
and interviewee) and order effects.
Furthermore, there is the potential for the file-drawer problems or secondary sampling
error. The researchers undertook the steps necessary to identify and locate presentations, theses,
and dissertations. In fact, four of the nine studies included in the meta-analysis were not
published in peer-reviewed journals. The fact that these source studies were not subjected to the
strict peer-review process should not detract from the results. Rather, it should serve as evidence
that common criticisms of meta-analyses were addressed and, with no evidence of heterogeneity
in effect size distributions, that the effects of bias in these studies were not materially different
from those in published studies.
A main purpose of meta-analysis is to collect relevant studies on a common topic, and
accumulate data to represent the nature and strength of important relationships. From here,
additional qualification and generalization can be suggested and pursued. In the current study,
we have recognized that biases have a small but significant effect on structured interviews, and a
larger effect on unstructured interviews. Perhaps the most clear next steps are to test for the
influence of bias in intermediately structured interview contexts, to determine the specific
structuring elements that may allow for these biases to emerge, any additional biases that are
generally recognized to affect interview situations, and most importantly – any interventions or
behaviors that may inhibit the impact of interviewer biases.
