Are highly structured interviews resistant to demographic similarity effects?


Abstract and Figures

This study examines the extent to which highly structured job interviews are resistant to demographic similarity effects. The sample comprised nearly 20,000 applicants for a managerial-level position in a large organization. Findings were unequivocal: Main effects of applicant gender and race were not associated with interviewers’ ratings of applicant performance nor was applicant–interviewer similarity with regard to gender and race. These findings address past inconsistencies in research on demographic similarity effects in employment interviews and demonstrate the value of using highly structured interviews to minimize the potential influence of applicant demographic characteristics on selection decisions.
2010, 63, 325–359
The University of Toronto
The Florida State University
Purdue University
This study examines the extent to which highly structured job interviews
are resistant to demographic similarity effects. The sample comprised
nearly 20,000 applicants for a managerial-level position in a large orga-
nization. Findings were unequivocal: Main effects of applicant gender
and race were not associated with interviewers’ ratings of applicant per-
formance nor was applicant–interviewer similarity with regard to gender
and race. These findings address past inconsistencies in research on de-
mographic similarity effects in employment interviews and demonstrate
the value of using highly structured interviews to minimize the po-
tential influence of applicant demographic characteristics on selection
Employment interviews are one of the most common selection de-
vices used by organizations. When structured techniques are employed,
interviews are able to obtain impressive levels of predictive validity (e.g.,
Huffcutt & Arthur, 1994; McDaniel, Whetzel, Schmidt, & Maurer, 1994;
Wiesner & Cronshaw, 1988). Nevertheless, there exists a seemingly per-
sistent belief among academics, practitioners, and the general public that
group-level characteristics, such as race and gender, can have an undue
influence on selection decisions, such as job interview scores (Landy,
2008). Indeed, episodes of racism, sexism, and other forms of workplace
discrimination are a common topic in the popular press (e.g., Cardona,
2009; Miley & Wheaton, 2009; Pear, 2009). For example, selection pro-
cess discrimination claims with respect to race and gender have reached
an all-time high (Equal Employment Opportunity Commission [EEOC],
The potential influence of demographic characteristics is particularly
relevant for selection/promotion systems that incorporate employment
interviews given the interpersonal nature of the interview situation. For
instance, interviewees’ performance—and interviewers’ evaluations of
that performance—could be influenced not only by the interviewee’s de-
mographics (i.e., a main effect) but also by the match (or mismatch)
between the interviewee’s demographics and the interviewer(s)’s demo-
graphics (i.e., an interaction effect). This latter situation is referred to
demographic similarity, and relevant theories predict that people will eval-
uate others who have similar group-level characteristics (e.g., gender) to
themselves more favorably than those who are less similar (Tsui, Egan, &
O’Reilly, 1992). The potential for demographic similarity effects to occur
is a serious concern, as they may result in unfavorable selection decisions
for dissimilar applicants and act to increase the potential for litigation
(Offerman & Gowing, 1993; Williamson, Campion, Malos, Roehling, &
Campion, 1997). Demographic similarity effects may also cause inter-
viewers from dissimilar groups to treat applicants differently, resulting in
negative applicant reactions. These negative reactions, in turn, can have
a variety of deleterious effects, including reduced test-taking motivation
and lower job acceptance rates (Ryan, 2001; Saks & McCarthy, 2006).
Finally, demographic similarity effects may reduce the predictive validity
of the interview process by unduly influencing interview scores and sub-
sequently reducing the impact of candidate knowledge, skills, abilities,
and other characteristics (KSAOs; McFarland, Ryan, Sacco, & Kriska,
A considerable amount of research has examined the main effects of
applicant demographic characteristics on the ratings they receive in in-
terviews. Meta-analyses have revealed relatively small main effects with
respect to applicant race (Huffcutt & Roth, 1998) and gender (Olian,
Schwab, & Haberfeld, 1988), particularly when structured interview for-
mats are employed (Huffcutt & Roth, 1998). Meta-analytic estimates
with respect to applicant gender also yield small main effects (Olian et al.,
1988). Although these studies are informative, they fail to consider the
fact that interviews involve interactions between an applicant and one or
more interviewers. Thus, it is possible that the demographic similarity be-
tween the applicant and the interviewer could impact subsequent interview
scores. For this reason, recent interest in demographics and interviews has
shifted away from simple main effects toward more sophisticated de-
mographic similarity models (e.g., Buckley, Jackson, Bolino, Veres, &
Feild, 2007; Goldberg, 2005; Sacco, Scheu, Ryan, & Schmitt, 2003).
Corresponding findings from this relatively new body of research have
yielded mixed results.
We suggest that this inconsistent pattern of findings is due, in part,
to the fact that prior research has examined a wide range of interview
procedures, which vary considerably in their degree of standardization in
terms of interview development, administration, and/or scoring. Much of
the past research has also studied small samples of participants completing
simulated interviews. Our study was designed to address these critical
gaps in past research. We draw from theories of individuating information
(i.e., Fiske & Neuberg, 1990; Kunda & Spencer, 2003; Kunda & Thagard,
1996) to propose that properly conducted interviews, which follow the key
components of interview structure (Campion, Palmer, & Campion, 1997),
will be resistant to the influence of applicant gender and race. We examine
this issue using data from nearly 20,000 job applicants who underwent
highly structured interviews.
Demographic Similarity Theory
Demographic similarity theory is concerned with the extent to which
people use demographic variables, such as gender and race, to determine
how similar they are to others (Tsui et al., 1992; Tsui & O’Reilly, 1989).
Two interrelated theoretical perspectives form the basis of demographic
similarity theory: the similarity-attraction paradigm (Byrne, 1961) and
social identity theory (Ashforth & Mael, 1989; Tajfel & Turner, 1986).
The similarity-attraction paradigm suggests that individuals regard oth-
ers more positively when they are viewed as more similar to themselves
because it is assumed that individuals with similar demographics will
also have similar underlying attributes (Milliken & Martins, 1996). The
social identity paradigm (Ashforth & Mael, 1989) suggests that our self-
concepts originate from the groups, or social categories, to which we
belong (e.g., demographic groups, occupational groups, sports groups).
We determine our social identities by classifying ourselves into various
groups, and we tend to identify with the groups that enable us to main-
tain positive self-identities. Inclusion of oneself in a particular category
leads to more positive evaluations of in-group than of out-group mem-
bers. These theories are based on the idea that “birds of a feather flock
together” and predict that people will evaluate group members with simi-
lar demographic backgrounds (i.e., gender, race) more favorably. Applied
to an interview context, these theories predict that demographic similarity
between applicants and interviewers will lead to higher levels of inter-
personal attraction and, in turn, more favorable outcomes for “similar”
There is considerable evidence that demographic similarity can influ-
ence work outcomes (Riordan, 2000). For example, demographic similar-
ity has been found to lead to more positive employee relations and com-
munication patterns and higher job satisfaction (Ensher & Murphy, 1997;
Green, Anderson, & Shivers, 1996; Tsui & O’Reilly, 1989; Wesolowski &
Mossholder, 1997). Findings are somewhat less consistent in evaluation
contexts. For example, demographic similarity has been found to have
no effect on performance ratings (Rotundo & Sackett, 1999; Waldman
& Avolio, 1991), small effects on performance ratings (Pulakos, White,
Oppler, & Borman, 1989), and moderate effects on performance ratings
(McKay & McDaniel, 2006; Roth, Huffcutt, & Bobko, 2003).
Prior Research on Demographic Similarity Effects in Interviews
A number of researchers have examined the extent to which demo-
graphic similarity influences interview scores. We summarize this research
in Table 1. For each study, we indicate the type of similarity examined, the
study context and sample, the type of interview(s), and the key findings.
The magnitude of observed effects are interpreted in a manner consistent
with Cohen (1988). As shown, the findings of these studies are varied,
with some reporting no effects (e.g., Graves & Powell, 1995; Sacco et al.,
2003; Simas & McCarrey, 1979) and others reporting small to moderate
effects (e.g., Buckley et al., 2007; Lin, Dobbins, & Farh, 1992; McFarland
et al., 2004).
This variability may be due, in part, to the fact that a majority of studies
have focused on simulated interviews (e.g., Buckley et al., 2007; Gallois,
Callan, & Palmer, 1992; Simas & McCarrey, 1979) and assessment center
ratings that have not teased apart the effects of interviews from other exer-
cises (e.g., Fiedler, 2001; Walsh, Weinberg, & Fairfield, 1987). Although
lab-based research possesses several advantages (e.g., increased control;
Mook, 1983), simulated interviews may not capture the motivations and
consequences that affect the conduct and evaluation of real interviews.
Moreover, the findings of assessment center studies do not speak directly
to demographic similarity effects in interviews. For example, although
interviews and assessment center exercises share common elements, they
also vary in the extent to which applicants must interact with interviewers/
assessors and in what applicants are required to do (e.g., respond to in-
terview questions vs. interact with other applicants in a leaderless group
The between-study differences in past interview similarity research
may also be due to sampling error. Indeed, many studies have exam-
ined demographic similarity using modest samples of interviewees and/or
interviewers (see Table 1). Small samples can give rise to sampling
Prior Research on Demographic Similarity Effects in Job Interviews
Type of
similarity Context and sample Interview type Key findings
Buckley, Jackson, Bolino,
Veres, and Field (2007)
-20 assessors viewed videotapes of
73 officers applying for a real
-significant effect, small
in magnitude
-significant effect, small
in magnitude
-400 applicants for selection at Irish
-38 interviewers
-no significant effect
Fiedler (2001) RACE
-341 applicants for a sales position
-total number of assessors not
-assessment center
ratings included
interviews; interview
data not reported
-no significant effect
Gallois, Callan, and Palmer
-56 personnel officers
viewed 6 videotapes of
simulated interviews
-no significant effect
TABLE 1 (continued)
Type of
similarity Context and sample Interview type Key findings
Goldberg (2005) GENDER
-273 students applying
for various jobs &
companies through career
-45 interviewers
-interviews varied by
different recruiters
-type of interview varied
and was not controlled
-no significant effect
-significant effect, small
in magnitude
Graves and Powell (1995) GENDER COLLEGE RECRUITING
-476 students applying for various
jobs & companies through career
-483 interviewers
-interviews varied by
different recruiters
-type of interview varied
and was not controlled
-no significant effect
Graves and Powell (1996) GENDER COLLEGE RECRUITING
-680 students applying for a various
jobs & companies through career
-237 interviewers
-interviews varied by
different recruiters
-type of interview varied
and was not controlled
-no significant effect
Lin, Dobbins, and Farh
-2,805 applicants for a custodial job
-total number of interviewers not
-significant effect, small
in magnitude
TABLE 1 (continued)
Type of
similarity Context and sample Interview type Key findings
McFarland, Ryan, Sacco,
and Kriska (2004)
-1,334 police officer applicants
-21 interviewers
-no significant effect
-significant effect, small
in magnitude
Prewett-Livingston, Field,
Veres, and Lewis (1996)
-153 police officers applying for
promotion to rank of sergeant
-24 interviewers
-significant effect,
medium in magnitude
-significant effect, small
in magnitude
Rand and Wexley (1975) RACE
-160 undergraduate students viewed
2 simulated video taped
-significant effect,
medium in magnitude
Reid, Kleiman, and Travis
-180 undergraduate students read 6
simulated paper interview
-no significant effects
TABLE 1 (continued)
Type of
similarity Context and sample Interview type Key findings
Sacco, Scheu, Ryan, and
Schmitt (2003)
-12,203 students applying for
various jobs with a large
manufacturing firm
-708 interviewers
-no significant effect
-no significant effect
Simas and McCarrey (1979) GENDER LAB STUDY
-28 individuals viewed 4 simulated
videotaped interviews
-no significant effect
Walsh, Weinberg, and
Fairfield (1987)
-1,035 candidates for a professional
sales position in a large financial
services organization
-133 assessors
-assessment center
included an interview
but interview data not
reported separately
-significant effect, small
in magnitude
Wiley and Eskilson (1985) GENDER LAB STUDY
-109 undergraduate students read 2
simulated paper interview
-no significant effect
Note. We used Cohen’s (1988) criteria to describe the effect sizes found in each study as reflecting small, medium, and large effects. (d: small =
.20 .49, medium =.50–.79,large=>.80; η2: small =.01 .05; medium =.06–.13;large=>.14).
error, which can produce observed effects that are substantially differ-
ent from actual population effects (Schmidt & Hunter, 1996). In support
of this possibility, studies that have examined relatively small numbers of
interviewees and/or interviewers have tended to yield inconsistent results
with respect to similarity effects. In contrast, the few large-sample studies
that have been conducted (e.g., Lin et al., 1992; McFarland et al., 2004;
Sacco et al., 2003) have found consistently small similarity effects.
Finally, the variation in past research may be a result of differences
in interview structure, which we believe is an important factor with re-
spect to demographic similarity effects. Some studies have not reported
the amount of interview structure, and others have been unable to control
the amount of structure due to the fact that the interviews were conducted
by recruiters from different units or organizations (e.g., Fiedler, 2001;
Gallois et al., 1992; Goldberg, 2005). In cases where structured inter-
views have been used, many of the key components were not followed.
In their review paper, Campion et al. (1997) identified 15 key elements
of interview structure. They maintain that properly designed structured
interviews should contain the following components: (1) job analysis, (2)
same questions, (3) limited prompting, (4) better questions, (5) longer in-
terviews, (6) control of ancillary information, (7) limited questions from
candidates, (8) multiple rating scales, (9) anchored rating scales, (10)
detailed notes, (11) multiple interviewers, (12) consistent interviewers,
(13) no discussion between interviews, (14) training, and (15) statistical
We found five demographic similarity studies that examined interviews
that appeared to incorporate the majority of these elements. Lin et al.
(1992) examined demographic effects among 2,805 Black, White, and
Hispanic individuals applying for a custodial job. Each applicant was
rated by a racially mixed panel of two interviewers (the total number of
interviewers was not reported). Although demographic similarity effects
were statistically significant across both the situational and past-behavioral
interviews, the magnitude of the observed effects was small. This study
provided a valuable starting point for research examining demographic
similarity effects in highly structured interviews. However, not all of the
key components of interview structure were followed. For example, the
past-behavioral interview format did not use anchored rating scales, and
the length of the situational interview was not reported. Moreover, as
noted by the authors, the strong underrepresentation of White applicants
(approximately 5%) resulted in low statistical power.
Prewett-Livingston, Feild, Veres, and Lewis (1996) examined 153 po-
lice officer candidates, each of whom was rated by a racially mixed panel
of four interviewers (24 total interviewers). Findings indicated a signif-
icant similarity effect, which was medium in magnitude. Although the
researchers indicated use of a highly structured interview, they did not
report factors such as the length of the interview and whether they con-
trolled ancillary information, interviewer prompting, or questions from
the candidate. Moreover, the study was based on a relatively small sample
of primarily male police officers who were White or Black, which pre-
cluded an assessment of similarity effects with respect to gender and other
racial minorities (e.g., Hispanics).
Sacco et al. (2003) examined 12,203 undergraduates who participated
in real recruiting interviews. Students applied to a variety of jobs in a
large manufacturing firm, and each student was given a one-on-one past-
behavioral interview administered by one of 708 college recruiters. This
study represented a particularly significant extension of past work, as it
was based on a very large sample and hierarchical linear modeling (HLM)
was used to analyze the data. This study also considered similarity effects
with respect to both gender and to each of the four primary racial groups
in the United States. (i.e., Asian, Black, Hispanic, and White). Interviews
were highly structured but were conducted by a single interviewer rather
than an interview panel. Findings revealed no evidence of racial or gender
similarity effects. Given that the focus of this study was on students
undergoing initial recruitment interviews for entry-level jobs, examination
of whether these findings generalize to managerial-level employees and
to other interview types would be valuable.
The fourth study was conducted by McFarland et al. (2004) and was
based on 1,334 police officer candidates. Candidates underwent a situ-
ational interview administered by a racially mixed panel of three inter-
viewers (21 total interviewers). A unique feature of this study was that it
was longitudinal, and thus considered changes in interview ratings over
time. In terms of results, interactions among applicant race, rater race,
and the composition of the interview panel were statistically significant
but small in magnitude. However, the data were analyzed using analysis
of variance, which required computing an average overall rating for each
applicant across the individual interviewers in each panel. Thus, the re-
searchers were unable to examine demographic similarity between each
applicant and each individual interviewer.
The most recent study in the area was conducted by Buckley et al. in
2007. In this study, 20 assessors evaluated videotapes showing 73 police
officers responding to a single situational interview question. By employ-
ing a lab simulation, the racial composition of the interview panels was
manipulated, such that all possible Black/White racial combinations were
represented on the different interview panels. Racial similarity effects
with respect to panel composition were found to be statistically signifi-
cant, albeit small in magnitude. However, the extent to which this small
sample of simulated interviews generalizes to face-to-face interviews in
organizational contexts is uncertain. Moreover, the data were analyzed
using analysis of variance, which prevented the researchers from account-
ing for the nested structure of the data.
The Resistance of Highly Structured Interviews to Demographic
Similarity Effects
As indicated, the nature and magnitude of demographic similarity ef-
fects on interview scores is not conclusive. Our goal was to conduct a
robust test of this phenomenon by developing a set of highly structured
interviews and assessing racial and gender similarity effects in a very large
sample of real applicants. We draw on theories of individuating informa-
tion to propose that properly designed, highly structured interviews will
be resistant to demographic similarity effects. Specifically, we contend
that demographic similarity effects are less likely to play a role because
structured interviews increase the amount of individuating information
available to, and used by, interviewers.
Three theories of individuating information have been advanced (Fiske
& Neuberg, 1990; Kunda & Spencer, 2003; Kunda & Thagard, 1996).
These theories share several key assumptions. First, they hold that when an
individual meets a new person, cognitive processing begins, and the initial
categorization of the individual is often based on group-level characteris-
tics, such as race or gender (see review by Fiske, 1998). This categorization
can cause perceivers to think, feel, and behave in a specific way toward
the target (Fiske, Lin, & Neuberg, 1999). In particular, demographic sim-
ilarity theory suggests that when the demographic characteristics of the
individual are categorized as similar to oneself, more positive perceptions
and evaluations are likely to ensue (Byrne, 1961; Tajfel & Turner, 1986;
Tsui & O’Reilly, 1989).
Also common among these theories is the belief that impressions are
based on more than just demographic information and can be influenced
by individuating information. Applied to the workplace, individuating
information is conceptualized as our knowledge about the job-related be-
haviors and attributes of a specific individual (Copus, 2005). It includes,
but is not limited to, knowledge, skills, abilities, personality traits, and be-
haviors. Thus, when forming impressions of others, individuals integrate
the full range of information known to characterize the individual, in-
cluding demographic characteristics and individuating information (Fiske
& Neuberg, 1990). Further, to the extent that individuating information
about the person becomes available, is processed, and is used, this in-
formation can override initial perceptions when final judgments of the
person are made. Research supports this proposition, as the more individ-
uating information that becomes available, the less influence demographic
characteristics tend to have (Kunda & Thagard, 1996).
We suggest that highly structured interviews facilitate the acquisi-
tion and use of individuating information, which, in turn, overrides initial
perceptions and provides resistance against demographic similarity ef-
fects. This may be accomplished in at least three ways. First, to elicit
individuating processes, the perceiver must be motivated to form an accu-
rate impression of the target (Fiske & Neuberg, 1990; Kunda & Spencer,
2003). This motivation determines whether the perceiver stays with an ini-
tial category-based impression or whether he or she moves beyond group
identity to focus on individuating information (Devine, Plant, Amodio,
Harmon-Jones, & Vance, 2002). Empirical findings suggest that indi-
viduals do not tend to seek out individuating information on their own
(Cameron & Trope, 2004) but rather try to conserve their energy and form
an impression of a target as soon as they feel they have enough information
to form a plausible evaluation (Epley & Gilovich, 2006). In other words,
people tend to stop adjusting their initial impression too soon.
Several features of highly structured interviews are likely to motivate
interviewers to form an accurate impression of candidates and to make it
difficult for interviewers to stop adjusting too soon. Fiske and Neuberg
(1990) noted that when perceivers expect that their judgments will be
made known or compared to others’ judgments, they are more motivated
to present an accurate impression. In structured interviews, the use of pan-
els increases interviewer motivation to attend to individuating information
because interviewers must explain their ratings to others (Arvey & Cam-
pion, 1982; Tetlock & Boettger, 1989). In other words, the anticipation of
discussion among raters should lead to the greater attention to individuat-
ing information. Further, all types of highly structured interviews make it
difficult for interviewers to stop adjusting too soon because the interview
is not complete until all relevant KSAOs have been assessed. In particu-
lar, interviews ask a series of predetermined, job-relevant questions, rate
interviewees’ responses using behaviorally anchored rating scales aligned
with a particular question or dimension, and derive a final evaluation that
reflects a statistical combination of ratings across questions/dimensions
(Campion et al., 1997).
Second, theories of individuation assert that the more attention the
perceiver pays to the target, the more likely it is that they will notice,
remember, and use the information that is inconsistent with initial percep-
tions (Fiske & Neuberg, 1990; Kunda & Spencer, 2003). In fact, attention
is conceptualized as a central mediator, such that motivation to obtain an
accurate impression leads to increased attention to the target, which, in
turn, facilitates the acquisition and use of individuating information (Fiske
& Neuberg, 1990). This idea has been supported by research showing that
when attentional resources are scarce, raters organize their impressions
based on group-level stereotypes (Biesanz, Neuberg, Smith, Asher, &
Judice, 2001; Gilbert & Hixon, 1991; Harris-Kern & Perkins, 1995). In
contrast, when evaluators are forced to attend to individuating information
about the targets (e.g., retrieving and recording specific characteristics of
the target), memory for the targets’ stereotyped traits is inhibited (Dunn
& Spellman, 2003).
Highly structured interviews are designed to focus interviewers’ at-
tention on the job-relevant content of interviewees’ responses. Structured
interviews also tend to take longer than less structured interviews and thus
allow ample opportunity for interviewers to obtain the requisite individuat-
ing information (Campion et al., 1997). Interviewer note taking also helps
to ensure that interviewers focus their attention on the target. Indeed, note
taking has been found to reduce the impact of preexisting expectations
(which, for example, may be influenced by demographic stereotypes) on
interviewers’ final evaluations of applicants (Biesanz, Neuberg, Judice, &
Smith, 1999).
A third way in which individuating information can help to override
potential demographic similarity effects is by ensuring that interviewers
focus on information that is predictive of job performance. Kunda and
Thagard (1996) highlighted the importance of the relevance of the in-
dividuating information to the judgment in question. Specifically, if the
individuating information is relevant to the evaluation task, then it is
more likely to be incorporated into evaluations of the target individual,
thereby overriding the influence of group-level stereotypes. In support
of this proposition, a number of studies have found that providing be-
haviorally relevant information about targets reduces the use of race and
gender-based stereotypes in evaluations of current and future performance
(Bodenhausen, Macrae, & Sherman, 1999; Fiske, 1998; Kunda & Thagard,
Within an interview context, it is important to ensure that raters fo-
cus on individuating information relevant to the job (Tetlock, Mitchell, &
Murray, 2008). The questions in high-structure interviews are designed
to measure KSAOs and behaviors identified from a job analysis. In addi-
tion, interviewers evaluate interviewees’ responses against rating scales
that describe low, moderate, and high descriptions or examples of each
KSAO/behavior. These features, coupled with the fact that highly struc-
tured interviews attempt to minimize the extent to which applicants can
express irrelevant information (e.g., limit the opportunity to ask ques-
tions during the interview), help interviewers obtain and evaluate relevant
individuating information.
In sum, structured interviews possess several characteristics that would
seem to enable interviewers to acquire and use individuating infor-
mation, thereby forming impressions of applicants that are minimally
affected by demographic similarity effects. It is important to note, how-
ever, that impressions of others are formed by simultaneously integrat-
ing initial category-based information (e.g., demographic similarity) and
individuating information (Kunda & Thagard, 1996). As such, demo-
graphic characteristics have the potential to influence interviewers’ judg-
ments at any stage of the interview process, regardless of how much
individuating information has already been obtained (Kunda & Spencer,
2003; Kunda & Thagard, 1996; Wessel & Ryan, 2008). This can occur,
for example, when the individuating information is ambiguous (Darley
& Gross, 1983; Kunda & Sherman-Williams, 1993). This underscores
the importance of ensuring that the entire interview process remains
highly structured. The influence of demographic-based judgments can
also change when different judgment tasks are used (Berndt & Heller,
1986; Jackson, Sullivan, & Hodge, 1993), such as the use of different
interview formats (e.g., past-behavioral vs. situational). Thus, examining
interviewer ratings with respect to different types of interviews is also
Current Study
Our goal was to conduct a robust test of the extent to which three
widely used types of structured interviews (past-behavioral, situational,
and experience-based) are resistant to demographic similarity effects.
Based on the above discussion of theory and research on demographic
similarity, individuation, and structured interviewing, we do not expect
high-structure interviews that conform to the key components of struc-
ture to be subject to racial or gender similarity effects. In examining this
general expectation, we address several critical gaps in past research to
provide a more definitive test of demographic similarity effects.
First, we provide a theoretical foundation for the proposed lack of
demographic similarity effects in job interviews. This is an important
contribution because there has been an absence of strong theory in past
work on similarity effects in structured interviews. Second, our data set
includes a large and diverse sample of both applicants (N=19,931) and
interviewers (N=207). This enabled us to fully explore the range of
demographic similarity effects that may be present in real-world selec-
tion situations. Third, several researchers have highlighted the need for
research on demographic similarity that focuses on managerial jobs (Lin
et al., 1992; McFarland et al., 2004; Prewett-Livingston et al., 1996). To
our knowledge, this is one of the first investigations to do so. Fourth,
we examine three types of highly structured interviews, which may be of
considerable value to organizations that may need to choose one or two
of these interview formats to use in the selection process (Simola, Taggar,
& Smith, 2007). Fifth, unlike past research (see Sacco et al., 2003 for an
exception), we consider demographic similarity with respect to the four
primary U.S. racial groups, as well as with respect to gender. Finally, our
large sample of interviewees and interviewers allow us to use advanced
HLM techniques to assess main and interactive effects of gender and race.
This approach allows for more accurate estimates of demographic similar-
ity than what more traditional approaches provide (e.g., ANOVA, linear
Participants included 19,931 entry-level persons applying for profes-
sional positions with an agency of the U. S. government. These positions
entail working with the public, government officials, and the business
community. Selected employees would work in one of several different
career tracks, including general management and specialty areas. Thirty-
four percent of the sample was female, and 59% was male. The remaining
7% did not identify their gender. In terms of racial composition, 15,709
participants were White (79%), 1,437 were Asian (7%), 1,026 were His-
panic (5%), and 700 were Black (4%). The remaining 5% did not report
their ethnicity. Participants were retained if they reported data on either
their gender and/or race.
A total of 207 interviewers participated in this research. All interviews
were conducted by a panel of two interviewers, who were randomly
assigned to applicants. In terms of gender, 74 interviewers were women
(37%), 115 were men (58%), and 5% did not identify their gender. In terms
of ethnicity, 131 interviewers were White (63%), 34 were Black (19%),
8 were Asian (4%), and 6 (3%) were Hispanic. The remaining 11% of
interviewers did not identify their ethnicity. As with the interviewees,
interviewers were retained if they reported data on either their gender
and/or race.
Structured Interviews
We took great care to ensure the interviews incorporated the 15 key
components of interview structure (Campion et al., 1997). The first seven
components of structure focus on the content of the interviews. We en-
sured that (1) the interviews were based on a comprehensive job analysis;
(2) within the situational and past-behavioral interviews, the same ques-
tions were asked of each candidate, and within the experience-based in-
terview, similar questions were asked of each candidate; (3) the use of
prompts and follow-up questions was limited; (4) three different ques-
tioning techniques were employed (i.e., experienced-based, situational,
past-behavioral); (5) each interview allowed sufficient time for interview-
ers to ask several questions; (6) ancillary information was controlled; and
(7) candidates were encouraged to ask questions after the structured phase
of the interview process was complete.
The remaining components of structure focus on the evaluation of in-
terviewees’ responses. We ensured that: (8) interviewers evaluated each
dimension using behaviorally anchored rating scales; (9) descriptive scale
anchors were derived from KSAO definitions, previously developed in-
terviews and responses from previous candidates; (10) interviewers were
trained on the importance of note taking during the interview process;
(11) a panel of two interviewers evaluated each candidate; (12) the same
set of interviewers conducted the interviews for each applicant; (13) the
interviewers did not discuss candidates between interviews; (14) all in-
terviewers were extensively trained to ensure proficiency in conducting
and scoring the interview; and (15) statistical procedures (unit weighting)
were used to combine ratings within each interview.
Experienced-based interview. The experience-based interview re-
quired applicants to answer questions about their qualifications, such
as work experience and education (cf., Roth & Campion, 1992). The
one difference between the experience-based interview and the other two
interviews was that interviewers could choose questions from a prede-
termined set of questions rather than asking every candidate the exact
same questions. Interviewers rated candidates on three questions that cor-
responded to the following three KSAOs: education and work experience,
motivation to join the organization, and other relevant experience.
Situational interview. The situational interview required applicants to
respond to hypothetical dilemmas that may be experienced on the job (cf.,
Latham, Saari, Pursell, & Campion, 1980). It contained nine questions
that corresponded to the following nine KSAOs: planning and organiz-
ing, teamwork, adaptability, leadership, judgment, integrity, analytical
skills, resourcefulness, and composure. Each question consisted of a base
question as well as follow-up questions that challenged the candidate by
eliminating obvious answers and/or by changing the situation. One in-
terviewer asked the questions, but both interviewers took notes and then
rated the candidate’s answers at the end of the interview.
Past-behavioral interview. The past-behavioral interview required
applicants to describe their behavior in past situations relevant to the
job (cf., Janz, 1982; Pulakos & Schmitt, 1995). This interview contained
eight questions that corresponded to the following eight KSAOs: planning
and organizing, teamwork, adaptability, leadership, judgment, integrity,
composure, and oral communication skills.
We used a within-subjects design, whereby each candidate completed
all three interviews. Due to legal and practical considerations associated
with interviewing 19,931 individuals, all applicants were administered
the interviews in the same order: experienced-based interview, situational
interview, and past-behavioral interview. Each interview lasted approxi-
mately 20 minutes, for a total of about 60 minutes. Each applicant was
interviewed in person by a panel of two experienced human resource
specialists. These raters received 2 full days of training lead by a group
of consultants who held doctoral degrees in I-O psychology or human
resources management. Training consisted of lecture, practice evaluations
of videotaped candidates, and feedback. Each rater was also given a man-
ual that included the assessment, training notes, and other work aids. In
addition to frame-of-reference training, a portion of the lecture material
covered rater errors, including leniency, severity, and central tendency.
Interviewers rated each dimension assessed within each interview on a
unique 7-point scale. The dimension ratings from each interview were then
averaged to create a total score. If interviewers disagreed on the total score
by more than two points, they would discuss the candidate. In situations
where interviewers discussed their ratings, they had the choice to retain
or change their original ratings. The data suggest that many retained
their original ratings. Thus, even in cases where raters discussed their
ratings, there was still opportunity for between-rater variance in their final
ratings. After the interviews were complete, demographic information for
both job candidates and interviewers was obtained from organizational
Analytic Strategy
We used HLM with restricted maximum likelihood (RML) estimation
to analyze the data. The use of HLM enabled us to control the nonindepen-
dence of the interview scores resulting from the fact that two interviewers
rated each applicant and that each interviewer evaluated multiple appli-
cants. Additional benefits of using HLM to examine nested (i.e., multi-
level) data structures have been highlighted by several researchers (e.g.,
Bliese, 2002; Raudenbush, Bryk, Cheong, Congdon, & du Toit, 2004), and
the specific benefits of using HLM to analyze interview data have been
outlined by Sacco et al. (2003). The dependent variable for the Level 1
unit of analysis was the mean ratings of each of the two interviewers who
conducted each interview. There were 119,586 scores at this level (i.e.,
the 19,931 applicants received two scores, one from each interviewer, for
each of the three interviews). These scores were cross-classified by two
higher-order Level 2 units: applicant demographic characteristics (gender,
race) and interviewer demographic characteristics (gender, race). Thus, a
cross-classified random effects model was estimated, with applicant and
interviewer demographics nested within interview scores. The specific
models to be estimated were beyond the current Windows version of HLM
6.0. Therefore, all analyses were run using advanced DOS programming
in HLM 6.0 (Raudenbush et al., 2004).
HLM allowed us to assess whether our Level 2 variables (i.e., appli-
cant and interviewer demographics) impacted outcomes at Level 1 (i.e.,
interview scores). This is analogous to testing for main effects of gender
and race on interview scores. Given our interest demographic similarity
effects, we also examined the interactions between applicant and inter-
viewer demographic characteristics. These interactions are meaningful
when the data are not centered, because a dichotomous coding strategy
was used for both gender and racial effects (Sacco et al., 2003). Thus, we
assessed uncentered, dichotomous variables for all analyses.
To facilitate interpretation of the findings, we compared seven gender
and race subgroups. For the analyses involving gender, each applicant and
interviewer was coded as 1 (male) or a 0 (female). This enabled an assess-
ment of (a) the main effect of applicant gender on interview scores, (b) the
main effect of interviewer gender on interview scores, and (c) the interac-
tion between applicant and interviewer gender on interview scores, which
tested for demographic similarity effects. The remaining six subgroups
reflected all possible racial combinations: White/Black, White/Asian,
White/Hispanic, Black/Asian, Black/Hispanic, and Asian/Hispanic. Us-
ing gender as an example, the following equations were estimated:
Level 1: Rating =βojk +rijk
Level 2:βoj =γ00 +γ01(sexapp)+y10 +y11 (sexint)+γ01 (sexapp)(sexint )+
uoj +uok.
The L1 equation predicts applicants’ interview scores based on the
mean interview score within each of the j applicants (βojk) and the within-
cell random effects for interview scores (rijk). The L2 equation models the
main effect of applicant sex [γ01 (sexapp)], the main effect of interviewer
sex [y11(sexint)], and interaction effect of applicant and interviewer sex
[γ01(sexapp )(sexint )] on interview scores, and includes the intercept (γoo)
and the residual random effects of applicants’ and interviewers’ demo-
graphic characteristics (uoj and uok respectively).
Descriptive statistics for interview scores for each combination of
applicant and interviewer race are shown in Table 2, and the statistics
for each combination of applicant and interviewer gender are shown in
Table 3. The internal consistency reliability for scores on the experience-
based interview was .79, for the situational interview was .90, and for the
past-behavioral interview was .86. The intraclass correlations (C,2) for
Descriptive Statistics of Interview Ratings for Each Applicant–Interviewer Race
Interviewer race
Interview type White Black Asian Hispanic Overall
White applicant
M5.14 5.14 5.17 5.10 5.14
SD .73 .71 .63 .69 .71
N21,485 5,632 2,027 885 31,371
Black applicant
M5.31 5.57 5.37 5.35 5.39
SD .75 .71 .63 .66 .73
N713 273 67 19 1,119
Asian applicant
M5.23 5.33 5.26 5.17 5.25
SD .68 .74 .62 .70 .68
N1,588 388 140 42 2,244
Hispanic applicant
M5.19 5.33 5.32 5.40 5.24
SD .69 .67 .52 .69 .67
N1,045 297 101 35 1,544
M5.15 5.18 5.19 5.12 5.16
SD .72 .71 .63 .69 .72
N24,929 6,619 2,347 984 34,879
White applicant
M4.93 4.93 4.88 4.98 4.93
SD .72 .69 .63 .69 .71
N21,485 5,632 2,027 885 30,029
Black applicant
M4.97 5.21 4.99 5.23 5.05
SD .75 .78 .66 .76 .76
N713 273 67 19 1,072
Asian applicant
M4.85 4.95 4.81 5.03 4.87
SD .73 .73 .76 .65 .73
N1,588 388 140 42 2,158
Hispanic applicant
M4.87 4.94 4.94 5.22 4.90
SD .73 .70 .55 .67 .72
N1,045 297 101 35 1,478
M4.92 4.94 4.88 4.99 4.93
SD .72 .70 .64 .69 .71
N24,929 6,619 2,437 984 34,879
TABLE 2 (continued)
Interviewer race
Interview type White Black Asian Hispanic Overall
White applicant
M5.15 5.16 5.13 5.16 5.15
SD .66 .63 .58 .62 .65
N21,485 5,632 2,027 885 30,029
Black applicant
M5.24 5.49 5.25 5.41 5.31
SD .71 .64 .62 .58 .69
N713 273 67 19 1,072
Asian applicant
M5.12 5.24 5.09 5.21 5.14
SD .65 .67 .74 .59 .66
N1,588 388 140 42 2,158
Hispanic applicant
M5.16 5.19 5.25 5.35 5.18
SD .61 .64 .57 .65 .62
N1,045 297 101 35 1,478
M5.15 5.18 5.13 5.18 5.15
SD .66 .64 .59 .62 .65
N24,929 6,619 2,437 984 34,879
the mean interview ratings of the two raters in each interview were also
high for the experience-based (.79), situational (.80), and past-behavioral
(.82) interviews.
Confirmatory Factor Analyses
Prior to examining the existence of demographic similarity effects
across the three interview formats, it was important to determine whether
the formats were indeed distinguishable empirically. To do so, we con-
ducted confirmatory analyses using AMOS 16.0 (Arbuckle, 2005) to ex-
amine the underlying structure of the interview ratings. Maximum likeli-
hood estimation procedures were used and three indices were employed
to assess the fit of the models: the chi-square index, the standardized root
mean residual (SRMR; Hu & Bentler, 1999), and the comparative fit index
(CFI; Bentler, 1990). This combination of fit indices ensured the inclusion
of an index that considers how much variance is explained in light of how
many degrees of freedom are used (i.e., SRMR) as well as an index that
is a direct function of how much variance is explained by the model (i.e.,
Descriptive Statistics of Interview Ratings for Each Applicant–Interviewer
Gender Combination
Interview type Female Male Overall
Female applicant
M5.26 5.27 5.27
SD .66 .68 .67
N7,964 5,315 13,446
Male applicant
M5.13 5.08 5.10
SD .74 .73 .73
N8,996 14,297 23,293
M5.18 5.15 5.16
SD .72 .71 .71
N14,311 22,261 36,572
Female applicant
M5.00 5.01 5.00
SD .69 .68 .69
N5,315 7,964 13,279
Male applicant
M4.89 4.88 4.89
SD .72 .72 .72
N8,996 14,297 23,293
M4.93 4.93 4.93
SD .71 .71 .71
N14,311 22,261 36,572
Female applicant
M5.27 5.25 5.26
SD .61 .61 .61
N5,315 7,964 13,279
Male applicant
M5.13 5.07 5.10
SD .67 .65 .66
N8,996 14,297 23,293
M5.19 5.14 5.16
SD .65 .64 .65
N14,311 22,261 36,572
CFI). In the case of the SRMR, values approaching .00 indicate a good
fit. For the CFI, values approaching 1.0 indicate a good fit. The mean di-
mensions ratings (averaged across the interviewers) within each interview
format served as the input for these analyses.
We first tested a model where interviewers’ ratings were specified to
load on one of three factors that corresponded to the three interview for-
mats (i.e., experience-based, situational, and past-behavioral). Findings
indicated that this model achieved an acceptable fit to the data (χ2
(167) =
23,171.6, p<.01; SRMR =.05, CFI =.90). We compared this to a model
in which the questions from the three interviews were specified to load
on a single factor. We created this factor by fixing the covariances among
the three interview factors to 1.0, and thus this model is nested within the
three-factor model. This unidimensional model tested the possibility that
applicants performed similarly (and/or interviewers evaluated that perfor-
mance similarly) across all three interview formats rather than exhibiting
distinct performance on the three interviews. Findings indicated that the
three-factor model provided a significantly better fit to the data (χ2
(3) =
8,289.96, p<.001) than did the single-factor model (χ2
(170) =46,011.1,
p<.01; SRMR =.25, CFI =.80). This provides support for examining
the three interview formats separately.
Hierarchical Linear Modeling Results
Results from HLM analyses are presented in Tables 4–7. The first sec-
tion of each table presents findings for the experience-based interviews,
the second section for the situational interviews, and the third section for
the past-behavioral interviews. In each case, main effects of applicant de-
mographics on interview scores, interviewer demographics on interview
scores, and the interaction between applicant and interview demographics
on interview scores are reported.1We also computed effect size estimates,
in the form of Cohen’s dand pseudo R2values, following the multi-
level modeling effect size computation provided by McNulty, O’Mara,
and Karney (2008).2It is noteworthy that each of these analyses were
conducted on a sample large enough to find even the minutest effects.
Indeed, the statistical power to detect an effect size of less than .05 (or
1Analyses examining the effects of interview panel composition on interview scores
were also conducted. In doing so, situations in which (a) one interviewer was the same race
as the applicant, (b) both interviews were the same race as the applicant, and (c) neither
interviewer was the same race as the applicant were considered. These analyses were also
conducted with respect to interviewer gender. Findings were consistent with our previous
results: panel composition did not have a meaningful impact on interview scores.
2The size of the effects (r) for each analysis was estimated using the formula provided
by McNulty et al. (2008). These rs were then converted into estimates of Cohen’s d.
HLM Analyses of Race Similarity Effects (White vs. Black) on Interview Scores
Experience-based interview Situational interview Past-behavioral interview
bSE t d bSE t d bSE t d
Level 1: Intercept 5.19 .06 89.97∗∗∗ 4.92 .06 88.11∗∗∗ 5.19 .05 103.81∗∗∗
Level 2: Main effects
Applicant race (AR) .03 .04 .74 .01 .01 .04 .17 .00 .05 .04 1.27 .01
Interviewer race (IR) .03 .04 .65 .01 .03 .04 .65 .01 .00 .04 .04 .00
Level 2: Interaction AR ×IR .04 .03 1.17 .01 .00 .03 .00 .00 .02 .03 .93 .01
Variance estimates:
Interview score (Level 1) .10 .31 .09 .30 .07 .26
Applicant (Level 2 row) .38 .62 .38 .62 .33 .57
Interviewer (Level 2 column) .02 .13 .01 .12 .01 .11
Pseudo R2.00 .00 .00
Note.N(Level 1) =28,774; N(Level 2 applicants) =16,304; N(Level 2 interviewers) =164. Race was coded as 1 =White,2=Black.
b=unstandardized beta coefficients; SE =standard error; t=t-ratio for each effect; d=effect size; Pseudo R2=the amount of variance in interview
scores accounted for by applicant and interviewer gender main effects and interactions.
HLM Analyses of the Effects of Applicant and Interviewer Race (White vs. Asian) on Interview Scores
Experience-based Situational interview Past-behavioral interview
bSE t d bSE t d bSE t d
Level 1: Intercept 5.14 .07 72.13∗∗∗ 4.97 .07 70.53∗∗∗ 5.24 .06 84.33∗∗∗
Level 2: Main effects
Applicant race (AR) .00 .05 .10 .00 .03 .05 .60 .01 .06 .04 1.43 .02
Interviewer race (IR) .05 .06 .87 .01 .04 .06 .60 .01 .04 .05 .81 .01
Level2:InteractionAR×IR .04 .04 .96 .01 .04 .04 1.29 .01 .02 .03 .50 .01
Variance estimates:
Interview score (Level 1) .09 .30 .09 .30 .07 .26
Applicant(Level2row) .37 .61 .37 .61 .31 .56
Interviewer (Level 2 column) .02 .13 .02 .13 .01 .11
Pseudo R2.00 .00 .00
Note.N(Level 1) =27,142; N(Level 2 applicants) =16,840; N(Level 2 interviewers) =139. Race was coded as 1 =White,2=Asian.
b=unstandardized beta coefficients; SE =standard error; t=t-ratio for each effect; d=effect size; Pseudo R2=the amount of variance in interview
scores accounted for by applicant and interviewer gender main effects and interaction effects.
HLM Analyses of the Effects of Applicant and Interviewer Race (White vs. Hispanic) on Interview Scores
Experience-based interview Situational interview Past-behavioral interview
bSE t d bSE t d bSE t d
Level 1: Intercept 5.13 .10 49.81∗∗∗ 4.97 .10 51.99∗∗∗ 5.08 .09 59.06∗∗∗
Level 2: Main effects
Applicant race (AR) .02 .07 .31 .00 .09 .07 1.20 .01 .01 .06 .21 .00
Interviewer race (IR) .02 .09 .20 .00 .04 .09 .51 .01 .07 .08 .88 .01
Level 2: Interaction AR ×IR .05 .07 .70 .01 .08 .06 1.26 .02 .02 .06 .40 .00
Variance estimates:
Interview score (Level 1) .09 .31 .09 .31 .06 .25
Applicant (Level 2 row) .37 .61 .38 .62 .32 .56
Interviewer (Level 2 column) .02 .15 .02 .13 .01 .12
Pseudo R2.00 .00 .00
Note.N(Level 1) =23,019; N(Level 2 applicants) =16,463; N(Level 2 interviewers) =137. Race was coded as 1 =White,2=Hispanic.b=
unstandardized beta coefficients; SE =standard error; t=t-ratio for each effect; d=effect size; Pseudo R2=the amount of variance in interview scores
accounted for by applicant and interviewer gender main effects and interaction effects.
HLM Analyses of the Effects of Applicant and Interviewer Gender on Interview Scores
Experience-based interview Situational interview Past-behavioral interview
bSEt dbSEt dbSEt d
Level 1: Intercept 5.08 .04 138.24∗∗∗ 4.87 .04 135.35∗∗∗ 5.08 .03 160.16∗∗∗
Level 2: Main effects
Applicant gender (AG) .05 .02 3.03∗∗ .03 .05 .02 3.37∗∗ .03 .05 .01 3.66∗∗∗ .04
Interviewer gender (IG) .03 .02 1.24 .01 .01 .02 .31 .00 .02 .20 1.15 .01
Level2:InteractionAG×IG .01 .01 .57 .01 .01 .01 .79 .01 .01 .01 .88 .01
Variance estimates:
Interview score (Level 1) .10 .31 .09 .30 .07 .26
Applicant(Level2row) .38 .62 .39 .62 .32 .57
Column (Level 2 column) .02 .12 .01 .12 .01 .11
Pseudo R2.00 .00 .00
Note.N(Level 1) =36,597; N(Level 2 applicants) =18,541; N(Level 2 interviewers) =192. Gender was coded as 1 =male,2=female.b=
unstandardized beta coefficients; SE =standard error; t=t-ratio for each effect; d=effect size; Pseudo R2=the amount of variance in interview scores
accounted for by applicant and interviewer gender main effects and interaction effects.
∗∗p<.01. ∗∗∗p<.001.
an r2<.01) was .995, thus rendering the probability of a Type II error
extremely low (Cohen, Cohen, West, & Aiken, 2003).
Tables 4–6 present the results for the racial analyses with the
White/Black, White/Asian, and White/Hispanic subgroups. Findings were
consistent across all groups—neither the main effects of applicant or in-
terviewer demographics, nor the interaction between applicant and inter-
viewer demographics, significantly influenced interview ratings. Not only
were the effects nonsignificant, but the effect sizes were consistently be-
low .10, rendering them extremely small (Cohen et al., 2003). Moreover,
the pseudo R2values were zero, suggesting that none of the variance in
interview scores could be attributed to the demographic effects. Analy-
ses for the Black/Asian, Black/Hispanic, and Asian/Hispanic subgroups
yielded an identical pattern of nonsignificant findings and are available
from the first author upon request.
Table 7 presents the results for gender. The main effect of interviewer
gender on applicant scores was nonsignificant across all three interviews,
as was the interaction between applicant and interviewer gender. A signifi-
cant main effect for applicant gender was found across all three interviews,
such that females scored slightly higher than males. However, the mag-
nitude of these effects was extremely small (d=.03 .04). Moreover,
R2values were zero across all interview types, indicating that demographic
effects did not contribute to the variance in interview scores. The overall
findings were unequivocal: demographic similarity effects did not have a
meaningful impact on any of the interview scores.
The extent to which demographic variables influence personnel deci-
sions can have important consequences with respect to fairness, diversity,
and legal defensibility (and perhaps even construct and/or criterion-related
validity). We used theories that focus on individuating information as a
basis to propose that when the key components of structure are adhered
to, demographic similarity effects are unlikely to occur in employment
interviews. We tested this proposition in a large sample of applicants for
managerial positions. Demographic similarity was considered with re-
spect to gender and all four primary racial groups in the U.S. Findings
were robust and suggest that demographic similarity effects in highly
structured interviews were trivial. This is an important finding because it
suggests that, in addition to obtaining impressive levels of predictive valid-
ity (Huffcutt & Arthur, 1994; McDaniel et al., 1994; Wiesner & Cronshaw,
1988), structured interviews can minimize or eliminate potential bias with
respect to demographic similarity between applicants and interviewers.
Implications for Theory and Practice
We drew from theories of individuating information to posit that highly
structured interviews facilitate the acquisition and use of individuating
information and are thereby resistant to the effects of demographic sim-
ilarity. Results were unequivocal and provided strong support for this
proposition. This provides a solid theoretical basis for the small similar-
ity effects that have been found in past studies of structured interviews.
Theories of individuating information may also extend to other personnel
selection and human resource practices, such as letters of recommendation
and performance appraisals. In particular, theories of individuating infor-
mation assert that for individuating information to be obtained and used,
raters should be motivated to form accurate impressions of the target,
and raters should focus their attention on job-relevant behaviors (Fiske
& Neuberg, 1990; Kunda & Spencer, 2003). In a performance appraisal
context, rater motivation could be facilitated by increasing accountability
through the use of multiple raters (i.e., 360feedback). Rater attention to
job-relevant behaviors could be facilitated by basing ratings on a set of
predetermined job-relevant dimensions of behavior and by using behav-
iorally anchored scoring techniques. Similar techniques could be used to
develop structured and standardized letters of recommendation that are
based on multiple raters and evaluate candidates on job-relevant attributes.
Our study also highlights possible factors for why inconsistent results
have been reported in past studies of demographic similarity in interviews.
First, past studies have varied considerably on the amount of interview
structure. Even in cases where highly structured interviews have been
examined, there is no evidence of a study that has followed all of the key
components of interview structure. It is therefore possible that structured
interviews are only resistant to demographic similarity effects when all,
or most, of the components of structure are followed. Second, past stud-
ies have varied widely on important factors such as study design (e.g.,
lab simulations vs. field studies). These methodological differences may
explain, in part, the inconsistent findings of past work. A third possibility
is that sampling error may have contributed to between-study because
a majority of past studies were based on samples considerably smaller
than that obtained in this study. Finally, the statistical techniques used
to analyze past work may have contributed to between-study differences
because prior studies have tended to rely on ANOVA-based techniques,
which do not account for the nested nature of the datasets that are common
in this area and may thus overestimate similarity effects (see Sacco et al.,
2003). Combined, these factors highlight the importance of using large
samples of job applicants and HLM analyses to derive the most accurate
estimates of similarity effects.
From a practical perspective, our findings challenge the frequent as-
sumption made by academics, practitioners, and the general public that
demographic characteristics have a substantial impact on interview scores.
Our results suggest that organizations that adopt carefully administered
interviews that conform to the key components of structure can minimize
concerns of applicant discrimination on the basis of gender and race. The
use of highly structured interviews will also help to facilitate the selection
of a diverse workforce, as well as act to reduce litigation concerns. Fur-
ther, the racial composition of the panel did not affect interview scores.
Thus, although the use of a diverse panel of raters may facilitate the at-
traction of diverse candidates (Avery & McKay, 2006), panel diversity (or
lack thereof) is not associated with subsequent scores. Our findings also
indicate that experience-based, situational, and past-behavioral interview
formats are able to provide unique assessment information and yet are
equally resistant to demographic similarity effects. Thus, the use of these
highly structured interview formats, independently or in combination, can
minimize the potential for demographic similarity effects to occur.
Although this study was characterized by several notable strengths, it
also contains certain limitations. Our goal was to examine whether highly
structured interviews are resistant to demographic similarity effects, and
this was the first study we know of to examine similarity effects across
three commonly used structured interview formats. Due to practical con-
siderations associated with interviewing 20,000 job candidates, the three
interviews were administered to each applicant in the exact same order.
This ordering was carefully planned to facilitate the logical flow of the
interviews and also helped to ensure that every candidate was treated in
exactly the same manner, which increases the legal defensibility of the
interview process.
At the same time, the consistent ordering of the interviews does not
preclude the possibility that the (lack of) demographic similarity effects
in the experienced-based interview influenced the effects observed for
the situational and past-behavioral interviews. However, as previously
described, demographic information has the potential to influence inter-
viewers’ judgments at any stage of the interview process, regardless of
how much individuating information has already been obtained (Kunda
& Spencer, 2003; Kunda & Thagard, 1996; Wessel & Ryan, 2008). Thus,
even if individuating information was obtained and used for the first inter-
view, demographic characteristics still could have influenced subsequent
interviews. Moreover, the CFA results indicate that the three interviews
were empirically distinct, which suggests that interviewers considered
each interview separately rather than being influenced by some general
impression (e.g., caused by a similarity effect) across all three interviews.
Nevertheless, researchers interested in testing whether different interview
formats are associated with different outcomes may wish to consider the
use of counterbalanced designs.
Another potential limitation of the current work was that it was not
possible to examine less structured interview formats. Unstructured inter-
views are less valid than their structured counterparts, and both ethical and
legal concerns surround their use. Hence, many organizations, including
the one that supported this study, do not use unstructured interviews for
selection. Nevertheless, future research that directly compares similarity
effects across interviews of varying structure may be advantageous, if it
is possible to conduct such research in an actual selection context. For
example, there may be certain boundary conditions with respect to in-
terview structure that are necessary to avoid similarity effects. Length of
the interview may be one such condition. Buckley et al. (2007) found
some evidence of demographic similarity effects for simulated interviews
that were based on a single-item and were therefore relatively short in
length. Had the interview been longer, more individuating information
would have been available and the effects may have diminished (Kunda &
Spencer, 2003). Further, because interviews represent a selection method
rather than construct (Arthur & Villado, 2008), it is possible that simi-
larity effects may vary across interviews designed to measure different
Directions for Future Research
Given the rigor of interviews that conform to the essential components
of structure, we anticipate that similar findings would be obtained if the
influence of other potential group-level stereotypes (i.e., age, education,
religion) were examined in highly structured interviews. Consistent with
this proposition, there is growing evidence that age does not affect how ap-
plicants perform in structured interviews (e.g., Lin et al., 1992; Morgeson,
Reider, Campion, & Bull, 2008). However, future research should exam-
ine the extent to which these findings generalize to broader attitudinal
similarity variables, such as personality and values.
We also encourage future researchers to conduct more direct tests of
the role of individuating information. In particular, it would be valuable to
assess the relative impact of individuating information at different times
in the interview process. Qualitative field studies that assess the under-
lying processes that operate with respect to individuating information in
job interview contexts may also provide valuable insight. Moreover, re-
search that examines whether the acquisition and use of individuating
information renders broader selection and human resource practices im-
mune to demographic similarity effects would be valuable.
