ArticlePDF Available

Abstract and Figures

This study provides updated estimates of the criterion-related validity of employment interviews, incorporating indirect range restriction methodology. Using a final dataset of 92 coefficients (N = 7,389), we found corrected estimates by structural level of .20 (Level 1), .46 (Level 2), .71 (Level 3), and .70 (Level 4). The latter values are noticeably higher than in previous interview meta-analyses where the assumption was made that all restriction was direct. These results highlight the importance of considering indirect range restriction in selection. However, we found a number of studies involving both indirect and direct restriction, which calls into question the viability of assuming all restriction is now indirect. We found preliminary empirical support for correction of one of these multiple restriction patterns, indirect then direct.
Content may be subject to copyright.
Moving Forward Indirectly: Reanalyzing the
validity of employment interviews with
indirect range restriction methodology
Allen I. Huffcutt*, Satoris S. Culbertson** and
William S. Weyhrauch***
*Department of Psychology, Bradley University, Peoria, IL 61625, USA. huffcutt@fsmail.bradley.edu
**Management Department, Kansas State University, Manhattan, KS 66506, USA
***Department of Psychology, Kansas State University, Manhattan, KS 66506, USA
This study provides updated estimates of the criterion-related validity of employment inter-
views, incorporating indirect range restriction methodology. Using a final dataset of 92 co-
efficients (N= 7,389), we found corrected estimates by structural level of .20 (Level 1), .46
(Level 2), .71(Level 3), and .70 (Level 4). The latter values are noticeably higher than in
previous interview meta-analyses where the assumption was made that all restriction was
direct. These results highlight the importance of considering indirect range restriction in
selection. However, we found a number of studies involving both indirect and direct restric-
tion, which calls into question the viability of assuming all restriction is now indirect. We
found preliminary empirical support for correction of one of these multiple restriction pat-
terns, indirect then direct.
1. Introduction
Recent advancements in methodology have made
correction for indirect range restriction in organiza-
tional meta-analyses viable and widely applicable (Le &
Schmidt, 2006). Prior to this point, corrections were
typically made for direct range restriction only (e.g.,
Arthur, Day, McNelly, & Edens, 2003; McDaniel,
Whetzel, Schmidt, & Maurer, 1994). What makes this
issue particularly important is that most of the range re-
striction in organizational contexts like selection tends
to be indirect rather than direct (Hunter & Schmidt,
2004).
It appears that correction for indirect range restric-
tion can make a considerable difference in meta-
analytic outcomes. To illustrate, in a reanalysis of the
United States Employment Services dataset, Hunter,
Schmidt, and Le (2006) found a mean-corrected
criterion-related validity of .66 for general mental abil-
ity in medium complexity jobs (see their p. 606), a 29%
increase over the original estimate of .51from Hunter
and Hunter (1984; see also Schmidt & Hunter, 1998).1
A number of potentially important implications arise
from such an increase, including incremental validity
with other predictors.
However, despite its potential to change estimates,
this methodology has only begun to be utilized to reas-
sess the criterion-related validity of common selection
techniques. The two realms of constructs for which in-
direct corrections have been made are cognitive ability
and personality (Hunter et al., 2006; Schmidt, Shaffer, &
Oh, 2008). Indirect corrections have not been made yet
for other techniques such as work samples, assessment
centers, and situational judgment tests. As for employ-
ment interviews, Le and Schmidt (2006) conducted a re-
analysis of the McDaniel et al.’s (1994) meta-analytic
findings using indirect methodology. However, McDaniel
et al. is limited by their binary classification of structure
(structured vs. unstructured), which resulted in a cor-
rected validity difference of .44 versus .33. Other works
that used a more expanded classification of structure
(e.g., Huffcutt & Arthur, 1994) have found more extens-
ive differences. Given that, along with the availability of
additional studies since that time frame, we do not be-
lieve that the reanalysis of McDaniel et al. is sufficient
to address the issue of the criterion-related validity of
bs_bs_banner
International Journal of Selection and Assessment Volume 22 Number 3 September 2014
©2014 John Wiley & Sons Ltd,
9600 Garsington Road, Oxford, OX4 2DQ, UK and 350 Main St., Malden, MA, 02148, USA
employment interviews with indirect range restriction
taken into account. Further, Le and Schmidt made the
assumption that all range restriction in the McDaniel et
al. primary studies was indirect, which does not appear
to be viable given our present finding that a number of
studies appear to be in other categories (e.g., none, di-
rect, multiple).
The current investigation has three purposes. The
primary purpose is to reassess the criterion-related va-
lidity of employment interviews incorporating the new
methodology for indirect correction. A second purpose
is to help flesh out the nomological framework for cat-
egorization of range restriction mechanisms in criterion-
related validity meta-analyses, which is necessary
because the only categorization scheme available cur-
rently was designed for predictor-to-predictor analyses.
Finally, we explore the viability of correcting for studies
with a multiple range restriction mechanism, namely, in-
direct then direct, a category for which we found more
studies than expected.
To put our reanalysis of criterion-related validity in
proper context, we note the mean-corrected validity es-
timates of .20, .35, .56, and .57, respectively, for struc-
ture Level 1(no constraints) through Level 4 (complete
standardization) from Huffcutt and Arthur (1994). These
estimates continue to be cited the most frequently, and
represent a benchmark against which our results can be
compared. Hunter and Schmidt (2004) have noted that
correction for direct restriction only, as was done in
Huffcutt and Arthur, generally results in underestima-
tion of parameters. As such, we expected these estim-
ates to increase when indirect correction is made.
1.1. Categories of restriction
In a reanalysis of the relationship between employment
interviews and cognitive ability, Berry, Sackett, and
Landers (2007) separated the primary studies in their
dataset by the type of range restriction mechanism in-
volved. Specifically, they coded for five categories, in-
cluding no range restriction, direct on one variable
(either interview or cognitive ability), direct on both v-
ariables (including both composite and multiple hurdle
procedures), indirect via incumbent samples, and insuffi-
cient information to make a coding.
Accordingly, we built on the classification scheme of
Berry et al. (2007) for predictor-to-criterion analyses
(rather than predictor to predictor), the result of which
was the seven restriction categories presented in
Table 1. Similar to Berry et al., we included a category
for no restriction (Category 1) because it is theoretic-
ally possible, but did not expect to find many studies
(and perhaps none).2Given the selection context, we
also included a direct restriction category (Category 2)
as well because it would not be improbable for re-
searchers to actually use the interview under investiga-
tion to make selection decisions. The next three
categories involve indirect restriction by itself, spanning
both concurrent (Category 3) and predictive (Categor-
ies 4 and 5) designs.
The final two categories (Categories 6 and 7) involve
multiple restriction mechanisms, specifically some com-
bination of direct and indirect. More information on the
development of these categories can be obtained from
the authors. We acknowledge that other restriction cat-
egories may exist (and encourage exploration of them in
future research), but believe that these seven adequately
describe the main restriction mechanisms found in
criterion-related validity studies.
1.2. Level of structure
There are a variety of moderator variables that have
been analyzed in relation to the criterion-related validity
of employment interviews, including job complexity, in-
terviewer training, and developing questions based on a
formal job analysis (cf. McDaniel et al., 1994; Wiesner &
Cronshaw, 1988). However, a widely held viewpoint is
that, overall, structure is the single most important
moderator of interview validity. Structure has been ana-
lyzed as a central moderator in all major interview
meta-analyses, and also in both major meta-analyses of
the interrater reliability of interviews (Conway, Jako, &
Goodman, 1995; Huffcutt, Culbertson, & Weyhrauch,
2013).
Table 1. Summary of the seven range restriction mechanisms for criterion-related validity
Category Description RR Mechanism No. Studies %
1All candidates who were interviewed were hired None 3 2.3
2 Candidates were hired based on the interview Direct 3 2.3
3 Selection data were collected from incumbents Indirect 68 51.1
4 Interview given, but hiring based on another predictor Indirect 5 3.8
5 Hiring based on a composite of the interview and another predictor Indirect 33 24.8
6 Hiring based on another predictor first, and then on the interview Indirect then direct 1914.3
7 Hiring based on the interview first, and then on another predictor Direct then indirect 2 1.5
Total 133
Note:RR=range restriction.
298 Allen I. Huffcutt, Satoris S. Culbertson and William S. Weyhrauch
International Journal of Selection and Assessment
Volume 22 Number 3 September 2014
©2014 John Wiley & Sons Ltd
Theoretically, it is helpful to view the effects of struc-
ture in terms of ‘procedural variability’ (Huffcutt &
Arthur, 1994). As the level of question structure in-
creases, interviewers have progressively less choice in
the questions asked and in their content, both of which
become progressively more standardized across candid-
ates. Similarly, as the level of response evaluation in-
creases, interviewers have progressively less discretion
in how candidate responses are rated, and the consist-
ency in evaluation across candidates should progres-
sively increase. Additionally, there is potential for the
job relatedness of both the content and its evaluation to
improve with increased structure. For instance, devel-
opers who carefully standardize the questions and de-
velop individualized, benchmarked rating scales for them
are likely to do so based on some type of job analysis
(Huffcutt et al., 2013).
Because of the combined benefits of increased stand-
ardization and job relatedness, the obvious hypothesis is
that mean levels of criterion-related validity should in-
crease steadily with increased structure. The only caveat
comes from the Huffcutt and Arthur (1994) original va-
lidity findings. They found essentially the same mean-
corrected validity for Level 3 structure as for Level 4
structure (.56 vs. 57), suggesting that allowing interview-
ers some limited procedural variability does not tend to
impact the accuracy of their ratings. We expected to
find a similar pattern. (For reasons outlined later, we
chose not to analyze other moderator variables in addi-
tion to structure.)
2. Method
2.1. Identification of studies
We conducted a comprehensive literature review to
identify published and unpublished studies that provided
validity coefficients for employment interviews. First, we
conducted a computerized search of the PsycINFO,
ABI/INFORM Global,LexisNexis Academic,JSTOR,ERIC,
CINAHL,ProQuest, and Dissertations Abstracts databases
using keywords such as employment interview,situational
interview,behavior description interview,interview validity,
and similar variants. Second, we did a manual search of
conference programs for the Society for Industrial and
Organizational Psychology and the Academy of Manage-
ment from 2003 through the present. This search was
limited to 2003 and beyond based on the assumption
that most interview studies appearing prior to this point
would likely be represented in the search for published
studies. Third, we reviewed the reference section of
key interview works (e.g., Huffcutt & Arthur, 1994;
McDaniel et al., 1994; Posthuma, Morgeson, & Campion,
2002). Finally, we contacted prominent interview re-
searchers to solicit unpublished manuscripts, pre-prints,
and additional correlational data. No restrictions were
made based on the time period in which studies needed
to be conducted. As a result of our efforts, we believe
the studies in the current meta-analysis are compre-
hensive, current, and representative.
2.2. Inclusion criteria
We developed several decision rules to assist us in de-
ciding which studies were appropriate to include in the
final dataset. First, because of the focus on criterion-
related validity in organizations, we excluded studies
that were based on student samples within a laboratory
setting because they are not true employment inter-
views and the motivation of the students to perform
well is questionable. Exceptions were made for educa-
tional samples in which students were interviewing for
real internships, apprenticeships, professional training
programs, or similar activities in which they would be
performing regular duties of the profession. Such studies
included selection of medical students (Burgess, Calkins,
& Richards, 1972), nurses in training (Asher, 1970), and
dental students (Poole, Catano, & Cunningham, 2007).
Similarly, performance had to be a demonstration of
skills relevant to a job in an actual employment setting.
For example, the criteria of interest in Barrick, Swider,
and Stewart (2010) studies were the number of intern-
ship offers received and whether interviewees were in-
vited to participate in second interviews. As such, their
criteria were more reflective of interview performance
than of job performance. Finally, clinical assessments
were excluded (e.g., Finesinger, Cobb, Chapple, &
Brazier, 1948, inquired about early childhood issues), as
were studies in which interviewees were coached (e.g.,
Maurer, Solamon, & Lippstreu, 2008) in order to avoid a
potential confound.
Given the nature of our inclusion criteria, some
studies were able to be eliminated on the basis of in-
formation provided in the title or the abstract. In
other cases, they were eliminated after a careful in-
spection of their reported methodology. The result of
our efforts was a dataset of 133 criterion-related inter-
view validity coefficients.
2.3. Coding of validity coefficients
All three authors independently coded each validity co-
efficient and participated in regular meetings to review
and discuss any discrepancies, all of which were re-
solved by consensus. In addition to the validity value and
sample size, we coded for the range restriction mechan-
ism involved and for the level of structure.
Range restriction was coded using the seven catego-
ries presented in Table 1, which also shows the cat-
egorical breakdown of the 133 coefficients. Somewhat
surprisingly, we found three coefficients that appeared
Employment Interview Validity 299
©2014 John Wiley & Sons Ltd International Journal of Selection and Assessment
Volume 22 Number 3 September 2014
to have no restriction (Category 1). Latham, Saari,
Pursell, and Campion (1980) noted that all of their
applicants were subsequently hired. Benz (1974) noted
that their sample of applicants did ‘not constitute a re-
stricted sample since virtually all applicants who possess
the required qualifications pass this test and are subse-
quently hired’ (p. 4). Finally, McMurry (1947) clearly
stated that ‘all applicants were employed . . . In this
manner, a completely representative cross-section of
applicants was included’ (p. 267).
We also found three validity coefficients with direct
restriction only (Category 2). Two of these were de-
scribed by Arvey, Miller, Gould, and Burch (1987), who
noted that ‘the interview was the sole basis for hiring
decisions’ (p. 3). The other was Robertson, Gratton,
and Rout (1990), where candidates were placed ‘as a di-
rect consequence of their performance in the situational
interview’ (p. 72). The low number in this category con-
firms Hunter and Schmidt’s (2004) position that the
focus of correction should be on indirect rather than di-
rect restriction.
The three categories with indirect restriction alone
clearly represented the majority of the dataset, with a
total of 106 coefficients (79.7%). As examples, we refer
to Campion, Campion, and Hudson (1994) and
DeGroot and Gooty (2009). Campion et al. noted that
their design ‘allowed a concurrent validation in that the
interview and tests could be validated against job per-
formance’ (p. 999). DeGroot and Gooty noted in their
abstract that they used ‘a concurrent design to validate
the interview’ (p. 179). For reasons explained later, we
were not able to analyze all categories with indirect re-
striction. A critical implication of the later point is that
not all indirect restriction is the same. Finally, and per-
haps most interesting psychometrically, there were 21
coefficients (15.8%) in the two categories with multiple
restriction mechanisms (Categories 6 and 7). These co-
efficients do not appear to fit under either the direct-
only premise or the indirect-only premise.
In regard to interview structure, we carefully analyzed
standardization gradients of both questions and re-
sponse evaluation, and determined that five levels of
both were maximally descriptive. Even though we col-
lapsed the various combinations into four overall levels
(as described below) to maintain compatibility with
Huffcutt and Arthur (1994), our benchmark, we still felt
it would be helpful to outline the full taxonomy for use
in future research. We note that this expanded tax-
onomy is not without precedent, as Conway et al.
(1995) used five levels of question standardization in
their reliability meta-analysis.
Specifically, we coded question standardization as (1)
no structure, (2) topical areas with sample questions, (3)
majority of questions fixed with extensive choice and
probing, (4) all questions fixed with limited probing or
some choice in pre-established questions without prob-
ing, and (5) completely fixed questions with no probing,
and response evaluation as (1) global, (2) simple dimen-
sional (e.g., Likert-type scaling), (3) complex dimensional
(e.g., behaviorally anchored rating scales), (4) individual
rating by question with simple scales, and (5) individual
rating with complex scales.
The four overall levels of structure we derived from
the above structural taxonomy are diagramed in Fig-
ure 1. Like Huffcutt and Arthur (1994), our overall
structure Level 1reflects the lowest level of structure
and consists of interviews with no question structure
and only a global rating scale. Similarly, our Level 2 in-
cludes interviews with low-to-moderate question struc-
ture (e.g., topical areas or a question bank) combined
with a dimensional rating scale. These two levels would
be roughly equivalent to the ‘unstructured’ category in
the McDaniel et al. (1994) interview meta-analysis.
Continuing, overall Level 3 includes interviews with
moderate-to-high question structure (e.g., a question
bank or fixed questions with or without probing) and a
single global rating as well as interviews with moderate
question structure (e.g., a question bank) with question-
based response scoring. Finally, overall Level 4 includes
interviews with high question structure (fixed questions)
and question-based response scoring. These two cat-
egories should be roughly representative of the ‘struc-
tured’ category in McDaniel et al. (1994).
We note a minor deviation between our overall Level
4 and that of Huffcutt and Arthur (1994), which was the
result of our more refined coding scheme. Specifically,
our overall Level 4 allows for minor procedural variabil-
ity in question standardization, typically allowing inter-
viewers to ask limited probes to the completely fixed
questions such as asking for a little more detail (the
fourth level in our question standardization gradients).
Huffcutt and Arthur combined these interviews into the
previous question gradient, which typically involves wide
choice of questions along with extensive probing. We
Question structure
12 3 45
Response structure
1
2
3
4
5
Level 1
Level 2 Level 3
Level 4
Theoretically
improbable
Figure 1. Classification scheme for the four overall levels of interview
structure, based on coded levels of question and response structure.
300 Allen I. Huffcutt, Satoris S. Culbertson and William S. Weyhrauch
International Journal of Selection and Assessment
Volume 22 Number 3 September 2014
©2014 John Wiley & Sons Ltd
believe that the fourth question gradient is much closer
both conceptually and operationally to the fifth gradient
than it is to the third gradient. Thus, we left the third
question gradient in overall Level 3 structure and in-
cluded the fourth and fifth question gradients in overall
Level 4.
2.4. Analyzability/combination of restriction
categories
Not surprisingly, the most number of validity coeffi-
cients (k=68) were from incumbent samples (Category
3). In these studies, the interview under investigation
was not used in the original selection, but would be ex-
pected to correlate to some degree with the original
measures used (which most likely included an inter-
view). Thus, the restriction mechanism is indirect alone.
There were a much smaller number of coefficients
(k=5) where the interview under investigation was
given to job applicants (rather than incumbents) but not
used to make selection decisions (Category 4). Similarly,
the restriction mechanism here is indirect alone. Given
that both groups involved only indirect restriction, we
felt justified in combining them into one indirect restric-
tion group. That group is designated as indirect only in
the analyses.
There were 33 studies in Category 5, where the in-
terview under investigation was given to job applicants
and used to make selection decisions. What makes this
category tricky is that interview ratings were not used
alone, but rather combined with another predictor (e.g.,
mental ability, personality, work sample) to form a com-
posite score. Normally, using those interview ratings
to select would constitute direct restriction. However,
because they were combined with something else,
the mechanism becomes indirect. To correct for this
mechanism would require estimates of the average
standard deviation in composite scores across the
samples and the standard deviation of composite scores
in the population, as well as reliability estimates of the
composite scores. We believe it may be possible to
make such a correction, but doing so will require addi-
tional consideration and is beyond the scope of the
present investigation. Thus, we eliminated these coeffi-
cients and leave them as a matter for future research.
There were 19 validity coefficients (14.3%) in Cat-
egory 6, where selection was based on another predic-
tor first and then on the interview under investigation
(i.e., multiple hurdle). Because the direct restriction oc-
curred last, it may be possible to correct for it first, and
then for the indirect restriction. Although such a proce-
dure is not mentioned in the Hunter and Schmidt (2004)
book on meta-analysis, it seemed worth exploring.
However, the correction for direct restriction would
have to be done at the group (rather than individual) co-
efficient level using values from an artifact distribution
since information on direct range restriction was not re-
ported for every coefficient. The only way to combine
these coefficients with the indirect only one would be
to correct each one individually for direct restriction
using a mean artifact distribution value. Once again, it
may be possible to do so, but would require further
consideration. Thus, we retained this group for analysis,
but kept it separate as a second dataset (at least
upfront; as explained later, we combined the two
datasets at a summary level).
Lastly, because of their small number, we choose not
to retain the three coefficients with no restriction (Cat-
egory 1) or the three coefficients with direct restriction
only (Category 2). It may be possible to combine these
at the summary level with the indirect only and indirect
then direct studies. However, the ultimate goal of this
investigation was to derive estimates of criterion-related
validity for the four levels of structure. Given potential
combination issues and the small number, this approach
did not appear viable.
In summary, our search and inclusion process re-
sulted in two datasets with which we felt a reasonable
correction for range restriction could be made. There
were 73 indirect only coefficients, with a total sample
size of 5,713. By structure, 3 were Level 1,11 were
Level 2, 13 were Level 3, and 46 were Level 4. And, 19
were indirect then direct, with a total sample size of
1,676. By structure, 9 were Level 2, 2 were Level 3, and
8 were Level 4. References for these data sources are
marked by an asterisk (*) in the reference section. Ref-
erences are further marked with a caret () for those
that include a coefficient from the indirect then direct
range restriction group. Among the 92 total coefficients,
63 were drawn from published sources, whereas 29
were drawn from unpublished sources.
2.5. Overlap with previous meta-analyses
To help assess the overlap between our dataset and
previous works, we examined the references for previ-
ous meta-analyses on employment interviews and com-
pared them with the references for our 92 coefficients.
The coefficients included in our analysis came from 64
published and unpublished studies (some of which pro-
vided multiple coefficients). We found that 10 of these
sources overlapped with McDaniel et al. (1994), but
could not assess the overlap with Huffcutt and Arthur
(1994) because they did not provide a reference list of
sources. However, we expected the Huffcutt and Ar-
thur data sources to be very similar to that of McDaniel
et al. given that they were published the same year. Fur-
ther, our meta-analysis includes 38 studies not previ-
ously referenced in these prior meta-analyses. This low
overlap reflects a combination of newer studies, our
stringent inclusion criteria, and a focus on specific forms
of range restriction (e.g., we were not able to analyze
Employment Interview Validity 301
©2014 John Wiley & Sons Ltd International Journal of Selection and Assessment
Volume 22 Number 3 September 2014
41of the 133 studies shown in Table 1because of the
form of range restriction present, leaving 92 analyzable
ones).
2.6. Nonindependence
As noted above, some studies reported multiple validity
coefficients based on the same sample of participants.
For example, several interviews contained multiple
question formats, such as situational versus behavior de-
scription (e.g., Campion et al., 1994; Huffcutt, Weekley,
Wiesner, Degroot, & Jones, 2001). Whereas a common
method of addressing such nonindependence is to aver-
age validity coefficients across the multiple components,
we do not believe that this practice is advisable when
dealing with interviews given that different components
may exhibit different properties. For example, several
studies have found a relatively low correlation between
parallel situational and behavior description questions
(e.g., Huffcutt et al., 2001; Latham & Skarlicki, 1995). In
addition, job complexity may differentially influence the
validity of situational interviews relative to behavior de-
scription interviews (Huffcutt, Conway, Roth, & Klehe,
2004). As such, and in line with other recent interview
meta-analytic work (e.g., Huffcutt et al., 2013), we opted
to adjust the sample sizes so that the total across the
nonindependent components came to the total for the
study. For instance, Kluemper (2006) reported separate
validity estimates for situational and behavior descrip-
tion questions based on the same sample of 66 youth
treatment workers, so we adjusted the sample size to
33 for each estimate.
2.7. Study weighting
As commonly happens in organizational meta-analyses,
our dataset included some coefficients that were based
on a notably larger sample. To illustrate, Pulakos and
Schmitt (1995, Study 3), Githens and Rimland (1964),
and Dipboye, Gaugler, Hayes, and Parker (2001, Study
1) collectively represented close to one fifth of the total
combined dataset. Given the basic goal of meta-analysis
to integrate results across a variety of sources and con-
texts, allowing a few large-sample values to overly influ-
ence results is inappropriate. Indeed, the possibility that
additional moderator variables (other than structure)
exerted influence is too great to allow these coefficients
to be given substantially more weight. As such, we
switched from pure sample weighting to using the
square root of the sample size as a weight. Doing so re-
tained a stronger influence from coefficients based on a
larger sample size, while preventing their domination on
the results. We note several precedents for alternate
weighting schemes within the interview literature (e.g.,
Huffcutt et al., 2013; Huffcutt, Roth, & McDaniel, 1996).3
2.8. Outliers
Finally, we computed the sample-adjusted meta-analytic
deviancy (SAMD) statistic to identify potential outliers in
our dataset (Arthur, Bennett, & Huffcutt, 2001; Huffcutt
& Arthur, 1995). Following Cortina’s (2003) recommend-
ation to discard outliers only if there is overwhelming
empirical or methodological justification, we reviewed
the flagged studies on a case-by-case basis. Upon close
inspection, we identified two outliers for those involving
indirect then direct range restriction (Kennedy, 1986;
McKinney & Melter, 1979). Given their substantial
SAMD values (3.16 and 4.22, respectively), we opted
to discard them. We report all analyses with these out-
liers removed.
2.9. Range restriction corrections
In order to conduct the corrections for the indirect
only studies, we followed the five-step approach out-
lined by Hunter and Schmidt (2004; see their pages
166–167). First, we estimated the reliability of the pre-
dictor in the population. In a recent meta-analysis of the
interrater reliability of employment interviews, Huffcutt
et al. (2013) found an overall mean of .44 across separ-
ate (as opposed to panel) interviews, as well as values of
.40, .48, and .61for low, medium, and high levels of
structure, respectively. These estimates captured all
three sources of measurement error (random response,
transient, and conspect), and as such are more accurate
estimates of interrater reliability. (Panel interviews, for
example, only account for conspect error.) Accordingly,
we used these values for the observed values.
To transform these observed reliability values into
population estimates required a mean range restriction
ratio (the restricted standard deviation of the interviews
in the samples divided by the unrestricted standard de-
viation). The only range restriction values available in
the literature (i.e., Huffcutt & Arthur, 1994; Salgado &
Moscoso, 2002) most likely represented direct restric-
tion. Accordingly, we conducted a simulation. Based on
a reasonable scenario where there was an initial 50% cut
based on a cognitively related predictor (e.g., education
or ability test scores) followed by hiring the top 10% of
remaining candidates based on original interview scores,
the indirect range restriction ratio for the target inter-
view being investigated in the primary studies (but not
used in actual selection) was .74. We note that this
value is higher (i.e., less restricted) than the direct range
restriction ratio of .61we used to undue the direct re-
striction in the indirect then direct studies (described
below), and may be somewhat conservative. A reason-
able implication is that the higher criterion-related valid-
ity estimates found under indirect correction are not
due to use of a more restricted ratio value, but rather
to the fundamentally different steps in the indirect cor-
rection process.
302 Allen I. Huffcutt, Satoris S. Culbertson and William S. Weyhrauch
International Journal of Selection and Assessment
Volume 22 Number 3 September 2014
©2014 John Wiley & Sons Ltd
Second, we transformed the indirect range restriction
ratio of .74 into its population counterpart. Third,
we corrected the mean observed correlation for
unreliability in the predictor and the criterion. The reli-
ability values at this step are observed. We used .52 for
the criterion, and the appropriate reliability value for the
interview as outlined earlier (overall low, medium, or
high structure). Fourth, we further corrected the mean
from the previous step for range restriction using the
population version of the range restriction ratio. The
result value represents the fully corrected population
(construct-level) correlation between interview and per-
formance ratings. Finally, we adjusted the latter correla-
tion back down to an operational level by multiplying it
by the square root of the appropriate population reli-
ability value.
To correct for indirect then direct range restriction,
we first needed to undue the direct range restriction to
reduce it to indirect only. We used the mean range re-
striction value (u, the ratio of restricted to unrestricted
standard deviation) of .61from Salgado and Moscoso
(2002) because it was more recent and was based on
over twice as many studies (38 vs. 15) as the previous
mean value from Huffcutt and Arthur (1994). Further,
the Salgado and Moscoso estimate was used by Berry et
al. (2007) in their reanalysis of the interview – mental
ability relationship.
Finally, to correct for performance criterion
unreliability, we used the value of .52 found by
Rothstein (1990; see p. 326). Although some interview
meta-analyses used the value of .60 from King, Hunter,
and Schmidt (1980), later works (e.g., Hunter & Hirsh,
1987; Schmidt, Hunter, Pearlman, & Hirsh, 1985) identi-
fied this value as being an upper-bound estimate.
3. Results
In each of our analyses, we report the number of valid-
ity coefficients, total sample size, mean observed validity
value, observed standard deviation, 80% confidence in-
terval around the mean, residual standard deviation
(after removal of sampling error), 80% credibility inter-
val, and corrected mean validity value. Confidence inter-
vals reflect the amount of error in mean estimates due
to sampling error and are used to assess the accuracy of
the estimate of the mean effect size (Whitener, 1990).
Using the formula from Schmidt, Hunter, and Raju
(1988; p. 668, right-hand column), we computed the
standard error of each mean estimate and then multi-
plied it by 1.28 to form the plus/minus component.
Credibility intervals are based on the residual standard
deviation (Hunter & Schmidt, 2004; p. 205), and are
used as indicators for the presence of moderator vari-
ables, or that a given effect is dependent on the situation
(i.e., there are subpopulations present; Whitener,
1990).
3.1. Indirect restriction only
Results for indirect only are shown in the upper half of
Table 2. Given the imbalance in number of coefficients
across structure levels, specifically more at the highest
level, the mean overall validity should be interpreted
with caution. As expected, the mean-corrected validity
is the lowest for Level 1structure (.20), which is identi-
cal to the value found by Huffcutt and Arthur (1994). All
of the variability could be accounted for by sampling
error. Although tentative because of the small number
of studies, these results suggest that failure to provide
interviewers with at least some guidance results in es-
sentially a disordered conversation rather than a precise
psychometric assessment. Further, because of the weak
starting point, it does not appear to matter much what
range restriction correction is made.
The mean validity for Level 2 structure (.48) is notice-
ably higher than the Huffcutt and Arthur (1994) estim-
ate of .35. Once again, virtually all (over 90%) of the
variability could be accounted for by sampling error.
Table 2. Criterion-related validity estimates overall and by structure within range restriction category
kN rSDr PVA CI80% SDres CR80% r(D) ρ
Range restriction category/level of structure
Indirect only 73 5,713.31.16141.0 .28–.33 .123 .15–.47 – .68
Level 13 592 .07 .035 100.0 .01–.12 .000 .07–.07 .20
Level 2 11 817.18.11891.5 .13–.22 .034 .13–.22 – .48
Level 3 131,344 .35 .12151.3 .31–.39 .085 .24–.46 .72
Level 4 46 2,960 .36 .148 55.0 .07–.64 .099 .23–.48 .69
Indirect then direct 191,676 .17.172 100.0 .13–.20 .000 .17–.17 .27 .62
Level 2 9 1,185 .10.179 100.0 .06–.14 .000 .10–.10.16 .45
Level 3 2 69 .16 .037 100.0 .00–.31.000 .16–.16 .25 .58
Level 4 8 422 .28 .098 100.0 .23–.34 .000 .28–.28 .44 .78
Note: In the above table, kis the number of studies, Nis the total sample size, ris the mean observed validity, SDris the observed standard devi-
ation of the validity values, PVA is the percent of variance accounted for by sampling error, CI80% is the 80% confidence interval, SDres is the residual
standard deviation after removal of sampling error, CR80% is the 80% credibility interval, r(D) is the mean validity after correction for direct range re-
striction, and ρis the fully corrected estimate of the criterion-related validity of employment interviews. There were no Level 1studies in the
indirect then direct category.
Employment Interview Validity 303
©2014 John Wiley & Sons Ltd International Journal of Selection and Assessment
Volume 22 Number 3 September 2014
The main implication here is that providing interviewers
with a basic organizational strategy and then trusting
them to carry out the interviews effectively tends to go
a long way. The benefits of doing so were previously
masked somewhat by the direct only restriction correc-
tions made in the major works of the 1990s, and now
come to full light when properly corrected using indir-
ect methodology.
The mean validities for Level 3 (.72) and Level 4 (.69)
are also noticeably higher than their counterpart values
of .56 and .57 in Huffcutt and Arthur (1994), again most
likely due to the proper correction for indirect restric-
tion. Several points of interest emerge from these re-
sults. First, roughly half of the variability can be
accounted for by sampling error, suggesting that there
may be another moderating influence besides structure.
Identification of this other moderator is an important
avenue for future research. Job complexity could be ex-
plored, especially so given that it appears to moderate
the validity general mental ability tests (Hunter &
Hunter, 1984). Type of question could also be explored,
such as situational versus behavior description, which
has been found to exert influence on interview out-
comes (e.g., Huffcutt et al., 2001).
Second, the criterion-related validity of structured
employment interviews appears to broach the .7 level.
These values (.72 and .69) are surprisingly similar to the
values of .73 and .74 found the two highest levels of job
complexity when they were reanalyzed using indirect
methodology (Hunter et al., 2006). Under direct correc-
tion, the benchmark for being considered a ‘top’ selec-
tion predictor appeared to be around the .5 level (see
Schmidt & Hunter, 1998; Table 1). In short, the bar ap-
pears to have been raised, although not by improvement
of these selection predictors themselves, but rather by
proper correction for range restriction with them.
Finally, the mean validity for Level 3 structure (.71)
was actually higher than that for Level 4 structure (.70)
in the combined results (.72 vs. 69 for the indirect only
results), although not by much. Whether this pattern is
truly real or just a sampling artifact should be pursued in
future research. What makes this issue particularly im-
portant is not just the implication that it is okay to allow
interviewers some minor variability in their approach
(e.g., limited probing), but rather that doing so could ac-
tually improve the accuracy of their ratings. As Barrick
et al. (2012) noted, ‘there is a limit to the amount of
structure that can be imposed upon the interview be-
fore it becomes an impersonal, mechanical process’
(p. 331).
3.2. Indirect then direct restriction
There were no Level 1structure coefficients found for
this restriction pattern, and as such the mean overall
corrected validity should be interpreted with caution.
The first important thing to note about these results is
the mean observed (uncorrected) validities. They are
clearly lower than their counterparts in the indirect only
analysis, suggesting (perhaps confirming) that multiple
restriction mechanisms (indirect and direct) were in fact
present and did operate to drive validity lower than one
restriction mechanism alone (indirect).
The second important thing to note is that, after cor-
rection for direct restriction and then indirect restric-
tion, the mean validities do appear to be very
comparable with their counterparts in the indirect only
analysis. Some are higher while some are lower, which is
expected from sampling error given the modest number
of studies. Specifically, indirect only is higher for Level 2
structure (.48 vs. 45) and for Level 3 structure (.72 vs.
58), while indirect then direct is higher for the Level 4
(.78 vs. 69). Although extremely tentative for several
reasons, we nonetheless believe these results offer em-
pirical support for the viability of correcting for this spe-
cific restriction pattern (indirect then direct).
3.3. Overall (combined) results
Given that the indirect then direct correction appeared
viable, we combined the two datasets at the summary
level. Specifically, we weighted each fully corrected
mean validity coefficient by its total sample size, both
overall and by structure level. The results are presented
in Table 3. The overall estimate should again be viewed
with caution, for the reasons noted earlier. By structure
level, the mean-corrected criterion-related validity
estimates were .20 (Level 1), .46 (Level 2), .71(Level 3),
and .70 (Level 4), respectively. We believe these to be
the best available estimates of the criterion-related va-
lidity of employment interviews.
4. Discussion
One very prominent implication to emerge from this in-
vestigation is that the assumption that all restriction is
now indirect is unwarranted, at least for employment in-
terviews. Berry et al. (2007) found that less than half of
the studies in their interviews versus cognitive ability
Table 3. Combined results for the criterion-related validity of
employment interviews
Category kN ρ
Overall 92 7,389 .67
Level 13 592 .20
Level 2 20 2,002 .46
Level 3 151,413.71
Level 4 54 3,382 .70
Note: In the above table, kis the number of studies, Nis the total
sample size, and ρis the fully corrected mean validity.
304 Allen I. Huffcutt, Satoris S. Culbertson and William S. Weyhrauch
International Journal of Selection and Assessment
Volume 22 Number 3 September 2014
©2014 John Wiley & Sons Ltd
dataset had indirect restriction (see their Table 3). Al-
though we found a noticeably higher percentage, there
were still a significant percentage of studies in other cat-
egories. In fact, if we had assumed all indirect restric-
tion, we would have missed the important contribution
that a double correction for studies with the multiple
restriction pattern indirect then direct appears to be
viable.
That said, we do not believe that this implication is
limited to employment interviews. The cognitive ability
dataset reanalyzed by Hunter et al. (2006) may be
somewhat unique in the selection landscape because all
of the studies came from a single source, the U.S. De-
partment of Labor (see Schmidt & Hunter, 1998, p.
264). There is a very good chance that studies for the
selection predictors remaining to be reanalyzed come
from a variety of sources (e.g., organizations, research-
ers, publication outlets). Thus, we advise strongly
against an initial assumption of all indirect when such
time that predictors such as assessment centers, work
samples, situational judgment tests, and even personal-
ity tests (which have only been partially reanalyzed) are
redone.
However, precision often comes with a price, and
that is clearly the case in the present context. Throw-
ing all studies together (as was done in the direct era)
resulted in a relatively large dataset, one with which it
was easy to do subdivision-type moderator analyses.
Because of differing restriction mechanisms, it is now
more difficult to combine studies at the individual
study level. For instance, our main indirect only ana-
lysis was composed of 73 studies, far short of the 114
studies analyzed by Huffcutt and Arthur (1994) almost
two decades ago. To enhance the stability of the find-
ings, we felt limited to analyzing only one moderator
variable (structure).
One of the most critical directions for future re-
search is thus to explore the viability of applying a
group-level artifact correction (e.g., direct) to the stud-
ies individually so that they can be combined with stud-
ies in other restriction categories. Through additional
research, including simulations, it may turn out that
doing so is reasonably accurate. If so, then studies could
once again be lumped together, thereby making it easier
to assess additional moderator variables.
On a related note, another important direction for
future research is to focus on developing the methodo-
logy for correcting for restriction mechanisms other
than direct or indirect by themselves. For instance, we
found empirical support for correcting the indirect then
direct restriction pattern (by first undoing the direct
restriction). However, it would be helpful to provide
confirmation in other ways as well (e.g., through
simulation). We did not attempt to correct for the re-
verse pattern, namely, direct then indirect, but believe
that may be possible too. Similarly, it may be possible to
correct for restriction when the interview and another
predictor are combined into a composite.
Clearly, a new chapter in organizational meta-analysis
has begun, and much more work is yet needed. There
are a number of selection predictors for which reanaly-
sis has yet to be performed (e.g., work samples, assess-
ment centers, situational judgment tests). The selection
landscape will not be complete until all predictors are
reanalyzed and synthesized. Hunter and Hunter (1984)
did this the first time around under the direct correc-
tion premise. Selection researchers should give top pri-
ority to reaching a new synthesis point in the new
indirect restriction landscape.
To aid in the reanalysis of selection predictors, we
urge researchers in the strongest possible terms to pro-
vide all possible information regarding restriction when
reporting their findings. There are already too many
studies in the literature where needed information (e.g.,
on moderators) is either not provided or not provided
with a high level of clarity. Under the new umbrella of
indirect correction and restriction categorization, the
reporting bar has been raised even higher. For instance,
clear reporting of the standard deviation of the study
sample versus the population value would be extremely
helpful, as would the intercorrelation among predictors
at all possible points in the process.
In terms of the practical implications of our empir-
ical results, it is clear that employment interviews re-
tain their place among the handful of elite selection
predictors. Given that both structured interviews and
mental ability tests appear to have exceptional validity
and that they do not appear to correlate highly with
each (see Berry et al., 2007), it would seem hard to go
wrong with an organizational strategy of combining a
structure interview and an ability test into a test bat-
tery for selection. Whether it would make sense to in-
clude additional predictors into this battery remains to
be seen, contingent upon the results of their indirect
reanalysis.
This study is not without limitations. First, because of
the lower number of studies that resulted from breaking
studies into range restriction categories (and not being
to analyze some of those categories), we were limited
to analyzing structure as the sole moderator of inter-
view validity. Again, we call for researchers to provide
sufficient information in their interview studies to allow
for correction under the indirect premise. Second, be-
cause of vague or incomplete reporting in our primary
studies, it is possible that a few studies were inaccur-
ately coded. For instance, failure to mention an early
step in a selection procedure (prior to the interview)
could result in misclassification into the direct only re-
striction category. Finally, we had to make assumptions
in order to complete the indirect correction process,
and it should be possible to refine those assumptions
through future research.
Employment Interview Validity 305
©2014 John Wiley & Sons Ltd International Journal of Selection and Assessment
Volume 22 Number 3 September 2014
Notwithstanding these limitations, we believe this
study makes a valuable contribution to the employment
interview literature. In addition to the empirical results,
we make a number of contributions to the methodology
for reanalyzing the criterion-related validity of selection
predictors. To illustrate, we fleshed out the categories
of restriction from Berry et al. (2007) to make them rel-
evant to a predictor-to-criterion analysis (rather than a
predictor-to-predictor analysis), including the creation
of several new ones. Further, we carried out a correc-
tion for a multiple restriction pattern, the results of
which seem viable. In summary, it is our hope that this
study provides a springboard and a stimulus for con-
tinued research on this unique approach to selecting
new employees, and for selection predictors in general.
Notes
1. The increase is 67% when viewed in terms of percentage
of variance.
2. It is important to note that ‘no restriction’ does not imply
that anyone could apply for any job regardless of their
background. Many jobs have prerequisite characteristics
that are needed before someone should apply (e.g., back-
ground, training, experience). No restriction implies that
all who could reasonably apply for a given job were hired.
We thank a reviewer for pointing this out.
3. Huffcutt et al. (1996) used a three-point weighting scheme
based on various ranges of sample size (see their p. 465),
while Huffcutt et al. (2013) used the cubic root. Given (as
noted above) that the three largest coefficients in our
study comprised roughly one fifth of the total sample size,
we felt that use of an alternate scheme was justified, but
not one as extensive as either of these two. Square root
weighting seemed like a logical step down (particularly
from cubic weighting since a 3-point scheme is more ex-
treme than cubic).
References
References marked with an asterisk (*) were included in ana-
lyses. References marked with a caret () include a study
from the indirect then direct range restriction group.
*Anderson, R. C. (1954). The guided interview as an evaluative
instrument. Journal of Educational Research, 48, 203–209.
Arthur, W., Jr., Bennett, W., Jr., & Huffcutt, A. I. (2001). Con-
ducting meta-analysis using SAS. Mahwah, NJ: Erlbaum.
Arthur, W., Jr., Day, E. A., McNelly, T. L., & Edens, P. S.
(2003). A meta-analysis of the criterion-related validity
of assessment center dimensions. Personnel Psychology, 56,
125–154.
*Arvey, R. R., Miller, H. E., Gould, R., & Burch, R. (1987). In-
terview validity for selecting sales clerks. Personnel Psycho-
logy, 40, 112. doi:10.1111/j.1744-6570.1987.tb02373.x
Asher, J. J. (1970). Reliability of a novel format for the selec-
tion interview. Psychological Reports, 26, 451–456.
*Banki, S., & Latham, G. P. (2010). The criterion-related valid-
ities and perceived fairness of the situational interview and
the situational judgment test in an Iranian organization. Ap-
plied Psychology: An International Review, 59, 124–142.
Banta, G. W. (1967). A comparison of the leaderless group
discussion and the individual interview techniques for the
selection of student orientation assistants. Dissertation Ab-
stracts International, 28, 937.
*Barrett, G. V., Svetlik, B., & Prien, E. P. (1966). Validity of the
job-concept interview in an industrial setting. Journal of Ap-
plied Psychology,51, 233–235.
Barrick, M. R., Dustin, S. L., Giluk, T. L., Stewart, G. L., Shaffer,
J. A., & Swider, B. W. (2012). Candidate characteristics driv-
ing initial impressions during rapport building: Implications
for employment interview validity. Journal of Occupational
and Organizational Psychology, 85, 330–352.
Barrick, M. R., Swider, B. W., & Stewart, G. L. (2010). Initial
evaluations in the interview: Relationships with subsequent
interviewer evaluations and employment offers. Journal of
Applied Psychology, 95, 1037–1046.
*Benz, M. P. (1974). Validation of the examination for Staff Nurse
II. Urbana, IL: University Civil Service Testing Program of Il-
linois, Testing Research Program.
Berry, C. M., Sackett, P. R., & Landers, R. N. (2007). Revisiting
interview–cognitive ability relationships: Attending to spe-
cific range restriction mechanisms in meta-analysis. Personnel
Psychology, 60, 837–874.
*Beutel, J. L. (2006). The development of field test of an employ-
ment interview system designed to identify the highly qualified
special education teacher. Doctoral dissertation. Retrieved
from ProQuest Information & Learning. (3234969).
*Bonneau, L. R. (1957). An interview for selecting teachers
(Doctoral dissertation, University of Nebraska, 1956). Dis-
sertation Abstracts International,17, 537–538.
*Bosshardt, M. J. (1994). Situational interviews versus behavior de-
scription interviews: A comparative validity study. Unpublished
doctoral dissertation, University of Minnesota.
Burgess, M. M., Calkins, V., & Richards, J. M. (1972). The struc-
tured interview: A selection device. Psychological Reports,31,
867–877.
*Burnett, J. R. & Motowidlo, S. J. (1998). Relations between
different sources of information in the structured selection
interview. Personnel Psychology,51, 963–983.
*Burroughs, W. A., & White, L. L. (1996). Predicting sales per-
formance. Journal of Business and Psychology,11, 73–84.
*Campbell, J. T., Prien, E. P., & Brailey, L. G. (1960). Predicting
performance evaluations. Personnel Psychology,13, 435–440.
*Campion, M. A., Campion, J. E., & Hudson, J. P., Jr. (1994).
Structured interviewing: A note on incremental validity and
alternate question types. Journal of Applied Psychology, 79,
998–1002.
*Cassidy, M. W. (2011). Concurrent validity of the ‘Working with
Others Scale’ of the ICIS employment interview system. Doc-
toral dissertation. Retrieved from ProQuest Information &
Learning. (AAI3487815).
Conway, J. M., Jako, R. A., & Goodman, D. F. (1995). A meta-
analysis of interrater and internal consistency reliability of
selection interviews. Journal of Applied Psychology, 80, 565–
579.
Cortina, J. M. (2003). Apples and oranges (and pears, oh my!):
The search for moderators in meta-analysis. Organizational
Research Methods,6,415–439.
Davis, R. (1986). Unpublished raw data.
306 Allen I. Huffcutt, Satoris S. Culbertson and William S. Weyhrauch
International Journal of Selection and Assessment
Volume 22 Number 3 September 2014
©2014 John Wiley & Sons Ltd
Dawson, C. R. (2005). The influence of impression manage-
ment on structured interview ratings and job performance.
(Doctoral dissertation, Clemson University, 2005). Disserta-
tion Abstracts International, 66(9-B), 5130.
*DeGroot, T., & Gooty, J. (2009). Can nonverbal cues be used
to make meaningful personality attributions in employment
interviews? Journal of Business and Psychology, 24, 179–192.
*DeGroot, T., & Motowidlo, S. J. (1999). Why visual and vocal
interview cues can affect interviewers’ judgments and pre-
dict job performance. Journal of Applied Psychology, 84, 986–
993.
*Delery, J. E. Wright, P. M., McArthur, K., & Anderson, C. D.
(1994). Cognitive ability tests and the situational interview.
International Journal of Selection and Assessment,2,53
58.
*Dillon, A., & Ebmeier, H. (2009). The development and field
test of an employment interview instrument for school pro-
fessionals. Journal of Special Education Leadership, 22, 93–
104.
Dipboye, R. L., Gaugler, B. B., Hayes, T. L., & Parker, D.
(2001). The validity of unstructured panel interviews: More
than meets the eye? Journal of Business and Psychology,16,
35–49.
DuBois, P. H., & Watson, R. K. (1950). The selection of pa-
trolman. Journal of Applied Psychology, 34, 90–95.
*Ebmeier, H., & Ng, J. (2005). Development and field test of an
employment selection instrument for teachers in urban
school districts. Journal of Personnel Evaluation in Education,
18, 201–218.
Finesinger, J. E., Cobb, S., Chapple, E. D., & Brazier, M. A. B.
(1948). Investigation of prediction of success in naval flight train-
ing. Part I: The squantum study (Report No. 81). Washington,
DC: Civil Aeronautics Administration.
*Flint, D. (1985). Effects of content oriented development process
on criterion related validity. Unpublished manuscript.
*Friedland, D. (1976). Junior administrative assistant validation
study. City of Los Angeles Personnel Department. Unpub-
lished manuscript.
*Gibb, J. L., & Taylor, P. J. (2003). Past experience versus situ-
ational employment: Interview questions in a New Zealand
social service agency. Asia Pacific Journal of Human Resources,
41,371–383.
*Githens, W. H., & Rimland, B. (1964). The validity of NROTC
selection interviews against career decisions and officer fitness re-
ports: An eight year followup (PRASD Rep. No. 234). San
Diego, CA: U.S. Naval Personnel Research Activity.
*Green, P. C., Alter, P., & Carr, A. (1993, April). Reliability and
validity of a behavior-based interview. In D. L. Denning
(Ed.), Psychometric analysis of the structured interview (pp. 203–
212). San Francisco, CA: Symposium conducted at the
Eighth Annual Meeting of the Society for Industrial and Or-
ganizational Psychology.
*Hale, T. M. (2006). The development and field test of an inter-
view system to identify quality school district office personnel.
Doctoral dissertation. Retrieved from ProQuest Informa-
tion & Learning. (3234995).
*Harel, G., Arditi-Vogel, A., & Janz, T. (2003). Comparing the
validity and utility of interview versus assessment center ratings.
Unpublished manuscript.
Hilliard, P. A. (2000). Comparison of the predictive validity of a
written test, an integrity test, a conscientiousness questionnaire, a
structured behavioral interview and a personality inventory in the
assessment of job applicants’ background investigations, and sub-
sequent task and contextual job performance. Unpublished
doctoral dissertation, University of Southern California, Los
Angeles, CA.
Hobfoll, S. E., & Benor, D. E. (1981). Prediction of student
clinical performance. Medical Education,15, 231–236.
Huffcutt, A. I., & Arthur, W., Jr. (1994). Hunter & Hunter
(1984) revisited: Interview validity for entry-level jobs. Jour-
nal of Applied Psychology, 79, 184–190.
Huffcutt, A. I., & Arthur, W., Jr. (1995). Development of a new
outlier statistic for meta-analytic data. Journal of Applied Psy-
chology, 80, 327–334. doi: org.ezproxy.bradley.edu/10.1037/
0021-9010.80.2.327
Huffcutt, A. I., Conway, J. M., Roth, P. L., & Klehe, U.-C.
(2004). The impact of job complexity and study design on
situational and behavior description interview validity. Inter-
national Journal of Selection and Assessment,12, 262–273.
Huffcutt, A. I., Culbertson, S. S., & Weyhrauch, W. S. (2013).
Employment interview reliability: New meta-analytic estim-
ates by structure and format. International Journal of Selection
and Assessment,21, 264–276.
Huffcutt, A. I., Roth, P. L., & McDaniel, M. A. (1996). A meta-
analytic investigation of cognitive ability in employment
interview evaluations: Moderating characteristics and
implications for incremental validity. Journal of Applied Psy-
chology,81, 459–473.
*Huffcutt, A. I., Weekley, J. A., Wiesner, W. H., Degroot,
T. G., & Jones, C. (2001). Comparison of situational and be-
havior description interview questions for higher-level posi-
tions. Personnel Psychology, 54, 619–644.
Hunter, J. E., & Hirsh, H. R. (1987). Application of meta-
analysis. In C. L. Cooper & I. T. Robertson (Eds.), Interna-
tional review of industrial and organizational psychology, 1987
(pp. 321–357). London: Wiley.
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of al-
ternative predictors of job performance. Psychological Bulle-
tin, 96, 72–98.
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis:
Correcting error and bias in research findings. Thousand Oaks,
CA: Sage.
Hunter, J. E., Schmidt, F. L., & Le, H. (2006). Implications of di-
rect and indirect range restriction for meta-analysis meth-
ods and findings. Journal of Applied Psychology,91, 594–
612.
*Huse, E. F. (1962). Assessments of higher-level personnel: IV.
The validity of assessment techniques based on systematic-
ally varied information. Personnel Psychology,15, 195–205.
*Janz, T. (1982). Initial comparison of patterned behavior de-
scription interviews versus unstructured interviews. Journal
of Applied Psychology, 67, 577–580.
*Johnson, E. K. (1991). The interview in a medical school setting:
Manipulating structuring criteria. Proceedings from the Sixth
Annual Conference of the Society for Industrial and Organ-
izational Psychology.
*Kennedy, R. (1986). An investigation of criterion-related validity
for the structured interview. Unpublished master’s thesis, East
Carolina University.
King, L. M., Hunter, J. E., & Schmidt, F. L. (1980). Halo in a
multidimensional, forced-choice performance evaluation
scale. Journal of Applied Psychology, 65, 507–516.
Employment Interview Validity 307
©2014 John Wiley & Sons Ltd International Journal of Selection and Assessment
Volume 22 Number 3 September 2014
*Kluemper, D. H. (2006). An examination of ability-based emo-
tional intelligence in the structured employment interview. Un-
published master’s thesis, Oklahoma State University.
*Kobler, A. L. (2010). The development and field test of an em-
ployment interview system to identify successful school principals.
Doctoral dissertation. Retrieved from ProQuest Informa-
tion & Learning. (AAI3398950).
Landy, F. J. (1976). The validity of the interview in police of-
ficer selection. Journal of Applied Psychology,61,193–198.
*Latham, G. P., & Saari, L. M. (1984). Do people do what they
say? Further studies on the situational interview. Journal of
Applied Psychology, 69, 569–573.
*Latham, G. P., Saari, L. M., Pursell, E. D., & Campion, M. A.
(1980). The situational interview. Journal of Applied Psycho-
logy, 65, 422–427.
*Latham, G. P., & Skarlicki, D. P. (1995). Criterion-related va-
lidity of the situational and patterned behavior description
interviews with organizational citizenship behavior. Human
Performance, 8, 67–80.
Le, H., & Schmidt, F. L. (2006). Correcting for indirect range
restriction in meta-analysis: Testing a new meta-analytic
procedure. Psychological Methods,11,416–438.
*Lin, T.-R., & Adrian, N. (1993, May). Multi-method multi-
dimension structure interviewing: A field study. Paper presented
at the annual conference of the Society for Industrial and
Organizational Psychology, San Francisco, CA.
*Little, J. P., Schoenfelt, E. L., & Brown, R. D. (2000, April). The
situational versus patterned behavior description interview for
predicting customer service performance. Paper presented at
the annual conference of the Society for Industrial and Or-
ganizational Psychology, New Orleans.
*Martin, J. E. (1993). The effect of providing choices on the va-
lidity of a situational interview for resident advisors. Applies
H.R.M. Research, 4, 69–78.
Maurer, T. J., Solamon, J. M., & Lippstreu, M. (2008). How
does coaching interviewees affect the validity of a structured
interview? Journal of Organizational Behavior, 29, 355–371.
McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S.
(1994). The validity of employment interviews: A compre-
hensive review and meta-analysis. Journal of Applied Psycho-
logy, 79, 599–617.
McKinney, T. S., & Melter, D. M. (1979). The validity and utility
of entry level police recruit selection procedures: The
Phoenix experience. Unpublished report prepared by Per-
sonnel Department Employee Selection Research, Phoenix,
AZ.
McMurry, R. N. (1947). Validating the patterned interview. Per-
sonnel, 23, 263–272.
*Morgeson, F. P., Reider, M. H., & Campion, M. A. (2005). Se-
lecting individuals in team settings: The importance of social
skills, personality characteristics, and teamwork knowledge.
Personnel Psychology, 58, 583–611.
*Motowidlo, S. J., Carter, G. W., Dunnette, M. D., Tippins, N.,
Werner, S., Burnett, J. R., & Vaughan, M. J. (1992). Studies of
the structured behavioral interview. Journal of Applied Psy-
chology, 77, 571–587.
*Motowidlo, S. J., & Schmit, M. J. (1997). Performance assess-
ment and interview procedures for store manager and store asso-
ciate positions. Unpublished manuscript.
*Muller, G., & Goodwin, M. (1974). Project EMPATHY: Devel-
opment of an interview procedure to predict student and ad-
ministrator ratings of prospective applicants. Unpublished
report prepared by Selection Research, Inc., Lincoln,
Nebraska.
*Orpen, C. (1985). Patterned behavior description interviews
versus unstructured interviews: A comparative validity
study. Journal of Applied Psychology, 70, 774–776.
Poole, A., Catano, V. M., & Cunningham, D. P. (2007). Predict-
ing performance in Canadian dental schools: The new CDA
structured interview, a new personality assessment, and the
DAT. Journal of Dental Education,71, 664–676.
Posthuma, R. A., Morgeson, F. P., & Campion, M. A. (2002).
Beyond employment interview validity: A comprehensive
narrative of review of recent research and trends over time.
Personnel Psychology, 55, 1–81.
*Prien, E. P. (1962). Assessments of higher-level personnel: V.
An analysis of interviewers’ predictions of job performance.
Personnel Psychology,15, 319–334.
*Pulakos, E. D., & Schmitt, N. (1995). Experience-based and
situational interview questions: Studies of validity. Personnel
Psychology, 48, 289–308.
Robertson, I. T., Gratton, L., & Rout, U. (1990). The validity
of situational interviews for administrative jobs. Journal of
Organizational Behavior,11, 69–76.
*Ross, A. L., & Hoeltke, G. (1987). An interviewing tool for se-
lection of residential child care workers: A follow-up report.
Child Welfare, 66, 175–183.
Roth, P. L., & Campion, J. E. (1992). An analysis of the
predictive power of the panel interview and pre-
employment tests. Journal of Occupational and Organizational
Psychology, 65, 51–60.
*Roth, P. L., Van Iddekinge, C. H., Huffcutt, A. I., Eidson, Jr.,
C. E., & Schmitt, M. J. (2005). Personality saturation in struc-
tured interviews. International Journal of Selection and Assess-
ment,13, 261–273.
Rothstein, H. R. (1990). Interrater reliability of job perform-
ance ratings: Growth to asymptote level with increasing
opportunity to observe. Journal of Applied Psychology, 75,
322–327.
*Rundquist, E. A. (1947). Development for an interview for se-
lection purposes. In G. A. Kelly (Ed.), New methods in applied
psychology (pp. 85–95). College Park: University of Maryland.
Salgado, J. F., & Moscoso, S. (2002). Comprehensive meta-
analysis of the construct validity of the employment inter-
view. European Journal of Work and Organizational Psychology,
11, 299–324.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of
selection methods in personnel psychology: Practical and
theoretical implications of 85 years of research findings. Psy-
chological Bulletin,124, 262–274.
Schmidt, F. L., Hunter, J. E., Pearlman, K., & Hirsh, H. R.
(1985). Forty questions about validity generalization and
meta-analysis. Personnel Psychology, 38, 697–798.
Schmidt, F. L., Hunter, J. E., & Raju, N. S. (1988). Validity gen-
eralization and situational specificity: A second look at the
75% rule and Fisher’s ztransformation. Journal of Applied Psy-
chology, 73, 665–672.
Schmidt, F. L., Shaffer, J. A., & Oh, I.-S. (2008). Increased
accuracy for range restriction corrections: Implications
for the role of personality and general mental ability in
job and training performance. Personnel Psychology,61,
827–868.
308 Allen I. Huffcutt, Satoris S. Culbertson and William S. Weyhrauch
International Journal of Selection and Assessment
Volume 22 Number 3 September 2014
©2014 John Wiley & Sons Ltd
*Schuler, H., Funke, U., Moser, K., & Donat, M. (1995). Per-
sonnel selection in research and development. Aptitudes
and performance of scientists and engineers. Goettingen:
Hogrefe.
Sparks, C. P. (1973). Validity of a selection program for pro-
cess trainees (Personnel Research No. 73-1). Unpublished
manuscript.
Spychalski, A. (1994). A test of a model of employment interview
information gathering. Unpublished master’s thesis, Rice
University.
*Tarico, V. S., Altmaier, E. M., Smith, W. L., Franken, E. A., Jr.,
& Berbaum, K. S. (1986). Development and validation of an
accomplishment interview for radiology residents. Journal of
Medical Education,61, 845–847.
*Walsh, U. R. (1975). A test of the construct and predictive
validity of a structured interview. Doctoral Dissertation. Re-
trieved from ProQuest Dissertations and Theses.
(288010492).
*Weekley, J. A., & Gier, J. A. (1987). Reliability and validity of
the situational interview for a sales position. Journal of Ap-
plied Psychology, 72, 484–487.
Whitener, E. M. (1990). Confusion of confidence intervals and
credibility intervals in meta-analysis. Journal of Applied Psy-
chology, 75, 315–321.
Wiesner, W. H., & Cronshaw, S. F. (1988). A meta-analytic
investigation of the impact of interview format and degree
of structure on the validity of the employment interview.
Journal of Occupational Psychology,61, 275–290.
Wright, P. M., Lichtenfels, P. A., & Pursell, E. D. (1989). The
structured interview: Additional studies and a meta-analysis.
Journal of Occupational Psychology, 62, 191199.
*Young, I. P., & Delli, D. A. (2002). The validity of the teacher
perceiver interview for predicting performance of class-
room teachers. Educational Administration Quarterly, 38, 586–
612.
*Zaccaria, M. A., Dailey, J. T., Tupes, E. C., Stafford, A. R.,
Lawrence, H. G., & Ailsworth, K. A. (1956). Development of
an interview procedure for USAF officer applicants (AFPTRC-
TN- 56-43, Project No. 7701). Lackland Air Force Base, TX:
Personnel Research Laboratory, Air Force Personnel and
Training Research Center, Air Research and Development
Command.
Employment Interview Validity 309
©2014 John Wiley & Sons Ltd International Journal of Selection and Assessment
Volume 22 Number 3 September 2014
CopyrightofInternationalJournalofSelection&AssessmentisthepropertyofWiley-
Blackwellanditscontentmaynotbecopiedoremailedtomultiplesitesorpostedtoalistserv
withoutthecopyrightholder'sexpresswrittenpermission.However,usersmayprint,
download,oremailarticlesforindividualuse.
... The implications of this notion of correcting different subsets of studies using different correction factors has had little effect on the examination of the validity of selection procedures. A meta-analysis of employment interview validity by Huffcutt et al. (2014) building explicitly on Berry et al., and applying differing correction factors to predictive and concurrent studies, is the only example we have been able to locate after reviewing meta-analyses of a wide range of selection procedures. ...
... We offer the proposition that the vast majority of validation research is concurrent. The percent of studies that are concurrent from various meta-analyses include 98% in Roth et al. (2005) work sample meta-analysis, 95% in McDaniel et al.'s (2007) SJT meta-analysis, 76% in Ones et al.'s (1993) integrity test meta-analysis, 74% in Huffcutt et al.'s (2014) interview meta-analysis, 78% in Salgado et al.'s (2003) cognitive ability meta-analysis in Europe, 80-85% in Bertua et al.'s (2005) cognitive ability meta-analysis in the United Kingdom, and 80% in the General Aptitude Test Battery (GATB) data base used in Hunter (1983) cognitive ability meta-analysis (reported in Bemis, 1968). Of course, there may be settings with very different percentages; what is needed is careful attention to the issue. ...
... SACKETT, ZHANG, BERRY, AND LIEVENS illustrate the mechanics of how to apply an artifact distribution method. As a second example, Huffcutt et al. (2014) wanted to correct a set of concurrent validity studies of interviews for indirect range restriction. They had no information on actual prior selection procedures, so they made the following assumption: in all studies, candidates were first screened on an ability-based predictor, and the top 50% moved forward. ...
Article
This paper systematically revisits prior meta-analytic conclusions about the criterion-related validity of personnel selection procedures, and particularly the effect of range restriction corrections on those validity estimates. Corrections for range restriction in meta-analyses of predictor-criterion relationships in personnel selection contexts typically involve the use of an artifact distribution. After outlining and critiquing five approaches that have commonly been used to create and apply range restriction artifact distributions, we conclude that each has significant issues that often result in substantial overcorrection and that therefore the validity of many selection procedures for predicting job performance has been substantially overestimated. Revisiting prior meta-analytic conclusions produces revised validity estimates. Key findings are that most of the same selection procedures that ranked high in prior summaries remain high in rank, but with mean validity estimates reduced by .10-.20 points. Structured interviews emerged as the top-ranked selection procedure. We also pair validity estimates with information about mean Black-White subgroup differences per selection procedure, providing information about validity-diversity tradeoffs. We conclude that our selection procedures remain useful, but selection predictor-criterion relationships are considerably lower than previously thought. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
... Seit Jahrzehnten zeigen Metaanalyse, dass hochstrukturierte Einstellungsinterviews zu deutlich valideren Aussagen über die Eignung von Bewerberinnen und Bewerbern kommen als gering-bzw. unstrukturierte Interviews (Wiesner und Cornshaw 1988;Conway et al. 1995;Huffcutt et al. 2014). Dennoch dominieren in deutschen Unternehmen nach wie vor Interviews mit einem sehr geringen Strukturierungsgrad (Kanning 2016b (Huffcutt et al. 2014;Schmidt und Hunter 1998). ...
... unstrukturierte Interviews (Wiesner und Cornshaw 1988;Conway et al. 1995;Huffcutt et al. 2014). Dennoch dominieren in deutschen Unternehmen nach wie vor Interviews mit einem sehr geringen Strukturierungsgrad (Kanning 2016b (Huffcutt et al. 2014;Schmidt und Hunter 1998). ...
... The degree of structure in the structured rating process may not have been sufficient to yield reliability and validity gains. The large body of research on the effect of structure on interview reliability and validity (Huffcutt et al., 2014;Levashina et al., 2014) shows that higher degrees of structure, both regarding what information is asked of the applicants and the response rating process, are more beneficial. The highest level of structure is reached when all applicants are asked to answer the exact same questions and each individual answer is rated using formal, anchored rating scales. ...
... Third, AVIs are highly standardized because all applicants are asked the same questions in the same order. Given the clear evidence that standardization increases reliability and validity (Huffcutt et al., 2013(Huffcutt et al., , 2014, AVIs, therefore, offer potential psychometric benefits. And finally, during the pandemic, AVIs are a tool that also allowed to conduct interviews during times of social distancing (Gibson et al., 2021). ...
Article
Full-text available
Asynchronous video interviews (AVIs) are increasingly used to preselect applicants. Previous research found that interviewees in AVIs receive better interview ratings compared to other forms of interviews. It has been suggested that this difference could be due to the preparation time given for each AVI question. A pilot study confirmed that preparation time in AVIs is indeed beneficial for interview performance. Furthermore, our main study replicated the significant effect of preparation time on interview performance and revealed that it was mediated by active response preparation, whereas no mediation effects were found for strain and for the use of impression management. Finally, preparation time had no direct effect on fairness perceptions but a positive indirect effect via honest impression management. Practitioner points • It was previously suggested that applicants receive better interview ratings in asynchronous video interviews (AVIs) than in synchronous interviews because of the preparation time that is provided for each question in an AVI. • Our results confirmed that preparation time in AVIs indeed leads to better interview performance ratings. • The positive effects of preparation time were due to active response preparation (i.e., interviewees made notes and structured their answers). • Longer preparation time did not affect dishonest impression management or fairness perceptions but might affect the validity of AVIs.
... Several meta-analyses assessing the criterion-related validity of interviews have been carried out. Huffcutt, Culbertson, and Weyhrauch (2014) showed that many of these meta-analyses have underestimated validities by failing to appropriately apply indirect range restriction corrections. In these meta-analyses, interviews are usually classified as unstructured versus structured, and criterion-related validities are estimated for each group. ...
... Meta-analytic evidence has shown that selection interviews can be highly valid selection tools that may even reach similar levels of criterion-related validity as tests of general mental ability (e.g., Huffcutt and Arthur, 1994;Huffcutt et al., 2014). However, such high levels of validity can only be obtained when structured interviews are used. ...
Article
Full-text available
Organizations increasingly use technology-mediated interviews. However, only limited research is available concerning the comparability of different interview media and most of the available studies stem from a time when technology-mediated interviews were less common than in the present time. In an experiment using simulated selection interviews, we compared traditional face-to-face (FTF) interviews with telephone and videoconference interviews to determine whether ratings of interviewees' performance, their perceptions of the interview, or their strain and anxiety are affected by the type of interview. Before participating in the actual interview, participants had a more positive view of FTF interviews compared to technology-mediated interviews. However, fairness perceptions did not differ anymore after the interview. Furthermore, there were no differences between the three interview media concerning psychological and physiological indicators of strain or interview anxiety. Nevertheless, ratings of interviewees' performance were lower in the technology-mediated interviews than in FTF interviews. Thus, differences between different interview media can still be found nowadays even though most applicants are much more familiar with technology-mediated communication than in the past. The results show that organizations should take this into account and therefore avoid using different interview media when they interview different applicants for the same job opening.
... Scientists largely agree that cognitive abilities, and to a lesser extent personality, are the most relevant constructs that explain differences in academicand job performance (Kuncel et al., 2004;Sackett, Lievens, et al., 2017;Stanek & Ones, 2018). Instead, practitioners primarily consider personality and applied social skills rather than cognitive abilities to be the most important constructs (Fisher et al., 2020;Ryan et al., 2015;Sackett & Walmsley, 2014).With regard to assessment instruments, scientific evidence showed that scores on cognitive ability tests, assessment centers, work sample tests, and structured interviews are valid predictors of job performance (Huffcutt et al., 2014;Ones et al., 2010;Roth et al., 2005;Sackett, Shewach, & Keiser, 2017). However, less valid instruments such as analyses of CV's and cover letters, and unstructured interviews are prevalent in practice (König et al., 2010;Lievens & De Paepe, 2004;Risavy et al., 2019;Zibarras & Woods, 2010). ...
Article
Full-text available
In personnel- and educational selection, a substantial gap exists between research and practice, since evidence-based assessment instruments and decision-making procedures are underutilized. We provide an overview of studies that investigated interventions to encourage the use of evidence-based assessment methods, or factors related to their use. The most promising studies were grounded in self-determination theory. Training and autonomy in the design of evidence-based assessment methods were positively related to their use, while negative stakeholder perceptions decreased practitioners’ intentions to use evidence-based assessment methods. Use of evidence-based decision-making procedures was positively related to access to such procedures, information to use it, and autonomy over the procedure, but negatively related to receiving outcome feedback. A review of the professional selection literature showed that the implementation of evidence-based assessment was hardly discussed. We conclude with an agenda for future research on encouraging evidence-based assessment practice.
Article
In a preregistered prediction-task experiment, we investigated the effect of narrative versus quantified information on decision-maker perceptions, confidence, predictor weighting, and predictive accuracy when making performance predictions. We also investigated the effect of who quantifies information (the decision maker or someone else). As expected, we found higher perceived informativeness and use intentions for narrative than quantified information. Information presented narratively was also weighted somewhat more heavily than quantified information. Using quantitative information quantified by decision makers themselves yielded higher perceived autonomy and use intentions than quantitative information quantified by someone else. However, no differences in prediction confidence were found and self- and other-produced quantifications received identical weight. Moreover, unexpectedly, differences in weighting did not translate to differences in predictive accuracy. Practitioner points • Several authors suggested that narrative information is perceived as richer, and hence, is more influential in judgments and predictions, than quantitative information. • Additionally, quantified information was suggested to be more influential when the decision-makers quantify information themselves. • We found that narrative information was perceived as more informative, yielded higher use intentions, and was weighted somewhat more heavily than quantified information. We also found higher perceived autonomy and use intentions for self-quantified, than for other-quantified information, but no differences in assigned weight. • Notably, we found no differences in confidence or predictive accuracy. • Since we did not find the expected effects on predictive accuracy, the implications for selection practice remain unclear.
Article
Full-text available
(Spa) La selección y contratación de personal está dentro de las decisiones más significativas en las organizaciones y la administración de recursos humanos, y la Administración Pública no es la excepción. Uno de los métodos más frecuentes utilizados en los procesos de selección son las entrevistas no estructuradas. En este artículo proponemos que las entrevistas no estructuradas no sólo tienen casi nula validez predictiva del desempeño laboral por encima de otros predictores de selección. Argumentamos que, muchas veces, su uso puede empeorar el proceso de selección de personal. Con esto en mente, en el artículo revisamos la evidencia de las entrevistas no estructuradas, su prevalencia en las organizaciones y las razones de esta prevalencia. También revisamos la evidencia de algunas pruebas psicométricas relevantes, basadas en constructos importantes de la psicología laboral y organizacional. Revisamos los resultados de un estudio empírico en que comparamos cómo las decisiones de directores y profesionales de recursos humanos en organizaciones pueden cambiar en presencia de información de entrevistas no estructuradas. Encontramos que las decisiones basadas en esta información, cuando se presentan en conjunto con información de pruebas psicométricas, son más inexactas y generan al mismo tiempo más confianza, lo que exacerba una ilusión de validez y exceso de confianza. Finalmente, proponemos siete principios fundamentales y prácticos para llevar a cabo entrevistas. En base a esto, concluimos que la entrevista estructurada conductual surge como una alternativa claramente superior para el contexto de la Administración Pública. (Eng) Personnel selection is among the most significant decisions in human resource management and Public Administration is no excepection. One of the most frequent methods used in the selection selection is unstructured interviews. In this article, we propose that unstructured interviews not only have almost zero validity in predicting job performance over psychometric tests. We argue that, many times, their use can worsen the selection process. With this in mind, in this article we first review research regarding unstructured interviews, their prevalence in organizations, and the reasons for this prevalence. We also review research regarding some psychometric tests, based on important constructs from industrial and organizational psychology. We also review the results of a previous empirical study we conducted, in which we compared how the decisions of human resource management directors and professionals in organizations can change in the presence of information from unstructured interviews. We found that decisions based on unstructured interviews, when presented together with information from psychometric tests, are more inaccurate and at the same time generate more confidence, exacerbating an illusion of validity and overconfidence. Finally, we propose seven fundamental and practical principles for conducting interviews. Based on this, we conclude that the structured behavioral interview emerges as a clearly superior alternative in the context of Public Administration.
Article
Full-text available
This article summarizes the practical and theoretical implications of 85 years of research in personnel selection. On the basis of meta-analytic findings, this article presents the validity of 19 selection procedures for predicting job performance and training performance and the validity of paired combinations of general mental ability (GMA) and the 18 other selection procedures. Overall, the 3 combinations with the highest multivariate validity and utility for job performance were GMA plus a work sample test (mean validity of .63), GMA plus an integrity test (mean validity of .65), and GMA plus a structured interview (mean validity of .63). A further advantage of the latter 2 combinations is that they can be used for both entry level selection and selection of experienced employees. The practical utility implications of these summary findings are substantial. The implications of these research findings for the development of theories of job performance are discussed.
Article
Full-text available
This meta-analytic review presents the findings of a project investigating the validity of the employment interview. Analyses are based on 245 coefficients derived from 86,311 individuals. Results show that interview validity depends on the content of the interview (situational, job related, or psychological), how the interview is conducted (structured vs. unstructured; board vs. individual), and the nature of the criterion (job performance, training performance, and tenure; research or administrative ratings). Situational interviews had higher validity than did job-related interviews, which, in turn, had higher validity than did psychologically based interviews. Structured interviews were found to have higher validity than unstructured interviews. Interviews showed similar validity for job performance and training performance criteria, but validity for the tenure criteria was lower.
Article
Full-text available
In this article we analyzed the James, Demaree, and Mulaik (1986) critique of validity generalization. We demonstrated that the James et al. article (a) is not relevant to the real-world use of validity generalization in organizations, (b) has overlooked the bulk of the evidence against the situational specificity hypothesis and, therefore, the substantive conclusion that the situational specificity hypothesis is "alive and well" cannot be supported, and (c) has confused the processes of hypothesis testing and parameter estimation in validity generalization and has made incorrect statements about the assumptions underlying both. In addition, (d) James et al.'s critique of the 75% rule is a statistical power argument and, as such, does not add to earlier statistical power studies; (e) the procedures for use of confidence intervals that they advocate are erroneous; (f) there is no double correction of artifacts in validity generalization, as they contend; (g) the bias in the correlation (r) and the sampling error formula for r that they discuss is well-known, trivial in magnitude, and has no empirical significance ; and (h) the use of the Fisher's z transformation of r in validity generalization studies and other meta-analyses (which they advocate) creates an unnecessary inflationary bias in estimates of true validities and provides no benefits. In light of these facts, we conclude that the James et al. substantive conclusions and methodological recommendations are seriously flawed.
Article
The interrater reliabilities of ratings of 9,975 ratees from 79 organizations were examined as a function of length of exposure to the ratee. It was found that there was a strong, nonlinear relationship between months of exposure and interrater reliability. The correlation between a logarithmic transformation of months of experience and reliability was .73 for one type of ratings and .65 for another type. The relationship was strongest during the first 12 months on the job. Changes in reliability were accounted for mostly by changes in criterion variance. Asymptotic levels of reliability were only about .60, even with 10-20 years of experience. Implications for estimating reliabilities in individual and meta-analytic studies and for performance appraisal were presented, and possible explanations of the reliability-variance relationship were advanced.