ArticlePDF Available

A Meta-Analysis of the Relationship Between Individual Assessments and Job Performance

Authors:

Abstract and Figures

Though individual assessments are widely used in selection settings, very little research exists to support their criterion-related validity. A random-effects meta-analysis was conducted of 39 individual assessment validation studies. For the current research, individual assessments were defined as any employee selection procedure that involved (a) multiple assessment methods, (b) administered to an individual examinee, and (c) relying on assessor judgment to integrate the information into an overall evaluation of the candidate's suitability for a job. Assessor recommendations were found to be useful predictors of job performance, although the level of validity varied considerably across studies. Validity tended to be higher for managerial than nonmanagerial occupations and for assessments that included a cognitive ability test. Validity was not moderated by the degree of standardization of the assessment content or by use of multiple assessors for each candidate. However, higher validities were found when the same assessor was used across all candidates than when different assessors evaluated different candidates. These results should be interpreted with caution, given a small number of studies for many of the moderator subgroups as well as considerable evidence of publication bias. These limitations of the available research base highlight the need for additional empirical work to inform individual assessment practices. (PsycINFO Database Record (c) 2014 APA, all rights reserved).
Content may be subject to copyright.
Individual Assessment Meta-Analysis
1
Running Head: Individual Assessment Meta-Analysis
A Meta-Analysis of the Relationship Between Individual
Assessments and Job Performance
Scott B. Morris, Rebecca L. Daisley, Megan Wheeler, & Peggy Boyer
Illinois Institute of Technology
April 2014
Journal of Applied Psychology, doi: 10.1037/a0036938.
© 2014 American Psychological Association, all rights reserved
This article may not exactly replicate the final version published in the APA journal. It is not the
copy of record. The final published version can be obtained at
http://psycnet.apa.org/doi/10.1037/a0036938
Authors Note
The authors wish to thank Paige Olson for her assistance with the coding of studies. We also
thank the many people who helped us obtain information on unpublished studies, including Joy
Hazucha, Maynard Goff, Bob Barnett, and Dave Sowinski.
Individual Assessment Meta-Analysis
2
ABSTRACT
Though individual assessments are widely used in selection settings, very little research exists to
support their criterion-related validity. A random-effects meta-analysis was conducted of 39
individual assessment validation studies. For the current research, individual assessments were
defined as any employee selection procedure that involved: 1) multiple assessment methods, 2)
administered to an individual examinee, and 3) relying on assessor judgment to integrate the
information into an overall evaluation of the candidate’s suitability for a job. Assessor
recommendations were found to be useful predictors of job performance, although the level of
validity varied considerably across studies. Validity tended to be higher for managerial than non-
managerial occupations and for assessments that included a cognitive ability test. Validity was
not moderated by the degree of standardization of the assessment content, or by use of multiple
assessors for each candidate. However, higher validities were found when the same assessor was
used across all candidates than when different assessors evaluated different candidates. These
results should be interpreted with caution given a small number of studies for many of the
moderator subgroups as well as considerable evidence of publication bias. These limitations of
the available research base highlight the need for additional empirical work to inform individual
assessment practices.
Key words: executive selection, meta-analysis, validation study, personnel selection, individual
psychological assessment.
Individual Assessment Meta-Analysis 3
Much of the research on personnel selection has focused on standardized assessment
methods, such as cognitive ability tests, self-report personality inventories and structured
interviews. These assessment methods are designed to be administered and scored efficiently for
large groups of job applicants. Another tradition in assessment takes a very different approach,
emphasizing assessments that are tailored to the individual and the organization. This latter
approach, called individual psychological assessment, emphasizes the measurement of the person
as a whole (Highhouse, 2002) and the use of expert judgment to interpret the pattern of responses
across a variety of assessment methods (Silzer & Jeanneret, 2011). This job- and person-specific
approach is particularly attractive in settings such as executive selection, where there are few
candidates for a single job opening and where job requirements are highly individualized
(Hollenbeck, 2009).
Individual psychological assessment is defined as a process of gathering information
regarding a person’s knowledge, skills, aptitude and temperament (Jeanneret & Silzer, 1998)
through the use of individually administered selection tools, including tests and interviews, and
integrating this information to make an inference regarding the individual’s appropriateness for a
particular position (Prien, Schippman & Prien, 2003). Prien et al. (2003) state that the
distinguishing feature of individual assessment is that it combines integrative and interpretive
treatments of assessment data at the individual case level. The uniqueness of individual
assessment rests on the idea that through the use of various tools, professionals use the individual
psychological assessment to evaluate the applicant as a whole and make a prediction about the
applicants’ appropriateness for a position within an organization holistically. In addition,
individual assessment differs from other forms of testing in that the assessor must draw a
conclusion about a candidate’s fit by utilizing his/her own subjective interpretation of the
Individual Assessment Meta-Analysis 4
candidate’s performance on the tests administered, the interview, and any other data collected
(Jeanneret & Silzer, 1998).
Individual assessment is widely used in employee selection, particularly for higher level
positions (Thornton, Hollenbeck & Johnson, 2010). Despite this, there has been relatively little
research published on the validity of individual assessments. The most extensive review of the
validity evidence was provided by Prein et al. (2003), who identify approximately 20 validation
studies. For the most part, the evidence supported the validity of individual assessments across a
variety of occupations, including mangers (Albrecht, Glaser & Marks, 1964), first-line
supervisors (Dicken & Black, 1965; Dunnette & Kirchner, 1958; Handyside & Duncan, 1954),
and management consultants (Miner, 1970), among several others. However, there was
considerable variability in the results, with validity coefficients ranging from -.14 (Miner, 1970)
to as high as .70 (Phelan, 1962).
Hunter and Schmidt (2004) have noted that observed differences across studies can often
be attributed to statistical artifacts, rather than true differences in validity. Meta-analysis provides
a means to disentangle true and artificial sources of variability, and therefore provides a stronger
basis for estimating the true validity and identifying moderators. This study will address the
inconsistencies that have been shown in previous reviews and investigate whether individual
assessments are valid across situations. In addition, surveys of the field have reported
considerable variability in assessment practices (Ryan & Sackett, 1987, 1992). B modeling
differences in validity across studies, this paper will seek to identify best practices in individual
assessment.
Reliability and Validity of Assessor Judgments
Individual Assessment Meta-Analysis 5
Most individual assessments include tests (e.g., cognitive ability, personality, biodata)
that have been shown to predict work performance. Still, it is an open question whether this
relationship remains when holistic integration is used to make predictions concerning
performance. A central feature of the individual assessment is the reliance on the assessors’
expert judgment to draw inferences from patterns of behavior (Silzer & Jeanneret, 2011) and fit
the multiple pieces of information together into a coherent picture of the candidate as a whole
(Highhouse, 2002). Therefore, it is important to assess not only the validity of the component test
scores, but also the validity of assessor judgments. To the extent that assessors can go beyond
individual test scores to provide a holistic evaluation of candidates, assessor judgments would be
expected to show greater validity than individual assessment tools. On the other hand, if
assessors are biased or inconsistent in their use of test information, assessor judgments may show
lower validity than obtained from individual test scores.
Ryan and Sackett (1998) point out that lack of reliability of ratings and recommendations
is a major issue with individual assessments. Several studies have examined the interrater
agreement of assessors who made ratings based on assessment reports (DeNelsky & McKee,
1969; Dicken & Black, 1965), those who had access to test scores and interview notes (Hilton,
Bolin, Parker, Taylor & Walker, 1955), as well as individuals who made predictions based on the
entire assessment which they conducted themselves (Ryan & Sackett, 1989). These studies
suggest that there can be considerable disagreement among assessors in ratings of candidate
attributes and overall person-job fit, as well as inconsistencies in the assessment process itself
(Ryan & Sackett, 1989).
Impact of Assessment Practices on Validity
Individual Assessment Meta-Analysis 6
The individual assessment process generally consists of three stages: information input,
information evaluation, and information output (Weiner, 2003). During the information input
stage, the individual assessor uses a variety of tools to collect information about the job
candidate. These tools may include any combination of an interview, personality tests, cognitive
ability tests, and other instruments. In the second stage, the collected data is interpreted and
integrated. In the final stage, the information output stage, the assessor provides a report and
recommendation regarding the suitability of the individual job candidate for a particular position.
There is significant variability in the methods of conducting individual assessments
(Jeanneret & Silzer, 1998; Ryan & Sackett, 1987, 1992; Silzer & Jeanneret, 2011), and the
choice of strategies for information input and integration are likely to influence the validity of
assessor recommendations. Research on employment interviews, which are similar to individual
assessments in their reliance on the interpretative skills of the interviewer, suggests that validity
can differ as a function of both the content and structure of the interview. In particular, most
reviews show higher reliability and validity for more structured interviews (Campion, Palmer, &
Campion, 1997; Arvey & Campion, 1982; Harris, 1989; Huffcutt & Arthur, 1994; Schmitt, 1976;
Ulrich & Trumbo, 1965). As such, the literature on structure provides a useful framework for
examining differences in individual assessment practices. The following sections discuss
components of information input and information integration that may affect the validity of
individual assessments.
Information Input
In the information input stage, the collection of data is the primary interest. One of the
defining characteristics of the individual assessment is the use of multiple assessment tools.
Although there may be some overlap in the constructs that are assessed by each of the
Individual Assessment Meta-Analysis 7
assessment tools, research has shown that combinations of different assessment tools can add
incrementally to the validity of the selection battery (Schmidt & Hunter, 1998). According to a
survey of individual assessors, most reported using a combination of personal history forms (bio-
data), ability tests, personality inventories, and interviews (Ryan & Sackett, 1987). A substantial
body of research supports the use of each of these selection methods. Personal history forms
show validities of .30 - .37 (Mumford & Stokes, 1990; Reilly & Chao, 1982). The validity of
general cognitive ability tests ranges from .23 for the least complex jobs to .58 for professional
jobs (Schmidt & Hunter, 1998). The personality trait of conscientiousness has been found to be a
consistent predictor of job performance, with validities ranging from .20 to .23 across a variety
of occupational groups (Barrick and Mount, 1991).
Most individual assessments also include an interview (Ryan & Sackett, 1987). Personal
contact between the assessor and the candidate is a central feature that distinguishes individual
assessments from group administration of test batteries. Silzer and Jeanneret (2011) maintain that
this personal contact allows the assessor to observe patterns of candidate behavior, yielding
unique insights about factors such as interpersonal and communication skills that are not well
assessed using standardized measures. Further, the interview allows the assessor to probe for
additional information in order to test hypotheses about candidate characteristics. Research on
employment interviews yields validity coefficients ranging from .20 to .62 (Conway, Jako, &
Goodman, 1995; McDaniel, Whetzel, Schmidt & Mauer, 1994), with higher validities for more
structured interviews.
Although all of these assessment tools are useful predictors of job performance, it is
important to note differences between the research literature and how these tools are used in the
context of individual assessment. For example, although 80% of individual assessors report using
Individual Assessment Meta-Analysis 8
personal history data (Ryan & Sackett, 1987), there is little information on exactly how personal
history data is collected and scored. Research on biodata has emphasized systematic methods for
developing, scoring and weighting personal history items. When biodata items are not developed
based on empirical evidence or are not weighted properly, it is questionable how valid and
reliable they may be (Stokes & Cooper, 2004).
Similarly, research on the validity of personality assessment has focused on personality
trait measures, particularly those based on the Big 5 theory (Barrick & Mount, 1991). Although
this approach is used in many individual assessments, others use personality inventories designed
to identify abnormal personality patterns, such as the Minnesota Multiphasic Personality
Inventory (Butcher, 1994) or projective techniques such as the Thematic Apperception Test
(Reilly & Chao, 1982). Although there is some support for the validity of the MMPI (Aamodt,
2004) and projective tests (Reilly & Chao, 1982) as predictors of job performance, research on
the use of these techniques for personnel selection is limited, and has been criticized for
methodological shortcomings (Kinslinger, 1966).
Research on the employment interview has consistently shown greater reliability and
validity for interviews that have greater structure, that is, interviews that are based on a job
analysis, that employ well trained interviewers, and that use behaviorally anchored rating scales
(Campion, Pursell, & Brown, 1988; Huffcutt & Arthur, 1994; Wiesner & Cronshaw, 1988).
Individual assessments, due to their individualized nature, are likely to impose less structure on
the interview.
In sum, individual assessments tend to administer a variety of selection tools that are
known to predict job performance, and the use of multiple assessments should further enhance
the validity the process. At the same time, the use of less structured approaches may limit
Individual Assessment Meta-Analysis 9
validity to some extent. Nevertheless, the research on the components of individual assessment
suggests that each should provide some incremental validity to the assessor’s recommendation.
Hypothesis 1a. Assessor recommendations from assessments that include a test of general
cognitive ability will be better predictors of performance than those from assessments that do not
include general cognitive ability.
Hypothesis 1b. Assessor recommendations from assessments that include a personality
questionnaire will be better predictors of performance than those from assessments that do not
include a personality questionnaire.
Hypothesis 1c. Assessor recommendations from assessments that include a biodata
measure will be better predictors of performance than those from assessments that do not
include a biodata measure.
Hypothesis 1d. Assessor recommendations from assessments that include an interview
will be better predictors of performance than those from assessments that do not include an
interview.
Information input can also be characterized by the degree of structure. At the highest
level of structure, the questions that are asked of the candidates are identical in every way
(Campion et al., 1997). As the interview for each candidate becomes more unique, it becomes
increasingly difficult to assess candidates on the same criteria, which can affect the validity of
the recommendations. Given the individualized nature of individual assessments, candidates are
often asked to complete slightly different assessment components. A survey by Ryan and Sackett
(1987) found that only 15% of individual assessment practitioners reported using a highly
standardized format, 73% reported a using a loose structure, and 12% reported an unstructured
format. Many assessors utilize an adaptive interviewing approach in which questions are
Individual Assessment Meta-Analysis 10
designed to test hypotheses or seek clarification on information from the assessment battery
(McPhail & Jeanneret, 2011; Ryan & Sackett, 1987). The standardization of the assessment
content is important because it ensures that the collected information is comparable across
candidates. As such, the standardization of the content allows the assessors to make meaningful
comparisons across applicants, which should lead to higher validities.
Hypothesis 2: Assessor recommendations from assessment batteries that are identical for
each candidate will be more predictive of performance than those from batteries that
differ across participants.
Information Integration
After the assessment tools have been administered, the information must be integrated by
the individual assessor, who combines all of the collected information into a final
recommendation for the hiring organization. Prien et al. (2003) note that the information
integration stage is really the heart of the individual assessment process. It is through this process
that the assessor uses both quantitative and qualitative information to make a holistic assessment
of the job candidate. Because this is a largely subjective process, the individual assessors
themselves are central to the information integration stage.
The integration of information from an assessment battery can take place either
statistically or through expert judgment. Statistical integration of information involves
techniques that combine the scores from each assessment tool mechanically into a composite
score. Judgmental integration relies on the assessor to combine information in a non-statistical,
clinical manner. Holt (1958) describes clinical judgment in terms of making decisions and
reaching conclusions by thinking over the facts and theories that are already available to the
Individual Assessment Meta-Analysis 11
decision-maker. Individual assessments tend to use this judgmental integration approach, and
this will be the focus of the current study.
In a definitive piece, Meehl (1954) argued that there is no theoretical or empirical basis
supporting the idea that people can combine information in their heads as efficiently as they can
by using statistical techniques. Several subsequent articles have supported this argument, finding
that statistical decision making techniques consistently outperform, or at least perform equally as
well as clinical decision making (e.g., Ægisdóttir et al., 2006. Dawes, Faust, & Meehl, 1989;
Grove, Zald, Lebow, Snitz, & Nelson, 2000; Grove & Meehl, 1996; Kuncel, Klieger, Connelly &
Ones, 2013). So despite the popularity of individual assessments as a selection tool, the utility of
holistic assessment has been controversial for at least half a century, and the question remains
whether holistic assessment is actually better than traditional statistical techniques.
However, the belief that statistical prediction is superior to clinical predictions is not
universal. In response to Meehl’s argument, Holt (1958) differentiates between two types of
clinical predictions, “naïve clinical” and “sophisticated clinical” (p. 4). He posits that naïve
clinical prediction is characterized by the use of data that are primarily qualitative with no
attempt to use objective criteria in the decision-making process. Predictions are made in an
entirely intuitive manner without emphasis on objectivity. Sophisticated clinical prediction on
the other hand, includes the use of qualitative data from sources such as interviews, bio-data, and
projective techniques that are used in addition to objective facts and scores. It differs from the
naïve clinical prediction techniques because it includes objectivity, organization, and scientific
methodology in the planning of the assessment, the gathering of data, and in the analysis. At the
same time, though objectivity is employed in the process, the clinician is retained as one of the
primary instruments of prediction, with an effort to be as reliable and valid a data-processor as
Individual Assessment Meta-Analysis 12
possible. According to Holt (1958), sophisticated clinical methods, or those that involve a
scientific approach to holistic assessment, might yield even better predictions than statistical
techniques.
Building on Holt’s contentions, one way to improve on the scientific approach and the
structure of the clinical assessment may be to involve more than one assessor to collectively
make predictions about each candidate. In the interview literature, it has been suggested that the
use of multiple interviewers can contribute to the structure of the assessment by reducing the
impact of idiosyncratic biases among interviewers (Campion, et al., 1988; Hakel, 1982), limiting
the number of irrelevant inferences that are not job related (Arvey & Campion, 1982), and
increasing accuracy as a result of the range of information and judgment that is obtained from
different perspectives (Dipboye, 1992). Research confirms that averaging ratings across multiple
assessors improves the reliability and validity of unstructured interviews (Schmidt &
Zimmerman, 2004). Given this, individual assessments are expected to be most valid when
several assessors are used.
Hypothesis 3: Recommendations from individual assessments that use multiple assessors
to assess each candidate will be better predictors of performance than those that utilize a
single assessor.
Another aspect of structure concerns whether or not the same assessor, or same panel of
assessors, is used across all candidates (Campion, et al. 1997; Huffcutt & Woehr, 1999). In
individual assessment situations, the integration of information probably differs across assessors,
which may lead to predictions that are not necessarily comparable across the job candidates.
Ryan and Sackett (1998) noted that disagreement is common among individual assessors. They
suggested that this disagreement may be attributed to differences in assessor competence,
Individual Assessment Meta-Analysis 13
different ideologies, or differences in how assessors organize and use the same information. If
different candidates are evaluated by different assessors, then these assessor idiosyncrasies may
add variability to ratings that is unrelated to the competencies being assessed, thereby lowering
validity (O’Brien & Rothstein, 2011). The use of a single assessor across candidates should
reduce error due to assessor differences, and therefore may result in higher validity.
Hypothesis 4: Recommendations from individual assessments that employ the same
assessor across all candidates will show higher validity than those that use a variety of
assessors across candidates.
Occupation
Individual assessments are most commonly used to fill high-level management positions
(Ryan & Sackett, 1987). Given the cost, individual assessments are most likely to be used in
hiring situations where the stakes are high, such as upper-level management. The assessment and
hiring of senior-level executives has been identified as distinct from other types of positions, due
to the constantly changing performance expectations and the latitude each incumbent has to
shape the nature of the work (Silzer & Jeanneret, 2011; Thornton, Hollenbeck, & Johnson,
2010). The complexities of assessing managerial potential call for a highly flexible approach,
such as that provided by individual assessment.
Given the popularity of individual assessment for management positions, it is particularly
important to examine whether assessment practices are effective in this context. Previous meta-
analytic research found that the validity of many predictors depends on the type of job. Cognitive
ability tests have been found to be more predictive of performance for jobs with greater
complexity (Hunter & Hunter, 1984). In contrast, the validity of situational interviews has been
found to be lower for high complexity jobs (Huffcutt, Conway, Roth, & Klehe, 2004). The
Individual Assessment Meta-Analysis 14
personality trait of extroversion has been found to better predict performance for managerial jobs
(Barrick & Mount, 1991). These findings suggest that validity might differ for managerial and
non-managerial positions, but the direction of this difference is uncertain.
Research Question 1: Does the validity of assessor recommendations differ for
managerial and non-managerial jobs?
Methodological Factors
All of the considerations that have been discussed thus far deal with the individual
assessment procedure and how different aspects of the procedure itself may contribute to the
validity of the individual assessment. Several research study design issues can also influence the
results of individual assessment validation studies, which will be discussed below.
Source of Recommendation
When designing a study, researchers make numerous choices that may impact the results.
In the case of individual assessments, one important issue is the fidelity of the research design,
that is, the extent to which the study design reflects individual assessments as they occur in
applied settings. An important component of fidelity in the individual assessment literature is the
source of the predictor score used to compute the validity coefficient. Given the goal of
evaluating the validity of assessor recommendations, the optimal research design would obtain
numerical ratings directly from the individuals who conducted the assessment. This is consistent
with the standard practice of individual assessment, where the assessor who has access to all of
the assessment data as well as personal interaction with the examinee is the same person who
writes the assessment report.
A difficulty arises in conducting validation research because some individual assessments
do not use numeric ratings, providing instead a qualitative report of the candidate’s strengths,
Individual Assessment Meta-Analysis 15
weaknesses and fit with to the hiring organization. Validating these assessments requires an
additional step whereby the assessment report is translated into a numerical rating for the
purpose of the validation research. In several studies, one person conducted the assessment and
wrote the qualitative assessment report, then a second assessor later reviewed this report and
provided the numeric ratings needed for the validation study. In such cases, the second individual
assessor, whose prediction was used to validate the individual assessment, did not have direct
access to the job candidate.
The source of the predictor data is likely to impact the results of the validation study. The
individual who conducted the original assessment will have the greatest opportunity to develop a
deep understand of the job candidate through interaction. In contrast, when ratings made by
secondary sources from narrative assessment reports, some information will inevitably be lost
and the validity of ratings may be lower.
Hypothesis 5: Studies in which recommendations were obtained directly from the
individual who conducted the assessment will have higher validity than studies where
recommendations were made by secondary sources after reading assessment reports.
Another important element of study design is the type of variable used to measure the
performance criterion. Employee success can be operationalized in a variety of ways. In order to
aggregate results across studies, it is extremely important that the criteria included in the meta-
analysis are comparable to one another. Many studies used supervisor ratings of performance as
the criterion variable. In other situations, administrative data, such as bonuses or organizational
advancement were the only criterion available. Administrative decisions might be influenced by
factors other than individual work productivity, and therefore reflect a more distal measure of
performance. Because these outcomes reflect substantially different types of criterion measures
Individual Assessment Meta-Analysis 16
(Bommer, Johnson, Rich, Podsakoff & MacKenzie, 1995), we conducted separate analyses for
the validity of individual assessments predicting subjective performance ratings and
administrative decisions.
Research Question 2: Are recommendations from individual assessments more predictive
of subjective performance ratings or administrative decisions?
METHOD
Selection of Studies
A literature search was conducted to identify published and unpublished criterion-related
validity studies of individual assessments used for selection purposes. First, a computer search
was done of PsycINFO and Dissertation Abstracts through 2011 in order to find all references to
individual assessment in employment selection using the following search terms: individual
assessment, individualized assessment, individual psychological assessment, clinical assessment,
clinical judgment, holistic assessment. Second, a manual search was conducted that consisted of
checking the sources cited in the reference section of literature reviews, articles, and books on
the topic of individual assessment (e.g. Highhouse, 2002; Prien et al., 2003; Ryan & Sackett,
1998). Additionally, manual searches of the reference sections of included studies were
conducted to identify additional individual assessment studies. We also sought unpublished
validation studies through a variety of methods, including contacting consulting firms that
conduct assessments, contacting researchers who have published on the topic, and posting a
request for studies on a listserv for human resource professionals (HRNET). Only English-
language research reports were sought.
Individual Assessment Meta-Analysis 17
For the current research, individual assessments were defined as any employee selection
procedure that involves: 1) multiple assessment methods, 2) administered to an individual
examinee, and 3) relying on assessor judgment to integrate the information into an overall
evaluation of the candidate’s suitability for a job. Therefore, studies that employed a statistical or
mechanical integration of collected data to arrive at a prediction were not included in the current
study. In addition, the validity coefficient must have reflected an assessment of an individual,
using individual assessment techniques as opposed to group assessments. Studies that included
activities such as role plays or simulations were included only if they were conducted on an
individual basis. For example, Silzer’s (1984) findings were excluded because the holistic
assessment included group exercises and no recommendation was made that excluded the results
of the group exercise. In contrast, a study conducted by Handyside and Duncan (1954) was
retained for the current study because it reported separate validity coefficients based on holistic
evaluations of the candidates with and without a group exercise.
It is also important to point out that Ryan and Sackett’s definition of individual
assessment reflects the concept of “one psychologist making an assessment decision for a
personnel-related purpose about one individual” (1987, p. 456). However, for this study, the
definition of individual assessment was not restricted to include only one psychologist assessor.
Several of the studies included panel decisions about individuals as well as multiple assessors in
the study design. In addition, the current study included assessments that were made by both
psychologists and non-psychologists. Specifically, for 16 of the samples, the assessors were
psychologists, in one sample a non-psychologist was used as an assessor, and 10 samples utilized
both human resources representatives and psychologists as assessors (the assessor background
was unknown for the remaining samples).
Individual Assessment Meta-Analysis 18
With these criteria in mind, the search yielded 42 research reports, of which 24 were
acceptable for inclusion in this analysis. The studies included in the analysis are summarized in
Appendix A. The remaining 18 studies were excluded for the following reasons: two studies did
not involve multiple assessment methods, three studies did not use assessor judgment to integrate
information, five studies involved group exercises, and eight studies did not provide a validity
coefficient for predicting job performance or enough information to derive a validity coefficient
from the data provided. The excluded studies are listed in Appendix B.
A total of 39 samples were obtained from the 24 research reports as several papers
reported results for more than one sample. Four of the published papers reported results based on
two samples, and one study reported results for four separate samples. Of the unpublished
reports, two provided two samples each, and one reported on seven samples. Nine samples were
reported in the 1950s, eight in the 1960s, four in the 1970s, two in the 1900s and 16 after 2000.
Twenty-one samples were from journal articles, two were from book chapters, three were from
unpublished doctoral dissertations, and 13 were from conference papers or unpublished technical
reports.
Effect Size
The validity of each study was computed as the correlation between an overall assessor
rating and a measure of job performance. The predictor consisted of an overall rating of job
suitability, fit, or potential, or a composite of ratings on multiple dimensions. Two types of
outcome measures were included. Subjective ratings consisted of ratings or rankings made by
either a supervisor or an administrator at a higher level in the organization than the participant.
Administrative decisions included organizational decisions such as promotions, salary changes,
and receipt of bonuses. We only included measures of change in job level or compensation. We
Individual Assessment Meta-Analysis 19
did not use current job level or salary, because both variables can be influenced by the initial job
offer and therefore could be contaminated by the assessment rating that was used to make the
employment decision.
Separate analyses were conducted for supervisor ratings and administrative data. Seven
studies included both supervisor ratings and administrative outcomes and were included in both
analyses, whereas 30 included supervisor ratings only and 2 included administrative data only.
Coding of Variables
Coding of each study was conducted independently by two trained coders using a
structured coding sheet. Whenever there was a discrepancy in coding, the discrepancy was
discussed until consensus was reached.
Content of Assessments
The assessment batteries were coded for inclusion of several commonly used assessment
practices, including cognitive ability tests, personality measures, biodata and an interview. It
should be noted that some research reports did not include a complete description of the
assessment battery, choosing instead to report only examples of the assessment methods used.
An assessment method was treated as absent if it was not mentioned in the research report.
Source of Recommendation
The studies were categorized into two types of research design, based on whether the
individual who conducted the assessment was the same person who provided the ratings used in
the validation study. The study was classified as an assessor recommendation when there was
direct interaction between the participant and the assessor who made the prediction that was used
in the validation study. These assessors had primary access to all standardized test data and
interview information upon which a judgment was made for each candidate. There were some
Individual Assessment Meta-Analysis 20
studies in which a single assessor had primary access to all information, having been the only
interviewer, but a panel decision was made by several psychologists who did not have direct
contact with the participant. A decision was made in these cases to code such studies as assessor
recommendations, as long as at least one person on the assessment panel had engaged in direct
contact with the individual in question.
Secondary source recommendations included studies in which numerical ratings were
made based on a written report provided to the hiring organization about each individual
candidate. In these studies, the assessment data was collected and an assessment report was
written by the original assessor; however, because the report did not include a numerical rating
of the candidate, another person read the report and assigned the predictor score that was used
for the validation. As such, the validation study assessor had only secondary access to the
assessment report and did not have any direct contact with study participants. Five samples were
identified as secondary studies based on the information in the original research report. Four
additional samples reported in Miner (1970) were classified as secondary based on the
description of these studies in a review by Prien et al. (2003).
It should be noted that the distinction between primary and secondary designs was only
relevant for assessment practices that involved some form of interaction between the assessor
and the assessee that went beyond the test battery. Studies of assessment practices that did not
include an interview were excluded from the analysis of source of recommendation.
The initial coding of this variable resulted in low inter-rater agreement (52% agreement,
kappa = .27) due to discrepancies in the coding of assessor panels and assessments with no
interaction between the assessor and the candidate. The adoption of the decision rules for these
situations, as noted above, resolved the discrepancies.
Individual Assessment Meta-Analysis 21
Standardization of Assessment Battery
This variable indicated whether the same set of assessment methods were administered to
all candidates (kappa = .80; 65% agreement). Differences in procedure included the use of
different tests for the same construct or assessing different constructs across candidates. For
example, in Hilton et al. (1955), slightly over half of the candidates were given a test of practical
judgment. In some instances, the author mentioned that the procedure varied without providing
specific information about the nature of the differences (e.g., DeNelsky & McKee, 1969).
Single vs. Multiple Assessors
This variable reflected whether a single assessor was given access to the candidates’
assessment battery or multiple assessors were used to assess each of the candidates. In cases
where multiple assessors were used, more than one assessor was given access to the results of the
applicant’s inventories and interview performance and a prediction was made for each candidate
using all of the assessors’ input. In secondary research designs, the number of assessors was
determined based on the original assessors who had direct contact with candidates.
Agreement on the coding of this variable was fairly low (kappa = .61; 72% agreement).
The discrepancies were primarily a side-effect of studies using a secondary source for
recommendations. Differences in coding studies as primary versus secondary designs led to
differences in identifying the primary assessors, which in turn affected judgments about the
number of primary assessors.
Same vs. Different Assessors Across Candidates
Whether each of the candidates within a study was assessed by the same assessor or a
variety of assessors were used to assess the candidates within a study was also coded. In
secondary research designs, this was determined based on the original assessors who had direct
Individual Assessment Meta-Analysis 22
contact with candidates. Inter-rater agreement on this variable was fairly low (kappa = .47; 68%
agreement). Many of the disagreements were a side-effect of problems with classifying studies
into primary versus secondary research designs, and were resolved once consensus was reached
on who was the primary assessor.
Occupation
The final variable of interest was the occupation of the position to be filled (kappa = .89;
percent agreement = 92%). Occupations were classified into managerial and non-managerial
work. Most of the managerial samples were first-line supervisors in manufacturing firms, and
other managerial positions included department managers and general division managers. Non-
managerial occupations included psychiatrists, engineers, consultants, police, fire fighters, sales,
students in graduate programs, pilots, and CIA agents. Two samples were comprised of multiple
occupations and were classified as non-managerial.
Criterion Type
The performance criterion was coded as either a subjective rating or an administrative
decision. Because some studies included both types of outcomes, we coded the presence of each
type of criterion measure separate. Subjective ratings consisted of ratings or rankings made by
either a supervisor or an administrator at a higher level in the organization than the participant
(kappa = .57; percent agreement = 84%). Administrative decisions included organizational
decisions such as promotions, salary changes, and receipt of bonuses (kappa = .92; percent
agreement = 96%).
Purpose of Performance Ratings
Subjective performance ratings were classified into those that were developed for the
validation study (research-based), and those that were conducted for administrative purposes.
Individual Assessment Meta-Analysis 23
Meta-Analytic Procedure
A random-effects meta-analysis was conducted using the procedures described in Hedges
and Olkin (1985). This approach weights each effect size by the reciprocal of its variance, which
is the sum of the sampling variance and the between study variance. For the estimation of the
sampling variance, the population correlation was set at a constant equal to the sample-size
weighted average correlation (Hunter & Schmidt, 2004). A restricted maximum likelihood
estimator was used for the between-study variance (Viechtbauer, 2005). The meta-analysis was
conducted in R using the metafor package (Viechtbauer, 2010). Heterogeneity of validity, that is,
the extent to which results varied across studies more than would be expected due to sampling
error, was evaluated for statistical significance using the Q-within test and for practical
significance using the I2 index (Higgins & Thompson, 2002). Moderator tests were conducted by
comparing the average validity for subgroups, using the Q-between test (Hedges & Olkin, 1985).
Validity was estimated using uncorrected correlation coefficients, and then the average
correlation was corrected for attenuation due to criterion reliability. Because reliability
information was available for only eight of the samples (total N = 331), the corrections were
conducted at the aggregate level, using the average inter-rater reliability of the criterion measures
(M=.82, SD = .06). Adjustments to the variance of validities due to criterion unreliability were
found to be trivial, and are not reported here. Few studies provided enough information to
investigate the effect of range restriction, and therefore correction for range restriction was not
applied.
Supplemental analyses were conducted to assess the sensitivity of our results to
publication bias. Publication bias occurs when the sample of studies available for the analysis
differs systematically from the full body of research on a topic. For example, validation studies
Individual Assessment Meta-Analysis 24
that produce non-significant findings may be less likely to be published, and organizations that
conduct validation studies on their own assessments may be less likely to share unfavorable
technical reports. We conducted two types of publication bias analyses. First, we used a visual
inspection of funnel plots to identify patterns of asymmetry consistent with publication bias
(Sterne, Becker & Egger, 2005). Second, we used a trim-and-fill analysis to estimate what the
average validity would be if the hypothetical missing studies were included in the meta-analysis
(Duval & Tweedie, 2000).
The number of studies included in many of the subgroup analyses was fairly small,
increasing the chance that a single extreme study could substantially influence the results.
Therefore, outlier analyses were conducted on both the overall analysis and each of the
moderator tests. A study was considered an influential data point the absolute value of the
studentized residual was greater than 2.5 and either the Cook’s D or the standardized dfbeta
statistic was greater than 1.0 (Viechtbauer & Cheung, 2010).
Results
Overall Meta-Analysis
Table 1 shows the results of the meta-analyses conducted across all available studies.
Separate analyses were conducted for subjective and administrative performance outcomes, with
several samples appearing in both analyses because they reported both types of criteria.
---------------------------------
Insert Table 1 about here
---------------------------------
Individual Assessment Meta-Analysis 25
The first meta-analysis estimated the validity of all individual assessments for predicting
ratings of job performance. The estimate of the mean corrected validity was .30 (SDρ = .12)
across 37 studies and a total sample size of 3922. A 95% confidence interval of .24 - .35
indicates that we can be quite confident the true average validity of individual assessments
exceeds zero. The Q-within test was significant and the I2 statistic of 62% indicated most of the
observed variability in validity was due to true difference and not sampling error. The 95%
credibility interval of .07 - .50 suggests that although individual assessments were predictive of
subjective performance ratings in most studies, the strength of the relationship varied
considerably. Thus, moderator analyses are in order.
The second meta-analysis was conducted to assess the mean validity of individual
assessments predicting administrative decisions, such as change in job level and salary. This
analysis included 9 correlations with a total sample size of 600. Because reliability information
was not available for the administrative decisions, no correction for measurement error was
applied. The mean validity for individual assessments was estimated at .17. The 95% confidence
interval of .06 - .28 indicates that we can be confident the true average validity of individual
assessments predicting administrative decisions is positive. The Q-within test indicated no
significant variance between studies. Outlier analyses identified a single influential study
(Gaudet, 1957), and when this study was removed, the validity dropped to .13. Due to the lack of
significance variance across studies, as well as the limited number of studies, moderator analyses
were not conducted for administrative criteria.
Moderator Analyses for Subjective Criteria
As can be seen in Table 2, the moderator variables were somewhat inter-correlated.
Individual assessments that reported using cognitive ability tests were also likely to report use of
Individual Assessment Meta-Analysis 26
a personality scale. Assessments practices that had a standardized battery were also more likely
to use multiple assessors for each candidate. Assessments that used the same assessor across all
candidates were less likely to report using an interview. Assessments used for managerial
candidates were more likely to use the same procedure across candidates, and were less likely to
use a secondary source for the recommendation. The negative correlations between publication
type and assessment tools reflect the fact that unpublished studies were fairly consistent in using
all three assessment tools (cognitive ability, personality and interview), whereas published
studies were more variable in type of tools used. Unpublished studies were also more likely to
use the same battery across candidates and multiple assessors, and were less likely to use the
same assessor for all candidates.
The results of the moderator analyses are reported in Table 3. The moderator variable can
be organized into four categories: the content of the assessment, the degree of structure,
occupation and methodological factors.
---------------------------------
Insert Table 2 about here
---------------------------------
---------------------------------
Insert Table 3 about here
---------------------------------
Assessment Content
An initial descriptive analysis was conducted to characterize the types of assessment
methods used in the individual assessments. For this analysis, an assessment tool was coded as
being present if it was mentioned in the research report and absent if it was not mentioned.
Individual Assessment Meta-Analysis 27
However, because many of the reports provided only a partial description of the assessment
battery, the results reported here may underestimate the use of these methods. Most studies
reported the use of a measure of cognitive ability (84%) and to a lesser extent a personality scale
(68%). Use of personal history or biodata was reported less often (22%). Most assessment
practices included an interview (78%).
The first four moderator analyses examined whether the validity differed depending on
whether a specific assessment tools was used in the assessment process. Consistent with
Hypothesis 1a, individual assessments that included a cognitive ability test had higher corrected
validity (.32) than those without a cognitive ability test (.14), Q(1, K=37) = 4.3, p<.05. It should
be noted, however, that the sample of studies with no cognitive ability test was quite limited (6
studies, N = 385). Further, an outlier analysis identified Russell (2001) as an influential data
point. This study had a substantially higher validity (.48) than other studies without a cognitive
ability test. Removing this study produced a lower validity for assessments without cognitive
ability (.05), and did not change the significance of the moderator test.
Hypotheses 1b-1d were not supported. No significant differences were found for
individual assessments that included personality or biodata. There was an interesting trend such
that individual assessments that were based on the assessors’ interpretation of a test battery with
no interview showed higher validity (corrected r = .42, k=8) than those that included an
interview (corrected r = .27, k=29); however the difference was not statistically significant, Q (1,
K=37) =2.0, ns. An outlier analysis identified two influential studies (Miner, 1970, study 1 and
study 4). These two samples showed substantially lower validity than other studies with no
interview (-.04 and -.14 for study 1 and study 4, respectively). Removing these two studies
increased the mean validity for non-interview studies to .64, and yielded a significant moderator
Individual Assessment Meta-Analysis 28
test. However, it should be noted that the number of studies without an interview was quite small
(8 studies, N = 304), and therefore this trend should be interpreted with caution.
Degree of Structure
Three moderator analyses examined whether more structured individual assessment
practices were associated with higher validity. Hypothesis 2 predicted that assessment protocols
that administered the same assessment battery to all candidates would produce higher validity
than those for which the assessment content differed across candidates. The non-significant
moderator test indicated that the validities for these moderator groups were not different from
one another, Q(1, K=32) = 0.0, ns. As such, there was no support for Hypothesis 2.
Hypothesis 3 predicted that practices with multiple assessors of each candidate would
show higher validity than those with a single assessor per candidate. The difference was not
significant, Q-between (1, K=31) = 0.9, ns. Thus, Hypothesis 3 was not supported.
Hypothesis 4 predicted that use of the same assessor across all candidates would lead to
higher validity than use of different assessors for different candidates. A significant moderator
test indicated support for this Hypothesis 4, Q (1,K=32) = 4.4, p<.05. The mean corrected
validity for studies using the same assessor/s across all candidates was .44, which was
substantially higher than the validity of .27 for the studies that used different assessors across
candidates.
Occupation
Another possible moderator of individual assessment validity involves the occupation of
the study participants, which we classified into managerial and non-managerial occupations. The
significant moderator test supports the presence of differences across occupational groups, Q(1,
K=37) = 5.6, p<.05. For managers, the mean corrected validity was .35, whereas for non-
Individual Assessment Meta-Analysis 29
managers, the mean corrected validity was .21. Considerable variability remained within each of
the occupational subgroups (SDρ = .15).
These findings clearly indicate that individual assessments are most effectively used to
predict managerial performance. The results also show a positive relationship with performance
ratings for non-managerial jobs, only to a lesser degree than managerial jobs.
Methodological Factors
We investigated two methodological features of the validation studies as moderators of
validity. Hypothesis 5 suggested that validity would be higher in designs in which the
recommendation came directly from the assessor, rather than from a secondary source who
provided a rating after reading a narrative assessment report. The moderator test was not
significant, Q(1, K=31) = 1.3, ns, and validities were quite similar for the two types of analysis.
Hypothesis 5 was not supported.
An additional analysis was conducted to compare validities for performance ratings
collected for research purposes to those based on administrative performance ratings. We did not
find a significant difference between administrative and research-based performance ratings,
Q(1, K= 32) = 0.3, ns.
Meta-regression Analysis
The interpretation of the moderator analyses is potentially complicated by the
correlations among the moderator variables. However, none of the significant moderators were
found to be significantly correlated with one another. Nevertheless, a meta-regression analysis
was conducted including two of the moderators simultaneously (occupation and use of the same
assessor across candidates). The third significant moderator, whether the battery included a
cognitive ability assessment, was excluded from this analysis due to the small number of studies
Individual Assessment Meta-Analysis 30
with no cognitive ability test. In the meta-regression analysis, both moderators showed an effect
similar to that found in the separate moderator analyses. Validities were .12 higher for
managerial jobs than for non-managerial jobs, p <.05. Similarly, individual assessments that used
the same assessor across candidates had validities .14 higher than those with different assessors,
p <.05.
Publication Bias Analysis
Publication bias occurs when the sample of studies available for a meta-analysis is
unrepresentative of the full body of research on the topic. The editorial practices of journals can
make it less likely for studies with non-significant findings to be published. Similarly,
unpublished research with non-significant findings may be less accessible because researchers
may be less motivated to write-up or share unfavorable results. Both processes can tend to
produce a distribution of effect sizes where studies that have lower (closer to zero) validity and
smaller sample size are less likely to be included, leading to a potential upward bias in the mean
effect size.
Several methods can be used to examine the presence of publication bias. First, we
conducted a moderator analysis comparing the results from journal articles to those from other
sources (see Table 3). The results do not suggest that a difference between published and
unpublished studies, Q (1, K = 37) = 1.1, ns. Validities from journal articles (corrected r = .33)
were similar to those from unpublished sources (corrected r = .26). However, the comparison of
published to unpublished findings is a limited approach to detect publication bias, because it fails
to account for the degree of publication bias within each subgroup (Kepes, Banks, McDaniel &
Whetzel, 2012).
Individual Assessment Meta-Analysis 31
Another method to detect publication bias is through visual inspection of funnel plot
asymmetry (Sterne et al., 2005). A funnel plot graphs the uncorrected validity against precision
(the inverse of the standard error). If there is no publication bias, one would expect the funnel
plot to be symmetrical, with a wide range of validities at the bottom and narrowing toward the
top. To the extent that publication bias causes exclusion of non-significant results, the funnel plot
will be asymmetric, with missing validities in the lower left part of the graph. The funnel plot for
the subjective criteria analysis is consistent this pattern (see Figure 1), suggesting that
publication bias is a potential concern with this data.
---------------------------------
Insert Figure 1 about here
---------------------------------
A third approach to examining publication bias, the trim and fill method (Duval &
Tweedie, 2000), imputes additional validities in order to make the funnel plot symmetric, and
then examines the impact of these studies on the mean validity. The difference between the
original validity and the trim and fill estimate indicates the sensitivity of the results to
publication bias.
Consistent with the funnel plot, the trim and fill analysis imputed nine studies on the left
side of the distribution of effect sizes for subjective performance ratings. The results indicated
that the original uncorrected validity estimate of .27 may be inflated by as much as 29% in
comparison to the trim and fill adjusted validity estimate of .21. Still, even after correction for
publication bias, a positive correlation was found between assessor recommendations and
subjective performance ratings, albeit a weaker relationship than indicated in the initial analysis.
Individual Assessment Meta-Analysis 32
Sterne et al. (2005) noted that trim and fill analysis may be inaccurate when there is
substantial between study variability (Sterne et al., 2005). When moderators are present, it is
better to conduct the publication bias analysis on moderator subgroups. Following the
recommendation of Kepes, Banks, McDaniel and Whetzel (2012), we report trim and fill
analyses only for moderator subgroups with at least 10 studies.
Many of the subgroup analyses showed asymmetry consistent with publication bias. As
shown in Table 3, the trim and fill analysis for most of the moderator subgroups imputed
validities on the left, and in several cases the number of imputed studies was substantial relative
to the number of actual studies.
In many subgroups, the trim and fill imputed validity was over 20% smaller than the
original estimate. For example, the trim and fill estimate for managerial occupations was .24,
which is substantially lower than the original uncorrected validity of .32. None of these
reductions was large enough to change the conclusion about the usefulness of the individual
assessments. The trim and fill estimates still represented useful levels of validity, and the pattern
of differences across moderator subgroups was similar. Nevertheless, the results suggest that
many of the estimates from this analysis may be slightly inflated.
Discussion
Despite the popularity of individual assessment for employee selection, it has received
much less scrutiny in the research literature than other selection approaches. Few published
reports of criterion-related validity are available, and much of this research was conducted about
half a century ago. The current meta-analysis compiled the limited evidence from the published
literature, along with a number of more recent unpublished validation reports. While necessarily
Individual Assessment Meta-Analysis 33
limited by the sparse research base, the current study represents the most comprehensive
summary to date of the research on individual assessment validity.
Overall, individual assessments were found to have moderate levels of validity for
predicting job performance. The average corrected validity for subjective performance ratings
was .30, and validity was higher (.35) for managerial jobs. These values represent useful levels
of predictive validity and are comparable to many other widely use employee selection methods.
Validity was lower for administrative criteria (e.g., pay raise and promotion rates), with an
uncorrected validity of .17.
Because individual assessments typically include a battery of tests and an interview, it is
useful to compare the resulting validity to what would be expected for the components of an
assessment battery if they were interpreted without the aid of assessor judgment. The data
available for the meta-analysis did not permit an assessment of the component test validity in
these particular studies; however, the existing literature provides considerable information on
typical validities for these methods.
Most of the individual assessment batteries we examined included a cognitive ability test
and an interview, and many also included a personality scale and a biodata measure. Based on a
summary of the available meta-analytic evidence, Bobko, Roth and Potosky (1999) report
average validities of .51 for cognitive ability tests, .48 for structured interviews, .22 for
conscientiousness, and .32 for biodata. An optimally-weighted composite of these four predictors
would be expected to have a validity of .63 (De Corte, Lievens & Sackett, 2008).
When placed in the context of the validity of the components, the support for individual
assessment was only modest. That is, although there is evidence for the predictive validity of
individual assessments, these validities do not exceed what could be obtained from use of a
Individual Assessment Meta-Analysis 34
cognitive ability test or structured interview alone, and the overall estimate of .30 is similar to
what could be obtained from a self-report biodata measure.
A defining feature of the individual assessment is the expert assessor who administers,
interprets and integrates the results of the battery. The literature on individual assessment has
pointed to several potential benefits of the assessment process that go beyond the content of the
assessment battery. First, through personal interaction with the candidate, an individual assessor
may be able to obtain information not available through standardized tests. The assessor is able
to form and test hypotheses, and probe for additional information to clarify discrepancies (Silzer
& Jeanneret, 2011). Further, individual assessments often include lengthy and intensive
interviews, which may reduce the impact of impression management strategies (Tsai, Chen &
Chiu, 2005).
Our results suggest that the benefits derived from the assessor’s interaction with the
candidate may be limited. Individual assessments that included an interview, providing an
opportunity for the assessor to interact personally with the candidate, did not show higher
validity than individual assessments based only on a battery of tests. In fact, individual
assessments conducted without an interview showed slightly higher validity than those with an
interview, although this difference was not statistically significant. Although intriguing, these
results should be interpreted with caution, given the limited size of the research base for
assessments where there was no interview (i.e., there were only 8 studies with a total sample size
of 304). Nevertheless, these results do not support the idea that assessors add unique insights to
the evaluation of candidates by observing subtle behavioral cues or through the application of
hypothesis testing during the interview (Silzer & Jeanneret, 2011).
Individual Assessment Meta-Analysis 35
The ability of expert assessors to interpret complex patterns of behavior and apply
configural decision rules is at the same time a hallmark of individual assessment (Prien et al.,
2003) and a source of considerable debate (Highhouse, 2008). According to Silzer and Jeanneret
(2011), effective assessors are able to interpret individual data points in the context of broader
patterns of behavior or psychological constructs. For example, a deeper understanding of the
competencies developed through work experience can be obtained by looking at how work
experiences build on each other, rather than in isolation (Dragoni, Oh, Vankatwyk & Tesluk,
2011) Further, the assessor can take situational factors and context into account and consider so-
called ‘broken leg’ cues when interpreting specific pieces of information (McPhail & Jeanneret,
2011).
Unfortunately, there is a lack of evidence that experts are able to reliably and validly
apply configural rules and form accurate judgments based on complex patterns of behavior.
Contrary to the idea that assessors provide more sophisticated interpretation of the data, research
suggests that expert judgments tend to focus on only a few salient cues (Hastie & Dawes, 2001).
Decades of research has consistently shown that trained experts cannot outperform simple linear
models in prediction tasks, and may actually be less accurate than a mechanical combination of
the available information (Ægisdóttir et al., 2006; Grove & Meehl, 1996). Although the
incremental validity of assessor judgments was not tested in the current study, our results are
consistent with the literature on statistical versus mechanical prediction, in that the validity of
assessor recommendations was not greater that would be expected from the component tests.
Even if assessors do not add to predictive validity, individual assessments may still be
useful in situations where it would not be practical to develop a mechanical decision process.
Developing an optimal linear model requires conducting a large-sample criterion-related validity
Individual Assessment Meta-Analysis 36
study to obtain optimal predictor weights. This approach will often be infeasible for upper-level
management positions, where there may only be one individual in a particular job and only a
handful of applicants (Guion, 1998; Hollenbeck, 2009; Ryan & Sackett, 1998). An advantage of
individual assessment is the ability of the assessor to provide recommendations tailored to the
organization’s needs and culture, even in situations where criterion-related approaches are not
feasible.
Surveys have shown that practitioners of individual assessment vary considerably in their
approaches (Ryan & Sackett, 1987, 1992). Our results suggest that these differences may have
important consequences. The validity of assessor ratings was found to vary considerably across
studies. A 95% credibility interval suggests that the validity for predicting job performance
ratings ranged from essentially zero to as high as .50. Similarly, moderator analyses yielded
mean corrected validity for subgroups ranging from low (ρ = .14) to moderate (ρ = .44). Thus,
choices in how assessments are conducted may have led to substantially higher or lower validity.
The selection literature suggests several features of IPA that may enhance validity,
including the content of the assessments, the standardization of data collection, and the process
used to integrate information into the final assessment report. In terms of content, a substantial
body of research exists on the predictor constructs that are likely to be useful components of the
assessment battery. This research consistently shows strong validity for tests of general cognitive
ability (Schmidt & Hunter, 1998). Other predictor constructs such as conscientiousness,
integrity, and prior experience have more modest validity, but nevertheless to add incremental
validity beyond cognitive ability (Schmidt & Hunter, 1998). Our results partly support these
recommendations, in that higher validity was found for assessments involving a cognitive ability
compared to those that did not measure cognitive ability. The reader should note, however, that
Individual Assessment Meta-Analysis 37
data on assessments without a cognitive ability test was quite limited (i.e., 6 studies with a total
sample size of 385), and therefore the comparison should be interpreted with caution.
In contrast, the inclusion of personality or biodata measures was not associated with
higher validity. It may be that the multi-method nature of individual assessments lessened the
importance of including specific measures for these constructs. Given the widespread use of
personality and background information in evaluating candidates, it is unlikely that these
constructs were completely absent from any of the assessments. Notably, a survey of individual
assessment practice found that most collect personal history information (Ryan & Sackett, 1987),
and yet only 22% of the studies in the current meta-analysis reported using a personal history
form. For situations were a standardized measure was not administered, the assessor may have
gathered information about prior experience through the interview. Thus, the assessors in most
situations would have access to information about prior training, education and work experience,
regardless of whether a separate personal history form was administered. Similarly, most
assessors likely drew some inferences about personality constructs as part of the interview, even
when no standardized personality questionnaire was administered. In both cases, the ability to
obtain relevant information from multiple methods may have compensated for the absence of
specific measures.
Another feature that differentiates among assessment practices is the degree of
standardization of the assessment process. The literature on employment interviews has
consistently shown greater validity for more structured interviews (Campion et al., 1997; Arvey
& Campion, 1982; Harris, 1989; Huffcutt & Arthur, 1994). However, our analysis found mixed
results regarding the impact on structure of individual assessment validity.
Individual Assessment Meta-Analysis 38
The only aspect of structure found to impact validity was the use of the same assessor
across candidates. Predictions were most valid when the same assessor or assessment panel was
used to make predictions about the entire pool of candidates. Although the comparison was
significant, it is notable that limited data was available on studies using the same assessor across
candidates (i.e., only 9 studies with a total sample size of 421).
The use of a common assessor has been identified in the interview literature as a way to
enhance reliability and validity (Campion et al., 1997; Huffcutt & Woehr, 1999). Having a single
assessor across candidates provides a consistent frame of reference and removes differences due
to idiosyncratic judgments by different assessors. Given evidence for low agreement among
practitioners of individual assessment (Ryan & Sackett, 1989), the use of different assessors for
different candidates is likely to add construct-irrelevant variance to suitability judgments
(O’Brien & Rothstein, 2011), thereby reducing validity.
At the same time, the use of a single assessor for all candidates will be useful only if the
individual selected to conduct the assessments is known to provide accurate judgments. Research
on employment interviews suggests that some individuals may make more valid judgments than
others (Dreher, Ash & Hancock, 1988), although research on this topic is mixed (Pulakos,
Schimitt, Whitney & Smith, 1996; Van Iddekinge, Sager, Burnfield & Heffner, 2006). Therefore,
the selection of effective assessors is critical for ensuring validity (Silzer & Jeanneret, 2011).
Further, when different assessors are to be used for different candidates, training to provide a
common frame of reference may enhance validity (Lievens, 2001).
Another strategy to reduce the impact of idiosyncratic rater effects would be to have each
candidate evaluated by multiple assessors. This practice has been recommended in both the
interview (Campion et al., 1997) and assessment center literatures (International Task Force on
Individual Assessment Meta-Analysis 39
Assessment Center Guidelines, 2009). When data are combined across multiple assessors, the
judgment errors and biases associated with any one assessor will be minimized, resulting in more
reliable recommendations and reducing construct-irrelevant variance.
The benefits of using multiple assessors, however, were not supported by the current
meta-analysis. Practices with multiple assessors were not found to be more valid than those
based on a single assessor. Despite the apparent advantages of using multiple assessors, prior
research in the context of employment interviews has been mixed. In fact, a meta-analysis by
Huffcutt and Woehr (1999) found lower validity for panel interviews than for those with a single
interviewer. It may be that use of multiple assessors has negative consequences that counteract
the benefits of multiple raters. For example, interacting with multiple assessors may place
increased demands on the candidate and increase the stressfulness of the assessment processes
(Campion et al., 1997).
The usefulness of having multiple assessors may depend on how information from the
multiple sources in combined to obtain the final recommendation. The benefits of multiple
perspectives may be limited if a single assessor is responsible for integrating data from the
multiple sources. Research on assessment centers has found that both mechanical approaches
(e.g., average ratings across assessors) and decisions based on consensus discussion can produce
valid recommendations (Pynes & Bernardin, 1992). Ideally, a consensus-based decision will give
equal weight to all assessors; however, in practice the assessors may differ in their influence
(Sackett & Wilson, 1982), and this may undermine the benefits of a multiple assessor approach.
Contrary to expectations, standardization of the assessment protocol did not moderate the
validity of individual assessments. It was expected that assessment protocols with a standardized
Individual Assessment Meta-Analysis 40
battery would be more predictive of performance than those that differed somewhat in the
process, but the results showed that the validities of these approaches were quite similar.
Two aspects of the individual assessments may allow reliable and accurate assessments
with a less structured process. Most individual assessors form judgments by integrating
information from multiple tests, simulations and an interview. The redundancy inherent in this
approach may make the consistent application of any one component less critical. Further, the
training and experience of the assessor may allow them to compensate for lack of perfectly
consistent sources of information.
As such, the consistency of the assessment battery may be less important than the
structure of the judgment and information integration process. If assessors are not using
information in consistent fashion, it does not matter if the input is the same. McPhail and
Jeanneret (2011) discuss three important ways in which structure could be built into the
judgment process. First, the assessment dimensions should be clearly linked to a systematic
analysis of the job. Second, there should be a clear mapping of assessment tools to dimension
ratings. Third, assessors should use well-defined behavioral rating scales for recording
observations from simulation exercises. The use of rating forms could also be beneficial for
recording impressions during interviews (Campion et al., 1997).
In addition to these assessment design features, occupation was also found to moderate
validity. Individual assessments were more valid for predicting supervisory and managerial
performance than they were for other occupations. This finding is important given the
widespread use of individual assessments for managerial and executive selection (Thornton,
Hollenbeck, & Johnson, 2010). Selection for high-level managerial positions creates unique
challenges due to the small number of candidates considered for a potion, and the highly
Individual Assessment Meta-Analysis 41
individualized nature of job requirements (Hollenbeck, 2009). The flexibility of individual
assessments makes them well-suited for this context. Although the strong validity results for
managerial occupations are supportive of this practice, it is important to note that the managerial
jobs included in our meta-analysis were mostly lower- and mid-level. Additional research is
needed to determine whether these findings will generalize to executive-level positions.
To some extent, the higher validity for managerial occupations may be a result of
occupational differences in the validity of the assessment tools used in the assessment. Cognitive
ability tests, which were included in most of the individual assessments we studied, tend to show
higher validity for high complexity jobs, such as management. Similarly, the personality trait of
extroversion tends to show higher validity for managerial jobs (Barrick & Mount, 1991).
Consequently, an assessment battery with a cognitive ability test and a Big 5 personality scale
would be expected to perform well for managerial jobs.
Limitations and Future Research
As with all research, this study is not without limitations. The most obvious limitation of
this study is the small number of samples available for inclusion. The current meta-analysis
included only 39 studies, which made interpretation of results somewhat difficult. This,
combined with the heterogeneity of validity across studies, means that the power of moderator
tests was low. It is also important to note that over half of the included studies were published
before 1980, so the results may reflect outdated individual assessment practices.
A related issue is the considerable evidence of publication bias. It is possible that the
results of this study may reflect a biased subset of individual assessment validation studies.
While publication bias typically refers to a tendency for published articles to over-represent
significant findings, our data revealed a different pattern. Evidence of publication bias was more
Individual Assessment Meta-Analysis 42
pronounced among unpublished rather than published research reports. Although efforts were
made to identify both published and unpublished validation reports, it is possible that additional
unpublished studies were conducted that were never written up or where vendors of individual
assessments chose not to share research reports with unfavorable findings (McDaniel, Rothstein
& Whetzel, 2006).
The trim and fill analysis (Duval & Tweedie 2000) indicated that publication bias may
have inflated the average validity in several of the moderator subgroups. Consequently, the true
validity may be slightly lower than indicated in our results. However, even after accounting for
this effect, the individual assessment still demonstrated useful levels of validity.
Another obvious limitation of this study concerns the coding of the studies. Though the
authors attempted to code the studies for all possible moderators, many of the studies did not
provide sufficient information to properly categorize the samples. As a result, many of the
studies were placed into “unknown” categories, which made the interpretation of results
somewhat ambiguous.
Along the same lines, limited information was provided concerning the range restriction
information for the included studies. It has been suggested that candidates in individual
assessment situations are likely to have been through several screening hurdles prior to the
assessment, so psychologists rarely see the entire applicant pool (Ryan & Sackett, 1998).
However, because the primary researchers rarely reported range restriction information, we were
unable to develop an artifact distribution for range restriction to correct for this artifact. As a
result, it is possible that the corrected mean validities derived in the current studies are
underestimates of the true validity of individual assessments.
Individual Assessment Meta-Analysis 43
Above anything else, the small number of studies available for inclusion in the current
study underscores the need for additional validation research in the area of individual
assessment. Though individual assessments have become a major source of income for many
practitioners, very few validation studies have been conducted on the subject. Although
admittedly it is difficult to conduct validation studies for assessments that are typically done with
small sample sizes, the literature provides suggestions concerning how practitioners can
conceptualize validation studies to address this limitation (Prien et al., 2003; Ryan & Sackett,
1998).
Several questions remain that can be addressed in future studies. While recent literature
has given increased attention to individual assessment (McPhail & Jeanneret, 2011; Silzer &
Jeanneret, 2011), additional research is needed to identify the practices that are most effective.
Although our findings did not strongly support benefits of increased structure, additional
research on this topic may be informative. Practices that have been found to enhance assessor
judgments in assessment centers (e.g., Lievens, 2001) may prove useful if applied to the design
of individual assessments as well. Similarly, future research should examine whether training of
assessors can improve the consistency of information interpretation and integration.
Like interviews or assessment centers, individual assessment represents a data collection
method that can be used to assess a variety of constructs (Arthur & Villado, 2008). Although
assessors commonly form judgments about multiple dimensions, the current study only
examined overall recommendations. Future research should explore the specific constructs
assessed in individual assessment and examine both the construct and predictive validity of more
specific assessor judgments.
Conclusion
Individual Assessment Meta-Analysis 44
Overall, the results indicate that individual assessments are useful predictors of job
performance, especially for managerial positions. However, the wide range of validities suggests
that not all assessment practices are equally valid. Additional research is needed to identify the
features of individual assessments that are most effective. We hope that this work will prompt
more researchers and practitioners to conduct validation studies on individual assessments and to
publish their results. Building the empirical literature on individual assessment validity will
provide a foundation for improving the effectiveness of this important area of practice.
Individual Assessment Meta-Analysis 45
References
References marked with an asterisk indicate studies included in the meta-analysis.
Aamodt, M. G. (2004). Special issue on using MMPI-2 scale configurations in law enforcement
selection: Introduction and meta-analysis. Applied H.R.M. Research, 9, 41-52.
Ægisdóttir, S., White, M. J., Spengler, P. M., Maugherman, A. S., Anderson, L. A., Cook, R. S.,
Nichols, C. N., Lampropoulos, G. K., Walker, B. S., Cohen, G., & Rush, J. D. (2006).
The Meta-Analysis of Clinical Judgment Project: Fifty-Six Years of Accumulated
Research on Clinical Versus Statistical Prediction. The Counseling Psychologist, 34, 341-
382.
*Albrecht, P. A., Glaser, E. M., & Marks, J. (1964). Validation of a multiple-assessment
procedure for managerial personnel. Journal of Applied Psychology, 48, 351-360.
Avery, R. D., & Campion, J.E. (1982). The employment interview: A summary and review of
recent research. Personnel Psychology, 35, 281-322.
Arthur, W., & Villado, A. J. (2008). The importance of distinguishing between constructs and
methods when comparing predictors in personnel selection research and practice. Journal
of Applied Psychology, 93, 435-442.
*Barnett, R., & Beatty, A. (2013, April). The validity of assessor judgment in individual
psychological assessment. Paper presented at the 28th annual conference of the Society
for Industrial and Organizational Psychology, Houston, TX.
Barrick, M. R. & Mount, M. K. (1991). The big five personality dimension and job performance:
A meta-analysis. Personnel Psychology, 44, 1-26.
Individual Assessment Meta-Analysis 46
Bobko, P., Roth, P. L., & Potosky, D. (1999). Derivation and implications of a meta-analytic
matrix incorporating cognitive ability, alternative predictors, and job performance.
Personnel Psychology, 52, 561589.
Bommer, W. H., Johnson, J.L., Rich, G. A., Podsakoff, P. M., & MacKenzie, S. B. (1995). On
the interchangeability of objective and subjective measures of employee performance: A
meta-analysis. Personnel Psychology, 48, 587-605.
Butcher, J. N. (1994). Psychological Assessment of Airline Pilot Applicants With the MMPI-2.
Journal of Personality Assessment, 62, 31-44.
*Campbell, J. T., Otis, J. L., Liske, R. E., & Prien, E. P. (1962). Assessments of higher-level
personnel: II. Validity of the over-all assessment process. Personnel Psychology, 15, 63-
74.
Campion, M. A., Pursell, E. D., & Brown, B. K. (1988). Structured interviewing: rasing the
psychometric properties of the employment interview. Personnel Psychology, 41, 25-42.
Campion, M. A., Palmer, D. K., & Campion, J. E. (1997). A review of structure in the selection
interview. Personnel Psychology, 50, 655-702.
Conway, J. M., Jako, R. A., & Goodman, D. F. (1995). A meta-analysis of interrater and internal
consistency reliability of selection interviews. Journal of Applied Psychology, 80, 565-
579.
Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science,
243, 1668-1674.
De Corte, W., Lievens, F., & Sackett, P. R. (2008). Validity and adverse impact potential of
predictor composite formation. International Journal of Selection and Assessment, 16,
183194.
Individual Assessment Meta-Analysis 47
*DeNelsky, G. Y., & McKee, M. G. (1969). Prediction of job performance from assessment
reports: Use of a modified Q-sort technique to expand predictor and criterion variance.
Journal of Applied Psychology, 53, 439-445.
*Dicken, C., & Black, J. (1965). Predictive validity of psychometric evaluation of supervisors.
Journal of Applied Psychology, 49, 34-47.
Dipboye, R. L. (1992). Selection interviews: Process perspectives. Cincinnati, OH:
Southwestern.
Dragoni, L., Oh, I-S., Vankatwyk, P., & Tesluk, P. E. (2011). Developing executive leaders: The
relative contribution of cognitive ability, personality, and the accumulation of work
experience in predicting strategic thinking competency. Personnel Psychology, 64, 829-
864.
Dreher, G. F., Ash, R.A., & Hancock, P. (1988). The role of the traditional research design in
underestimating the validity of the employment interview. Personnel Psychology, 41,
315-325.
*Dunnette, M. D., & Kirchner, W. K. (1958). Validation of psychological tests in industry.
Personnel Administration, 21, 20-27.
Duval, S. J., & Tweedie, R. L. (2000). A nonparametric "Trim and Fill" method of accounting
for publication bias in meta-analysis. Journal of the American Statistical Association, 95,
89-98.
*Francoeur, K. A., Schnur, A. C., Bell, D. & Kinney, T. B. (2010, April). Using Individual
Assessment to Predict Executive “Promotability." Paper presented at the 25th annual
conference of the Society for Industrial and Organizational Psychology, Atlanta, GA.
Individual Assessment Meta-Analysis 48
*Gaudet, F. J. (1957). A study of psychological tests as instruments for management evaluation.
(pp. 17-27). In American Management Association, Inc., Executive selection,
development and inventory (Personnel Series No. 171). NY: American Management
Association.
*Goudy, K., & Sowinski, D. (2010, April). Individual Assessment Validation: Using Assessment
to Identify High Performance and Potential. Paper presented at the 25th annual
conference of the Society for Industrial and Organizational Psychology, Atlanta, GA.
Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal (subjective,
impressionistic) and formal (mechanical, algorithmic) prediction procedures: The
clinical-statistical controversy. Psychology, Public Policy, and Law, 2, 293-323.
Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical versus
mechanical prediction: A meta-analysis. Psychological Assessment, 12, 19-30.
Guion, R. M. (1998). Assessment, measurement, and prediction for personnel decisions.
Mahwah, NJ: Lawrence Erlbaum Associates.
Hakel, M. D. (1982). Employment interviewing. In K. Rowland & G. Ferris (Eds.), Personnel
Manager (pp. 129-155). Boston: Allyn and Bacon.
*Handyside, J. D., & Duncan, D. C. (1954). Four years later: A follow-up of an experiment in
selecting supervisors. Occupational Psychology, 28, 9-23.
Harris, M. M. (1989). Reconsidering the employment interview: A review of recent literature
and suggestions for future research. Personnel Psychology, 42, 691-726.
Hastie, R., & Dawes, R. M. (2001). Rational choice in an uncertain world: The psychology of
judgment and decision making. Thousand Oaks, CA: Sage.
Individual Assessment Meta-Analysis 49
*Hausknecht, J. P., Langevin, A. M., & Schruhl, J. (2011). Working paper: Predictors of
executive success (Unpublished manuscript). Cornell University, Ithaca, NY.
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FL: Academic
Press.
Higgins, J., & Thompson, S. G. (2002). Quantifying heterogeneity in meta-analysis. Statistics in
Medicine, 21, 15391558.
Highhouse, S. (2002). Assessing the candidate as a whole: A historical and critical analysis of
individual assessment for personnel decision making. Personnel Psychology, 55, 363-
396.
Highhouse, S. (2008). Stubborn reliance on intuition and subjectivity in employee selection.
Industrial and Organizational Psychology: Perspectives on Science and Practice, 1, 333-
342.
*Hilton, A. C., Bolin, S. F., Parker, J. W., Taylor, E. K., & Walker, W. B. (1955). The validity of
personnel assessments by professional psychologists. Journal of Applied Psychology, 39,
287-293.
Hollenbeck, G. P. (2009). Executive selection What’s right … and what’s wrong. Industrial
and Organizational Psychology: Perspectives on Science and Practice, 2, 130-143.
*Holt, R. R. (1958). Clinical and statistical prediction: A reformulation and some new data.
Journal of Abnormal and Social Psychology, 56, 1-12.
Huffcutt, A. I., & Arthur, W. (1994). Hunter and Hunter (1984) revisited: Interview validity for
entry-level jobs. Journal of Applied Psychology, 79, 184-190.
Individual Assessment Meta-Analysis 50
Huffcutt, A. I., Conway, J. M., Roth, P. L, & Klehe, U. (2004). The impact of job complexity
and study design on situational and behavior description interview validity. International
Journal of Selection and Assessment, 12, 262273.
Huffcutt, A. I., & Woehr, D. J. (1999). Further analysis of employment interview validity: A
quantitative evaluation of interviewer-related structuring methods. Journal of
Organizational Behavior, 20, 549-560.
Hunter, J. E. & Hunter, R. F. (1984). Validity and utility of alternative predictors of job
performance. Psychological Bulletin, 96, 72-98.
Hunter , J. E. & Schmidt, F. L. (2004). Methods of Meta-Analysis: Correcting Error and Bias in
Research Findings (2nd Edition). Newbury Park, CA: Sage Publications.
*Huse, E. F. (1962). Assessments of higher-level personnel: IV. The validity of assessment
techniques based on systematically varied information. Personnel Psychology, 15, 195-
205.
International Task Force on Assessment Center Guidelines (2009). Guidelines and Ethical
Considerations for Assessment Center Operations, International Journal of Selection and
Assessment, 17, 243-253.
Jeanneret, R. & Silzer, R. (1998). An overview of individual psychological assessment.In R.
Jeanneret & R. Silzer (Eds.), Individual Psychological Assessment: Predicting behavior
in organizational settings (pp. 3-26). San Francisco: Jossey-Bass.
Kepes, S., Banks, G. C., McDaniel, M. A., & Whetzel, D. L. (2012). Publication bias in the
organizational sciences. Organizational Research Methods, 15, 624-662.
Kinslinger, H. J. (1966). Application of projective techniques in personnel psychology since
1940. Psychological Bulletin, 66, 134-149.
Individual Assessment Meta-Analysis 51
*Korn Ferry (2011). Unpublished technical report. Minneapolis, MN: Author.
Kuncel, N. R., Klieger, D. M., Connelly, B. S., & Ones, D. S. (2013). Mechanical versus clinical
data combination in selection and admissions decisions: A meta-analysis. Journal of
Applied Psychology, 98, 1060-1072
*Kwaske, I. H. (2006). An exploratory, multi-level study to validate individual psychological
assessments for entry-level police officer and fire fighter positions (Unpublished doctoral
dissertation). Illinois Institute of Technology, Chicago, IL.
*LaGanke, J. S. (2008). Validation of an individual assessment process. (Unpublished doctorial
dissertation). Wayne State University, Detroit, MI.
*Lees-Hotton, C. Kinney, T. B., & Kung, M. (2010, April). Using executive assessment in
predicting success on the job. Paper presented at the 25th annual conference of the
Society for Industrial and Organizational Psychology, Atlanta, GA.
Lievens, F. (2001). Assessor training strategies and their effects on accuracy, interrater reliability
and discriminant validity. Journal of Applied Psychology, 86, 255-264.
McDaniel, M. A., Rothstein, H. R., & Whetzel, D. L. (2006). Publication bias: A case study of
four test vendors. Personnel Psychology, 59, 927-953.
McDaniel, M. A., Whetzel, D. L., Schmidt, F. L. & Mauer, S. D. (1994). The validity of
employment interviews: A comprehensive review and meta-analysis. Journal of Applied
Psychology, 79, 599-616.
McPhail, S. M., & Jeanneret, P. R. (2011). Individual psychological assessment (pp. 411-442). In
N. Schmitt (Ed.), Oxford Handbook of Personnel Assessment and Selection. NY: Oxford
University Press.
Individual Assessment Meta-Analysis 52
Meehl, P. E. (1954). Clinical versus statistical prediction: a theoretical analysis and a review of
the evidence. Minneapolis, MN: University of Minnesota Press.
*Meyer, H. H. (1956). An evaluation of a supervisory selection program. Personnel Psychology,
9, 499-513.
*Miner, J. B. (1970). Psychological evaluations as predictors of consulting success. Personnel
Psychology, 23, 393-405.
Mumford, M. D. & Stokes, G. S. (1992). Developmental determinants of individual action:
Theory and practice in applying background measures (pp. 61-138). In M. Dunnette and
L. Hough (Eds.) Handbook of Industrial and Organizational Psychology, Palo Alto, CA:
Consulting Psychologists Press.
O’Brien, J, & Rothstein, M. G., (2011). Leniency: Hidden Threat to Large-Scale, Interview-
Based Selection Systems. Military Psychology, 23, 601-615.
*Phelan, J. G. (1962). Projective techniques in the selection of management personnel. Journal
of Projective Techniques, 26, 102-104.
Prein, E. P., Schippman, J. S., & Prien, K. O. (2003). Individual assessment as practiced in
industry and consulting. Mahwah, NJ: Lawrence Erlbaum Associates.
Pulakos, E. D., Schimitt, N., Whitney, D., & Smith, M. (1996). Individual differences in
interviewer ratings: The impact of standardization, consensus discussion, and sampling
error on the validity of a structured interview. Personnel Psychology, 49, 85-102.
Pynes, J., & Bernardin, H. J. (1992). Mechanical vs consensus-derived assessment center ratings:
A comparison of job performance validities. Public Personnel Management, 21, 17-28.
Reilly, R. R. and Chao, G. T. (1982). Validity and fairness of some alternative employee
selection procedures, Personnel Psychology, 35, 1-62.
Individual Assessment Meta-Analysis 53
*Russell, C. J. (1990). Selecting top corporate leaders: An example of biographical information.
Journal of Management, 16, 73-86.
*Russell, C. J. (2001). A longitudinal study of top-level executive performance. Journal of
Applied Psychology, 86, 560-573.
Ryan, A. M. & Sackett, P. R. (1987). A survey of individual assessment practices by I/O
psychologists. Personnel Psychology, 40, 455-488.
Ryan, A. M. & Sackett, P. R. (1989). Exploratory study of individual assessment practices:
Interrater reliability and judgments of assessor effectiveness. Journal of Applied
Psychology, 74, 568-579.
Ryan, A. M. & Sackett, P. R. (1992). Relationships between graduate training, professional
affiliation, and individual psychological assessments for personnel decisions. Personnel
Psychology, 45, 363-387.
Ryan, A. M. & Sackett, P. R. (1998). Individual Assessment: The research base. In R. Jeanneret
& R. Silzer (Eds.), Individual Psychological Assessment: Predicting behavior in
organizational settings (pp. 3-26). San Francisco: Jossey-Bass.
Sackett, P. R., & Wilson, M. A. (1982). Factors affecting the consensus judgment process in
managerial assessment centers. Journal of Applied Psychology, 67(1), 10-17.
Schmidt, F. & Hunter, J. E. (1998). The validity and utility of selection methods in personnel
psychology: Practical and theoretical implications of 85 years of research findings.
Psychological Bulletin, 124, 262-274.
Schmidt,, F. L., & Zimmerman, R. D. (2004). A Counterintuitive Hypothesis About Employment
Interview Validity and Some Supporting Evidence. Journal of Applied Psychology, 89,
553-561.
Individual Assessment Meta-Analysis 54
Schmitt, N. (1976). Social and situational determinants of interview decisions: Implications for
the employment interview. Personnel Psychology, 29, 79-101.
Silzer, R.F. (1984). Clinical and statistical prediction in a management assessment center.
(Unpublished doctoral dissertation). University of Minnesota, Minneapolis, MN.
Silzer, R. J., & Jeanneret, R. (2011). Individual psychological assessment: A practice and science
in search of common ground. Industrial and Organizational Psychology: Perspectives on
Science and Practice, 4, 270-296.
*Stern, G.G., Stein, M.I., & Bloom, B.S. (1956). Methods in personality assessment. Glencoe,
NY: Free Press.
Sterne, J. A., Becker, B. J., & Egger, M. (2005). The funnel plot. In H. R. Rothstein, A. J. Sutton
& M. Borenstein (Eds.), Publication bias in meta-analysis: Prevention, assessment, and
adjustments (pp. 75-98). West Sussex, UK: Wiley.
Stokes G. S., & Cooper, L. A. (2004). Biodata. In J. C. Thomas & M. Hersen (Eds.),
Comprehensive Handbook of Psychological Assessment, Vol. 4, Industrial and
Organizational Assessment (pp. 243-268). Hoboken, NJ: John Wiley & Sons.
Thornton, G. C., Hollenbeck, G. P., & Johnson, S. K. (2010). Selecting leaders: Executives and
high potentials. In Farr, J. L. & Tippins, N. T. (Eds.), Handbook of Employee Selection
(pp. 823-840). NY: Routledge.
Trankell, A. (1959). The psychologist as an instrument of prediction. Journal of Applied
Psychology, 43, 170-175.
Tsai, W-C., Chen, C-C., & Chiu, S-F. (2005). Exploring boundaries of the effects of applicant
impression management tactics in job interviews. Journal of Management, 31, 108-125.
Individual Assessment Meta-Analysis 55
Ulrich, L. & Trumbo, D. (1965). The selection interview since 1949. Psychological Bulletin, 63,
100-116.
Van Iddekinge, C. H., Sager, C. E., Burnfield, J. L., & Heffner, T. S. (2006). The variability of
criterion-related validity estimates among interviewers and interview panels.
International Journal of Selection and Assessment, 14, 193-205.
Viechtbauer W (2005). Bias and efficiency of meta-analytic variance estimators in the random-
effects model. Journal of Educational and Behavioral Statistics, 30, 261-293.
Viechtbauer, W. (2010). Conducting meta-analysis in R with the metafor package. Journal of
Statistical Software, 36 (3), 1-48.
Viechtbauer, W., & Cheung, M. W-L. (2010). Outlier and influence diagnostics for meta-
analysis. Research Synthesis Methods, 1, 112-125.
Weiner, I. B. (2003). The assessment process. In J. R. Graham, & J. A. Naglieri (Eds.) Handbook
of psychology, Vol. 10: Assessment Psychology (pp. 3-25). Hoboken, NJ: John Wiley &
Sons Inc.
Wiesner, W. H. & Cronshaw, S. F. (1988). A meta-analytic investigation of the impact of
interview format and degree of structure on the validity of the employment interview.
Journal of Occupational Psychology, 61, 275-290.
Individual Assessment Meta-Analysis 56
Table 1
Meta-Analysis results for individual assessments predicting subjective ratings and administrative
decisions as criteria
Criterion
N
K
N-
weighted
Random
Effect(SE)
Corrected
SDr
SDρ
I2
Q w/in
T&F
ΔK
T&F
r
Subjective
Criteria
3922
37
.24
.27 (.03)
.30
.16
.12
62%
91.3**
9a
.21
Administrative
Decisions
600
9
.19
.17 (.06)
--
.17
.11
45%
14.5
--b
--
Notes. N = Combined sample size; K = number of samples; N-weighted = average sample-size
weighted validity; Random Effect = Average validity under from the random effects meta-
analysis; Corrected = average validity corrected for criterion unreliability; SDr = standard
deviation of the observed validities; SDρ = standard deviation of the validity corrected for
sampling error; I2 = percent of variance beyond sampling error; Q w/in = χ2 test for homogeneity
of observed validities of studies; T&F ΔK = number of effect sized imputed by trim and fill
analysis; T&F r = trim and fill estimate of average correlation. aImputed on the left side of the
distribution. bTrim and fill analysis was not conducted fo administrative decisions due to the
small number of studies. * p < .05. **p<.01.
Individual Assessment Meta-Analysis 57
Table 2
Correlations among moderator variables
Moderator
1
2
3
4
5
6
7
8
9
10
11
1. Battery Included Cognitive
Ability
1.00
2. Battery Included Personality
.64**
1.00
3. Battery Included Biodata
-.13
-.20
1.00
4. Assessment Included
Interview
.13
.20
.28
1.00
5. Standardized Assessment
Battery
-.12
-.09
.25
-.23
1.00
6. Multiple Assessors Per
Candidate
-.10
-.21
-.02
.28
.37*
1.00
7. Same Assessor Across
Candidates
-.11
-.18
.17
-.44*
.06
-.05
1.00
8. Managerial Occupation
.23
.13
.17
.10
.50**
.29
.09
1.00
8. Assessor was Source of
Recommendation
.46**
.18
.32
.31
.28
.19
.27
.35*
1.00
9. Performance Rating
Collected for Research
.38*
.08
-.06
-.10
-.26
-.44*
.10
-.20
.00
1.00
11. Journal Publication
-.41*
-.52**
-.04
-.35*
-.48**
-.35*
.45**
-.32
-.31
.13
1.00
Notes: K=26 37. *p<.05. **p<.01.
Individual Assessment Meta-Analysis 58
Table 3
Moderators of validity for subjective performance ratings.
Mean r
Moderator
N
K
N-
weighted
Random
Effect (SE)
Corrected
SDr
SDρ
I2
Q mod
Q w/in
T&F
ΔK
T&F r
Battery Included
Cognitive Ability
4.3*
Yes
3537
31
.35
0.29 (.03)
0.32
0.15
0.10
57%
69.4**
13
0.21
No
385
6
.16
0.13 (.09)
0.14
0.23
0.19
71%
18.8**
--
--
Battery Included
Personality
0.1
Yes
2981
25
.23
0.25 (.02)
0.28
0.12
0.07
39%
38.3*
10
0.20
No
941
12
.29
0.28 (.08)
0.31
0.26
0.23
81%
50.3**
0
--
Battery Included
Biodata
0.3
Yes
1122
8
.19
0.26 (.07)
0.28
0.19
0.16
79%
29.2**
--
--
No
2800
29
.26
0.28 (.03)
0.30
0.16
0.10
52%
57.2**
7
.23
Assessment Included
Interview
2.0
Yes
3618
29
.23
0.25 (.02)
0.27
0.12
0.07
42%
48.9**
9
.20
No
304
8
.37
0.38 (.12)
0.42
0.34
0.29
76%
36.6**
--
--
Standardization of
Assessment Battery
0.0
Same Procedure
3134
26
.25
0.30 (.03)
0.33
0.17
0.13
69%
74.7
10
0.21
Diff. Procedure
495
6
.28
0.28 (.04)
0.31
0.10
0.00
0%
2.3
--
--
Individual Assessment Meta-Analysis 59
Single vs. Multiple
Assessors
0.9
Single Assessor
1570
15
.20
0.24 (.04)
0.27
0.14
0.09
46%
26.1*
6
0.18
Multiple
Assessors
1998
16
.26
0.28 (.03)
0.31
0.11
0.06
35%
24.5
5
0.23
Same vs. Different
Assessors Across
Candidates
4.4*
Same Assessor
421
9
.38
0.40 (.07)
0.44
0.21
0.14
47%
14.4
--
--
Different
Assessors
2948
23
.24
0.24 (.03)
0.27
0.15
0.12
67%
61.7**
0
--
Occupation
5.6*
Managers
2346
22
.28
0.32 (.03)
0.35
0.15
0.11
58%
48.1**
8
0.24
Non-Managers
1576
15
.18
0.19 (.04)
0.21
0.15
0.10
53%
30.8**
1
0.19
Source of
Recommendation
1.3
Assessor
2910
23
.23
0.25 (.03)
0.28
0.12
0.08
47%
43.6**
7
0.20
Secondary Source
750
8
.19
0.18 (.06)
0.20
0.17
0.13
59%
15.7*
--
--
Purpose of PA
0.3
Research
2317
18
.22
0.27 (.04)
0.29
0.16
0.12
68%
50.3**
1
.26
Administrative
1436
14
.24
0.23 (.04)
0.26
0.15
0.11
56%
28.7**
0
--
Publication Type
1.1
Journal Articles
1383
20
.29
0.30 (.05)
0.33
0.21
0.17
68%
55.7**
0
--
Other Sources
2539
17
.21
0.23 (.03)
0.26
0.11
0.07
44%
28.7*
6
.19
Notes. N = Combined sample size; K = number of samples; N-weighted = average sample-size weighted validity; Random Effect = Average
validity under from the random effects meta-analysis;Corrected = average validity corrected for criterion unreliability; SDr = standard deviation of
Individual Assessment Meta-Analysis 60
the observed validities; SDρ = standard deviation of the validity corrected for sampling error; I2 = percent of variance beyond sampling error; Q
mod = χ2 moderator test; Q w/in = χ2 test for homogeneity of observed validities; T&F ΔK = number of effect sized imputed by trim and fill
analysis; T&F r = trim and fill estimate of average correlation. All trim and fill estimates were imputed on the left side of the distribution; Trim
and fill analyses were not conducted for subgroups with fewer than 10 studies; * p < .05. **p<.01.
Individual Assessment Meta-Analysis 61
Figure 1
Funnel Plot for Validities Predicting Subjective Criteria
Observed Outcome
Standard Error
0.314 0.235 0.157 0.078 0.000
-0.50 0.00 0.50 1.00
... We also employ a new meta-analytic estimator, namely, the Morris estimator, to deal with the effects of large between-study variance on the meta-analytic mean and variance estimates. The Morris estimator was introduced by Morris et al. (2015) and carefully evaluated relative to other estimators by Brannick et al. (2019). ...
... If, say, there is a moderator with a large effect, such as a .20 correlation point difference between moderator categories, the fact that a single very large study happens to be drawn from one moderator category rather than the other would result in a biased estimate of the overall mean. Therefore, Morris et al. (2015) introduced the Morris estimator, which blends the S-H and Hedges approaches and uses both the sample-weighted mean and the random-effect variance for effect size estimation. Brannick et al.'s (2019) simulation study showed that the Morris estimator displayed smaller differences between parameter and estimate and higher coverage of the 95% confidence interval for the mean than the S-H estimator when the between-study variance was large. ...
Article
Full-text available
Given the centrality of the job performance construct to organizational researchers, it is critical to understand the reliability of the most common way it is operationalized in the literature. To this end, we conducted an updated meta-analysis on the interrater reliability of supervisory ratings of job performance (k = 132 independent samples) using a new meta-analytic procedure (i.e., the Morris estimator), which includes both within- and between-study variance in the calculation of study weights. An important benefit of this approach is that it prevents large-sample studies from dominating the results. In this investigation, we also examined different factors that may affect interrater reliability, including job complexity, managerial level, rating purpose, performance measure, and rater perspective. We found a higher interrater reliability estimate (r = .65) compared to previous meta-analyses on the topic, and our results converged with an important, but often neglected, finding from a previous meta-analysis by Conway and Huffcutt (1997), such that interrater reliability varies meaningfully by job type (r = .57 for managerial positions vs. r = .68 for nonmanagerial positions). Given this finding, we advise against the use of an overall grand mean of interrater reliability. Instead, we recommend using job-specific or local reliabilities for making corrections for attenuation.
... However, it is also essential to consider the psychological impact of perceived inadequacy, including lowered self-esteem and decreased motivation (Boduszek & Debowska, 2020), which can further hinder performance. Individual inadequacy can thus be understood as a lack of competence -that is, that the individual's competence is inadequate in the face of a certain situation or task (Morris et al., 2015). Previous studies of extreme contexts, such as US navy seal training, have also resorted to individual's dealing with breakdowns as learning to "embrace the suck" (Fraher et al., 2017). ...
Article
Full-text available
This article investigates how participants in simulations of extreme events handle inadequacy, contributing to the discussion on workplace learning in high-pressure and unpredictable scenarios. The study is based on ethnographic fieldwork conducted across five simulations in three organizations (military, police, and county administrative board), involving 288 h of observations, ethnographic interviews, and 18 semi-structured interviews. The analysis focused on identifying episodes where participants encountered inadequacy, exploring how they recognized, attributed, and addressed it. Our findings reveal that inadequacy disrupts routine practices but also fosters opportunities for learning and innovation. Key conditions for effectively handling inadequacy include the voicing of inadequacy, which requires psychologically safe environments, and proactive responses such as improvisation or acceptance under urgency. Additionally, simulations, while controlled and artificial, effectively expose inadequacies, revealing gaps in preparedness that can inform future crisis responses. This article contributes to professional learning by highlighting inadequacy as a critical factor in both individual and collective learning, offering insights into how simulations can be designed to enhance preparedness for unpredictable, high-stakes events.
... This would be odd behavior because advice would typically be given conversationally and some explanation for the reasoning behind the decision would be shared. As another example, it is common practice for consultants who conduct individual assessments to explain their human advice in an assessment report (Morris et al., 2015;Ryan & Sackett, 1987). Also, clinical and neuropsychologists communicate assessment results using narrative reports (Butcher et al., 2000). ...
Article
Full-text available
Human advisors typically explain their reasoning, which is absent when advice is given by an algorithm in the form of a mere number. We hypothesized that decision maker perceptions (e.g., trust), use of algorithmic advice, and hence judgment consistency and accuracy would improve if an algorithm ‘explains itself’. We recruited 1,202 English-speaking adults via Prolific who predicted the performance of a draw of 40 job candidates based on their assessment information and algorithmic advice. We used a 2 (narrative advice: yes/no) × 2 (narrative algorithm information: yes/no) × 2 (algorithmic advice as default: yes/no) between-subjects design. The first factor varied whether participants received mere numeric algorithmic advice or numeric advice plus a short case-by-case narrative explanation based on the specific candidate information. The second factor varied whether, before the task, the algorithm’s design and predictor weight choice were introduced in a narrative manner by a human character, using first-person language or in a descriptive manner. The third factor varied whether participants’ predictions defaulted to the algorithmic advice or an irrelevant value. Most effects were detectable but small in magnitude. The results showed that participants used narrative advice somewhat more than mere numeric advice, but only when their prediction did not default to the advice. Furthermore, participants had more trust, stronger feelings of human interaction, higher judgment consistency, and higher intentions to use the algorithm for future decisions when they received case-by-case narrative advice. People seem to feel more comfortable with algorithmic advice when receiving an explanation for each decision.
... Regarding the methods used to combine the assessment and selection data, Kuncel et al., (2013) observed that the mechanical combination of assessment data, such as the use of algorithms and formulae, showed a stronger validity when predicting different work and academic-related criteria compared to the holistic or clinical method of combining data, such as using expert judgment and intuition. Morris et al., (2015) conducted a meta-analysis on studies that used multiple-assessment methods and found that the validity of assessors' recommendations was high when predicting job performance, but the effect sizes were higher for managerial compared to non-managerial jobs and for the assessments that included a cognitive ability test. ...
... While the predictive validity of emotional intelligence for professional performance is low (e.g., O'Connor and Little 2003), the predictive power of general cognitive performance has proven to be good in many studies (e.g., Kotsou et al. 2019;Morris et al. 2015;Salgado et al. 2003). Nonetheless, HR managers' interest in the construct of emotional intelligence remains high (e.g., Devonish 2016). ...
Article
Full-text available
The call for evidence-based decisions in HR has become a heated debate in recent years. An alleged research-practice gap has been identified by a number of HRM scholars, leading to recommendations for practice. To what extent the assumption of this gap is justified, theoretically or empirically, remains vague, however. Thus, building on a systematic literature search and the formulation of eligibility criteria for articles, we conducted a scoping review of the current research landscape. Our aim was to explore the constituent components, causes and consequences of the gap. Overall, it was found that research activity has so far been heterogeneous, a significant number of articles were conceptually driven, and a large proportion related to knowledge deficits of HR practitioners. A subset of consistent survey-based studies indicated little awareness of empirically supported practices in personnel selection. The qualitative, mixed-method, and content-analysis studies revealed other influences, such as research with limited practical relevance or divergent interests between scholars and practitioners (e.g., employee motivation). Based on the conceptual contributions, three thematic clusters were identified as causes for the gap: (1) communication barriers (e.g., insufficient interfaces), (2) methodological issues (e.g., rigor-relevance tensions), (3) accessibility, visibility, and dissemination of HR research (e.g., oversimplification of practical implications). There was a strong emphasis on presumed causes and their resolution, with less consideration given to the expected consequences of the gap (e.g., poorer organizational outcomes). Despite preliminary empirical indications for the existence of a research-practice gap in particular areas of HRM, many articles tend to focus on overarching recommendations for practice. We conclude that the HRM research-practice gap in itself has not yet been sufficiently empirically investigated. In view of this, we discuss implications and develop an agenda for future research.
... Conversely, mechanical judgment (statistical prediction) utilizes standardized formulas, algorithms, or scoring systems, often based on empirical data, ensuring consistency, and reducing potential bias (Meehl, 1954). Clinical judgment has shown low validity (Grove et al., 2000) in personnel selection (Morris et al., 2015) compared to mechanical judgment (Kuncel et al., 2013). Highhouse and Brooks (2023) review how the prevailing dichotomy between mechanical and clinical data combination methods continues to shape decision-making in employee selection. ...
... A meta-analysis of 24 studies [28] that targeted these workers and included measures as simple as offering fruit at reduced prices in vending machines, providing access to professional nutritional advice, or encouraging physical exercise by giving away sports passes or organizing sports competitions among company members achieved an average weight reduction of 1.2 kg and a decrease in BMI of 0.47 kg/m 2 after one year of implementation. At first glance, this decrease may not seem very significant, but considering that most of the population aged 18-49 years gains, on average, between 0.5 and 1 kg per year, it cancels out the average weight gain resulting from the increasing age of the workforce. ...
Article
Full-text available
The work environment is a factor that can significantly influence the composition and functionality of the gut microbiota of workers, in many cases leading to gut dysbiosis that will result in serious health problems. The aim of this paper was to provide a compilation of the different studies that have examined the influence of jobs with unconventional work schedules and environments on the gut microbiota of workers performing such work. As a possible solution, probiotic supplements, via modulation of the gut microbiota, can moderate the effects of sleep disturbance on the immune system, as well as restore the dysbiosis produced. Rotating shift work has been found to be associated with an increase in the risk of various metabolic diseases, such as obesity, metabolic syndrome, and type 2 diabetes. Sleep disturbance or lack of sleep due to night work is also associated with metabolic diseases. In addition, sleep disturbance induces a stress response, both physiologically and psychologically, and disrupts the healthy functioning of the gut microbiota, thus triggering an inflammatory state. Other workers, including military, healthcare, or metallurgy workers, as well as livestock farmers or long-travel seamen, work in environments and schedules that can significantly affect their gut microbiota.
... As mentioned above, many empirical studies and meta-analyses convincingly showed that following structured decision rules results in better prediction than combining information "in the head" (Meehl, 1954;Kuncel et al., 2013;Grove et al., 2000;Karelaia & Hogarth, 2008;AEgisdóttir et al., 2006;Morris et al., 2015). More specifically: Dawes et al. (1989) cited almost 100 comparative studies and found that the statistical method performed better than the holistic method. ...
Chapter
Full-text available
When it comes to decision-making based on psychological and educational assessments, there is compelling evidence that statistical judgment is superior to holistic judgment. Yet, implementing this finding in practice has proven to be difficult for both academic and professional psychologists. Knowledge transfer from research findings to practitioners and other stakeholders in psychological assessment is a necessary condition to close this gap. To obtain insight into how academic specialists in psychological testing disseminate knowledge about research findings in this area, we investigated how textbooks on testing and guidelines on test use report on, or do not to report on, decision-making in psychological and educational assessment. Second, we discuss some commonly encountered misunderstandings, and third we argue for a broader and more in-depth dissemination of research findings on this topic in textbooks and test standards; to this end we provide some suggestions.
Article
Full-text available
As the COVID‐19 pandemic wanes, many organizations are asking employees to return to the office concerned that more extensive remote work could hurt employee morale and productivity. Employees, however, prefer to work remotely because of the flexibility it provides. In light of such competing perspectives, we conducted a meta‐analysis examining remote work intensity's (RWI) effects on employee outcomes. RWI refers to the extensiveness of remote work ranging from one or two days a week to full‐time remote work. We propose a dual pathway model linking RWI to employee outcomes arguing that it has indirect but opposing effects on the same outcomes via two mediators—perceived autonomy and isolation. Findings from a meta‐analysis of RWI's effects based on 108 studies (k = 110, N = 45,288) support the dual pathway model. Allaying organizational concerns about remote work, RWI had overall small but beneficial effects on multiple consequential employee outcomes including job satisfaction, organizational commitment, perceived organizational support, supervisor‐rated performance, and turnover intentions. We also conducted a meta‐analysis of the effects of remote work use (RWU), a binary construct taking on two values—remote workers (users) versus office‐based workers (non‐users of remote work). Findings from the RWU meta‐analysis based on 62 studies (k = 63, N = 41,904) suggest that remote workers generally have better outcomes than their office‐based colleagues. Altogether, findings suggest that remote work offers modest upsides with limited downsides—even for those who spend more time working away from the office.
Article
Full-text available
Unit human capital resources (HCR) are vital to performance across organizational levels. Crucially, the benefits of unit HCR often hinge on resource access and effective resource management. Yet, how units manage HCR remains unclear. We first review findings from human resource management (HRM) and unit leadership literatures relating to unit HCR, which have evolved separately despite their shared goals. Using our review as a foundation, we offer an integrative model highlighting the ways unit leaders can leverage HRM practices and their leadership behaviors for the greatest impact on unit HCR. In so doing, we identify a potentially potent nexus for scholars of both disciplines to focus their integrative efforts on—unit leaders—given their responsibility for HRM practice delivery (e.g., implementing a job rotation program) and their own leadership behaviors (e.g., composing teams). We conclude by highlighting future research questions, opportunities for theoretical integration, and expanding empirical examination.
Article
Full-text available
This study compares the effects of data-driven assessor training with schema-driven assessor training and control training. The sample consisted of 229 industrial and organizational psychology students and 161 managers who were randomly assigned to 1 of these training strategies. Participants observed and rated candidates in an assessment center exercise. The data-driven and schema-driver assessor training approaches outperformed the control training on all 3 dependent variables. The schema-driven assessor training resulted in the largest values of interrater reliability, dimension differentiation, and accuracy. Managers provided significantly more accurate ratings than students but distinguished less between the dimensions. Practical implications regarding the design of assessor trainings and the composition of assessor teams are proposed.
Article
Full-text available
This article summarizes the practical and theoretical implications of 85 years of research in personnel selection. On the basis of meta-analytic findings, this article presents the validity of 19 selection procedures for predicting job performance and training performance and the validity of paired combinations of general mental ability (GMA) and the 18 other selection procedures. Overall, the 3 combinations with the highest multivariate validity and utility for job performance were GMA plus a work sample test (mean validity of .63), GMA plus an integrity test (mean validity of .65), and GMA plus a structured interview (mean validity of .63). A further advantage of the latter 2 combinations is that they can be used for both entry level selection and selection of experienced employees. The practical utility implications of these summary findings are substantial. The implications of these research findings for the development of theories of job performance are discussed.
Book
Full-text available
The intent of this book is to review the research on selection interviews from an integrative perspective. The book is organized around a conception of the interview as a multistage process. The process begins as the interviewer forms initial impressions of the applicant from previewing paper credentials and from initial encounters with the applicant. The actual face-to-face interview follows, consisting of verbal, nonverbal, and paralinguistic exchanges between interviewer and applicant. The process concludes with the interviewer forming final impressions and judgments of the applicant's qualifications and rendering a decision (e.g., hire, reject, gather more information). The book follows from this general sequence of events, with each chapter focusing on a stage of the interview. In exploring the phases of the interview, the text draws freely from basic research on social cognition, decision making, information processing, and social interaction. Chapter 1: An overview of selection interview research and practice Chapter 2: Cognitive processes of the interviewer Chapter 3: First encounters: Impression formation in the preinterview phase Chapter 4: Social interaction in the interview Chapter 5: Final impressions: judgments and decisions in the post interview phase Chapter 6: Alternative models of the interview process Chapter 7: Evaluating the selection interview Chapter 8: Legal issues in selection interviews Chapter 9: Strategies for improving selection interviews Chapter 10: Other functions of the interview Chapter 11: Concluding comments References Author Index Index
Article
A comparison of the validities of mechanically-derived and consensus-derived assessment center ratings found no significant differences between the two approaches for the prediction of on-the-job performance. No significant differences were found in the predictive validities on any of the dimensions. Adverse impact percentages were almost identical between the two approaches.
Article
Individual Assessment is a professional practice important to Human Resource Managers, Executives and anyone making decisions about employees. Finally, we now have a clear, practical guide with methodologically-grounded descriptions of how to successfully do it. The authors have put together a unique new book with the following key features: case studies and applied examples showing "how to" conduct individual assessment; the book provides the reader with a conceptual structure and the research and literature supporting the process; and it can be used as a text or supplemental text in courses on Personnel Selection, Assessment, Human Resources and Testing. This book will take Individual Assessment to an entirely new level of understanding and practice, and into a new era of professional research and activity. © 2003 by Lawrence Erlbaum Associates, Inc. All rights reserved.