ArticlePDF Available

The Validity of Interpersonal Skills Assessment Via Situational Judgment Tests for Predicting Academic Success and Job Performance

Authors:

Abstract and Figures

This study provides conceptual and empirical arguments why an assessment of applicants' procedural knowledge about interpersonal behavior via a video-based situational judgment test might be valid for academic and postacademic success criteria. Four cohorts of medical students (N = 723) were followed from admission to employment. Procedural knowledge about interpersonal behavior at the time of admission was valid for both internship performance (7 years later) and job performance (9 years later) and showed incremental validity over cognitive factors. Mediation analyses supported the conceptual link between procedural knowledge about interpersonal behavior, translating that knowledge into actual interpersonal behavior in internships, and showing that behavior on the job. Implications for theory and practice are discussed.
Content may be subject to copyright.
RESEARCH REPORT
The Validity of Interpersonal Skills Assessment Via Situational Judgment
Tests for Predicting Academic Success and Job Performance
Filip Lievens
Ghent University
Paul R. Sackett
University of Minnesota, Twin Cities Campus
This study provides conceptual and empirical arguments why an assessment of applicants’ procedural
knowledge about interpersonal behavior via a video-based situational judgment test might be valid for
academic and postacademic success criteria. Four cohorts of medical students (N723) were followed
from admission to employment. Procedural knowledge about interpersonal behavior at the time of
admission was valid for both internship performance (7 years later) and job performance (9 years later)
and showed incremental validity over cognitive factors. Mediation analyses supported the conceptual link
between procedural knowledge about interpersonal behavior, translating that knowledge into actual
interpersonal behavior in internships, and showing that behavior on the job. Implications for theory and
practice are discussed.
Keywords: interpersonal skills, situational judgment test, high-stakes testing, student selection, medical
selection
Terms such as “personal characteristics,” “soft skills,” “noncog-
nitive skills,” and “21st-century skills” are often used to refer to a
wide array of attributes (e.g., resilience, honesty, teamwork skills)
viewed as valuable in many settings, including work and higher
education. There exists a long interest in measuring these noncog-
nitive predictors for use in selection or admission as they enable to
go beyond cognitive ability, predict various success criteria, and
reduce adverse impact (Kuncel & Hezlett, 2007; Schmitt et al.,
2009; Sedlacek, 2004). We focus in this article on one character-
istic of broad interest, namely, interpersonal skills. This refers to
skills related to social sensitivity, relationship building, working
with others, listening, and communication (Huffcutt, Conway,
Roth, & Stone, 2001; Klein, DeRouin, & Salas, 2006; Roth,
Bobko, McFarland, & Buster, 2008). Over the years, a variety of
approaches for measuring interpersonal skills have been proposed
and examined. In workplace settings, methods that involve direct
observation of interpersonal behavior are widely used (e.g., inter-
views, work samples, and assessment center exercises; Arthur,
Day, McNelly, & Edens, 2003; Ferris et al., 2007; Huffcutt et al.,
2001; Roth et al., 2008). In higher education admissions settings,
such direct approaches are uncommon, as it is difficult to reliably
and formally apply them in the large-scale and high-stakes nature
of student admission contexts (Schmitt et al., 2009).
Situational judgment tests (SJTs) are a step removed from direct
observation, and are better viewed as measures of procedural
knowledge in a specific domain (e.g., interpersonal skills). SJTs
confront applicants with written or video-based scenarios and ask
them to indicate how they would react by choosing an alternative
from a list of responses (Christian, Edwards, & Bradley, 2010;
McDaniel, Hartman, Whetzel, & Grubb, 2007).
Widely used in employment contexts, there is also growing
interest in the potential use of SJTs in student admission/selection
settings (Lievens, Buyse, & Sackett, 2005a; Oswald, Schmitt,
Kim, Ramsay, & Gillespie, 2004; Schmitt et al., 2009). The initial
evidence for the use of SJTs in such high-stakes settings is prom-
ising. In particular, Lievens et al. (2005a) found that a video-based
SJT predicted grade point average (GPA) in interpersonal skills
courses. Although the results of these studies are promising, they
do not address the key issue as to whether students’ interpersonal
skills scores on a video-based SJT at the time of admission really
achieve long-term prediction of performance in actual interper-
sonal situations. Such interpersonal interactions can be observed in
some academic settings (e.g., in internships) and subsequently on
the job after completion of a program of study.
Therefore, we use a predictive validation design to examine
whether an assessment of interpersonal skills via the use of SJTs at
the time of admission predicts academic (internship performance)
and postacademic (job performance) criteria. Our study is situated
in the context of selection to medical school in Belgium. Medical
admissions are a relevant setting for examining the value of
interpersonal skills assessments for predicting academic and posta-
cademic success because in many countries, calls have been made
to include a wider range of skills in medical admission (Barr, 2010;
Bore, Munro, & Powis, 2009; Powis, 2010).
Our interest in predicting outcomes other than academic perfor-
mance, particularly outcomes after students have completed their
This article was published Online First October 3, 2011.
Filip Lievens, Department of Personnel Management and Work and
Organizational Psychology, Ghent University, Belgium; Paul R. Sackett,
Department of Psychology, University of Minnesota, Twin Cities Campus.
We thank Tine Buyse for her help in collecting the data.
Correspondence concering this article should be addressed to Filip
Lievens, Department of Personnel Management and Work and Organiza-
tional Psychology, Ghent University, Henri Dunantlaan 2, 9000 Ghent,
Belgium. E-mail: filip.lievens@ugent.be
Journal of Applied Psychology © 2011 American Psychological Association
2012, Vol. 97, No. 2, 460– 468 0021-9010/12/$12.00 DOI: 10.1037/a0025741
460
studies and entered their chosen profession, fits in a growing trend
in graduate and professional education (Kuncel & Hezlett, 2007;
Kuncel, Hezlett, & Ones, 2004; Schultz & Zedeck, 2008). This
trend has also generated some discussion. Specifically, two differ-
ing perspectives on the appropriate role of tests used in the admis-
sion process can be distinguished. One perspective posits that
predicting academic outcomes (e.g., GPA) is an insufficient basis
for the justification of the use of admission tests, as academic
outcomes are not of interest in and of themselves. From this
perspective, links to postacademic (employment) performance are
essential for justifying test use. A competing perspective is that the
goal of admission testing is to identify students who will develop
the highest level of knowledge and skill in the curriculum of
interest, as such knowledge and skill is the foundation for effective
practice in a profession. From this perspective, research linking
admissions tests to postacademic outcomes is not critical to the
justification of test use. This debate on the appropriate role of
admissions tests reflects value judgments, and thus cannot be
resolved by research. Nonetheless, regardless of whether links to
postacademic outcomes are or are not viewed as critical to justi-
fying test use, we believe that an investigation of these linkages
helps to better understand what tests can and cannot accomplish.
By linking a test given as part of an admission process with job
performance subsequent to degree completion, this study also
spans the education and employment worlds.
Interpersonal Skills and SJTs
Recent taxonomies of interpersonal skills (Carpenter & Wise-
carver, 2004; Klein et al., 2006) make a distinction between two
metadimensions, namely, building and maintaining relationships
(e.g., helping and supporting others) and communication/
exchanging information (e.g., informing and gathering informa-
tion). In the medical field, the “building and maintaining relation-
ships” and “communication/exchanging information” have also
been proposed as key dimensions for effective physician–patient
interactions (Accreditation Council for Graduate Medical Education,
2010; Makoul, 2001). For example, the clinical skills examination
checklist used by the National Board of Medical Examiners includes
items such as listening carefully to patients or encouraging patients to
express thoughts and concerns (indicators of building and maintaining
relationships) and using words that patients can understand or using
clear and precise speech (indicators of communication/exchanging
information).
In this study, students’ interpersonal skills in the physician–
patient interaction were assessed via a video-based SJT. A higher
fidelity video-based format is particularly relevant in this context
as meta-analytic evidence has revealed a large validity difference
between video-based and written SJTs (.47 vs. .27) for measuring
interpersonal skills (Christian et al., 2010). The SJT contained
video-based situations related to the dimensions of “building and
maintaining relationships” and “communication/exchanging infor-
mation,” which is consistent with the emphasis placed on those
dimensions by interpersonal skills taxonomies and medical exam-
ination boards. For example, the SJT included situations dealing
with showing consideration and interest, listening to patients,
conveying bad news, reacting to patients’ refusal to take the
prescribed medicine, or using appropriate language for explaining
technical terms. No medical knowledge was necessary to complete
the SJT.
The assumption underlying the use of an SJT in a medical
admission context was that students’ scores on such interperson-
ally oriented SJT situations at the time of admission will predict
their performance in future actual interactions with patients, as
observed/rated during internships and many years later on the job.
So, even though students at the time of admission might not have
any experience as physicians with patient situations, we expect
their answers on the video-based SJT situations to be predictive of
their future internship and job performance. Conceptually, this
expectation is based on the theory of knowledge determinants
underlying SJT performance (Motowidlo & Beier, 2010; Motow-
idlo, Hooper, & Jackson, 2006). According to Motowidlo et al., an
SJT is a measure of procedural knowledge, which can be broken
down in job-specific procedural knowledge and general/nonjob-
specific procedural knowledge. Whereas the former type of knowl-
edge is based on job-specific experience, the latter accrues from
experience in general situations. As SJTs used in admission exams
do not rely on job-specific knowledge, only general procedural
knowledge is relevant. Motowidlo defined this general procedural
knowledge as the knowledge somebody has acquired about effec-
tive and ineffective courses of trait-related behavior in situations
like those described in the SJT. Applied to this study’s interper-
sonally oriented SJT, this general procedural knowledge relates to
students’ procedural knowledge about (in)effective behavior in
interpersonal situations (with patients) as depicted in the SJT
items. This procedural knowledge about costs and benefits of
engaging in interpersonally oriented behavior is considered to be a
precursor of actual behavior in future interpersonal situations
(Lievens & Patterson, 2011; Motowidlo & Beier, 2010; Motowidlo
et al., 2006). So, the assumption is that students with substantial
procedural knowledge about effective behavior in interpersonal
situations as assessed by an SJT will also show superior interper-
sonal performance when they later interact with actual patients as
compared with students with inadequate procedural knowledge
about effective interpersonal behavior.
Stability of Interpersonal Skills
When considering the use of measures of procedural knowledge
about interpersonal behavior to predict both academic and posta-
cademic employment criteria that span multiple years, it is impor-
tant to consider the stability of the interpersonal skills construct.
Hypothesizing such relationships requires the assumption of some
degree of stability in students’ interpersonal knowledge and skills
as they progress through the curriculum and onto the job. In the
vast literature on dynamic criteria (Alvares & Hulin, 1972; Barrett,
Phillips, & Alexander, 1981; Campbell & Knapp, 2001; Deadrick
& Madigan, 1990; Ghiselli, 1956; Schmidt, Hunter, Outerbridge,
& Goff, 1988), this issue as to whether individuals change over
time is captured by the “changing person” model. In the ability
domain, the changing person explanation has now been largely
rejected as postdictive validities appear to follow the same patterns
of changes as predictive validities (Humphreys & Taber, 1973;
Lunneborg & Lunneborg, 1970). Similar arguments of stability
have been made for personality traits, as meta-analytic evidence
(Fraley & Roberts, 2005; Roberts & DelVecchio, 2000) suggests
that rank-order stability is high (Caspi, Roberts, & Shiner, 2005).
461
INTERPERSONAL SKILLS AND SITUATIONAL JUDGMENT TESTS
At first glance, one might expect a different pattern of findings
in the interpersonal skills domain, as training programs and
courses aim to change these skills, and are successful at doing so.
Arthur, Bennett, Edens, and Bell’s (2003) meta-analysis of training
program effectiveness reports mean ds for interpersonal skills of
0.68 for learning criteria and 0.54 for behavioral criteria. However,
in considering the implications of this for the changing person
model, it is important to consider the implications of different
types of change. Here we consider four possible ways for inter-
personal skills to be changed by intervention. First, an intervention
might improve the skills of all individuals by a comparable
amount, in which case the validity of a predictor of interpersonal
skills would be unaffected. Second, an intervention might improve
the skills of those with severe deficits, but have little impact on
those with good skills. In this case, it is possible that rank order is
unchanged; all that is seen is a tightening of the distribution, and
the validity of an interpersonal skills predictor is also affected only
to a limited degree. Third, the intervention might train all individ-
uals to a common level of interpersonal skill, in which variance
would be reduced to zero, and therefore validity of a predictor
would also go to zero. Fourth, the intervention might be differen-
tially effective, resulting in substantial change in the rank ordering
of individuals in terms of their interpersonal skills, and thus in
substantial reduction in validity. For example, Ceci and Papierno
(2005) report that it is not uncommon for those with higher
preintervention scores to benefit more from the intervention than
those with lower scores.
Thus, the first two possible forms of “changing abilities” pose
no threat to validity, whereas the last two forms do pose a threat.
However, we note that if either of these latter two forms were the
true state of affairs, one would observe very low pretest–posttest
correlations between measures of interpersonal skills. In contrast,
a high pretest–posttest correlation would be strong evidence
against these latter two forms. We find such evidence in a meta-
analysis by Taylor, Russ-Eft, and Chan (2005) of behavioral
modeling training programs aimed at interpersonal skills. They
reported a mean pretest–posttest correlation of .84 across 21 stud-
ies for the effects of training on job behaviors, which is inconsis-
tent with either the “training eliminates variance” or the “training
radically alters rank order” perspectives. Thus, we expect that the
forms of a “changing persons” argument that would lead to re-
duced validity can also be rejected in the interpersonal skills
domain.
Hypotheses
On the basis of the above discussion, we posit the following
hypotheses:
Hypothesis 1a (H1a): Procedural knowledge about interper-
sonal behavior (assessed by an SJT at the time of admission)
will be a valid predictor of internship performance.
Hypothesis 1b (H1b): Procedural knowledge about interper-
sonal behavior (assessed by an SJT at the time of admission)
will be a valid predictor of job performance.
Using the behavioral consistency logic (Schmitt & Ostroff,
1986), these first hypotheses might be tested together in a mediated
model. One might consider performance on an interpersonally
oriented SJT, internship performance with patients, and job per-
formance with actual patients as assessments of interpersonal skills
that differ in their degree of fidelity. Hereby fidelity refers to the
extent to which the assessment task and context mirror those
actually present on the job (Callinan & Robertson, 2000; Gold-
stein, Zedeck, & Schneider, 1993). As noted above, an interper-
sonal SJT serves as an interpersonal skills assessment with a low
degree of fidelity because it assesses people’s procedural knowl-
edge about interpersonal behavior instead of their actual behavior.
Next, internship performance can be considered a high-fidelity
assessment (Zhao & Liden, 2011) because during internships,
individuals have the opportunity to interact with real patients under
close supervision. However, they work only with a small number
of patients and have no responsibility for those patients. Finally,
physicians’ demonstration of interpersonal skills with actual pa-
tients on the job is no longer a simulation as it constitutes the
criterion to be predicted. All of this suggests a mediated model
wherein procedural knowledge about interpersonal behavior (low-
fidelity assessment) predicts internship performance (high-fidelity
assessment), which in turn predicts job performance (criterion). So
far, no studies have scrutinized these causal links between low-
fidelity/high-fidelity samples and the criterion domain. We posit
the following:
Hypothesis 2 (H2): The relationship between procedural
knowledge about interpersonal behavior (low-fidelity assess-
ment) and job performance (criterion) will be mediated by
internship performance (high-fidelity assessment).
It should be noted that internship and job performance are
multidimensional because in this study, performance on both in-
volves a combination of interpersonal as well as technical knowl-
edge/skills. So, internship and job performance are saturated with
both interpersonal and technical/medical skills. Saturation refers to
how a given construct (e.g., interpersonal skills) influences a
complex multidimensional measure (e.g., ratings of job perfor-
mance; Lubinski & Dawis, 1992; Roth et al., 2008). The multidi-
mensional nature of internship and job performance makes it
useful to examine the incremental contribution of procedural
knowledge about interpersonal behavior over and above other
factors (i.e., cognitive factors) influencing performance. In partic-
ular, we expect the SJT to offer incremental validity over cognitive
factors in predicting internship and job performance. Thus,
Hypothesis 3a (H3a): Procedural knowledge about interper-
sonal behavior (assessed by an SJT) will have incremental
validity over cognitive factors for predicting internship per-
formance.
Hypothesis 3b (H3b): Procedural knowledge about interper-
sonal behavior (assessed by an SJT) will have incremental
validity over cognitive factors for predicting job performance.
Method
Sample and Procedure
This study’s sample consisted of 723 students (39% men and
61% women; average age 18 years and 4 months; 99.5%
462 LIEVENS AND SACKETT
Caucasian) who completed the Medical Studies Admission Exam
in Belgium, passed the exam, and undertook medical studies. This
sample includes four entering cohorts of students who had com-
pleted the exam between 1999 and 2002. We focused on these four
cohorts because criterion data for the full curriculum (7 years)
were available from them. On average, the passing rate of the
admission exam was about 30%. Candidates who passed the exam
received a certificate that warranted entry in any medical univer-
sity. Thus, there was no further selection on the part of the
universities.
This sample came from a population of 5,444 students (63%
women, 37% men; average age 18 years and 10 months; 99.5%
Caucasian) who completed the examination between 1999 and
2002. Data on this total applicant pool will be used for range
restriction corrections to estimate validity in the applicant pool
(see below).
Predictor Measures
Students’ scores on the predictor measures were gathered during
the actual admission exam. Each year, the exam lasted for a whole
day and was centrally administered in a large hall. To preserve the
integrity of the tests, alternate forms per test were developed each
year. Therefore, candidates’ test scores were standardized within
each exam.
Cognitive composite. In their meta-analysis, Kuncel, Hezlett,
and Ones (2001) showed that a composite of general measures
(e.g., Graduate Record Exam [GRE] verbal and numerical) com-
bined with specific GRE subject-matter tests provided the highest
validity in predicting academic performance. To provide the stron-
gest test of the incremental validity of interpersonal skills, a
cognitive composite was used that consisted of four science
knowledge test scores (biology, chemistry, mathematics, and psy-
chics) and a general mental ability test score. The science-related
subjects consisted of 40 questions (10 questions per subject) with
four alternatives. The general mental ability test consisted of 50
items with five possible answers. The items were formulated in
verbal, numeric, or figural terms. In light of test security, we
cannot mention the source of this test. Prior research demonstrated
the satisfactory reliability and validity of this cognitive composite
for a medical student population (Lievens et al., 2005a).
Medical text. In this test, candidate medical students were
asked to read and understand a text with a medical subject matter.
Hence, this test can be considered a miniaturized sample of tasks
that students will encounter in their education. The texts developed
typically drew on texts from popular medical journals. Students
had 50 min to read the text and answer 30 multiple-choice ques-
tions (each with four possible answers). Across the exams, the
average internal consistency coefficient of this test equaled .71.
Video-based SJT. The general aim of the video-based SJT
was to measure interpersonal skills. As noted above, the video-
based SJT focused on the “building and maintaining relationships”
and “communication/exchanging information” components of in-
terpersonal skills. First, realistic critical incidents regarding these
domains were collected from experienced physicians and profes-
sors in general medicine. Second, vignettes that nested these
incidents were written. Two professors teaching physicians’ con-
sulting practices tested these vignettes for realism. Using a similar
approach, questions and response options were derived. Third,
semiprofessional actors were hired and videotaped. Finally, a
panel of experienced physicians and professors in general medi-
cine developed a scoring key. Agreement among the experts was
satisfactory, and discrepancies were resolved upon discussion. The
scoring key indicated what response alternative was correct for
each item (1 point). In its final version, the SJT consisted of
videotaped vignettes of interpersonal situations that physicians are
likely to encounter with patients. After each critical incident, the
scene froze, and candidates received 25 s to answer the question
(“What is the most effective response?”). In total, the SJT con-
sisted of 30 multiple-choice questions with four possible answers.
Prior research attests to the construct-related validity of the SJTs
developed as they consistently correlated with scores on interper-
sonally oriented courses in the curriculum (Lievens et al., 2005a).
Prior studies also revealed that alternate form reliability of the
SJTs was .66 (Lievens, Buyse, & Sackett, 2005b), which was
consistent with values obtained in studies on alternate form SJTs
(Clause, Mullins, Nee, Pulakos, & Schmitt, 1998).
Operational composite. To make actual admission deci-
sions, a weighted sum of the aforementioned predictors (cogni-
tively oriented tests, medical text, and SJT) was computed. The
weights and cutoff scores were determined by law, with the most
weight given to the cognitive composite. A minimal cutoff was
determined on this operational composite.
Criterion Measures
Internship performance rating. Two of the authors in-
spected the descriptions of the medical curricula of the universi-
ties. To qualify as an “internship,” students had to work in a
temporary position with an emphasis on on-the-job training and
contact with patients (instead of with simulated patients). Interrater
reliability (ICC 2,1) among the authors was .90. In four of the
seven academic years (first, fourth, sixth, and seventh year), in-
ternships were identified. Whereas in the first year the internship
had a focus on observation, the other internships were clinical
clerkships lasting 2– 4 months in different units. In their intern-
ships, students were evaluated on their technical (e.g., examination
skills) and interpersonal skills (e.g., contact with patients) using
detailed score sheets. Internship ratings of 606 students were
obtained from archival records of the universities. Only the global
internship ratings were available. Those ratings ranged from 0 to
20, with higher scores indicating better ratings. Given differences
across universities, students’ internship ratings were standardized
within university and academic year. As the four internship ratings
were significantly correlated, a composite internship performance
rating was computed.
Job performance rating. Some of the medical students of
this study (about 20%, n103) who ended their 7 years of
education chose a career in general medicine and entered a General
Practitioner training program of up to 2 years duration. During that
program, they worked under the supervision of a registered Gen-
eral Practitioner in a number of general practice placements. At the
end of the training program, all trainees were rated on a scale from
0 to 20 by their supervisor. None of the supervisors had access to
the trainees’ admissions scores. Similar evaluation sheets (with a
focus on technical and interpersonal skills) were used for intern-
ship performance ratings. Only the global job performance ratings
463
INTERPERSONAL SKILLS AND SITUATIONAL JUDGMENT TESTS
were available from the archives. As expected, internship and job
performance were moderately correlated (corrected r.40).
As the above description refers to participants as “trainees,” a
question arises as to whether this should be viewed as a measure
of “training performance” rather than of “job performance.” We
view this as “job performance” in that these graduates are engaged
in full-time practice of medicine. They are responsible for patients
while they work under supervision of a General Practitioner
charged with monitoring and evaluating their work.
GPA. Although no hypotheses were developed for students’
GPA, this criterion (average GPA across 7 years) was included by
way of a comparison with prior studies. Given differences across
universities, students’ GPA was standardized within university and
academic year prior to making a composite.
Range Restriction
In this study, students were selected on an operational composite
of the various predictors, leading to indirect range restriction on
each individual predictor. Given that indirect range restriction is a
special case of multivariate range restriction, the multivariate
range restriction formulas of Ree, Carretta, Earles, and Albert
(1994) were applied to the uncorrected correlation matrix. As
recommended by Sackett and Yang (2000), statistical significance
was determined prior to correcting the correlations.
Results
Table 1 presents the means, standard deviations, and correla-
tions among the predictors. In Table 2, the correlations between
the predictors and the criteria are shown. The values below the
diagonal of Table 2 represent the corrected correlations between
the predictors and performance. The values above the diagonal are
the uncorrected correlations. Consistent with prior research, the
corrected correlation between the cognitive composite and overall
academic performance (GPA after 7 years) equaled .36. This was
significantly higher than the corrected correlation (.15) between
the SJT and GPA, t(720) 4.31, p.001.
Our first hypotheses dealt with the validity of procedural knowl-
edge about interpersonal behavior for predicting internship (H1a)
and job performance (H1b). Table 2 shows that the corrected
validities of the interpersonal SJT for predicting overall internship
performance and supervisory-rated job performance were .22 and
.21. These results support H1a and H1b. Although the corrected
correlations of the SJT with both criteria were higher than the
corrected validities of the cognitive composite (.13 and .10, re-
spectively), it should be noted that these differences were not
statistically significant.
H2 posited that a high-fidelity assessment of interpersonal be-
havior would mediate the relationship between a low-fidelity as-
sessment of that behavior and job performance. All requirements
for mediation according to Baron and Kenny (1986) were met (see
Table 3). The effect of the interpersonal SJT on job performance
dropped from .26 to .10 when the mediator (internship perfor-
mance) was controlled for. To statistically test the mediating role
of internship performance, we used the bootstrapping method
(Preacher & Hayes, 2004; Shrout & Bolger, 2002). This was done
on the uncorrected correlation matrix because it was not possible
to run a bootstrapping procedure on a corrected matrix. Bootstrap-
ping procedures showed a point estimate of .16 for the indirect
effect of the interpersonal SJT on job performance through intern-
ship performance (95% CI [.04, .29]). Thus, H2 was supported.
The third hypotheses related to the incremental validity of
procedural knowledge about interpersonal behavior (as measured
via an SJT) over cognitive factors for predicting internship and job
performance. We conducted hierarchical regression analyses, with
the matrices corrected for multivariate range restriction serving as
input. The cognitive composite was entered as a first block because
such tests have been traditionally used in admissions. As a second
block, the medical text was entered. Finally, we entered the SJT.
Table 4 shows the SJT explained incremental variance in intern-
ship (5%) and job performance (5%), supporting H3a and H3b. It
should be noted, though, that these results were obtained with the
other two predictors not being significant predictors of the criteria.
Discussion
This study focused on the assessment of interpersonal skills via
SJTs. Key results were that procedural knowledge about interper-
sonal behavior as measured with an SJT at the time of admission
was valid for both internship (7 years later) and job performance (9
years later). Moreover, students’ procedural knowledge about in-
terpersonal behavior showed incremental validity over cognitive
factors for predicting these academic and postacademic success
criteria, underscoring the role of SJTs as “alternative” predictors in
Table 1
Means, Standard Deviations, and Correlations Among Predictors in the Sample
Variable
Applicant population (N5,439)
Selectees
(n723)
General practitioners
(n103)
MSD123 MSDMSD
1. Cognitive composite 11.68 2.65 14.08 1.67 13.60 1.47
2. Written text 15.17 4.74 .36 16.81 4.47 16.30 4.16
3. SJT 18.35 3.08 .20 .24 19.30 2.84 20.15 2.66
4. Operational composite 20.66 5.29 .91 .45 .28 24.90 3.89 24.63 4.04
Note. Although all analyses were conducted on standardized scores, this table presents the raw scores across exams. The maximum score on each test
was 30, with the exception of the operational composite (maximum score 40). Both the selectees (i.e., medical students) and general practitioners are
subsamples of the applicant sample. Correlations between the predictors in the applicant group are presented. All correlations are significant at p.01.
SJT situational judgment test.
464 LIEVENS AND SACKETT
selection. These findings speak not only to the potential role of
interpersonal skills assessment via SJTs in higher education ad-
missions but also to its relevance in the employment world.
A contribution of this study was that we tested conceptual
arguments why the assessment of interpersonal skills via SJTs is
predictive of future academic and postacademic performance. On
the basis of the theory of knowledge determinants underlying SJT
performance (Motowidlo & Beier, 2010; Motowidlo et al., 2006),
we hypothesized that students’ procedural knowledge about costs
and benefits of engaging in specific interpersonally oriented be-
havior that is assessed at the time of admission via an SJT will be
a precursor of future actual behavior in interpersonal situations as
encountered during internships and on the job. Our mediation
analyses confirmed that the link between a low-fidelity assessment
of criterion behavior (i.e., procedural knowledge about interper-
sonal behavior via an SJT) and criterion ratings (job performance)
was mediated by a high-fidelity (internship) assessment of criterion
behavior. So, as a key theoretical contribution, we found support for
a conceptual link between possessing procedural knowledge about
interpersonal behavior, translating that knowledge into actual inter-
personal behavior in constrained settings such as internships, and
showing that interpersonal behavior later on the job.
This study is also the first to establish evidence of the long-term
predictive power of an interpersonal skills assessment via SJTs.
That is, an operational SJT administered at the time of application
for admission retained its validity many years later as a predictor
Table 2
Correlations Among Predictors and Criteria in Selected Sample
Variable 1 2 3 4 5 6 7
Predictors (N723)
1. Cognitive composite .01 .06 .75 .27 .09 .13
2. Written text .13 .13 .19 .05 .01 .12
3. SJT .03 .15 .11 .10 .21 .21
4. Operational composite .86 .28 .16 .23 .09 .00
Criteria
5. GPA (N713) .36 .10 .13 .34 .60 .38
6. Internship performance (N606) .13 .03 .22 .14 .61 .40
7. Job performance (N103) .10 .12 .21 .01 .37 .40
Note. Correlations were corrected for multivariate range restriction. Uncorrected correlations are above the
diagonal, corrected correlations are below the diagonal. Apart from the last row, correlations higher than .09 are
significant at the .05 level; correlations higher than .12 are significant at the .01 level. For the last row,
correlations higher than .20 are significant at the .05 level; correlations higher than .26 are significant at the .01
level. SJT situational judgment test; GPA grade point average.
Table 3
Regression Results for Mediation
Variable bSEt p
Direct and total effects
Equation 1
Dependent variable Internship performance .32 .08 3.85 .001
ⴱⴱⴱ
Independent variable Interpersonal SJT
Equation 2
Dependent variable Job performance .26 .12 2.19 .03
Independent variable Interpersonal SJT
Equation 3
Dependent variable Job performance
Independent variable Internship performance .52 .14 3.80 .001
ⴱⴱⴱ
Independent variable Interpersonal SJT .10 .12 0.81 .42
Value SE LL 95% CI UL 95% CI zp
Indirect effect and significance using normal distribution
Sobel 0.17 0.06 .04 .29 2.66 .01
MSELL 95% CI UL 95% CI
Bootstrap results for indirect effect
Effect 0.16 0.06 .07 .28
Note. N 103. Results are based on the uncorrected correlation matrix. Values were rounded. Unstandardized regression coefficients are reported.
Bootstrap sample size 1,000. SJT situational judgment test; LL lower limit; UL upper limit; CI confidence interval. Listwise deletion of
missing data.
p.05.
ⴱⴱⴱ
p.01.
465
INTERPERSONAL SKILLS AND SITUATIONAL JUDGMENT TESTS
of academic and postacademic criteria. This finding has theoretical
implications as it informs the discussion of whether it is useful to
select people on skills that will be taught to them later on. As noted
in the introduction, there are a number of possible patterns of
change in interpersonal skills over the course of one’s training.
Attempts to select on the basis of interpersonal skills would be
inappropriate if training eliminated variance (e.g., all were trained
to a common level of skill) or resulted in a change in the rank
ordering of individuals that was unrelated to initial skill level. Both
of these are ruled out by the present findings, namely that predic-
tive relationships still applied for interpersonal criteria that were
gathered many years later (i.e., up to 9 years later after SJT
administration). This is an important finding as most prior SJT
studies were concurrent in nature or relied on predictors gathered
after the first 6 months. Moreover, our results demonstrate that
selecting on interpersonal skills early on is worthwhile. Training
on them later on does not negate the value of selection.
The challenge of broadening both criteria and predictors with
noncognitive factors runs as a common thread through scientific
and societal discussions about selection and admission processes.
At a practical level, our results demonstrate that SJTs might be up
to this challenge as they provide practitioners with an efficient
formal method for measuring procedural knowledge of interper-
sonal skills early on. Accordingly, this study lends evidence for the
inclusion of SJTs in formal school testing systems when decision
makers have made a strategic choice of emphasizing an interper-
sonal skills orientation in their programs. Presently, one commonly
attempts to assess (inter)personal attributes through mechanisms
such as interviews or letters of recommendation or personal state-
ments, whereas the formal system focuses only on academic
achievement in science domains and specific cognitive abilities
(Barr, 2010). Hereby it should be clear that measures such as SJTs
are not designed to replace traditional cognitive predictors. In-
stead, they are meant to increase the coverage of skills not mea-
sured by traditional predictors. In particular, we see two possibil-
ities for implementing SJTs as measures of procedural
interpersonal knowledge in selection practice alongside cognitive
skills assessment. One is to use the SJT at the general level of
broad admissions exams to college. Another option is to imple-
ment the SJT at the level of school-specific additional screening of
students who have met prior hurdles in the admission process.
However, a caveat that qualifies these practical implications is
in order. A contextual feature worthy of note of this study is that
to the best of our knowledge, there was no flourishing commercial
test coaching industry in Belgium focusing on the SJT at the time
of these cohorts (1999 –2002). At that time, coaching mostly
focused on the cognitive part of the exam. In more recent years,
commercial coaching programs related to the interpersonal com-
ponent have also arisen, and it will be useful to examine SJT
validity under this changed field context. So far, only laboratory
studies on SJT coaching (Cullen, Sackett, & Lievens, 2006; Ram-
say et al., 2003) have been reported (with results indicative of
moderate effect sizes).
Several limitations of this study should be mentioned. Like
virtually all studies in the selection literature, this study reflects an
examination of a single testing program in a single setting. We
make no grand claims of generalizability; rather, we believe that it
is useful to illustrate that an assessment of interpersonal skills via
SJTs can predict performance during internships and in a subse-
quent work setting. Another limitation is the small sample size
(N103) for the analysis of validity against job performance
criteria. We also wish Nwere larger, but note that we are studying
the entire population of these medical school graduates moving
into general practice. The rarity of studies following individuals
from school entry to subsequent job performance 9 years after
administration of the predictor measure makes this a useful study
to report, in our opinion, despite this limitation. Additional studies
using this strategy are certainly needed before strong conclusions
can be drawn.
In terms of future research, we need more studies that inte-
grate both education and work criteria as they provide a more
comprehensive and robust view of the validity of admission/
selection procedures. Such research might provide important
evidence to relevant stakeholders (e.g., students, admission
systems, schools, organizations, general public) that the selec-
tion procedures used are valid for predicting both academic and
job performance. In the future, the adverse impact of SJTs in
student admissions should also be scrutinized. Along these
lines, Schmitt et al. (2009) provided evidence that the demo-
graphic composition of students was more diverse when SJTs
and biodata measures were used. However, that study was
conducted in a research context. Therefore, studies in which the
potential adverse impact reduction is examined via the use of
noncognitive measures in actual admission and selection con-
texts are needed.
Table 4
Summary of Hierarchical Regression Analyses of Predictors on Internship and Job Performance
Model Predictors
Internship performance
(N606) Job performance (N103)
R
2
R
2
R
2
R
2
1 Cognitive composite .13
.02
.02 .12 .01 .01
2 Reading text .02 .02 .00 .17 .03 .02
3 SJT .22
ⴱⴱ
.06 .05
ⴱⴱ
.23
.08 .05
Note. The corrected correlation matrix served as input for the regression analysis. Parameter estimates are for
the last step, not entry. Due to rounding, R
2
differs .01 from the cumulative R
2
. SJT situational judgment
test.
p.05.
ⴱⴱ
p.01.
466 LIEVENS AND SACKETT
References
Accreditation Council for Graduate Medical Education. (2010). ACGME
general competencies and outcomes assessment for designated institu-
tional officials. Retrieved from http://www.acgme.org/acWebsite/irc/
irc_competencies.asp
Alvares, K. M., & Hulin, C. L. (1972). Two explanations of temporal
changes in ability-skill relationships: A literature review and theoretical
analysis. Human Factors, 14, 295–308.
Arthur, W., Jr., Bennett, W., Jr., Edens, P. S., & Bell, S. T. (2003).
Effectiveness of training in organizations: A meta-analysis of design and
evaluation features. Journal of Applied Psychology, 88, 234 –245. doi:
10.1037/0021-9010.88.2.234
Arthur, W., Jr., Day, E. A., McNelly, T. L., & Edens, P. S. (2003). A
meta-analysis of the criterion-related validity of assessment center di-
mensions. Personnel Psychology, 56, 125–153. doi:10.1111/j.1744-
6570.2003.tb00146.x
Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable
distinction in social psychological research: Conceptual, strategic, and
statistical considerations. Journal of Personality and Social Psychology,
51, 1173–1182. doi:10.1037/0022-3514.51.6.1173
Barr, D. A. (2010). Science as superstition: Selecting medical students. The
Lancet, 376, 678 679. doi:10.1016/S0140-6736(10)61325-6
Barrett, G. V., Phillips, J. S., & Alexander, R. A. (1981). Concurrent and
predictive validity designs: A critical reanalysis. Journal of Applied
Psychology, 66, 1– 6. doi:10.1037/0021-9010.66.1.1
Bore, M., Munro, D., & Powis, D. A. (2009). A comprehensive model for
the selection of medical students. Medical Teacher, 31, 1066 –1072.
doi:10.3109/01421590903095510
Callinan, M., & Robertson, I. T. (2000). Work sample testing. Interna-
tional Journal of Selection and Assessment, 8, 248 –260. doi:10.1111/
1468-2389.00154
Campbell, J. P., & Knapp, D. J. (Eds.). (2001). Exploring the limits in
personnel selection and classification. Mahwah, NJ: Lawrence Erlbaum
Associates.
Carpenter, T. D., & Wisecarver, M. M. (2004). Identifying and validating
a model of interpersonal performance dimensions (ARI Technical Re-
port No. 1144). Alexandria, VA: United States Army Research Institute
for the Behavioral and Social Sciences.
Caspi, A., Roberts, B. W., & Shiner, R. L. (2005). Personality develop-
ment: Stability and change. Annual Review of Psychology, 56, 453– 484.
doi:10.1146/annurev.psych.55.090902.141913
Ceci, S. J., & Papierno, P. B. (2005). The rhetoric and reality of gap
closing: When the “have-nots” gain but the “haves” gain even more.
American Psychologist, 60, 149 –160. doi:10.1037/0003-066X.60.2.149
Christian, M. S., Edwards, B. D., & Bradley, J. C. (2010). Situational
judgment tests: Construct assessed and a meta-analysis of their criterion-
related validities. Personnel Psychology, 63, 83–117. doi:10.1111/
j.1744-6570.2009.01163.x
Clause, C. S., Mullins, M. E., Nee, M. T., Pulakos, E., & Schmitt, N.
(1998). Parallel test form development: A procedure for alternative
predictors and an example. Personnel Psychology, 51, 193–208. doi:
10.1111/j.1744-6570.1998.tb00722.x
Cullen, M. J., Sackett, P. R., & Lievens, F. (2006). Threats to the opera-
tional use of situational judgment tests in the college admission process.
International Journal of Selection and Assessment, 14, 142–155. doi:
10.1111/j.1468-2389.2006.00340.x
Deadrick, D. L., & Madigan, R. M. (1990). Dynamic criteria revisited: A
longitudinal study of performance stability and predictive validity. Per-
sonnel Psychology, 43, 717–744. doi:10.1111/j.1744-6570.1990
.tb00680.x
Ferris, G. R., Treadway, D. C., Perrewe´ , P. L., Brouer, R. L., Douglas, C.,
& Lux, S. (2007). Political skill in organizations. Journal of Manage-
ment, 33, 290 –320. doi:10.1177/0149206307300813
Fraley, R., & Roberts, B. W. (2005). Patterns of continuity: A dynamic
model for conceptualizing the stability of individual differences in
psychological constructs across the life course. Psychological Review,
112, 60 –74. doi:10.1037/0033-295X.112.1.60
Ghiselli, E. E. (1956). Dimensional problems of criteria. Journal of Applied
Psychology, 40, 1– 4. doi:10.1037/h0040429
Goldstein, I. L., Zedeck, S., & Schneider, B. (1993). An exploration of the
job analysis-content validity process. In N. Schmitt & W. C. Borman
(Eds.), Personnel selection in organizations (pp. 2–34). San Francisco,
CA: Jossey Bass.
Huffcutt, A. I., Conway, J. M., Roth, P. L., & Stone, N. J. (2001).
Identification and meta-analytic assessment of psychological constructs
measured in employment interviews. Journal of Applied Psychology, 86,
897–913. doi:10.1037/0021-9010.86.5.897
Humphreys, L. G., & Taber, T. (1973). Postdiction study of the Graduate
Record Examination and eight semesters of college grades. Journal of
Educational Measurement, 10, 179 –184. doi:10.1111/j.1745-
3984.1973.tb00795.x
Klein, C., DeRouin, R. E., & Salas, E. (2006). Uncovering workplace
interpersonal skills: A review, framework, and research agenda. In G. P.
Hodgkinson & J. K. Ford (Eds.), International review of industrial and
organizational psychology (Vol. 21, pp. 79 –126). New York, NY:
Wiley & Sons. doi:10.1002/9780470696378.ch3
Kuncel, N. R., & Hezlett, S. A. (2007, February 23). Standardized tests
predict graduate students’ success. Science, 315, 1080 –1081. doi:
10.1126/science.1136618
Kuncel, N. R., Hezlett, S. A., & Ones, D. S. (2001). A comprehensive
meta-analysis of the predictive validity of the graduate record examina-
tions: Implications for graduate student selection and performance. Psy-
chological Bulletin, 127, 162–181. doi:10.1037/0033-2909.127.1.162
Kuncel, N. R., Hezlett, S. A., & Ones, D. S. (2004). Academic perfor-
mance, career potential, creativity, and job performance: Can one con-
struct predict them all? Journal of Personality and Social Psychology,
86, 148 –161. doi:10.1037/0022-3514.86.1.148
Lievens, F., Buyse, T., & Sackett, P. R. (2005a). The operational validity
of a video-based situational judgment test for medical college admis-
sions: Illustrating the importance of matching predictor and criterion
construct domains. Journal of Applied Psychology, 90, 442– 452. doi:
10.1037/0021-9010.90.3.442
Lievens, F., Buyse, T., & Sackett, P. R. (2005b). Retest effects in opera-
tional selection settings: Development and test of a framework. Person-
nel Psychology, 58, 981–1007. doi:10.1111/j.1744-6570.2005.00713.x
Lievens, F., & Patterson, F. (2011). The validity and incremental validity
of knowledge tests, low-fidelity simulations, and high-fidelity simula-
tions for predicting job performance in advanced-level high-stakes se-
lection. Journal of Applied Psychology, 96, 927–940.
Lubinski, D., & Dawis, R. V. (1992). Aptitudes, skills, and proficiencies.
In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and
organizational psychology (Vol. 3, 2nd ed., pp. 1–59). Palo Alto, CA:
Consulting Psychologists Press.
Lunneborg, C. E., & Lunneborg, P. W. (1970). Relations between aptitude
changes and academic success during college. Journal of Educational
Psychology, 61, 169 –173. doi:10.1037/h0029253
Makoul, G. (2001). Essential elements of communication in medicine
encounters: The Kalamazoo Consensus Statement. Academic Medicine,
76, 390 –393. doi:10.1097/00001888-200104000-00021
McDaniel, M. A., Hartman, N. S., Whetzel, D. L., & Grubb, W. L. (2007).
Situational judgment tests, response instructions, and validity: A meta-
analysis. Personnel Psychology, 60, 63–91. doi:10.1111/j.1744-
6570.2007.00065.x
Motowidlo, S. J., & Beier, M. E. (2010). Differentiating specific job
knowledge from implicit trait policies in procedural knowledge mea-
sured by a situational judgment test. Journal of Applied Psychology, 95,
321–333. doi:10.1037/a0017975
Motowidlo, S. J., Hooper, A. C., & Jackson, H. L. (2006). Implicit policies
467
INTERPERSONAL SKILLS AND SITUATIONAL JUDGMENT TESTS
about relations between personality traits and behavioral effectiveness in
situational judgment items. Journal of Applied Psychology, 91, 749
761. doi:10.1037/0021-9010.91.4.749
Oswald, F. L., Schmitt, N., Kim, B. H., Ramsay, L. J., & Gillespie, M. A.
(2004). Developing a biodata measure and situational judgment inven-
tory as predictors of college student performance. Journal of Applied
Psychology, 89, 187–207. doi:10.1037/0021-9010.89.2.187
Powis, D. (2010). Improving the selection of medical students. British
Medical Journal, 340, 708. doi:10.1136/bmj.c708
Preacher, K. J., & Hayes, A. F. (2004). SPSS and SAS procedures for
estimating indirect effects in simple mediation models. Behavior Re-
search Methods, Instruments, & Computers, 36, 717–731. doi:10.3758/
BF03206553
Ramsay, L. J., Gillespie, M. A., Kim, B. H., Schmitt, N., Oswald, F. L.,
Drzakowski, S. M., & Friede, A. J. (2003, November). Identifying and
preventing score inflation on biodata and situational judgment inventory
items. Invited presentation to the College Board, New York, NY.
Ree, M. J., Carretta, T. R., Earles, J. A., & Albert, W. (1994). Sign changes
when correcting for restriction of range: A note on Pearson’s and
Lawley’s selection formulas. Journal of Applied Psychology, 79, 298
301. doi:10.1037/0021-9010.79.2.298
Roberts, B. W., & DelVecchio, W. F. (2000). The rank-order consistency
of personality traits from childhood to old age: A quantitative review of
longitudinal studies. Psychological Bulletin, 126, 3–25. doi:10.1037/
0033-2909.126.1.3
Roth, P., Bobko, P., McFarland, L., & Buster, M. (2008). Work sample
tests in personnel selection: A meta-analysis of Black–White differences
in overall and exercise scores. Personnel Psychology, 61, 637– 661.
doi:10.1111/j.1744-6570.2008.00125.x
Sackett, P. R., & Yang, H. (2000). Correction for range restriction: An
expanded typology. Journal of Applied Psychology, 85, 112–118. doi:
10.1037/0021-9010.85.1.112
Schmidt, F. L., Hunter, J. E., Outerbridge, A. N., & Goff, S. (1988). Joint
relation of experience and ability with job performance: Test of three
hypotheses. Journal of Applied Psychology, 73, 46 –57. doi:10.1037/
0021-9010.73.1.46
Schmitt, N., Keeney, J., Oswald, F. L., Pleskac, T., Quinn, A., Sinha, R.,
& Zorzie, M. (2009). Prediction of 4-year college student performance
using cognitive and noncognitive predictors and the impact of demo-
graphic status on admitted students. Journal of Applied Psychology, 94,
1479 –1497. doi:10.1037/a0016810
Schmitt, N., & Ostroff, C. (1986). Operationalizing the “behavioral con-
sistency” approach: Selection test development based on a content-
oriented strategy. Personnel Psychology, 39, 91–108. doi:10.1111/
j.1744-6570.1986.tb00576.x
Schultz, M. M., & Zedeck, S. (2008). Identification, development, and
validation of predictors of successful lawyering. Retrieved from http://
www.law.berkeley.edu/files/LSACREPORTfinal-12.pdf
Sedlacek, W. E. (2004). Beyond the big test: Noncognitive assessment in
higher education. San Francisco, CA: Jossey Bass.
Shrout, P. E., & Bolger, N. (2002). Mediation in experimental and non-
experimental studies: New procedures and recommendations. Psycho-
logical Methods, 7, 422– 445. doi:10.1037/1082-989X.7.4.422
Taylor, P. J., Russ-Eft, D. F., & Chan, D. W. L. (2005). A meta-analytic
review of behavior modeling training. Journal of Applied Psychology,
90, 692–709. doi:10.1037/0021-9010.90.4.692
Zhao, H., & Liden, R. C. (2011). Internship: A recruitment and selection
perspective. Journal of Applied Psychology, 96, 221–229. doi:10.1037/
a0021295
Received April 8, 2011
Revision received July 18, 2011
Accepted August 25, 2011
468 LIEVENS AND SACKETT
... The evolution of modern workplaces introduces additional measurement considerations. Lievens and Sackett (2012) note that assessment tools must account for how soft skills manifest in both virtual and traditional work environments. Furthermore, measurement approaches must consider the dynamic nature of contemporary work environments where job roles and required competencies continuously evolve. ...
Article
Full-text available
Recent workplace transformations have heightened the importance of soft skills, yet validated instruments for measuring these competencies remain limited. This study validates a comprehensive instrument measuring contemporary business soft skills using data from 294 participants representing 38 nationalities. Factor analysis revealed a robust 10-factor structure explaining 62.4% of the variance, with reliability coefficients ranging from .775 to .877. Results indicate the integration of traditionally distinct competencies and the emergence of new factorial combinations. The validated instrument provides a reliable tool for assessing soft skills in modern workplace contexts, particularly valuable for virtual and cross-cultural environments. The findings support more precise soft skills communication between employees, employers, educators, and students.
... Employees' interpersonal skills are closely linked to their attitudes and performance at both the individual and team levels, and ultimately impact the productivity of the organization as well as long-term job performance in healthcare settings (40). Given that teamwork and interpersonal communication are integral to organizational success, managers must prioritize the development of these skills to implement evidence-based practices within the organization. ...
Article
Full-text available
Background Identifying the essential skills and qualifications for evidence-based managers in healthcare is crucial for decision-makers who aim to select competent managers and design effective training programs. This study reviews the requirements, capabilities, and skills necessary for evidence-based managers in the healthcare sector to accurately utilize and implement evidence. Method This scoping review was conducted following the PRISMA-SCR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. A thorough literature search was carried out using relevant keywords across the Web of Science, PubMed, and Scopus databases, with no time restrictions. The selection of articles adhered to predefined inclusion criteria. After eliminating duplicates and reviewing titles, abstracts, and full texts, six articles were included in the final analysis. Results The study identified several competencies required for effective evidence-based management in healthcare, including professional and technical knowledge, leadership, personal traits and attitudes, communication skills, information management, self-management, critical thinking, research skills, and the ability to apply evidence effectively. Discussion Given the ongoing transformations in healthcare, including the emergence of new technologies and the generation of extensive data, evidence-based managers must continually enhance their skills to access and evaluate evidence. They should also work on improving both their interpersonal and intrapersonal skills. The authors advocate for further applied research to deepen our understanding of the competencies required for evidence-based managers within the healthcare and treatment contexts.
... This assessment paradigm is based on the idea that the decisions and actions chosen in the situations presented are indicative of the individual's real performance and skills in everyday situations in their work. Therefore, it is considered a valid and reliable way to assess and predict performance in specific contexts, in this case, in education (Lievens and Sackett 2012;McDaniel et al. 2007). In recent years, situational judgment tests have made significant advances in assessment practices for the following reasons: (a) they predict job performance because they measure procedural knowledge about how to behave effectively in various work situations, and (b) this knowledge is not acquired from specific work experience, but rather reflects effects of fundamental socialization processes and personal dispositions and can predict performance in work situations. ...
Article
Full-text available
Emotional Intelligence (EI) in teaching is associated with various educational outcomes and processes. However, it has typically been measured through self-reports and general EI assessments, lacking a specific performance test with greater ecological validity in relation to the demands of the professional educational context. This study describes the development and validation results of the Video-Test of Emotional Intelligence for Teachers (ViTIED), a new performance-based measure to assess the EI of secondary education teachers based on ability EI model and the situational judgment test paradigm. The test comprises 12 video scenes designed to elicit intra- and interpersonal processes, as well as both positive and negative emotions. A total of 163 Spanish teachers (36% male, 64% female; mean age = 40.32 years) completed the ViTIED, along with personality, perceived EI, and burnout assessments. Test scores provide initial evidence of adequate reliability, as well as content, convergent, and divergent validity. Continued validation of this measure will benefit evaluation and intervention processes with teachers, as well as research on the impact of teachers’ EI on the teaching–learning processes and the well-being of the educational community.
... In other words, high scores in biology, chemistry and physics do not necessarily indicate the ability to communicate effectively with patients, demonstrate empathy, or work well in a team. Book-smart students might be cognitively or technically sound but may struggle with the human aspects of patient care (Lievens & Sackett, 2012). For instance, patients often report dissatisfaction not because their doctors lack medical knowledge but because of poor communication, lack of empathy, or inadequate professionalism (Cleland et al., 2023). ...
... In the latter half of the 2000s, video-based SJTs were being examined for their usefulness for admissions decisions in medical schools (e.g., Lievens, Buyse, & Sackett, 2005;Lievens & Sackett, 2006), systematic examinations of methodological issues as well as mean group differences were published (e.g., McDaniel et al., 2007;Whetzel et al., 2008), andRoberts's (2008) influential study on an SJT-based measure for emotional intelligence (EI), as well as other non-cognitive content, appeared. Since then, more studies examining methodological issues have been published (e.g., McDaniel, Psotka, Legree, Yost, & Week, 2011;Motowidlo & Beier, 2010) and the use of SJTs in medical programs, particularly for admissions decisions, began to emerge (e.g., Lievens & Patterson, 2011;Lievens & Sackett, 2012). Most recently, the medical sciences have systematically reviewed the use of SJTs for their educational admission (e.g., Patterson, Knight, Dowell, Nicholson, Cousans, & Cleland, 2016;Patterson, Zibarras, & Ashworth, 2016;Webster et al., 2020) and specialty programs (e.g., surgery residency programs; Gardner, Cavanaugh, Willis, & Dunkin, 2020;Gardner & Dunkin, 2018). ...
Article
Situational judgment tests (SJTs) are popular assessment approaches that present scenarios describing situations that one may experience in a job. Due to its long history and cross-disciplinary nature, today's SJT literature is quite fragmented. In this integrative review, we start by systematically taking stock and synthesizing the SJT literature from the different scientific disciplines via bibliometric techniques on 524 unique documents. We identify six literature clusters (i.e., SJTs in the medical sciences, SJTs in personnel selection, methodological issues and SJTs for specific constructs , SJTs to assess emotional intelligence and related constructs, technological advances in SJTs, SJTs for teacher assessment and development) that correspond to academic disciplines and research streams within them. We also identify current trends in SJT research by examining the clusters formed by a recent subset of the SJT literature. We then build on the bibliometric analysis by categorizing the identified themes in an organizing framework with two fundamental dimensions: the main purpose of a study (i.e., conceptual understanding, prediction, other [e.g., understanding mean group differences, applicant reactions]) and its research focus (i.e., SJTs holistically, content, and design and methods). Finally, on the basis of this framework, we provide recommendations to encourage greater knowledge sharing between scientific disciplines. In addition, we outline an agenda for future research in terms of four broad directions: SJT theory, SJT constructs, SJT design and methods, and SJT application domains.
... Further, Patterson and colleagues [14] reported an SJT demonstrated substantial incremental validity over application form questionnaire responses (ΔR 2 = 0.17) in predicting shortlisted candidates for postgraduate medical training selection. Lievens and Sackett [19] followed medical students for seven to nine years from admission to employment and found that scores on an interpersonal SJT at admission demonstrated incremental validity over academic measures in predicting fieldwork (ΔR 2 = 0.05) and job performance ratings (ΔR 2 = 0.05) (r = 0.22 and 0.21, respectively). Since academic and non-academic skills are not mutually exclusive, researchers have identified an increasing need to investigate the incremental validity of non-academic assessments over conventional academic achievement metrics in medical education [11,20,21]. ...
Article
Full-text available
Background Casper, an online open-response situational judgement test that assesses social intelligence and professionalism [1], is used in admissions to health professions programs. Method This study (1) explored the incremental validity of Casper over grade point average (GPA) for predicting student performance on objective structured clinical examinations (OSCEs) and fieldwork placements within an occupational therapy program, (2) examined optimal weighting of Casper in GPA in admissions decisions using non-linear optimization and regression tree analysis to find the weights associated with the highest average competency scores, and (3) investigated whether Casper could be used to impact the diversity of a cohort selected for admission to the program. Results Multiple regression analysis results indicate that Casper improves the prediction of OSCE and fieldwork performance over and above GPA (change in Adj. R² = 3.2%). Non-linear optimization and regression tree analysis indicate the optimal weights of GPA and Casper for predicting performance across fieldwork placements are 0.16 and 0.84, respectively. Furthermore, the findings suggest that students with a slightly lower GPA (e.g., 3.5–3.6) could be successful in the program as assessed by fieldwork, which is considered to be the strongest indicator of success as an entry-level clinician. In terms of diversity, no statistically significant differences were found between those actually admitted and those who would have been admitted using Casper. Conclusion These results constitute preliminary validity evidence supporting the integration of Casper into applicant selection in an occupational therapy graduate program.
... The identification, assessment, and development of competencies are inherently intertwined with evaluating and cultivating key behaviors. Key behaviors are the empirical indicators and behavioral anchors that signify the presence and proficiency level of relevant competencies required to execute a task or responsibility effectively [33], [34]. For the millennial generation, a critical competency is the ability to innovate, adapt to change, and embrace a growth mindset [1,35] Exemplar key behaviors associated with this competency may encompass proactively seeking novel approaches and continuous improvement opportunities, rapidly acquiring and applying emerging technologies or processes, displaying resilience and maintaining productivity amidst ambiguity or disruption, and generating creative, out-of-the-box solutions to complex problems [36,37]. ...
Article
Full-text available
This study embarks on a pioneering examination of the key leadership behaviors required by Millennial Leaders in Indonesia's State-Owned Enterprises (SOEs) amidst the ongoing transition towards sustainable business practices. Utilizing a meticulously executed two-round Delphi Technique, which involved interviews with 16 distinguished leaders from Indonesia's largest national companies, the research thoroughly identifies 20 essential leadership behaviors. These findings present a vital guide for millennial leaders to proficiently navigate the challenges present in today's volatile, uncertain, complex, and ambiguous (VUCA) environment, exacerbated by the disruptions stemming from the pandemic. Additionally, this research signifies the dawn of a new era in strategic business expansion, stressing the necessity of adopting sustainable and renewable energy sources. Its empirical validation through rigorous methodologies like the Delphi method provides a nuanced comprehension of the leadership qualities crucial for success within Indonesia's evolving SOE landscape, making significant contributions to dialogues concerning millennial leadership advancement and promoting sustainable innovation.
... However, the SJT is more of a measurement methodology than a single style of test and has wide variability in construction and application [7]. Notably, in addition to its ability to predict performance with respect to domain-based knowledge, SJT also reliably measures nonacademic constructs and interpersonal skills [8]. While the SJT has been used primarily for personnel selection in the last half century, we propose using this measurement methodology to assess clinical judgement among medical students [9]. ...
Article
Full-text available
Introduction Assessing clinical judgement objectively and economically presents a challenge in academic medicine. The authors developed a situational judgement test (SJT) to measure fourth-year medical students’ clinical judgement. Methods A knowledge-based, single-best-answer SJT was developed by a panel of subject matter experts (SMEs). The SJT included 30 scenarios, each with five response options ranked ordinally from most to least appropriate. A computer-based format was used, and the SJT was piloted by two cohorts of fourth-year medical students at California University of Science and Medicine in 2022 and 2023 upon completion of an internship preparation course. Subsequently, students completed an optional survey. Evaluated scoring methods included original ordinal ranking, dichotomous, dichotomous with negative correction, distance from SME best answer, and distance from SME best answer squared. Results The SJT was completed by 142 fourth-year medical students. Cronbach’s alpha ranged from 0.39 to 0.85, depending on the scoring method used. The distance-from-SME-best-answer-squared method yielded the highest internal consistency, which was considered acceptable. Using this scoring method, the mean score was 72.89 (SD = 48.32, range = 26-417), and the standard error of measurement was 18.41. Item analysis found that seven (23%) scenarios were of average difficulty, 13 (43%) had a good or satisfactory discrimination index, and nine (30%) had a distractor efficiency of at least 66%. Most students preferred the SJT to a traditional multiple-choice exam (16; 62%) and felt it was an appropriate tool to assess clinical judgement (15; 58%). Conclusions The authors developed and piloted an SJT to assess clinical judgement among medical students. Although not achieving validation, subsequent development of the SJT will focus on expanding the SME concordance panel, improving difficulty and discrimination indices, and conducting parallel forms reliability and adverse impact analyses.
Article
Full-text available
Work design scholarship has demonstrated that work characteristics are important determinants of a wide range of individual outcomes including well-being, motivation, satisfaction, and performance. Yet this scholarship has also revealed substantial and unaccounted for variance in these effects, prompting calls for theory and research that applies multilevel and contextual perspectives to expand our understanding of work designs. We develop theory that spans occupation, job, and individual levels to connect the influences of both context and personal attributes (e.g., skills) on work design consequences. Central to our multilevel theory is the concept of attribute relevance, which reflects the extent to which different attributes are prioritized within occupational and job contexts in which individuals enact their roles. Results across three studies spanning 3,838 incumbents and 339 unique occupations reveal that attribute relevance systematically moderates the relationships between work designs and individual outcomes and thus demarcates factors that account for variability in the main effects observed in previous work design research. We bring much-needed theory and evidence to open questions about how worker requirements and individual differences are connected to work designs.
Article
Full-text available
Social skills (e.g., assertiveness, empathy, ability to accept criticism) are essential for the medical profession and therefore also for the selection and development of medical students. However, the term “social skills” is understood differently in different contexts. There is no agreed upon taxonomy for classifying physicians’ social skills, and skills with the same meaning often have different names. This conceptual ambiguity presents a hurdle to cross-context communication and to the development of methods to assess social skills. Drawing from behavioral psychology, we aim to contribute to a better understanding of social skills in the medical context. To this end, we introduce a theoretically and empirically informed taxonomy that can be used to integrate the large number of different social skills. We consider how skills manifest at the behavioral level to ensure that we focus only on skills that are actually observable, distinguishable, and measurable. Here, behavioral research has shown that three overarching skill dimensions can be seen in interpersonal situations and are clearly distinguishable from each other: agency skill (i.e., getting ahead in social situations), communion skill (i.e., getting along in social situations), and interpersonal resilience (i.e., staying calm in social situations). We show that almost all social skills relevant for physicians fit into this structure. The approach presented allows redundant descriptions to be combined under three clearly distinguishable and behavior-based dimensions of social skills. This approach has implications for the assessment of social skills in both the selection and development of students.
Article
Full-text available
A common research problem is the estimation of the population correlation between x and y from an observed correlation rxy obtained from a sample that has been restricted because of some sample selection process. Methods of correcting sample correlations for range restriction in a limited set of conditions are well-known. An expanded classification scheme for range-restriction scenarios is developed that conceptualizes range-restriction scenarios from various combinations of the following facets: (a) the variable(s) on which selection occurs (x, y and/or a 3rd variable z), (b) whether unrestricted variances for the relevant variables are known, and (c) whether a 3rd variable, if involved, is measured or unmeasured. On the basis of these facets, the authors describe potential solutions for 11 different range-restriction scenarios and summarize research to date on these techniques.
Article
Full-text available
There has been a growing interest in understanding what constructs are assessed in the employment interview and the properties of those assessments. To address these issues, the authors developed a comprehensive taxonomy of 7 types of constructs that the interview could assess. Analysis of 338 ratings from 47 actual interview studies indicated that basic personality and applied social skills were the most frequently rated constructs in this taxonomy, followed by mental capability and job knowledge and skills. Further analysis suggested that high-and low-structure interviews tend to focus on different constructs. Taking both frequency and validity results into consideration, the findings suggest that at least part of the reason why structured interviews tend to have higher validity is because they focus more on constructs that have a stronger relationship with job performance. Limitations and directions for future research are discussed.
Article
Full-text available
Data from four different jobs (N = 1,474) were used to evaluate three hypotheses of the joint relation of job experience and general mental ability to job performance as measured by (a) work sample measures, (b) job knowledge measures, and (c) supervisory ratings of job performance. The divergence hypothesis predicts an increasing difference and the convergence hypothesis predicts a decreasing difference in the job performance of high- and low-mental-ability employees as employees gain increasing experience on the job. The noninteractive hypothesis, by contrast, predicts that the performance difference will be constant over time. For all three measures of job performance, results supported the noninteractive hypothesis. Also, consistent with the noninteractive hypothesis, correlational analyses showed essentially constant validities for general mental ability (measured earlier) out to 5 years of experience on the job. In addition to their theoretical implications, these findings have an important practical implication: They indicate that the concerns that employment test validities may decrease over time, complicating estimates of selection utility, are probably unwarranted.
Article
Full-text available
In May 1999, 21 leaders and representatives from major medical education and professional organizations attended an invitational conference jointly sponsored by the Bayer Institute for Health Care Communication and the Fetzer INSTITUTE: The participants focused on delineating a coherent set of essential elements in physician-patient communication to: (1) facilitate the development, implementation, and evaluation of communication-oriented curricula in medical education and (2) inform the development of specific standards in this domain. Since the group included architects and representatives of five currently used models of doctor-patient communication, participants agreed that the goals might best be achieved through review and synthesis of the models. Presentations about the five models encompassed their research base, overarching views of the medical encounter, and current applications. All attendees participated in discussion of the models and common elements. Written proceedings generated during the conference were posted on an electronic listserv for review and comment by the entire group. A three-person writing committee synthesized suggestions, resolved questions, and posted a succession of drafts on a listserv. The current document was circulated to the entire group for final approval before it was submitted for publication. The group identified seven essential sets of communication tasks: (1) build the doctor-patient relationship; (2) open the discussion; (3) gather information; (4) understand the patient's perspective; (5) share information; (6) reach agreement on problems and plans; and (7) provide closure. These broadly supported elements provide a useful framework for communication-oriented curricula and standards.
Article
Full-text available
deals with the role of human abilities or response capabilities as determinants of work behavior / some definitions are offered for conceptualizing labels used to classify these attributes (e.g., abilities, achievements, aptitudes, and skills) and indices thereof / particular attention is devoted to cognitive/intellectual abilities and to the optimal utilization of contrasting dimensions generated by factor analytic research / the importance of setting expectations on how much predictive power to expect solely from ability attributes is addressed / also suggest how new assessment procedures may be compared and evaluated against existing techniques in terms of an empirically based form of competitive support / in this vein, the importance of using multiple criteria for assessing performance (including the aggregation of distinct criteria) is recommended suggest that personal qualities not typically considered in conventional treatments of human abilities (e.g., personality dimensions) may be construed as instrumental response capabilities / the causal status of these entities as determinants of proficient work behavior is developed, and suggestions are offered on how these attributes might be incorporated into future research / meta-analytic studies of validity generalization are reviewed / two topics concerning group differences in abilities are discussed in detail / first, a new methodology is offered for predicting group differences in performance; and second, the importance of assessing group differences in variability (ability dispersion) is examined / closes by explicating the importance of achievement as opposed to topographical accounts of behavior for measuring human attributes and for future developments in both applied and theoretical psychology (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Data from four different jobs ( N = 1,474) were used to evaluate three hypotheses of the joint relation of job experience and general mental ability to job performance as measured by (a) work sample measures, (b) job knowledge measures, and (c) supervisory ratings of job performance. The divergence hypothesis predicts an increasing difference and the convergence hypothesis predicts a decreasing difference in the job performance of high- and low-mental-ability employees as employees gain increasing experience on the job. The noninteractive hypothesis, by contrast, predicts that the performance difference will be constant over time. For all three measures of job performance, results supported the noninteractive hypothesis. Also, consistent with the noninteractive hypothesis, correlational analyses showed essentially constant validities for general mental ability (measured earlier) out to 5 years of experience on the job. In addition to their theoretical implications, these findings have an important practical implication: They indicate that the concerns that employment test validities may decrease over time, complicating estimates of selection utility, are probably unwarranted. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
K. Pearson (1903) recognized that the correlation coefficient was subject to distortion when a sample was censored or preselected in some way. He proposed 3 univariate correction formulas for better estimates in these circumstances. These have become well known from the work of R. L. Thorndike (1949). D. N. Lawley (1943) proposed a general solution usually called the multivariate correction for range restriction. Both Pearson's and Lawley's corrections are discussed and examples are presented. Of particular interest are the opportunities for the corrected correlations to change sign as a result of the correction. Numerical examples are presented that show that correlations can change signs in the Pearson-Thorndike Case 3 and in Lawley's general solution. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Two models which explain both temporal changes in behavior during training and temporal decreases in correlations between ability measures and performance measures are presented. It is argued that both phenomena are dependent on the same process and that each of the models presented adequately accounts for the experimental data. The changing task model originally proposed by Woodrow and later elaborated by Fleishman assumes that the abilities which contribute to task performance change systematically over time. A second model, the changing subject model, assumes that practice on the criterion task systematically and significantly affects the ability levels of the subjects. A discussion of the changed conception of the ability-skill distribution necessitated by the second model is presented. The psychological and organizational implications of the two models are discussed, and the well nigh impossibility of an empirical evaluation is pointed out.
Article
Current models of job performance recognize its multidimensional nature but do not provide a comprehensive picture of the interpersonal requirements of jobs. As a first step toward developing a more cogent and comprehensive understanding of interpersonal performance, a taxonomy of the interpersonal requirements of jobs was developed and validated. An extensive literature review of interpersonal performance behaviors was conducted to develop a proposed taxonomy of interpersonal performance. Two studies were then completed to validate the proposed taxonomy. In the first study empirical evidence for the taxonomy was gathered using a content analysis of critical incidents taken from a job analysis. In the second study, confirmatory factor analysis was used to recreate the model based on ratings of the importance of and time spent on each interpersonal performance behavior identified in the model. Raters represented a variety of Army jobs and ranks. Confirmatory factor analyses supported the proposed taxonomy. Results also indicated that the criticality of several dimensions of interpersonal performance increased with increasing enlisted ranks. The importance of the results toward the identification of predictors of interpersonal performance is discussed.
Article
"The evaluation of selective devices merely by simple correlations with single criterion variables is insufficient." Three dimensions of criteria that pose "embarrassing and confusing questions" are discussed: Static dimensionality, or the problem of multiple performance measures that are not determined by the same general factors; dynamic dimensionality, or the problem of changes in criterion performance over time; and the criterion dimensionality of the individual—two individuals may be equally good, but for different reasons and in different ways. (PsycINFO Database Record (c) 2012 APA, all rights reserved)