Article

Ethnic and Gender Subgroup Differences in Assessment Center Ratings: A Meta-Analysis

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Assessment centers are widely believed to have relatively small standardized subgroup differences (d). However, no meta-analytic review to date has examined ds for assessment centers. The authors conducted a meta-analysis of available data and found an overall Black-White d of 0.52, an overall Hispanic-White d of 0.28, and an overall male-female d of -0.19. Consistent with our expectations, results suggest that Black-White ds in assessment center data may be larger than was previously thought. Hispanic-White comparisons were smaller than were Black-White comparisons. Females, on average, scored higher than did males in assessment centers. As such, assessment centers may be associated with more adverse impact against Blacks than is portrayed in the literature, but the predictor may have less adverse impact and be more "diversity friendly" for Hispanics and females.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Although numerous researchers place simulations on a pedestal claiming that they produce only minimal subgroup diff erences (e.g., Cascio & Phillips, 1979), others have put this statement to the test by gathering empirical evidence via metaanalyses. An example is the meta-analysis on subgroup diff erences in AC performance of Dean, Bobko, and Roth (2008). Th eir meta-analysis consisted of 27 studies performed in applicant as well as incumbent samples that yielded 17, 9, and 18 eff ect sizes for black-white, Hispanic-white, and malefemale comparisons, respectively. ...
... Th eir meta-analysis consisted of 27 studies performed in applicant as well as incumbent samples that yielded 17, 9, and 18 eff ect sizes for black-white, Hispanic-white, and malefemale comparisons, respectively. In contrast to the traditionally positive image ascribed to simulations when it comes to adverse impact, Dean et al. (2008) found somewhat larger subgroup diff erences than often assumed. Th e largest mean diff erence was observed for black-white comparisons in favor of white test-takers (d = 0.52). ...
... For minority members other than blacks, the adverse impact potential of ACs was less. Dean et al. (2008) observed a rather small eff ect size (d = 0.28) for Hispanic-white comparisons in favor of whites. For female-male comparisons, a minor gender diff erence favoring women was observed (d = −0.19). ...
Article
Full-text available
Simulations represent more or less exact replicas of tasks, knowledge, skills, and abilities required in actual work behavior. This chapter reviews research on the more traditional high-fidelity simulations (i.e., assessment centers and work samples) and contrasts it with the growing body of research on low-fidelity simulations (i.e., situational judgment tests). Both types of simulations are compared in terms of the following five statements: "The use of simulations enables organizations to make predictions about a broader array of KSAOs," "We don't know what simulations exactly measure," "When organizations use simulations, the adverse impact of their selection system will be reduced," "Simulations are less fakable than personality inventories," and "Applicants like simulations." Generally, research results show that these statements apply to both high-fidelity and low-fidelity simulations. Future research should focus on comparative evaluations of simulations, the effects of structuring simulations, and the cross-cultural transportability of simulations.
... Multiple recent reviews, however, have drawn attention to the variability in subgroup difference estimates observed across AC studies (d = 0.03-0.60; Bobko & Roth, 2013;Ployhart & Holtz, 2008), and a recent meta-analysis found that at least the Black-White assessee subgroup difference appears to be larger than previously believed (d = 0.52; Dean, Roth, & Bobko, 2008). Given these findings, we cannot simply assume the absence of subgroup differences or adverse impact in AC ratings, particularly for employment decision-making purposes. ...
... More recent reviews, however, have cited Black-White assessee subgroup differences in primary studies to be as large as d = 0.60 among exercises and dimensions that are strongly cognitive in nature (Bobko & Roth, 2013;Ployhart & Holtz, 2008). Further, a meta-analysis by Dean et al. (2008) shows a substantial Black-White difference among assessee ratings (d = 0.52), suggesting that White assessees are rated, on average, approximately one-half of a standard deviation (SD) higher than Black assessees. Findings also show a Hispanic-White difference (d = 0.28) favoring White assessees and a gender difference (d = 0.19) favoring females. ...
... Findings also show a Hispanic-White difference (d = 0.28) favoring White assessees and a gender difference (d = 0.19) favoring females. Evidence that subgroup difference estimates can vary considerably across studies (Bobko & Roth, 2013;Ployhart & Holtz, 2008), as well as larger Black-White mean differences than had previously been assumed (Dean et al., 2008), raise considerable concern over the fairness and legal defensibility of ACs, suggesting the need to identify explanatory mechanisms driving these differences. ...
Article
This study investigated leniency and similar-to-me bias as mechanisms underlying demographic subgroup differences among assessees in assessors’ initial dimension ratings from three assessment center (AC) simulation exercises used as part of high-stakes promotional testing. It examined whether even small individual-level effects can accumulate (i.e., “trickle-up”) to produce larger subgroup-level differences. Individual-level analyses were conducted using cross-classified multilevel modeling and conducted separately for each exercise. Results demonstrated weak evidence of leniency towards White assessees and similar-to-me bias among non-White assessee-assessor pairs. Similar leniency was found towards female assessees, but no statistically significant effects were found for assessee or assessor gender or assessee-assessor gender similarity. Using traditional d effect size estimates, weak individual level assessee effects translated into small but consistent subgroup differences favoring White and female assessees. Generally small but less consistent subgroup differences indicated that non-White and male assessors gave higher ratings. Moreover, analyses of overall promotion decisions indicate the absence of adverse impact. Findings from this AC provide some support for the “trickle-up” effect, but the effect on subgroup differences is trivial. The results counter recent reviews of AC studies suggesting larger than previously assumed subgroup differences. Consequently, the findings demonstrate the importance of following established best practices when developing and implementing the AC method for selection purposes to minimize subgroup differences.
... Hispanics. Expectations for Hispanic-White ds for ACs are in the range of .28 to .40 (Dean, Roth, & Bobko, 2008). We updated these estimates by retrieving the studies from Dean et al. and searching the literature for additional studies. ...
... The results in Table 10 reveal a d of .32 (k ϭ 12, N ϭ 42,284) and a d of .38 without a large study (k ϭ 11, N ϭ 5,671). Thus, Whites tend to receive somewhat to moderately higher AC scores than Hispanics (we refer readers interested in Black-White differences to Dean et al., 2008). We Note. ...
... Hispanics H-W ds in the range of .28 to .40 based on a small set of primary studies (Dean, Roth, & Bobko, 2008). ...
Article
Hispanics are both the largest and fastest growing minority group in the U.S. workforce. Asians also make up a substantial and increasing portion of the workforce. Unfortunately, empirical research on how these groups perform selection procedures appears to be lacking. To address this critical gap, we identified and reviewed research from a variety of literatures relevant to Hispanic and/or Asian performance on 12 commonly used staffing procedures. We also contacted authors of studies that included members of these subgroups and requested the relevant data. On the basis of our review, we provide updated estimates of Hispanic-White and Asian-White differences for the predictors, which often differ from existing estimates of these differences. Further, we provide the first known meta-analytic estimates for Hispanics and Asians on many predictors, such as vocational interests and physical ability. We discuss the implications of the findings for staffing research and practice. We also identify critical next steps for future research regarding these 2 important, yet largely neglected, groups. (PsycINFO Database Record
... A number of researchers have considered these theories in concert to understand patterns of evaluative extremity (e.g., Bettencourt et al., 1997; Jackson, Sullivan, & Hodge, 1993; Jussim et al., 1987; Marques, Robalo, & Rocha, 1992), but these were single-lab investigations. Other meta-analysts have examined contextual moderators of the effects of social categories on judgment, but these have typically focused on only one social category (e.g., only gender), only one evaluative dimension, and have not been guided by theories of evaluative extremity (e.g., Dean, Roth, & Bobko, 2008; Eagly, Makhijani, & Klonsky, 1992; Finkelstein, Burke, & Raju, 1995; J. K. Ford, Kraiger, & Schechtman, 1986; Kite, Stockdale, Whitley, & Johnson, 2005; Roth, Huffcutt, & Bobko, 2003; Sackett & DuBois, 1991; Swim, Borgida, Maruyama, & Myers, 1989). In our work, we consider these three theories in the context of the broader extant literature on impression formation, including studies of many social groups and a variety of evaluative dimensions. ...
... We do not compare the effects of group-based and person-based information on judgment (cf. Kunda & Thagard, 1996) or assess the overall direction of group bias (see Dean et al., 2008; Eagly et al., 1992; Finkelstein et al., 1995; J. K. Ford et al., 1986; Swim et al., 1989 ). Instead, we ask when identical person-based information is evaluated differently depending on the target's group membership. Understanding evaluative extremity is an important goal, because it moves the field beyond either/or thinking about sources of impression formation. ...
Article
Full-text available
A meta-analysis that included more than 1,100 effect sizes tested the predictions of three theoretical perspectives that explain evaluative extremity in social judgment: complexity-extremity theory, subjective group dynamics model, and expectancy-violation theory. The work seeks to understand the ways in which group-based information interacts with person-based information to influence extremity in evaluations. Together, these three theories point to the valence of person-based information, group membership of the evaluated targets relative to the evaluator, status of the evaluators' ingroup, norm consistency of the person-based information, and incongruency of person-based information with stereotype-based expectations as moderators. Considerable support, but some limiting conditions, were found for each theoretical perspective. Implications of the results are discussed. © 2015 by the Society for Personality and Social Psychology, Inc.
... Early in the literature it was thought that Black-White group mean score differences for assessment centres was small (Dean, Roth & Bobko, 2008), though the range reported by Bobko and Roth (2013) was quite wide (d = 0.03 to 0.60, favouring Whites). In the only meta-analysis on this topic to date, Dean and colleagues (2008) estimated an overall d of 0.52 favouring Whites (k = 17, N = 8,210). ...
... These three groups accounted for approximately three-quarters of the total US Latino/Hispanic population. Despite the size of Latino/Hispanics groups in the US and the global workforce, as well as their rate of growth, research on score differences has focused primarily on African-Americans to the seeming exclusion of Latinos/Hispanics (Dean et al., 2008;Dovidio, Gluszek, John, Ditlmann & Lagunes, 2010;Reynolds, Willson & Ramsey, 1999;Verney, Granholm, Marshall, Malcarne & Saccuzzo, 2005). Our review of score differences b etween Whites and US Latinos/Hispanics focuses on current meta-analyses and narrative reviews, which are limited. ...
Chapter
This chapter summarizes current research on differences between racial or ethnic groups and national cultural groups on predictors that are frequently used in employee selection. It reviews the research on score differences for African-American, US Hispanic/Latinos and Whites, as well as national culture groups, examines the various explanations for those differences and propose directions for future research aimed at further understanding score differences between groups. Before investigating the research on observed score differences, it is important to highlight the scope of the covered predictors and the difference between constructs and methods in the predictors commonly used in personnel selection. Current research finds that minority cultural groups tend to score lower on cognitive tests than the majority cultural group. As was true for the research on score differences between race and ethnic groups, aspects of the measurement can play a role in differences observed.
... Cognitive loading refers to the extent that SJT performance correlates with performance on a cognitive ability test. 14 Similar to assessment centers and work samples (Bobko, Roth, & Buster, 2005;Dean, Bobko, & Roth, 2008;Roth, Bobko, McFarland, & Buster, 2008), SJTs with a higher cognitive loading display substantially larger ethnic subgroup differences (Roth, Bobko, & Buster, 2013;Whetzel et al., 2008). Second, the personality loading, i.e., the correlation between the SJT and each of the Big Five personality factors, has been identified as a smaller driver of ethnic subgroup differences in SJTs, so that Black-White and Asian-White differences in SJT performance are smaller when the SJT displays a higher correlation with emotional stability and Hispanic-White differences are smaller when the SJT displays a higher correlation with conscientiousness and agreeableness (Whetzel et al., 2008). ...
Chapter
Full-text available
In this article, we give an overview of situational judgment tests (SJTs) as selection instruments. Their history, basic characteristics, and development are presented. The available research evidence regarding their reliability, construct-related validity, criterion-related validity, incremental validity, subgroup differences, and test-taker perceptions is also reviewed. As a general conclusion, the increasing popularity of SJTs in personnel selection seems to be accredited to their potential to capture a variety of constructs and for different purposes. Additionally, SJTs are able to predict several job-related and/or academic criteria while at the same time offering prospects permitting to select for diversity.
... Bij alle selectiemethoden zijn de scores van allochtone kandidaten lager dan die van autochtone kandidaten. De grootste verschillen doen zich voor op de intelligentietests en andere tests die een grote cognitieve component bevatten, de kleinste op persoonlijkheidstests(De Meijer, Born, Terlouw & Van der Molen, 2006;Dean, Bobko & Roth, 2008;Ployhart & Holtz, 2008;Roth, Bobko, McFarland & Buster, 2008). Zodra het selectiegesprek gestructureerder plaatsvindt, worden de scoreverschillen kleiner ...
Article
Full-text available
The present study was designed to map the causal relationships between nonstandard working hours and work-home interference (WHI) and home-work interference (HWI). To this purpose, a longitudinal full-panel design was employed. Using such a design, we examined both the causal effects of non-standard working hours on WHI /HWI and the causal effects of WHI /HWI on non-standard working hours. We also investigated the moderating effect of gender in these relationships. Data were collected in two waves (2002 and 2004) among 337 Dutch employees and self-employed persons who lived together with a partner and had at least one child living in the household. We included evening work and weekend work as types of non-standard working hours. Data were analyzed by means of structural equation modeling. Results showed that, among women with children, evening work was related to elevated levels of WHI and HWI two years later. A comparable relationship for men with children was not found. A possible explanation for this finding is that for women working at non-standard hours appears to cause WHI and HWI, as working at non-standard hours interferes with their responsibilities at home, which they are still more often accountable for than men. Furthermore, WHI turned out to be related to an increase in evening work and weekend work two years later for both men and women. A possible explanation for this finding is that workers try to reduce WHI by means of working at nonstandard hours.
... In terms of ethnic subgroup differences in test performance, metaanalytic research has proven that simulations display lower subgroup differences than cognitive ability tests. For assessment centers, Dean, Bobko, and Roth (2008) demonstrated standardized Black-White subgroup differences of 0.52, with White test-takers systematically obtaining higher scores than Blacks. Roth, Huffcutt, and Bobko (2003) found similar effect sizes for work samples (d=0.52), ...
Article
Full-text available
As globalization increases and labor markets become substantially more diverse, increasing diversity during personnel selection has become a dominant theme in personnel selection in human resource management. However, while trying to pursue this goal, researchers and practitioners find themselves confronted with the diversity-validity dilemma, as some of the most valid selection instruments display considerable ethnic subgroup differences in test performance. The goal of the current paper is twofold. First, we update and review the literature on the diversity-validity dilemma and discuss several strategies that aim to increase diversity without jeopardizing criterion-related validity. Second, we provide researchers and practitioners with evidence-based guidelines for dealing with the dilemma. Additionally, we identify several new avenues for future research.
... Bij alle selectiemethoden zijn de scores van allochtone kandidaten lager dan die van autochtone kandidaten. De grootste verschillen doen zich voor op de intelligentietests en andere tests die een grote cognitieve component bevatten, de kleinste op persoonlijkheidstests (De Meijer, Born, Terlouw & Van der Molen, 2006;Dean, Bobko & Roth, 2008;Ployhart & Holtz, 2008;Roth, Bobko, McFarland & Buster, 2008). Zodra het selectiegesprek gestructureerder plaatsvindt, worden de scoreverschillen kleiner (Huffcutt & Roth, 1998). ...
Article
This paper deals with the issue of suitability of the assessment process for an ethnoculturally heterogeneous pool of candidates and investigates whether this process helps organizations in their wish to create a diverse workforce. To what extent do integrity and moral values play a role in this issue? Moral disengagement forms a potential explanation for the fact that blatant rejection of ethnic minority candidates is a big taboo but that this nevertheless still happens. To which degree do regular methods of selection contribute to accurate procedures for a heterogeneous candidate pool and to diversity at work? In comparison to the sole use of cognitive tests, measuring the full arsenal of relevant capabilities, knowledge and skills reduces score differences between ethnic groups, while retaining selection utility. Finally, self-reported personality does not always provide a good representation of personality characteristics of ethnic-minority candidates. Transparent rating and decision making facilitates correct recruitment and selection towards a diverse work force.
... Fourth, the majority of AC studies on adverse impact attest to the widely held view that ACs are reasonably unbiased regarding race and gender. According to the metaanalysis of Dean, Roth and Bobko (2008), positive results were reported, in particular for females and Hispanics. For Blacks, however, their results suggested that ACs may be associated with more adverse impact than was previously thought in the literature, but still have less adverse impact than the typical cognitive ability test. ...
Article
Full-text available
Assessment centers have always had a strong link with practise. This link is so strong that the theoretical basis of the workings of an assessment center is sometimes questioned. This article posits that trait activation theory might be fruitfully used to explain how job-relevant candidate behavior is elicited and rated in assessment centers. Trait activation theory is a recent theory that focuses on the person-situation interaction to explain behavior based on responses to trait-relevant cues found in situations. These observable responses serve as the basis for behavioral ratings on dimensions used in a variety of assessments such as performance appraisal and interviews, but in also assessment centers. The article starts by explaining the basic tenets behind the assessment center method and trait activation theory. It shows how trait activation theory might have key implications for current and future assessment center research. The article also provides various directions for future assessment center studies.
... bestätigt. Untersuchungen aus den USA und Ländern wie z.B. den Niederlanden (De Meijer et al., 2006) zeigen, dass untersuchte Minoritätsgruppen gegenüber Majoritätsgruppen sowohl in kognitiven Fähigkeitstests (Roth et al., 2001), in Tests zu sprachlichen Fertigkeiten (Hough et al., 2001), in Auswahlinterviews (Roth et al., 2002) als auch in Assessment Centern (Dean et al., 2008) in der Regel schlechter abschneiden. Dieses Muster findet sich auch in dem Auswahlprozess der untersuchten deutschen Behörde. ...
Article
Full-text available
Während die englischsprachige Personalforschung in den letzten Jahren zunehmend differenzierte Ergebnisse zum Adverse Impact in Auswahlverfahren vorlegen konnte, fehlen solche Untersuchungen zu Subgruppenunterschieden in der Personalauswahl deutscher Organisationen. Am Beispiel einer großen deutschen Behörde wird ein mehrstufiger Personalauswahlprozess im Hinblick auf das Abschneiden von Bewerbern mit versus ohne Migrationshintergrund analysiert. Die Ergebnisse zeigen, dass die in einer frühen Auswahlphase zum Einsatz kommenden Testverfahren zur Messung der kognitiven Fähigkeiten sowie der Rechtschreibung erhebliche Subgruppendifferenzen aufweisen. Differenzen zwischen den Bewerbern mit versus ohne Migrationshintergrund sind auch trotz der Vorselektion mit den Tests in dem in einer späteren Auswahlphase durchgeführten Assessment Center nachweisbar. Dabei wird der Erfolg in allen Phasen des Auswahlverfahrens von der Art des Migrationshintergrundes beeinflusst: Während bei ausländischen Staatsbürgern die größte Diskrepanz zu den Bewerbern ohne Migrationshintergrund auftritt, liegen die Ergebnisse von Bewerbern, die von Geburt an Deutsche sind, aber einen ausländischen Vater und/oder eine ausländische Mutter haben, nahezu auf dem Niveau der Bewerber ohne Migrationshintergrund. Diese Befunde werden auf dem Hintergrund der internationalen Forschungslage zum Adverse Impact analysiert. Abschließend werden Maßnahmen zur Reduktion von Subgruppendifferenzen bei der Personalauswahl von Organisationen am Beispiel der betrachteten Behörde diskutiert.
... Outtz (2002) showed that ability tests tend to produce differences in scores between ethnic groups that are on average 3 to 5 times larger than for personality tests or structured interviews. Similarly, Dean, Roth and Bobko (2008) people of one ethnicity to perform better on these tests than those of another. ...
Thesis
Full-text available
The present research represents a coherent approach to understanding the root causes of ethnic group differences in ability test performance. Two studies were conducted, each of which was designed to address a key knowledge gap in the ethnic bias literature. In Study 1, both the LR Method of Differential Item Functioning (DIF) detection and Mixture Latent Variable Modelling were used to investigate the degree to which Differential Test Functioning (DTF) could explain ethnic group test performance differences in a large, previously unpublished dataset. Though mean test score differences were observed between a number of ethnic groups, neither technique was able to identify ethnic DTF. This calls into question the practical application of DTF to understanding these group differences. Study 2 investigated whether a number of non-cognitive factors might explain ethnic group test performance differences on a variety of ability tests. Two factors – test familiarity and trait optimism – were able to explain a large proportion of ethnic group test score differences. Furthermore, test familiarity was found to mediate the relationship between socio-economic factors – particularly participant educational level and familial social status – and test performance, suggesting that test familiarity develops over time through the mechanism of exposure to ability testing in other contexts. These findings represent a substantial contribution to the field’s understanding of two key issues surrounding ethnic test performance differences. The author calls for a new line of research into these performance facilitating and debilitating factors, before recommendations are offered for practitioners to ensure fairer deployment of ability testing in high-stakes selection processes.
... Männer à Frauen (Dean, Roth & Bobko, 2008) .19 Erläuterung: Ein positives d bedeutet, dass die zweite Gruppe einen höheren Mittelwert aufweist. ...
... Research on ethnic SJT score differences in Europe revealed comparable findings, with ethnic minorities obtaining systematically somewhat lower scores than majority test takers (d = 0.38; De Meijer, 2008). Research on ethnic score differences on selection tools has repeatedly shown that the instrument's cognitive loading constitutes one of the most important drivers of ethnic score differences (e.g., Bobko, Roth, & Buster, 2005; Dean, Bobko, & Roth, 2008). In this context, SJTs with a higher cognitive loading have been found to display larger ethnic score differences than SJTs with a lower cognitive loading (Roth, Bobko, & Buster, 2013; Whetzel et al., 2008). ...
... Similarly, McKay and McDaniel (2006) conducted a meta-analysis and found that the overall mean Black-White difference was approximately one-quarter standard deviation for performance though this effect has been less pronounced for research conducted more recently. Dean, Roth, and Bobko (2008) found a similar pattern when examining racial differences in assessment center ratings. However, criterion type and cognitive loading of criteria moderated this relationship in that the difference and cognitive loading had a positive relationship. ...
... Fourth, the majority of AC studies on adverse impact attest to the widely held view that ACs are reasonably unbiased regarding race and gender. According to the metaanalysis of Dean, Roth and Bobko (2008), positive results were reported, in particular for females and Hispanics. For Blacks, however, their results suggested that ACs may be associated with more adverse impact than was previously thought in the literature, but still have less adverse impact than the typical cognitive ability test. ...
... Seit vielen Jahren beschäftigt sich die Forschung mit der Frage, inwieweit Bewerber aufgrund ihrer Zugehörigkeit zu bestimmten Personengruppen stereotyp wahrgenommen und diskriminiert werden. Metanalysen legen nahe, dass insbesondere Afroamerikaner und Lateinamerikaner im Vergleich zu anderen ethnischen Personengruppen in den USA von einer Diskriminierung betroffen sind (Dean, Roth & Bobko, 2008;Foldes, Duehr & Ones, 2008;Huffcutt & Roth, 1998). Stereotype Vorstellungen basieren aus Sicht der Social Identity Theorie (Tajfel, 1978;Tajfel & Turner, 1986) (Fiske & Taylor, 1991;King, Madera, Hebl, Knight & Mendoza, 2006;Tajfel, 1978;Tajfel & Turner, 1986). ...
... Second, a closer look at assessors' initial impressions contributes to research on potential biases in dimension ratings. Although AC ratings are less prone to subgroup differences than other selection procedures (e.g., cognitive ability tests), ethnic and sex differences are not negligible (Bobko & Roth, 2013;Dean, Roth, & Bobko, 2008). Based on dual process theories, an unexplored hypothesis is that initial impressions, quickly made on the basis of limited and salient information, carry biases that affect subsequent dimension ratings. ...
Article
Full-text available
Insight into assessors’ initial impressions has the potential to advance knowledge on how assessors form dimension-based judgments and on possible biases in these ratings. Therefore, this study draws on dual process theory to build and test a model that integrates assessors’ dimension ratings (i.e., systematic, slow, deliberate processing mode) with their initial impressions (i.e., intuitive, fast, automatic processing mode). Data collection started with an AC where assessors provided ratings of assessees, and an online survey of assessees’ supervisors who rated their job performance. In addition, two other rater pools provided initial impressions of these assessees by evaluating extracted 2-min video clips of their AC performance. Initial impressions from both of these samples were positively related to assessors’ dimension ratings, which supports assumptions from dual process theory and might explain why assessors’ dimensional ratings are often undifferentiated. Initial impressions did not appear to open up the doors for biases and stereotypes based upon appearance and perceptions of liking. Instead, assessors picked up information that assessees transmitted about their personality (i.e., Conscientiousness and Emotional Stability). Implications for further research on initial impressions and AC dimension ratings are discussed.
... Health-related stigma are associated with various conditions, such as mental illnesses (Chang, Wu, Chen, & Lin, 2016;Chang, Yen, Jang, Su, & Lin, 2017;Corrigan, 2000), infectious diseases (Mak et al, 2006;Zhang, Liu, Bromley, & Tang, 2007), sexual orientations (Herek, 2007), race, and obesity (Dean, Roth, & Bobko, 2008;Lin & Lee, 2017;Roehling, Roehling, & Pichler, 2007). Manifestations of stigma vary in the context of diverse health conditions and cultures (Parker & Aggleton, 2003). ...
... Al Personnel Assessment And decisions muslim And ArAb Hiring discriminAtion though there have been several previous meta-analyses that have focused on work discrimination of other groups (e.g., sex, Davison & Burke, 2000;Olian, Schwab, & Haberfeld, 1988;bodyweight, Rudolph, Wells, Weller, & Baltes, 2009;and age, Finkelstein, Burke, & Raju, 1995), there has not yet been a quantitative summary of the research on discrimination against Arab or Muslim individuals at work. Moreover, these groups are often overlooked in meta-analyses that have examined the adverse impact associated with specific selection constructs or methods (e.g., Berry, Clark, & McClure, 2011;Dean, Roth, & Bobko, 2008). A meta-analysis of the current body of work will bring much needed attention to these two traditionally understudied yet important groups (e.g., Ghumman et al., 2013;Ruggs et al., 2013). ...
... This perspective also complements existing individuallevel research on the minority experience at work, which has primarily emphasized the challenges and liabilities associated with being a minority, such as unfavorable stereotypes, prejudice, and discrimination (Davidson et al. 2016) in areas ranging from colleagues' support and cooperation to selection, evaluation, and promotion (Maume 1999, James 2000, Milton and Westphal 2005, Stauffer and Buckley 2005, Carli and Eagly 2007, Dean et al. 2008, Heilman and Eagly 2008, Rosette et al. 2008, Yzerbyt and Demoulin 2010, Koenig et al. 2011, Heilman and Caleo 2015. Scholars have recently begun to examine how minority individuals cope with such challenges. ...
... This process often involves using different sources of information, including SAT scores. However, standardized testing generally disadvantages marginalized applicants (Roth et al., 2001;Dean et al., 2008;Fagioli, 2013) due to several reasons including economic and socioeconomic factors, psychological factors, societal factors, cultural factors, test constructruction, and valdiation factors (Ployhart et al., 2003;Berry et al., 2011). As a consequence, organizational psychologists encourage decision makers in the workplace to broaden perspectives on selection in general. ...
... We included participants' gender and age as control variables because in past studies these variables were related to AC and SJT performance (e.g., Clapham & Fulford, 1997;Dean, Roth, & Bobko, 2008;Herde, Lievens, Jackson, Shalfrooshan, & Roth, 2020;Whetzel, McDaniel, & Nguyen, 2008). ...
... Below, in our review of these alternative selection methods (e.g., personality measures, integrity tests, employment interviews, and situational judgment tests), we focus attention on a number of widely used selection methods that display useful validities and markedly lower AI against minorities than CATs do. Other alternative selection methods, such as job knowledge tests, biodata, work samples, and assessment centers, are not reviewed because although they are valid selection techniques, their usage results in moderate to high levels of AI against minorities (Dean, Roth, & Bobko, 2008;McKay & McDaniel, 2006;Potosky, Bobko, & Roth, 2005;Roth, Bobko, McFarland, & Buster, 2008). Furthermore, these selection methods are most applicable for selecting experienced job applicants, (p. ...
Article
Full-text available
The article presents guidelines for professionals and ethical considerations concerning the assessment center method. Topics of the guidelines will be beneficial to human resource management specialists, industrial and organizational consultants. The social responsibility of business, their legal compliance and ethics are also explored.
Chapter
Full-text available
The small sample studies typical of psychological research produce seemingly contradictory results, and reliance on statistical significance tests causes study results to appear even more conflicting. Meta-analysis integrates the findings across such studies to reveal the simpler patterns of relations that underlie research literatures, thus providing a basis for theory development. Meta-analysis can correct for the distorting effects of sampling error, measurement error, and other artifacts that produce the illusion of conflicting findings. This chapter discusses these artifacts and the procedures used to correct for them. Different approaches to meta-analysis are discussed. Applications of meta-analysis in I/O psychology and other areas are discussed and evidence is presented that meta-analysis is transforming research in psychology. Meta-analysis has become almost ubiquitous. One indication of this is that, as of October 12, 2011, Google listed more than 9 million entries for meta-analysis.Keywords:meta-analysis;research synthesis;data analysis;cumulative knowledge;psychological theory
Article
Full-text available
The State of Personnel Selection Research An Expanded Problem Space: Selection to Differentiate the Firm Directions for Future Research Conclusion Acknowledgments References
Article
Full-text available
Despite the common lay assumption that males and females are profoundly different, Hyde (2005) used data from 46 meta-analyses to demonstrate that males and females are highly similar. Nonetheless, the gender similarities hypothesis has remained controversial. Since Hyde's provocative report, there has been an explosion of meta-analytic interest in psychological gender differences. We utilized this enormous collection of 106 meta-analyses and 386 individual meta-analytic effects to reevaluate the gender similarities hypothesis. Furthermore, we employed a novel data-analytic approach called metasynthesis (Zell & Krizan, 2014) to estimate the average difference between males and females and to explore moderators of gender differences. The average, absolute difference between males and females across domains was relatively small (d = 0.21, SD = 0.14), with the majority of effects being either small (46%) or very small (39%). Magnitude of differences fluctuated somewhat as a function of the psychological domain (e.g., cognitive variables, social and personality variables, well-being), but remained largely constant across age, culture, and generations. These findings provide compelling support for the gender similarities hypothesis, but also underscore conditions under which gender differences are most pronounced.
Article
This document is an update of several prior editions of guidelines and ethical considerations for assessment center operations dating back to 1975. Each set of guidelines was developed and endorsed by specialists in the research, development, and implementation of assessment centers. The guidelines are a statement of the considerations believed to be most important for all users of the assessment center method. For instance, the use of job-related simulations is a core concept when using the method. Job simulation exercises allow individuals to demonstrate their abilities in situations that are important on the job. As stressed in these guidelines, a procedure should not be represented as an assessment center unless it includes at least one, and usually several, job-related simulations that require the assessee to demonstrate a constructed behavioral response. Other important areas include assessor selection and training, using ‘competencies’ as dimensions to be assessed, validation, participants' rights, and the incorporation of technology into assessment center programs. The current guidelines discuss a number of considerations in developing and using assessment centers in diverse cultural settings.
Article
Full-text available
The article presents guidelines for professionals and ethical considerations concerning the assessment center method. Topics of the guidelines will be beneficial to human resource management specialists, industrial and organizational consultants. The social responsibility of business, their legal compliance and ethics are also explored.
Chapter
The research base for selection into teacher education programs and teaching practice is only recently emerging (Klassen & Kim, 2019; Klassen et al., 2017). In this light, reviewing selection practices and methods used in other fields—especially those where the methods are well-developed and well-researched—provides a lens through which to view and consider teacher selection. Various selection methods have been used to select individuals into educational (training) programs and into employment. Though the methods used in other fields have some degree of overlap with each other, each area also has its own distinct methods and research base that characterize the field. As such, in this chapter, we will review the practices and the evidence base for the methods that are used to select individuals into medical schools, law schools, and into large organizations.
Article
Full-text available
OBJECTIVE Neurosurgery is among the most competitive residencies, as evidenced by the high number of applicants for relatively few positions. Although it is important to recruit candidates who have the intellectual capacity and drive to succeed, traditional objective selection criteria, such as US Medical Licensing Examination (USMLE) (also known as Step 1) score, number of publications, and class ranking, have not been shown to consistently predict clinical and academic success. Furthermore, these traditional objective parameters have not been associated with specific personality traits. METHODS The authors sought to determine the efficacy of a personality assessment in the selection of neurosurgery residents. Specifically, the aim was to determine the correlation between traditional measures used to evaluate an applicant (e.g., USMLE score, number of publications, MD/PhD status) and corresponding validated personality traits. RESULTS Fifty-four neurosurgery residency applicants were interviewed at the Cleveland Clinic during the 2014–2015 application cycle. No differences in validated personality scores were identified between the 46 MD applicants and 8 MD/PhD applicants. The mean USMLE score (± SD) was 252.3 ± 11.9, and those in the high-USMLE-score category (USMLE score ≥ 260) had a significantly lower “imaginative” score (a stress measure of eccentric thinking and impatience with those who think more slowly). The average number of publications per applicant was 8.6 ± 7.9, and there was a significant positive correlation (r = 0.339, p = 0.016) between greater number of publications and a higher “adjustment” score (a measure of being even-tempered, having composure under pressure). Significant negative correlations existed between the total number of publications and the “excitable” score (a measure of being emotionally volatile) (r = −0.299, p = 0.035) as well as the “skeptical” score (measure of being sensitive to criticism) (r = −0.325, p = 0.021). The average medical school rank was 25.8, and medical school rankings were positively correlated with the “imaginative” score (r = 0.287, p = 0.044). CONCLUSIONS This is the first study to investigate the use of personality scores in the selection of neurosurgical residents. The use of personality assessments has the potential to provide insight into an applicant's future behavior as a resident and beyond. This information may be useful in the selection of neurosurgical residents and can be further used to customize the teaching of residents and for enabling them to recognize their own strengths and weaknesses for self-improvement.
Article
The authors quantify the conventional wisdom that predictors' correlations with cognitive ability are positively related to subgroup mean differences. Using meta-analytic and large-N data from diverse predictors, they found that cognitive saturation correlates .84 with predictors' artifact-corrected Black-White d values and .95 with predictors' artifact-corrected Hispanic-White d values. The authors also investigate the extent to which d values are associated with the use of assessor-based scoring and with predictor domains in which differential investment is likely to occur. As a practical application of these findings, they present a procedure to forecast mean differences on a new predictor based on its cognitive saturation and other attributes. They also present a Bayesian framework that allows one to integrate regression-based forecasts with observed d values to achieve more precise estimates of mean differences. The proposed forecasting techniques based on the relationship between mean differences and cognitive saturation can help to mitigate the difficulties inherent in computing precise local estimates of mean differences. (PsycINFO Database Record
Article
To promote diversity in organizations it is important to have accurate knowledge about subgroup differences associated with selection procedures. However, current estimates of subgroup differences in situational judgment tests (SJTs) are overwhelmingly based on range‐restricted incumbent samples that are downwardly biased. This study provides much‐needed applicant level estimates of SJT subgroup differences (N = 37,530). As a key finding, Black‐White differences (d = 0.66) were higher than in incumbent samples (d = 0.38). Overall, sex differences were small. Females scored higher for management jobs (d = −0.13) and males scored higher for administrative jobs (d = 0.15). By analyzing applicant samples that do not suffer from range restriction, this study adds knowledge about subgroup differences in SJTs.
Article
Full-text available
A longstanding concern about admissions to higher education is the underprediction of female academic performance by admission test scores. One explanation for these findings is selection system bias, that is, not all relevant KSAOs that are related to academic performance and gender are included in the prediction model. One solution to this problem is to include these omitted KSAOs in the prediction model, many of these KSAOs are 'noncognitive' and “hard‐to‐measure” skills in a high‐stakes context. An alternative approach to capture relevant KSAOs is using representative performance samples. We examined differential prediction of first year‐ and third year academic performance by gender based on a curriculum‐sampling test that was designed as a small‐scale simulation of later college performance. In addition, we examined differential prediction using both frequentist and Bayesian analyses. Our results showed no differential prediction or small female underprediction when using the curriculum‐sampling tests to predict first year GPA, and no differential prediction for predicting third year GPA. In addition, our results suggest that more comprehensive curriculum samples may show less differential prediction. We conclude that curriculum sampling may offer a practically feasible method that yields minimal differential prediction by gender in high‐stakes operational selection settings.
Article
Purpose The purpose of this paper is to establish the extent of general performance factors (GPF) in assessment center (AC) exercises and dimensions. The study further aims to determine if larger GPF contributes to larger ethnic group differences across exercises and dimensions that are more cognitively loaded in an emerging market context. Design/methodology/approach The authors analyzed data across three independent AC samples (Sample 1: N=172; Sample 2: N=281; Sample 3: N=428). The Schmid-Leiman solution was used to determine the extent of GPF in AC exercises and dimensions. An independent samples t-test and Cohen’s d was used to determine the size of ethnic group differences across exercises and dimensions. Findings The results indicate that GPF is consistently large for the in-basket exercise. Furthermore, dimensions that are more cognitively loaded, such as problem solving, strategic thinking, and business acumen, seem to produce the largest ethnic group differences. Overall, the research indicates that larger GPF is associated with larger ethnic group differences in relation to specific AC dimensions and exercises. Originality/value The authors add to the literature by investigating the prevalence of a GPF in AC ratings across AC exercises and dimensions. A novel contribution of the research attempts to link the prevalence of a GPF in AC ratings to group membership in South Africa. The study offers an alternative statistical analysis procedure to examine GPF in AC ratings.
Article
Separate meta-analyses of the cognitive ability and assessment center (AC) literatures report higher criterion-related validity for cognitive ability tests in predicting job performance. We instead focus on 17 samples in which both AC and ability scores are obtained for the same examinees and used to predict the same criterion. Thus, we control for differences in job type and in criteria that may have affected prior conclusions. In contrast to Schmidt and Hunter's (1998) meta-analysis, reporting mean validity of .51 for ability and .37 for ACs, we found using random-effects models mean validity of .22 for ability and .44 for ACs using comparable corrections for range restriction and measurement error in the criterion. We posit that 2 factors contribute to the differences in findings: (a) ACs being used on populations already restricted on cognitive ability and (b) the use of less cognitively loaded criteria in AC validation research. (PsycINFO Database Record
Chapter
Dieses Kapitel verfolgt das Ziel, eine zusammenfassende Übersicht zum AC zu vermitteln. Das AC wird definiert und Ziele (Personalauswahl, Personalentwicklung) sowie allgemeine Ablaufphasen (Analyse, Design, Ausführung, Evaluation) werden beschrieben. Ferner werden Anforderungsdimensionen, die häufig Anwendung finden, charakterisiert und geschildert, welche Probleme mit der Art und Anzahl der verwendeten Anforderungsdimensionen im AC verbunden sind. Ein weiterer Teil des Kapitels widmet sich den im AC eingesetzten Übungen und Simulationen (z. B. Präsentationen, Postkorb, Fallstudie, führerlose Gruppendiskussion) sowie der Frage, inwiefern die Kombination von AC mit weiteren Diagnosemethoden (z. B. psychometrischen Testverfahren) von Vorteil sein kann. Schließlich legt das Kapitel dar, wie sich der Beobachterpool im AC konstituiert. Es erfolgt eine Beschreibung der Beobachtungsmethoden und der Inhalte von Beobachtertrainings. Der abschließende Teil des Kapitels widmet sich ausgewählten psychometrischen Qualitäten des AC, nämlich metaanalytischen Befunden zur prädiktiven Validität und Maßnahmen ihrer Verbesserung sowie empirischen Ergebnissen zur inkrementellen Validität des AC.
Chapter
Shifts in the global economy have placed more pressure on the decisions of employees, managers, and CEOs than ever before. At the same time, the proliferation of technology in the workplace has become ubiquitous. Fortunately, the assessment center method is evolving alongside these other trends. With just a few modifications to existing assessment center simulations, we can use newer technology to capture previously difficult to observe behaviors that tap directly into candidate’s decision making processes. By combining passive data logging of key strokes and mouse clicks, eye tracking, and physiological responses such as skin conductivity, we are able to capture behaviors in real time that can be used to supplement traditional assessor ratings. Doing so allows assessors to compile a more accurate and holistic summary of a candidate’s performance. This wealth of new behavioral information has implications for high stakes hiring decisions as well as targeted training and development.
Chapter
Die Fairness von Personalauswahlverfahren kann in vielfältiger Weise eingeschränkt sein: Bewerber werden „aus dem Bauch heraus“ ausgewählt, ohne dass die Entscheidungsträger dabei bemerken, dass sie in systematischer Weise bestimmte Personengruppen diskriminieren. An sich aussagekräftige Personalauswahlmethoden werden nicht professionell umgesetzt, wodurch die Validität der getroffenen Entscheidungen sinkt. Methoden, die in starkem Maße sprachgebunden sind, setzen Zuwanderer in einen Nachteil gegenüber Muttersprachlern. Darüber hinaus können manche Bevölkerungsgruppen bei auswahlrelevanten Merkmalen niedrigere Werte aufweisen, wodurch sie mit geringerer Erfolgswahrscheinlichkeit bestimmte Auswahlverfahren positiv durchlaufen. Die Akzeptanz der Personalauswahlverfahren hängt nur bedingt von ihrer diagnostischen Qualität ab. Grundsätzlich bevorzugen Bewerber Auswahlmethoden, die einen offensichtlichen Bezug zur Arbeitsplatzrealität aufweisen. Die Diskussion verschiedener Lösungsstrategien zeigt, dass insbesondere eine Orientierung an den wissenschaftlichen Prinzipien guter Personalauswahl zu einer Reduktion entsprechender Probleme beitragen kann.
Article
Gamified and game-based assessments (GBAs) are increasingly used for personnel selection but there are concerns that males and younger applicants have an advantage in these assessments. However, hardly any research has addressed whether sex and age are related to GBA performance. Similarly, the criterion-related validity of GBAs is also not sufficiently confirmed. Therefore, we analyzed archival data from a high-stakes setting in which applicants completed a computer-based simulation game targeting complex problem solving. The analyses confirmed expectations for the present simulation game of better performance for males than for females and for younger than for older applicants. However, the effect sizes were small. Furthermore, performance in the current simulation game correlated with job-related performance as measured in an assessment center.
Chapter
Die Personalauswahl dient in erster Linie dazu, die Eignung der Bewerber für die ausgeschriebene Stelle kritisch zu hinterfragen. Darüber hinaus wirkt das Auswahlverfahren indirekt aber auch als Personalmarketingmaßnahme, da Bewerber das Auswahlverfahren als eine Art Visitenkarte des Unternehmens betrachten. Auch wenn die diagnostische Qualität oberster Priorität besitzt, gibt es jedoch viele Aspekte, die ein Arbeitgeber bei der Entwicklung und Durchführung konkreter Auswahlmethoden beachten sollte. Die Ausführungen des Kapitels beziehen sich auf die zügige und professionelle Kommunikation im Bewerbungsprozess (z. B. Formulierung von Eingangsbestätigungen und Absagen), die Sichtung von Bewerbungsunterlagen bzw. den Einsatz eines onlinegestützten E-Assessments, den Einsatz von Testverfahren und Fragebögen, die Durchführung von Arbeitsproben, das strukturierte Einstellungsinterview, Assessment Center sowie die Folgeprozesse, die sich an eine erfolgte Auswahlentscheidung anschließen.
Article
Full-text available
Fairness toward job applicants differing in gender and ethnicity in a video-based assessment interview was explored. For this purpose, 103 female and 105 male participants, including 38 who declared to have a migration background of their own, rated a behavior anchored rating scale after having watched the videotaped answers of a potential applicant. The domains assessed were communication skills and the capacity to work in a team. The videos of the applicants were generated with the help of standardized scripts and semi-professional actors. Eight videos were made operationalizing a two (Turkish migration background-native German) by two (male-female) by two (more positive applicant answers-moderately good applicant answers) experimental design. A multivariate analysis of variance (MANOVA) revealed a small to moderate main effect only for migration background of the applicants. Subsequent ANOVAs found that in three of the four dependent variables this effect reached significance of p<.05. The effects were robust against consideration of the raters' agreeableness and the raters' own migration background as covariates. Applicants with Turkish background scored higher in the evaluation of their videotaped answers than German native applicants did. Social Identity Theory (Taijfel & Turner, 1986) provides an approach to integrate these findings.
Chapter
This chapter addresses test development and validation strategies for public safety jobs. Authoritative guidance provided by the Uniform Guidelines, Standards, and Principles is discussed along with best practices in job analysis, test administration, and validation. Case studies illustrating the inherent complexities in public safety assessment are presented.
Article
The other articles in this special issue of Human Resource Management Review present meta-analyses of specific topic areas, or articles on methodological issues associated with meta-analyses, within the human resources management field. Ours is a bit different in that we do not present actual meta-analytic results, but instead conduct a thorough review of the field in order to identify areas where meta-analyses have not been conducted. Then, we discuss why such analyses have not been provided, suggestions for how we might like to see research proceed in such areas, and also implications for theory development in these areas of the field. We conclude our paper with some additional thoughts on issues to keep in mind as we seek to utilize meta-analysis to its fullest potential, and thus yield the best results possible.
Article
Full-text available
The purpose of this investigation was to assess the effect of race on employment interview evaluations. A mete-analysis of 31 studies found that both Black and Hispanic applicants received interview ratings that on average were only about one quarter of a standard deviation lower than those for White applicants. Thus, interviews as a whole do not appear to affect minorities nearly as much as mental ability tests. Results also suggested that (a) high-structure interviews have lower group differences on average than low-structure interviews, (b) group differences tend to decrease as the complexity of the job increases, and (c) group differences tend to be higher when there is a greater proportion of a minority in the applicant pool. Implications and directions for future research are discussed.
Book
Full-text available
Meta-analysis is arguably the most important methodological innovation in the social and behavioral sciences in the last 25 years. Developed to offer researchers an informative account of which methods are most useful in integrating research findings across studies, this book will enable the reader to apply, as well as understand, meta-analytic methods. Rather than taking an encyclopedic approach, the authors have focused on carefully developing those techniques that are most applicable to social science research, and have given a general conceptual description of more complex and rarely-used techniques. Fully revised and updated, Methods of Meta-Analysis, Second Edition is the most comprehensive text on meta-analysis available today. New to the Second Edition: * An evaluation of fixed versus random effects models for meta-analysis* New methods for correcting for indirect range restriction in meta-analysis* New developments in corrections for measurement error* A discussion of a new Windows-based program package for applying the meta-analysis methods presented in the book* A presentation of the theories of data underlying different approaches to meta-analysis
Article
Full-text available
This article summarizes the practical and theoretical implications of 85 years of research in personnel selection. On the basis of meta-analytic findings, this article presents the validity of 19 selection procedures for predicting job performance and training performance and the validity of paired combinations of general mental ability (GMA) and the 18 other selection procedures. Overall, the 3 combinations with the highest multivariate validity and utility for job performance were GMA plus a work sample test (mean validity of .63), GMA plus an integrity test (mean validity of .65), and GMA plus a structured interview (mean validity of .63). A further advantage of the latter 2 combinations is that they can be used for both entry level selection and selection of experienced employees. The practical utility implications of these summary findings are substantial. The implications of these research findings for the development of theories of job performance are discussed. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
The purpose of this investigation was to assess the effect of race on employment interview evaluations. A meta-analysis of 31 studies found that both Black and Hispanic applicants received interview ratings that on average were only about one quarter of a standard deviation lower than those for White applicants. Thus, interviews as a whole do not appear to affect minorities nearly as much as mental ability tests. Results also suggested that (a) high-structure interviews have lower group differences on average than low-structure interviews, (b) group differences tend to decrease as the complexity of the job increases, and (c) group differences tend to be higher when there is a greater proportion of a minority in the applicant pool. Implications and directions for future research are discussed. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Mean subgroup (gender, ethnic/cultural, and age) differences are summarized across studies for several predictor domains – cognitive ability, personality and physical ability – at both broadly and more narrowly defined construct levels, with some surprising results. Research clearly indicates that the setting, the sample, the construct and the level of construct specificity can all, either individually or in combination, moderate the magnitude of differences between groups. Employers using tests in employment settings need to assess accurately the requirements of work. When the exact nature of the work is specified, the appropriate predictors may or may not have adverse impact against some groups. The possible causes and remedies for adverse impact (measurement method, culture, test coaching, test-taker perceptions, stereotype threat and criterion conceptualization) are also summarized. Each of these factors can contribute to subgroup differences, and some appear to contribute significantly to subgroup differences on cognitive ability tests, where Black–White mean differences are most pronounced. Statistical methods for detecting differential prediction, test fairness and construct equivalence are described and evaluated, as are statistical/mathematical strategies for reducing adverse impact (test-score banding and predictor/criterion weighting strategies).
Article
Full-text available
This chapter reviews personnel selection research from 1995 through 1999. Areas covered are job analysis; performance criteria; cognitive ability and personality predictors; interview, assessment center, and biodata assessment methods; measurement issues; meta-analysis and validity generalization; evaluation of selection systems in terms of differential prediction, adverse impact, utility, and applicant reactions; emerging topics on team selection and cross-cultural issues; and finally professional, legal, and ethical standards. Three major themes are revealed: (a) Better taxonomies produce better selection decisions; (b) The nature and analyses of work behavior are changing, influencing personnel selection practices; (c) The field of personality research is healthy, as new measurement methods, personality constructs, and compound constructs of well-known traits are being researched and applied to personnel selection.
Article
Full-text available
The authors conducted a new meta-analysis of ethnic group differences in job performance. Given a substantially increased set of data as compared with earlier analyses, the authors were able to conduct analyses of Black-White differences within more homogeneous categories of job performance and to reexamine findings on objective versus subjective measurement. Contrary to one perspective sometimes adopted in the field, objective measures are associated with very similar, if not somewhat larger, standardized ethnic group differences (ds) than subjective measures across a variety of indicators. This trend was consistent across quality, quantity, and absenteeism measures. Further, work samples and job knowledge tests are associated with larger ds than performance ratings or measures of absenteeism. Analysis of Hispanic-White standardized differences shows that they are generally lower than Black-White differences in several categories.
Article
Full-text available
This study examined gender differences in a large-scale assessment center for officer entry in the British Army. Subgroup differences were investigated for a sample of 1,857 candidates: 1,594 men and 263 women. A construct-driven approach was chosen (a) by examining gender differences at the construct level, (b) by formulating a priori hypotheses about which constructs would be susceptible to gender effects, and (c) by using both effect size statistics and latent mean analyses to investigate gender differences in assessment center ratings. Results showed that female candidates were rated notably higher on constructs reflecting an interpersonally oriented leadership style (i.e., oral communication and interaction) and on drive and determination. These results are discussed in light of role congruity theory and of the advantages of using latent mean analyses.
Article
Full-text available
Abstract This chapter reviews literature from approximately mid-1993 through early 1996 in the areas of performance and criteria, validity, statistical and equal opportunity issues, selection for work groups, person-organization fit, applicant reactions to selection procedures, and research on predictors, including ability, personality, assessment centers, interviews, and biodata. The review revolves around three themes: (a) attention toward criteria and models of performance, (b) interest in personality measures as predictors of job performance, and (c) work on the person-organization fit selection model. In our judgment, these themes merge when it is recognized that development of performance models that differentiate criterion constructs reveal highly interpretable relationships between the predictor domain (i.e. ability, personality, and job knowledge) and the criterion domain (i.e. technical proficiency, extra-technical proficiency constructs such as prosocial organizational behavior, and overall job performance). These and related developments are advancing the science of personnel selection and should enhance selection practices in the future.
Article
Various forms of score adjustment have been suggested and used when mean differences by gender, race, or ethnicity are found using preemployment tests. This article examines the rationales for score adjustment and describes and compares different forms of score adjustment, including within-group norming, bonus points, separate cutoffs, and banding. It reviews the legal environment for personnel selection and the circumstances leading to the passage of the Civil Rights Act of 1991. It examines score adjustment in the use of cognitive ability tests, personality inventories, interest inventories, scored biographical data, and physical ability tests and outlines the implications for testing practice of various interpretations of the Civil Rights Act of 1991.
Article
Several studies have evaluated the effects of sex and race on performance ratings in general, and ratings in assessment centers in particular. However, the relationship between Hispanic origin and assessment center ratings has not been studied. To investigate the relationship between Hispanic origin and assessment, data were collected from an assessment center that had both Hispanic and non-Hispanic candidates. This investigation has revealed a main effect for assessee origin (Hispanic vs. non-Hispanic) and a nonsignificant interaction between the ethnic composition of the assessor team and the ethnicity of the assessees.
Article
Presents a review of assessment-centre practice and evaluates its overall effectiveness as a method of selection. There is a specific evaluation of an assessment centre in the banking industry for the selection of graduates. The results do not in general support the findings on validity found in the general literature. There was a lack of construct validity specifically in respect of an exercise effect. Criterion validity was similarly poor.
Article
Meta-analysis (Hunter, Schmidt, & Jackson, 1982) of 50 assessment center studies containing 107 validity coefficients revealed a corrected mean and variance of .37 and .017, respectively. Validities were sorted into five categories of criteria and four categories of assessment purpose. Higher validities were found in studies in which potential ratings were the criterion, and lower validities were found in promotion studies. Sufficient variance remained after correcting for artifacts to justify searching for moderators. Validities were higher when the percentage of female assessees was high, when several evaluation devices were used, when assessors were psychologists rather than managers, when peer evaluation was used, and when the study was methodologically sound. Age of assessees, whether feedback was given, days of assessor training, days of observation, percentages of minority assessees, and criterion contamination did not moderate assessment center validities. The findings suggest that assessment centers show both validity generalization and situational specificity. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Meta-analysis of the cumulative research on various predictors of job performance showed that for entry-level jobs there was no predictor with validity equal to that of ability, which had a mean validity of .53. For selection on the basis of current job performance, the work sample test, with mean validity of .54, was slightly better. For federal entry-level jobs, substitution of an alternative predictor would cost from $3.12 (job tryout) to $15.89 billion/year (age). Hiring on ability had a utility of $15.61 billion/year but affected minority groups adversely. Hiring on ability by quotas would decrease utility by 5%. A 3rd strategy—using a low cutoff score—would decrease utility by 83%. Using other predictors in conjunction with ability tests might improve validity and reduce adverse impact, but there is as yet no database for studying this possibility. (89 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Discusses the use of simulation exercises as opposed to paper and pencil tests to assess supervisory skills. Specific problems encountered in developing a simulation approach for the disadvantaged are outlined. Relevance of the simulation was found to be a crucial variable. 53 white and 54 black disadvantaged Ss in a manpower training program were studied. Ss participated in 2 leaderless group discussions and 2 individual exercises. Test scores from a standard reading test were obtained. No significant differences were found between the white and black groups on demographic variables, reading level, and on the exercises. In the group discussions the only significant (p < .001) finding was that whites made more negative statements than blacks. None of the correlations between reading level and exercise scores were significant at the .05 level. Evaluator ranking of supervisory potential of the disadvantaged as compared to supervisors and executives indicated that approximately 50% of the disadvantaged scored satisfactorily or better. Data indicate that many supervisory skills are independent of overall reading ability. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
We used meta-analytic procedures to investigate the criterion-related validity of assessment center dimension ratings. By focusing on dimension-level information, we were able to assess the extent to which specific constructs account for the criterion-related validity of assessment centers. From a total of 34 articles that reported dimension-level validities, we collapsed 168 assessment center dimension labels into an overriding set of 6 dimensions: (a) consideration/awareness of others, (b) communication, (c) drive, (d) influencing others, (e) organizing and planning, and (f) problem solving. Based on this set of 6 dimensions, we extracted 258 independent data points. Results showed a range of estimated true criterion-related validities from .25 to .39. A regression-based composite consisting of 4 out of the 6 dimensions accounted for the criterion-related validity of assessment center ratings and explained more variance in performance (20%) than Gaugler, Rosenthal, Thornton, and Bentson (1987) were able to explain using the overall assessment center rating (14%).
Article
A common assumption exists which asserts that the formation of composites of predictors represents a method for dealing with adverse impact. It is often expected that including predictors that demonstrate smaller group differences with others that demonstrate larger group differences will help to alleviate the amount of adverse impact observed at the composite level. The purpose of this paper is to answer the question “If two or more predictors are combined to form a composite, what will be the magnitude of group differences and, consequently, of adverse impact, of using that composite for selection?” In answering this question, a set of tables, figures, and formulas are presented that highlight variables influential in affecting how composites of predictors influence observed group differences. A number of conclusions are drawn that clarify the extent to which forming composites decreases group differences and subsequently adverse impact.
Article
A variety of recent articles in the personnel selection literature have used analyses of meta-analytically derived matrices to draw general conclusions for the field. The purpose of this article is to construct a matrix that incorporates as complete information as possible on the relationships among cognitive ability measures, three sets of alternative predictors, and job performance, We build upon a starting matrix used by Schmitt, Rodgers, Chan, Sheppard, and Jennings (1997). Mean differences, by race, for each of the measures and the potential for adverse impact of predictor composites are also considered. We demonstrate that the use of alternative predictors alone to predict job performance (in the absence of cognitive ability) lowers the potential for adverse impact. However, in contrast to recent claims, adverse impact continues to occur at many commonly used selection ratios. Future researchers are encouraged to use our matrix and to expand upon it as new primary research becomes available. We also report and reaffirm many methodological lessons along the way, including the many judgment calls that appear in an effort of this magnitude and a reminder that the field could benefit from even greater conceptual care regarding what is labeled an “alternative predictor.” Directions for future meta-analyses and for future primary research activities are also derived.
Article
The cognitive ability levels of different ethnic groups have interested psychologists for over a century. Many narrative reviews of the empirical literature in the area focus on the Black-White differences, and the reviews conclude that the mean difference in cognitive ability (g) is approximately 1 standard deviation; that is, the generally accepted effect size is about 1.0. We conduct a meta-analytic review that suggests that the one standard deviation effect size accurately summarizes Black-White differences for college application tests (e.g., SAT) and overall analyses of tests of g for job applicants in corporate settings. However, the 1 standard deviation summary of group differences fails to capture many of the complexities in estimating ethnic group differences in employment settings. For example, our results indicate that job complexity, the use of within job versus across job study design, focus on applicant versus incumbent samples, and the exact construct of interest are important moderators of standardized group differences. In many instances, standardized group differences are less than 1 standard deviation. We conduct similar analyses for Hispanics, when possible, and note that Hispanic-White differences are somewhat less than Black-White differences.
Article
This study investigates whether different job-relevant competencies vary in terms of Black-White subgroup differences exhibited. There were 633 participants (545 Whites, 88 Blacks) who completed a managerial assessment center that evaluated 13 competency dimensions across 8 assessment exercises. Participants also completed a cognitive ability test. The results suggest that subgroup differences vary by the content domain of the competency. As predicted, significant subgroup differences emerged for a majority of the more cognitively loaded competencies (e.g., judgment) while nonsignificant differences were associated with a majority of the less cognitively loaded competencies (e.g., human relations). Furthermore, when cognitive ability was controlled, 12 of 13 competency scores demonstrated incremental validity in predicting supervisory job performance ratings. In addition, competencies with greater cognitive load tended to more strongly predict cognitive aspects of job performance as compared to noncognitive aspects. However, competencies with less cognitive load did not differentially predict cognitive and noncognitive aspects of job performance.
Article
Given a data set about an individual or a group (e.g., interviewer ratings, life history or demographic facts, test results, self-descriptions), there are two modes of data combination for a predictive or diagnostic purpose. The clinical method relies on human judgment that is based on informal contemplation and, sometimes, discussion with others (e.g., case conferences). The mechanical method involves a formal, algorithmic, objective procedure (e.g., equation) to reach the decision. Empirical comparisons of the accuracy of the two methods (136 studies over a wide range of predictands) show that the mechanical method is almost invariably equal to or superior to the clinical method: Common antiactuarial arguments are rebutted, possible causes of widespread resistance to the comparative research are offered, and policy implications of the statistical method's superiority are discussed.
Article
This study investigates the degree to which subgroup (Black-White) mean differences on various assessment center exercises (e.g., in-basket, role play) may be a function of the type of exercise employed; and furthermore, begins to explore why these different types of exercises result in subgroup differences. The sample consisted of 633 participants who completed a managerial assessment center that evaluated them on 14 ability dimensions across 7 different types of assessment exercises. In addition, each participant completed a cognitive ability measure. The results suggest that subgroup differences varied by type of assessment exercise; and furthermore that the subgroup difference appeared to be a function of the cognitive component of the exercise. Lastly, preliminary support is found that the validity of some of the assessment center exercises in predicting supervisor ratings of job performance is based, in part, on their cognitive component; however, evidence of incremental validity does exist.
Article
Although much research has focused on assessment centres, rather less has been done on development centres (DCs), and in particular on the effects they have on participants. A study is reported of the impact of DC attendance on selfassessment made by 111 customer service staff. Participants rated themselves on the DC dimensions before and after attending, and these ratings were correlated with observers’ assessments made at the DC. Results indicated congruence on only two out of 10 dimensions between observer- and self-assessments pre-DC, rising to six out of the 10 dimensions post-DC. Self-esteem also increased following attendance at the DC. Females showed more self-assessment accuracy than did males. Classifying the participants (pre-DC) into underraters, accurate raters and overraters showed that underraters became more accurate in their self-assessments post-DC whereas overraters did not; the latter group continued to have self-ratings significantly higher than observer ratings, and were unchanged in level of self-esteem. It is concluded that the DC studied has demonstrated its value as a process for increasing self-awareness for some but not all participants. The findings are discussed in terms of their implications for both research and the application of DCs in practice.
Article
The main elements in the design and validation of personnel selection procedures have been in place for many years. The role of job analysis, contemporary models of work performance and criteria are reviewed critically. After identifying some important issues and reviewing research work on attracting applicants, including applicant perceptions of personnel selection processes, the research on major personnel selection methods is reviewed. Recent work on cognitive ability has confirmed the good criterion-related validity, but problems of adverse impact remain. Work on personality is progressing beyond studies designed simply to explore the criterion-related validity of personality. Interview and assessment centre research is reviewed, and recent studies indicating the key constructs measured by both are discussed. In both cases, one of the key constructs measured seems to be generally cognitive ability. Biodata validity and the processes used to develop biodata instruments are also critically reviewed. The article concludes with a critical evaluation of the processes for obtaining validity evidence (primarily from meta-analyses) and the limitations of the current state of the art. Speculative future prospects are briefly reviewed.
Article
The effects of gender on evaluations of managerial potential within a corporate assessment center program were investigated. The sample consisted of 375 men and 61 women (94% White, 3% Black, 2.3% Asian, and .7% Hispanic) assessed between 1980 and 1985. Candidates were assessed on their intellectual ability, performance and interpersonal skills, and overall management potential. Women were rated higher than men on the performance-style skills; however, there were no differences in overall management potential ratings or in actual long-term job advancement. The results suggest that subtle gender bias affects evaluations of managerial potential and subsequent promotion decisions.
Article
Various forms of score adjustment have been suggested and used when mean differences by gender, race, or ethnicity are found using preemployment tests. This article examines the rationales for score adjustment and describes and compares different forms of score adjustment, including within-group norming, bonus points, separate cutoffs, and banding. It reviews the legal environment for personnel selection and the circumstances leading to the passage of the Civil Rights Act of 1991. It examines score adjustment in the use of cognitive ability tests, personality inventories, interest inventories, scored biographical data, and physical ability tests and outlines the implications for testing practice of various interpretations of the Civil Rights Act of 1991.
Article
Cognitively loaded tests of knowledge, skill, and ability often contribute to decisions regarding education, jobs, licensure, or certification. Users of such tests often face difficult choices when trying to optimize both the performance and ethnic diversity of chosen individuals. The authors describe the nature of this quandary, review research on different strategies to address it, and recommend using selection materials that assess the full range of relevant attributes using a format that minimizes verbal content as much as is consistent with the outcome one is trying to achieve. They also recommend the use of test preparation, face-valid assessments, and the consideration of relevant job or life experiences. Regardless of the strategy adopted, it is unreasonable to expect that one can maximize both the performance and ethnic diversity of selected individuals.
Article
Previous studies of standardized ethnic group differences in the employment interview have shown differences to be relatively small. Unfortunately, many researchers conducting interview studies have not considered the issue of range restriction in research design. This omission is likely to lead to underestimates of standardized ethnic group differences (d) when the interview is considered as an initial screening device or used in combination with other initial screening devices. The authors found that 2 forms of a behavioral interview were associated with standardized ethnic group differences of .36 and .56 when corrected for range restriction. These differences are substantially larger than previously thought and demonstrate the importance of considering a variety of study design characteristics in obtaining the appropriate parameter estimates.