Article

Do Assessors Have Too Much on Their Plates? The Effects of Simultaneously Rating Multiple Assessment Center Candidates on Rating Quality

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

It has been suggested that the large cognitive demands during the observation of assessment center (AC) participants can impair the quality of the assessors' ratings. An aspect that is especially relevant in this regard is the number of candidates that assessors have to observe simultaneously during group discussions, which are one of the most commonly used AC exercises. The present research evaluated potential impairments of the quality of the assessors' ratings (construct- and criterion-related validity and rating accuracy) related to the number of to-be-observed candidates. Study 1 (N=1046) was a quasi-experimental field study and Study 2 (N=71) was an experimental laboratory study. Both studies found significant impairments of assessors' rating quality when a larger in comparison to a lower number of candidates had to be observed simultaneously. These results suggest that assessors should not have to observe too many candidates at the same time during AC group discussions.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... When assessors have to observe multiple candidates simultaneously as, for example, in a group discussion, cognitive demands increase and thus inaccuracies in ratings are likely to increase, too (cf. Melchers, Kleinmann, & Prinz, 2010). ...
... In line with the unitarian framework of validity, the few studies that are available to date revealed that improvements in AC construct-related validity may indeed lead to improvements in criterion-related validity (Melchers, Kleinmann, & Prinz, 2010;Schleicher, Day, Mayes, & Riggio, 2002). Other findings, however, suggest that some factors might have opposite effects on construct-related and criterion-related validity of ACs. ...
... Based on these findings, this study provided important practical guidance on how to weigh assessor expertise against the size of the assessor team so that rating accuracy can be ensured while keeping AC costs under control. As rating accuracy is connected to AC construct-related validity (Gaugler & Thornton, 1989;Lievens, 2001a;Melchers, Kleinmann, & Prinz, 2010;Schleicher et al., 2002), the results from this study are also relevant for AC construct-related validity. ...
Article
Assessment Centers (ACs) are a diagnostic tool that serve as a basis for decisions in the context of personnel selection and employee development. In view of the far-reaching consequences that AC ratings can have, it is important that these ratings are accurate. Therefore, we need to understand what AC ratings measure and how the measurement of dimensions, that is, construct-related validity, can be improved. The aims of this thesis are to contribute to the understanding of the construct-related validity of ACs and to provide practical guidance in this regard. Three studies that offer different perspectives on rating accuracy and AC construct-related validity, respectively, were conducted. The first study investigated whether increasing assessor team size can compensate for missing assessor expertise (i.e., assessor training and assessor background) and vice versa to improve rating accuracy. On the basis of dimension ratings from a laboratory setting (N = 383), we simulated assessor teams of different sizes. Of the factors considered, assessor training was most effective in improving rating accuracy and it could only partly be compensated for by increasing assessor team size. In contrast, increasing the size of the assessor team could compensate for missing expertise related to assessor background. In the second study, the effects of exercise similarity on AC construct-related and criterion-related validity were examined simultaneously. Data from a simulated graduate AC (N = 92) revealed that exercise similarity was beneficial for construct-related validity, but that it did not affect criterion-related validity. These results indicate that improvements in one aspect of validity are not always paralleled by improvements in the other aspect of validity. The third study examined whether relating AC overall dimension ratings to external evaluations of the same dimensions can provide evidence for construct-related validity of ACs. Confirmatory factor analyses of data from three independent samples (Ns = 428, 121, and 92) yielded source factors but no dimension factors in the latent factor structure of AC overall dimension ratings and external dimension ratings. This means that different sources provide different perspectives on candidates’ performance, and that AC overall dimension ratings and external dimensions ratings cannot be attributed to the purported dimensions. Taken as a whole, this thesis looked at AC construct-related validity from different angles. The reported findings contribute to the understanding of rating accuracy and construct-related validity of ACs.
... It follows that mental workload (i.e., processing demands) would be higher when multiple elements of performance for multiple dimensions must be processed simultaneously and that, as a result, ratings may ultimately be affected (Kogan et al. 2009). When Melchers et al. (2010) asked raters to rate either 4 or 5 candidates for managerial positions using the same 4 dimensions in leaderless group tasks, they found significant impairments of assessors' rating quality when higher demands (rating 5 candidates) were compared to lower demands (rating 4 candidates) (Melchers et al. 2010). This is a dramatic finding given that the manipulation was so minimal. ...
... It follows that mental workload (i.e., processing demands) would be higher when multiple elements of performance for multiple dimensions must be processed simultaneously and that, as a result, ratings may ultimately be affected (Kogan et al. 2009). When Melchers et al. (2010) asked raters to rate either 4 or 5 candidates for managerial positions using the same 4 dimensions in leaderless group tasks, they found significant impairments of assessors' rating quality when higher demands (rating 5 candidates) were compared to lower demands (rating 4 candidates) (Melchers et al. 2010). This is a dramatic finding given that the manipulation was so minimal. ...
... Such strategies have been found effective in the context of improving assessment of autobiographical submissions in an admissions context (Dore et al. 2006). While necessarily speculative at this point as direct evidence has not been collected in the broader world of assessment relevant to medical education, researchers in other fields investigating similar concerns have demonstrated that reducing load may create strategies for improved rater performance (Gaugler and Thornton 1989;Melchers et al. 2010). ...
Article
When appraising the performance of others, assessors must acquire relevant information and process it in a meaningful way in order to translate it effectively into ratings, comments, or judgments about how well the performance meets appropriate standards. Rater-based assessment strategies in health professional education, including scale and faculty development strategies aimed at improving them have generally been implemented with limited consideration of human cognitive and perceptual limitations. However, the extent to which the task assigned to raters aligns with their cognitive and perceptual capacities will determine the extent to which reliance on human judgment threatens assessment quality. It is well recognized in medical decision making that, as the amount of information to be processed increases, judges may engage mental shortcuts through the application of schemas, heuristics, or the adoption of solutions that satisfy rather than optimize the judge's needs. Further, these shortcuts may fundamentally limit/bias the information perceived or processed. Thinking of the challenges inherent in rater-based assessments in an analogous way may yield novel insights regarding the limits of rater-based assessment and may point to greater understanding of ways in which raters can be supported to facilitate sound judgment. This paper presents an initial exploration of various cognitive and perceptual limitations associated with rater-based assessment tasks. We hope to highlight how the inherent cognitive architecture of raters might beneficially be taken into account when designing rater-based assessment protocols.
... einer Assessorin beurteilt werden, empfiehlt es sich, diese Anzahl im Remote-AC gegenüber dem Vor-Ort-AC niedriger zu halten. Gruppenaufgaben sind ohnehin kognitiv zehrend für Assessor/innen (Melchers et al., 2010), sodass die zusätzliche Belastung im Remote-Format reduziert werden sollte. Überdies kann es für Assessor/innen hilfreich sein, wenn sie während Übungen, in denen sie nicht zwingend zu sehen sein müssen, ihre Kameras ausschalten. ...
Chapter
Full-text available
Dieses Kapitel gibt einen Überblick über zentrale Unterschiede und Herausforderungen von remote bzw. digitalisiert durchgeführten Assessment Centern (Remote-ACs) im Vergleich zu vor Ort durchgeführten ACs. Dabei werden relevante theoretische Konzepte, generelle Herausforderungen und übungsspezifische Aspekte, sowie die überschaubare wissenschaftliche Befundlage zu diesen ACs vorgestellt. Daraus leiten wir konkrete Hinweise ab, welche Aspekte bei der Nutzung von Remote-ACs beachtet werden sollten und wie möglichen nachteiligen Effekten vorgebeugt werden kann.
... The significant cognitive load involved in keeping track of multiple observable behaviors, especially in multiple domains, has been documented. 7,8 Instructors with fewer cognitive demands would have more cognitive load to devote to formative assessment. ...
Article
Full-text available
Background: Facilitating simulation is a complex task with high cognitive load. Often simulation technologists are recruited to help run scenarios and lower some of the extraneous load. We used cognitive load theory to explore the impact of technologists on instructors, identifying sources of instructor cognitive load with and without technologists present. Methods: Data were collected from 56 simulation sessions for postgraduate emergency medicine residents. Instructors delivered 14 of the sessions without a technologist. After each session, the instructor and simulation technologist (if present) provided quantitative and qualitative data on the cognitive load of the simulation. Results: Instructors rated their cognitive load similarly, regardless of whether simulation technologists were present. However, the composition of their cognitive load differed. Instructors experienced reduced cognitive load related to the simulator and technical resources when technologists were present. Qualitative feedback from instructors suggested real consequences to these differences in cognitive load in (1) perceived complexities in running the scenario, and (2) observations of learners. Conclusion: We provide evidence that simulation technologists can remove some of the extraneous load related to the simulator and technical resources for the instructor, allowing the instructor to focus more on observing the learner(s) and tailoring the scenario to their actions.
... Third, the EXTERNAL CONSTRUCT-RELATED VALIDITY 25 designers of the present ACs also followed recommendations concerning design features that should make it more likely to support dimension measurements (International Taskforce on Assessment Center Guidelines, 2015). For example, only a limited number of different dimensions were used (Gaugler & Thornton, 1989), assessors received adequate rater training (Woehr & Arthur, 2003), and assessors did not have to evaluate too many participants simultaneously in group exercises (Melchers, Kleinmann, & Prinz, 2010). In line with this, the ACs showed expected levels of criterion-related validity (cf. ...
Article
Full-text available
There have been repeated calls for an external construct validation approach to advance our understanding of the construct-related validity of assessment center dimension ratings beyond existing internal construct-related validity findings. Following an external construct validation approach, we examined whether linking assessment center overall dimension ratings to ratings of the same dimensions that stem from sources external to the assessment center provides evidence for construct-related validity of assessment center ratings. We used data from one laboratory assessment center sample and two field samples. External ratings of the same dimensions stemmed from assessees, assessees’ supervisors, and customers. Results converged across all three samples and showed that different dimension-same source correlations within the assessment centers were larger than same dimension-different source correlations. Moreover, confirmatory factor analyses revealed source factors but no dimension factors in the latent factor structure of overall dimension ratings from the assessment center and from external sources. Hence, consistent results across the three samples provide no support that assessment center overall dimension ratings and ratings of the same dimensions from other sources can be attributed to dimension factors. This questions arguments that assessment center overall dimension ratings should have construct-related validity.
... These demands are typically associated with the assessor-participant ratio as well as the number of competencies to be assessed. 48 Although there is no consensus in the literature regarding the specific number of competencies to be measured, as a guideline, it is recommended that between four and six competencies be measured for each behavioural simulation exercise. 49 Literature postulates that lowering the assessor-participant ratio as well as minimising the number of dimensions to be assessed, could lower the cognitive demands placed on the assessor. ...
Article
Full-text available
The South African Special Forces is a grouping of highly trained, motivated and dedicated soldiers who execute specialised tasks that ordinary infantry soldiers are not trained or required to conduct. The milieu in which Special Forces operators function is notoriously challenging as these forces could deploy for a few days or several months or longer in any type of environment. It is therefore essential that the correct candidates be selected to function in these environments. The aim of the officer’s potential assessment (OPA) is thus to select candidates with the physical, cognitive, emotional and psychological fitness to be trained as South African Special Forces operators and officers. The study on which this article is based, explored the development of the behavioural assessments during the South African Special Forces officers’ selection process as a method and model for the review and design of assessment centres from a holistic, detailed perspective. Keywords: Behavioural observation, Special Forces, assessment centres, rating scales, officers
... As such, we expect that intelligence would be a stronger predictor of accuracy in low-structure interviews (or other assessment contexts) as opposed to high-structure interviews. Likewise, in assessment centre judgments, information processing loads are higher than in interviews because multiple candidates are judged simultaneously, often on multiple dimensions, and also in varying situations (Melchers, Kleinmann, & Prinz, 2010;Melchers, Meyer, & Kleinmann, 2008). More complex judgment tasks may increase difficulty of detection and use of multiple cues. ...
Article
Full-text available
In light of the pivotal importance of judgments and ratings in human resource management (HRM) settings, a better understanding of the individual differences associated with being a good judge is sorely needed. This review provides an overview of individual difference characteristics that have been associated with the accurate judges in HRM. We review empirical findings over more than 80 years to identify what we know and do not know about the individual difference correlates of being an accurate judge. Overall, findings suggest that judges’ cognitive factors show stronger and more consistent relationships with rating accuracy than personality-related factors. Specific intelligences in the social cognition domain, such as dispositional reasoning (complex understanding of traits, behaviors and a situation’s potential to manifest traits into behaviors) show particular promise to help understanding what makes an accurate judge. Importantly, our review also highlights the scarcity of research on HRM context (selection vs. performance appraisal settings) and judges’ motivation to distort ratings. To guide future research, we present a model that links assessor constructs to key processes required for accurate judgment and ratings in HRM contexts. The discussion suggests twenty questions for future work in this field.
... As the majority of these active strategies appear to be both idiosyncratically applied and designed to reduce effort rather than optimize performance, the results of this study support earlier research suggesting that rating demands may contribute to rater error. 13,15,27,28 In particular, raters reported engaging in a process of prioritization/selection and/or simplification. They actively reduced intrinsic load (i.e., rating demands) by focusing on the dimensions that they felt were most important when all could not be considered simultaneously. ...
Article
Theory: Assessment of clinical competence is a complex cognitive task with many mental demands often imposed on raters unintentionally. We were interested in whether this burden might contribute to well-described limitations in assessment judgments. In this study we examine the effect on indicators of rating quality of asking raters to (a) consider multiple competencies and (b) attend to multiple issues. In addition, we explored the cognitive strategies raters engage when asked to consider multiple competencies simultaneously. Hypotheses: We hypothesized that indications of rating quality (e.g., interrater reliability) would decline as the number of dimensions raters are expected to consider increases. Method: Experienced faculty examiners rated prerecorded clinical performances within a 2 (number of dimensions) × 2 (presence of distracting task) × 3 (number of videos) factorial design. Half of the participants were asked to rate 7 dimensions of performance (7D), and half were asked to rate only 2 (2D). The second factor involved the requirement (or lack thereof) to rate the performance of actors participating in the simulation. We calculated the interrater reliability of the scores assigned and counted the number of relevant behaviors participants identified as informing their ratings. Second, we analyzed data from semistructured posttask interviews to explore the rater strategies associated with rating under conditions designed to broaden raters' focus. Results: Generalizability analyses revealed that the 2D group achieved higher interrater reliability relative to the 7D group (G = .56 and .42, respectively, when the average of 10 raters is calculated). The requirement to complete an additional rating task did not have an effect. Using the 2 dimensions common to both groups, an analysis of variance revealed that participants who were asked to rate only 2 dimensions identified more behaviors of relevance to the focal dimensions than those asked to rate 7 dimensions: procedural skill = 36.2%, 95% confidence interval (CI) [32.5, 40.0] versus 23.5%, 95% CI [20.8, 26.3], respectively; history gathering = 38.6%, 95% CI [33.5, 42.9] versus 24.0%, 95% CI [21.1, 26.9], respectively; ps < .05. During posttask interviews, raters identified many sources of cognitive load and idiosyncratic cognitive strategies used to reduce cognitive load during the rating task. Conclusions: As intrinsic rating demands increase, indicators of rating quality decline. The strategies that raters engage when asked to rate many dimensions simultaneously are varied and appear to yield idiosyncratic efforts to reduce cognitive effort, which may affect the degree to which raters make judgments based on comparable information.
... Findings from previous studies imply that increased cognitive demands are associated with lower accuracy of the ratings obtained (cf. Melchers, Kleinmann, & Prinz, 2010). Therefore, in exercises with higher cognitive demands, assessor expertise as well as increasing the number of assessors and aggregating their ratings might be even more important than in a presentation exercise. ...
Article
Full-text available
We investigated the effects of assessor team size on the accuracy of ratings in a presentation exercise as it is commonly used in assessment centers and compared it to the effects of two factors related to assessor expertise (assessor training and assessor background). On the basis of actual ratings from a simulated selection setting (N = 383), we sampled assessor teams of different sizes and with different expertise and determined the accuracy of their ratings in the presentation exercise. Of the three factors, assessor training had the strongest effect on rating accuracy. Furthermore, in most conditions, using larger assessor teams also led to more accurate ratings. In addition, the use of larger assessor teams compensated for having not attended an assessor training only when the assessors had a psychological background. Concerning assessor background, we did not find a significant main effect. Practical implications and directions for future research are discussed.
... For example, Schollaert and Lievens (2011) noted that role player prompts appear to facilitate better dimension measurement. Similarly, Melchers, Kleinmann, and Prinz (2010) noted that having to simultaneously rate multiple individuals has a detrimental impact on ratings. Similarly, current work on task-based assessment centers has also shown some potential (e.g., Jackson, Stillman, &Englert, 2010) as has recent research into parallel forms of assessment center exercises (Brummel, Rupp, & Spain, 2009). ...
Article
Full-text available
Recent Monte Carlo research (Lance, Woehr, & Meade, 2007) has questioned the primary analytical tool used toassess the construct-related validity of assessment center post-exercise dimension ratings (PEDRs) – aconfirmatory factor analysis of a multitrait-multimethod (MTMM) matrix. By utilizing a hybrid of Monte Carlodata generation and univariate generalizability theory, we examined three primary sources of variance (i.e.,persons, dimensions, and exercises) and their interactions in 23 previously published assessment center MTMMmatrices. Overall, the person, dimension, and person by dimension effects accounted for a combined 34.06% ofvariance in assessment center PEDRs (16.83%, 4.02%, and 13.21%, respectively). However, the largest singleeffect came from the person by exercise interaction (21.83%). Implications and suggestions for futureassessment center research and design are discussed.
... are predictor methods that present the applicant with a variety of activities or exercises (e.g., inbox, role play, leaderless group projects) designed to assess multiple KSAOs. Despite several decades of activity, research on assessment center construct validity continues to sort through questions of what assessment centers actually measure and how to enhance their value (Dilchert & Ones 2009, Gibbons & Rupp 2009, Hoffman & Meade 2012, Hoffman et al. 2011, Jones & Born 2008, Meriac et al. 2009, Putka & Hoffman 2013Schollaert & Lievens 2012), reduce demands on assessors (Melchers et al. 2010), construct parallel forms of exercises (Brummel et al. 2009), and use technology for assessment exercise delivery (Lievens et al. 2010). We conclude that assessment center research remains vibrant and that many of the products of this effort have applicability to other selection methods (e.g., one can have parallel versions of interviews or SJTs, the demands on evaluators in interviews have similar influences as those on assessors, and construct validity questions similarly occur for SJTs and interviews). ...
Article
Full-text available
Over 100 years of psychological research on employee selection has yielded many advances, but the field continues to tackle controversies and challenging problems, revisit once-settled topics, and expand its borders. This review discusses recent advances in designing, implementing, and evaluating selection systems. Key trends such as expanding the criterion space, improving situational judgment tests, and tackling socially desirable responding are discussed. Particular attention is paid to the ways in which technology has substantially altered the selection research and practice landscape. Other areas where practice lacks a research base are noted, and directions for future research are discussed. Expected final online publication date for the Annual Review of Psychology Volume 65 is January 03, 2014. Please see http://www.annualreviews.org/catalog/pubdates.aspx for revised estimates.
... (Chen, Wu, & Leung, 2011;Melchers, Kleinmann, & Prinz, 2010;Murphy & Cleveland, 1995). Traditionally, the term performance appraisal referred to the process of a manager completing an annual report on a subordinate's performance and usually discussing it with the subordinate in an appraisal interview. ...
Article
Full-text available
Much of the prior research investigating the influence of cultural values on performance ratings has focused either on conducting cross-national comparisons among raters or using cultural level individualism/collectivism scales to measure the effects of cultural values on performance ratings. Recent research has shown that there is considerable within country variation in cultural values, i.e. people in one country can be more individualistic or collectivistic in nature. Taking the latter perspective, the present study used Markus and Kitayama's (1991) conceptualization of independent and interdependent self-construals as measures of individual variations in cultural values to investigate within culture variations in performance ratings. Results suggest that rater self-construal has a significant influence on overall performance evaluations; specifically, raters with a highly interdependent self-construal tend to show a preference for interdependent ratees, whereas raters high on independent self-construal do not show a preference for specific type of ratees when making overall performance evaluations. Although rater self-construal significantly influenced overall performance evaluations, no such effects were observed for specific dimension ratings. Implications of these results for performance appraisal research and practice are discussed.
... The results of the study hold implications for improving construct validity of the LADC through improved AC design. Construct validity of ACs depends primarily on the accuracy of ratings and thus assessors' ability to process complex cognitive information inherent to the AC process (Kolk et al., 2004;Melchers et al., 2010;Moses, 2008;Thornton and Krause, 2009). In the LADC the assessors' cognitive load was affected by the large number of competency dimensions. ...
Article
The issue of construct validity has become a contentious issue in the study of assessment centres around the world. The purpose of the study was to investigate the construct validity of assessment centre dimension ratings through correlation and factor analysis. The sample consisted of 138 individuals who participated in a two-day assessment centre for selection as partners/directors in an auditing firm. Twenty-one dimensions were measured using six different exercises. Both correlation and factor analysis results showed no evidence of discriminant validity amongst dimensions measured in the simulation and interview exercises. Convergent validity among some dimensions was found only in one of the simulation exercises. Implications for assessment centre design and research are discussed.
... Therefore, it would be advantageous if more South African organizations would consider this important moderator variable of an AC's construct validity. Additionally, a recent study supported that assessors judgment in group discussions is more accurate if assessors have to observe only a few candidates (instead of a large number of candidates) per exercise (Melchers, Kleinmann, & Prinz, 2010). The observation of fewer candidates leads to higher construct and criterion-related validity. ...
Article
Despite the popularity of assessment centers (AC) in South Africa, no recent study exists that describes AC practices in that region. Given this research gap, we conducted a survey study that analyzes the development, execution, and evaluation of ACs in N=43 South African organizations. We report findings regarding AC design, job analysis and job requirements assessed, target groups and positions of the participants after the AC, number and kind of exercises used, additional diagnostic methods used, assessors and characteristics considered in constitution of the assessor pool, observational systems and rotation plan, characteristics, contents, and methods of assessor training, types of information provided to participants, data integration process, use of self‐ and peer‐rating, characteristics of the feedback process, and features after the AC. Finally, we compare the results with professional suggestions to identify pros and cons in current South African AC practices and offer suggestions for improvement.
Book
Full-text available
شایستگی مدیران
Article
Full-text available
This meta‐analysis tested a series of moderators of sex‐ and race‐based subgroup differences using assessment center (AC) field data. We found that sex‐based subgroup differences favoring female assessees were smaller among studies that reported: combining AC scores with other tests to compute overall assessment ratings, lower mean correlations between rating dimensions, using more than one assessor to rate assessees in exercises, and providing assessor training. In contrast, we found larger sex‐based subgroup differences favoring female assessees among studies that reported: lower proportions of females in assessee pools, conducting a job analysis to design the AC, and using multiple observations of AC dimensions across exercises. We also observed a polynomial effect showing that subgroup differences most strongly favored female assessees in jobs with the highest and lowest rates of female incumbents. We found race‐based subgroup differences favoring White assessees were smaller on less cognitively loaded rating dimensions and for jobs with lower rates of Black incumbents. Studies reporting greater overall methodological rigor also showed smaller subgroup differences favoring White assessees. Regarding specific rigor features, studies reporting use of highly qualified assessors and integrating dimension ratings from separate exercises into overall dimension scores showed significantly lower differences favoring White assessees.
Article
Full-text available
Previous studies have found that factors that improved assessment center (AC) construct-related validity also had beneficial effects on criterion-related validity. However, some factors might have diverging effects on construct- and criterion-related validity. Accordingly, we followed recent calls to evaluate construct- and criterion-related validity of ACs simultaneously by examining the effects of exercise similarity on both aspects of validity within a single study. Data were collected in an AC (N = 92) that consisted of two different types of exercises. Convergent validity was better for similar exercises than it was for dissimilar exercises. However, regarding criterion-related validity, we did not find differences between similar and dissimilar exercises. Hence, this study revealed that improvements in AC construct-related validity are not necessarily paralleled by improvements in criterion-related validity.
Article
Full-text available
This study sought to add to the literature on the validity of Assessment centers (ACs) by first examining the factorial structure emerging from observers' dimension ratings and then examining their predictive validity using a performance criterion often unavailable to researchers—performance-based bonus pay-ment. A series of ACs specially designed for the selection of candidates for entry-mid tier management positions in a large financial corporate (n = 180) was used as the sampling frame. For candidates who were promoted to managerial position we gathered bonus information within 6 -12 months of their pro-motion (n = 75). The dimension ratings and factorial structure of the AC were examined to reveal a 2-factor structure pertaining to cognitive and interpersonal aspects of performance. Both the original di-mensions and the two factorial grades showed moderate predictive validity using performance-based bo-nus as the criterion: The 'organizational commitment' dimension best predicted bonus payment (r = .38; p < .01) and the interpersonal factorial grade best predicted bonus (standardized b = .22 p < .01), followed by the cognitive factor, after controlling for gender and tenure. The theoretical and practical implications of the findings are briefly discussed.
Article
The present study updates Woehr and Huffcutt's (1994) rater training meta-analysis and demonstrates that frame-of-reference (FOR) training is an effective method of improving rating accuracy. The current meta-analysis includes over four times as many studies as included in the Woehr and Huffcutt meta-analysis and also provides a snapshot of current rater training studies. The present meta-analysis also extends the previous meta-analysis by showing that not all operationalizations of accuracy are equally improved by FOR training; Borman's differential accuracy appears to be the most improved by FOR training, along with behavioural accuracy, which provides a snapshot into the cognitive processes of the raters. We also investigate the extent to which FOR training protocols differ, the implications of protocol differences, and if the criteria of interest to FOR researchers have changed over time.
Article
Full-text available
Orientation: The use of assessment centres (ACs) has drastically increased over the past decade. However, ACs are constantly confronted with the lack of construct validity. One aspect of ACs that could improve the construct validity significantly is that of assessor training. Unfortunately untrained or poorly trained assessors are often used in AC processes. Research purpose: The purpose of this research was to evaluate a frame-of-reference (FOR) programme to train intern psychometrists as assessors at an assessment centre. Motivation of study: The role of an assessor is important in an AC; therefore it is vital for an assessor to be able to evaluate and observe candidates’ behaviour adequately. Commencing with this training in a graduate psychometrist programme gives the added benefit of sending skilled psychometrists to the workplace. Research design, approach and method: A quantitative research approach was implemented, utilising a randomised pre-test-post-test comparison group design. Industrial Psychology postgraduate students (N = 22) at a South African university were used and divided into an experimental group (n = 11) and control group (n = 11). Three typical AC simulations were utilised as pre- and post-tests, and the ratings obtained from both groups were statistically analysed to determine the effect of the FOR training programme. Main findings: The data indicated that there was a significant increase in the familiarity of the participants with the one-on-one simulation and the group discussion simulation. Practical/managerial implications: Training intern psychometrists in a FOR programme could assist organisations in the appointment of more competent assessors. Contribution/value-add: To design an assessor training programme using FOR training for intern psychometrists in the South African context, specifically by incorporating this programme into the training programme for Honours students at universities.
Chapter
Full-text available
Die einzelnen Phasen der Personalauswahl kennen lernen. Verstehen, was Personalmarketing so bedeutsam macht. Erfahren, dass eine gute Passung zwischen Bewerber und zu besetzender Stelle den späteren Berufserfolg und die Arbeitszufriedenheit erhöht. Erkennen, dass die Durchführung komplexer Auswahlverfahren von diagnostischen Laien unter Umständen problematisch ist und wodurch dies umgangen werden kann. Verstehen, warum qualitativ hochwertige Personalauswahlverfahren wichtig sind.
Article
Full-text available
Attentional limits on perception and memory were measured by the decline in performance with increasing numbers of objects in a display. Multiple objects were presented to Ss who discriminated visual attributes. In a representative condition, 4 lines were briefly presented followed by a single line in 1 of the same locations. Ss were required to judge if the single line in the 2nd display was longer or shorter than the line in the corresponding location of the 1st display. The length difference threshold was calculated as a function of the number of objects. The difference thresholds doubled when the number of objects was increased from 1 to 4. This effect was generalized in several ways, and nonattentional explanations were ruled out. Further analyses showed that the attentional processes must share information from at least 4 objects and can be described by a simple model.
Article
Full-text available
Investigated the spatial extent of attention to visually presented letters and words using a probe technique in 2 studies with 135 undergraduates. The primary task required Ss to categorize (a) 5-letter words, or to categorize the middle letter of (b) 5-letter words or (c) 5-letter nonwords. The probe task required Ss to respond when the "7" appeared in 1 of the 5 letter positions. Probe trials were inserted at the onset of letter and word processing in Exp I and 500 msec after letter and word processing in Exp II. In both experiments, probe trials produced a V-shaped function of RTs across probe positions for the letter-categorization task for word and nonword stimulus conditions. In contrast, a relatively flat RT function was found for the word-categorization tasks. Data suggest that the spotlight width in the letter tasks is 1 letter space and that the spotlight width in the word task is typically 5 spaces. (16 ref)
Article
Full-text available
This study compares the effects of data-driven assessor training with schema-driven assessor training and control training. The sample consisted of 229 industrial and organizational psychology students and 161 managers who were randomly assigned to 1 of these training strategies. Participants observed and rated candidates in an assessment center exercise. The data-driven and schema-driver assessor training approaches outperformed the control training on all 3 dependent variables. The schema-driven assessor training resulted in the largest values of interrater reliability, dimension differentiation, and accuracy. Managers provided significantly more accurate ratings than students but distinguished less between the dimensions. Practical implications regarding the design of assessor trainings and the composition of assessor teams are proposed.
Article
Full-text available
We contrasted 2 competing interpretations of assessment center (AC) exercise effects on postexercise dimension ratings in 3 independent samples using a quasi-multitrait-multimethod framework. The (traditional) method bias interpretation is that exercise effects represent sources of systematic but invalid variance that compromise the construct validity of AC ratings. The situational specificity interpretation is that exercise effects reflect true cross-situational specificity in AC performance and thus, sources of valid variance in AC performance. Significant correlations between latent exercise factors and external correlates of AC performance supported the situational specificity interpretation. Findings are discussed as they help reconcile the apparently contradictory findings that ACs have demonstrated criterion-related but not construct validity.
Article
Full-text available
This study reviews prior construct-related validity research in assessment centres. Special focus is placed on disentangling possible explanations for the construct-related validity findings. The conclusion is that we now have a much better picture of the reasons behind the construct-related validity findings. Careful assessment centre design and high interrater reliability among assessors seem necessary albeit insufficient conditions to establish assessment centre construct-related validity. The nature of candidate performances is another key factor. This study next discusses how these empirical findings have changed how assessment centres are conceptualized (theoretical advancements framed in the application of trait activation theory), analysed (methodological advancements), and designed (practical advancements).
Article
Full-text available
Meta-analysis (Hunter, Schmidt, & Jackson, 1982) of 50 assessment center studies containing 107 validity coefficients revealed a corrected mean and variance of .37 and .017, respectively. Validities were sorted into five categories of criteria and four categories of assessment purpose. Higher validities were found in studies in which potential ratings were the criterion, and lower validities were found in promotion studies. Sufficient variance remained after correcting for artifacts to justify searching for moderators. Validities were higher when the percentage of female assessees was high, when several evaluation devices were used, when assessors were psychologists rather than managers, when peer evaluation was used, and when the study was methodologically sound. Age of assessees, whether feedback was given, days of assessor training, days of observation, percentages of minority assessees, and criterion contamination did not moderate assessment center validities. The findings suggest that assessment centers show both validity generalization and situational specificity. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Issues common to both the process of building psychological theories and validating personnel decisions are examined. Inferences linking psychological constructs and operational measures of constructs are organized into a conceptual framework, and validation is characterized as the process of accumulating various forms of judgmental and empirical evidence to support these inferences. The traditional concepts of construct-, and content-, and criterion-related validity are unified within this framework. This unified view of validity is then contrasted with more conventional views (e.g., Uniform Guidelines, 1978), and misconceptions about the validation of employment tests are examined. Next, the process of validating predictor constructs is extended to delineate the critical inferences unique to validating performance criteria. Finally, an agenda for programmatic personnel selection research is described, emphasizing a shift in the behavioral scientist's role in the personnel selection process. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Videotaping of assessment center exercises has become an increasingly common practice, yet little is known about the impact of video technology on rating accuracy. This study compared ratings of a group discussion made after live observation (direct), after viewing a video (indirect), or after viewing a video with opportunities to pause and rewind (controlled). Results indicated some differences in observational accuracy but not in rating accuracy. Implications for the use of video technology in assessment centers are discussed. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Assessment center ratings of eight abilities from each of five situational exercises were examined for their cross-situational consistency and discriminant validity. A series of confirmatory factor analyses revealed that the ratings were largely (if not totally) situation specific, and that assessors failed to distinguish among the eight target abilities. These results combined with previous research suggest that the assessment center method measures mainly situation-specific performance, not cross-situational managerial abilities. We suggest that the intended constructs might be better measured if more ability-related behaviors were elicited within each exercise and if the cognitive demands placed on assessors were reduced. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Construes performance appraisal as the outcome of a dual-process system of evaluation and decision making whereby attention, categorization, recall, and information integration are carried out through either an automatic or a controlled process. In the automatic process, an employee's behavior is categorized without conscious monitoring unless the decisions involved are problematic; a consciously monitored categorization process would then occur. Subsequent recall of the employee is viewed to be biased by the attributes of prototypes (abstract images) representing categories to which the employee has been assigned. Dispositional and contextual factors influence the availability of categories during both assignment and recall. Although automatic and controlled processes can create accurate employee evaluations, categorization interacting with task type tends to affect subsequent employee information with halo, lenient/stringent, racial, sexual, ethnic, and personality biases. Behavior taxonomies, individual differences in cognitive structure, validation of behavior-sampling techniques, and laboratory studies of appraisal processes are presented as potential topics for research. (93 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
We examined methodological and theoretical issues related to accuracy measures used as criteria in performance-rating research. First, we argued that existing operational definitions of accuracy are not all based on a common accuracy definition; we report data that show generally weak relations among different accuracy operational definitions. Second, different methods of true score development are also examined, and both methodological and theoretical limitations are explored. Given the difficulty of obtaining true scores, criteria are discussed for examining the suitability of expert ratings as surrogate true score measures. Last, the usefulness of using accuracy measures in performance-rating research is examined to highlight situations in which accuracy measures might be desirable criterion measures in rating research. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Provides an integration and a quantitative review of the rater training literature. A general framework for the evaluation of rater training is presented in terms of 4 rating training strategies (rater error training, performance dimension training, frame-of-reference training, and behavioral observation training) and 4 dependent measures (halo, leniency, rating accuracy and observational accuracy). A meta-analytic review is presented to assess the effectiveness of the rater training strategies across the 4 dependent measures. Each of the 4 rater training strategies appeared to be at least moderately effective in addressing the aspect of performance ratings that it was designed to address. In most cases, each of the 4 training strategies resulted in positive effects on all of the 4 dependent measures. The effects of rater training on rating accuracy appeared to be moderated by the nature of the specific error training approach. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Extends research on the cognitive mechanisms underlying frame-of-reference (FOR) rater training by examining the impact of FOR training on the recall of performance information. It was hypothesized that the shared performance schema fostered by FOR training would serve as the basis for information processing, resulting in better recall for behavioral performance information as well as more accurate ratings of individual ratees. 174 FOR-trained Ss produced more accurate performance ratings, as measured by L. Cronbach's (1955) differential accuracy and differential elevation components, than did 142 control-trained Ss. FOR-trained Ss also recalled more behaviors, representing more performance dimensions, and exhibited less evaluative clustering and a larger relationship between memory and judgment. No differences were found between control and FOR Ss on measures of recognition accuracy. Implications for the evaluative judgment process are discussed. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Undergraduates ( N = 131) were trained as assessors, who evaluated the performance of confederates in an assessment center simulation on 3, 6, or 9 dimensions. Number of dimensions significantly affected some assessment center judgments but not others. Ss who rated a small number of dimensions classified behaviors more accurately and made more accurate ratings than did Ss who rated a large number of dimensions. Number of dimensions did not affect the accuracy of assessors' observations nor the discriminant validity of their dimension ratings. Given these results and the findings of others (e.g., J. R. Hinrichs and S. Haanpera; see record 1978-20114-001), developers of assessment centers should limit the cognitive demands placed on assessors by, for example, minimizing the number of dimensions assessors are required to process. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
A u t h o r ' s p e r s o n a l c o p y Abstract A mathematical model of working-memory capacity limits is proposed on the key assumption of mutual interference between items in working memory. Interference is assumed to arise from overwriting of features shared by these items. The model was fit to time-accuracy data of memory-updating tasks from four experiments using nonlinear mixed effect (NLME) models as a framework. The model gave a good account of the data from a numerical and a spatial task ver-sion. The performance pattern in a combination of numerical and spatial updating could be explained by variations in the interference parameter: assuming less feature overlap between contents from different domains than between con-tents from the same domain, the model can account for double dissociations of content domains in dual-task experi-ments. Experiment 3 extended this idea to similarity within the verbal domain. The decline of memory accuracy with increasing memory load was steeper with phonologically similar than with dissimilar material, although processing speed was faster for the similar material. The model captured the similarity effects with a higher estimated interference parameter for the similar than for the dissimilar condition. The results are difficult to explain with alternative models, in particular models incorporating time-based decay and models assuming limited resource pools.
Article
Full-text available
This study provides an investigation of the nomological net for the seven primary assessment center (AC) dimensions identified by Arthur, Day, McNelly, and Eden (Personnel Psychology, 56, 125–154, 2003). In doing so, the authors provide the first robust estimates of the relationships between all primary AC dimensions with cognitive ability and the Big 5 factors of personality. Additionally, intercorrelations between AC dimensions based on sample sizes much larger than those previously available in the meta-analytic literature are presented. Data were obtained from two large managerial samples (total N=4985). Primary data on AC dimensions, personality, and cognitive ability interrelationships were subsequently integrated with meta-analytic data to estimate incremental validity for optimally and unit-weighted AC dimension composites as well as overall AC ratings over psychometric tests of personality and cognitive ability. Results show that unit- and optimally weighted composites of construct-based AC dimensions add incremental validity over tests of personality and cognitive ability, while overall AC ratings (including those obtained using subjective methods of data combination) do not.
Article
Full-text available
This study aims to shed light on possible problems of assessment center users and designers when developing and implementing assessment centers. Semi-structured interviews with a representative sample of assessment center users in Flanders revealed that, besides a large variability in assessment center practice, practitioners experience problems with dimension selection and definition, exercise design, line/staff managers as assessors, distinguishing between observation and evaluation, and with the content of assessor training programs. Solutions for these problems are suggested.
Article
Full-text available
Both tests of cognitive ability and assessment center (AC) ratings of various performance attributes have proven useful in personnel selection and promotion contexts. To be of theoretical or practical value, however, the AC method must show incremental predictive accuracy over cognitive ability tests given the cost disparities between the two predictors. In the present study, we investigated this issue in the context of promotion of managers in German police departments into a training academy for high-level executive positions. Candidates completed a set of cognitive ability tests and a 2-day AC. The criterion measure was the final grade at the police academy. Results indicated that AC ratings of managerial abilities were important predictors of training success, even after accounting for cognitive ability test scores. These results confirm that AC ratings provide unique contribution to the understanding and prediction of training performance of high-level executive positions beyond cognitive ability tests.
Article
Full-text available
Research indicates that assessment center (AC) ratings typically demonstrate poor construct validity; that is, they do not measure the intended dimensions of managerial performance (e.g., Sackett & Harris, 1988). The purpose of this study was to investigate the construct validity of dimension ratings from a developmental assessment center (N=102), using multitrait-multimethod analysis and factor analysis. The relationships between AC ratings, job performance ratings, and personality measures also were investigated. Results indicate that the AC ratings failed to demonstrate construct validity. The ratings did not show the expected relationships with the job performance and personality measures. Additionally, the factors underlying these ratings were found to be the AC exercises, rather than the managerial dimensions as expected. Potentially, this lack of construct validity of the dimension ratings is a serious problem for a developmental assessment center. There is little evidence that the managerial weaknesses identified by the AC are the dimensions that actually need to be improved on the job. Methods are discussed for improving the construct validity of AC ratings, for example, by decreasing the cognitive demands on the assessors.
Article
Full-text available
This paper presents a model of performance appraisal which focuses on the cognitive processes employed by a rater attempting to form an evaluation. The model describes the method by which a rater collects, encodes, stores, and later retrieves information from memory, and the method by which he or she weights and combines this information to form an evaluation which is converted to a rating on a scale. The model is based on diverse bodies of literature which share a social-cognitive orientation, and it forms the foundation for a number of testable research propositions.
Article
Full-text available
One possible reason for the continued neglect of statistical power analysis in research in the behavioral sciences is the inaccessibility of or difficulty with the standard material. A convenient, although not comprehensive, presentation of required sample sizes is provided. Effect-size indexes and conventional values for these are given for operationally defined small, medium, and large effects. The sample sizes necessary for .80 power to detect effects at these levels are tabled for 8 standard statistical tests: (1) the difference between independent means, (2) the significance of a product-moment correlation, (3) the difference between independent rs, (4) the sign test, (5) the difference between independent proportions, (6) chi-square tests for goodness of fit and contingency tables, (7) 1-way analysis of variance (ANOVA), and (8) the significance of a multiple or multiple partial correlation.
Article
Full-text available
Eight experiments with the complex span paradigm are presented to investigate why concurrent processing disrupts short-term retention. Increasing the pace of the processing task led to worse recall, supporting the hypothesis that the processing task distracts attention from maintenance operations. Neither phonological nor semantic similarity between memory items and processing-task material impaired memory. In contrast, the degree of phonological overlap between memory items and processing-task material affected recall negatively, supporting feature overwriting as one source of interference in the complex span paradigm. When compared directly, phonological overlap impaired memory, but similarity had a beneficial effect. These findings rule out response competition or confusion as a mechanism of interference between storage and processing.
Article
Full-text available
This study investigates the incremental variance in job performance explained by assessment center (AC) dimensions over and above personality and cognitive ability. The authors extend previous research by using meta-analysis to examine the relationships between AC dimensions, personality, cognitive ability, and job performance. The results indicate that the 7 summary AC dimensions postulated by W. Arthur, Jr., E. A. Day, T. L. McNelly, & P. S. Edens (2003) are distinguishable from popular individual difference constructs and explain a sizeable proportion of variance in job performance beyond cognitive ability and personality.
Article
This survey of public sector police and fire chiefs and human resources professionals disclosed increasing use of the assessment center method. It also disclosed several serious flaws in the assessment centers used in the public sector. Job analyses were not always required, validation was reported lacking or inappropriate, assessors were not always properly trained, and feedback to and from participants was not invariably provided.
Article
Issues common to both the process of building psychological theories and validating personnel decisions are examined. Inferences linking psychological constructs and operational measures of constructs are organized into a conceptual framework, and validation is characterized as the process of accumulating various forms of judgmental and empirical evidence to support these inferences. The traditional concepts of construct-, content-, and criterion-related validity are unified within this framework. This unified view of validity is then contrasted with more conventional views (e.g., Uniform Guidelines, 1978), and misconceptions about the validation of employment tests are examined. Next, the process of validating predictor constructs is extended to delineate the critical inferences unique to validating performance criteria. Finally, an agenda for programmatic personnel selection research is described, emphasizing a shift in the behavioral scientist's role in the personnel selection process.
Article
Working memory capacity was differentiated along functional and content-related facets. Twenty-four tasks were constructed to operationalize the cells of the proposed taxonomy. We tested 133 university students with the new tasks, together with six working memory marker tasks. With structural equation models, three working memory functions could be distinguished: Simultaneous storage and processing, supervision, and coordination of elements into structures. Each function was further subdivided into distinct components of variance. On the content dimension, evidence for a dissociation between verbal-numerical working memory and spatial working memory was comparatively weak.
Article
The authors investigated temporal trends in the validity of an assessment center consisting of a group discussion and an analysis-presentation exercise for predicting career advancement as measured by average salary growth over a 7-year period in a sample of 679 academic graduates. The validity of the overall assessment rating for persons with tenure of 7 years, corrected for initial differences in starting salaries and restriction in range, was .39. There was a considerable time variation in the validity of both the overall assessment rating and the assessment center dimensions. In accordance with findings from research in managerial effectiveness and development, the interpersonal effectiveness dimension became valid only after a number of years, whereas the firmness dimension was predictive in the whole period and increased with time. For comparison, validity trends for 2 types of interviews and a mental test were also studied.
Article
This study notes that the lack of convergent and discriminant validity of assessment center ratings in the presence of content-related and criterion-related validity is paradoxical within a unitarian framework of validity. It also empirically demonstrates an application of generalizability theory to examining the convergent and discriminant validity of assessment center dimensional ratings. Generalizability analyses indicated that person, dimension, and person by dimension effects contribute large proportions of variance to the total variance in assessment center ratings. Alternately, exercise, rater, person by exercise, and dimension by exercise effects are shown to contribute little to the total variance. Correlational and confirmatory factor analyses results were consistent with the generalizability results. This provides strong evidence for the convergent and discriminant validity of the assessment center dimension ratings – a finding consistent with the conceptual underpinnings of the unitarian view of validity and inconsistent with previously reported results. Implications for future research and practice are discussed.
Article
This study notes that the lack of convergent and discriminant validity of assessment center ratings in the presence of content-related and criterion-related validity is paradoxical within a unitarian framework of validity. It also empirically demonstrates an application of generalizability theory to examining the convergent and discriminant validity of assessment center dimensional ratings. Generalizability analyses indicated that person, dimension, and person by dimension effects contribute large proportions of variance to the total variance in assessment center ratings. Alternately, exercise, rater, person by exercise, and dimension by exercise effects are shown to contribute little to the total variance. Correlational and confirmatory factor analyses results were consistent with the generalizability results. This provides strong evidence for the convergent and discriminant validity of the assessment center dimension ratings–a finding consistent with the conceptual underpinnings of the unitarian view of validity and inconsistent with previously reported results. Implications for future research and practice are discussed.
Article
In their meta-analysis on the construct validity of assessment centers (ACs), Woehr and Arthur (2003) found that within-dimension ratings of candidates' performance had better construct validity than within exercise ratings. However, this finding conflicts with previous experimental studies. In these studies, comparable results were only obtained when the rating approach was confounded with other factors, which are also likely to affect construct validity but which were not taken into account by Woehr and Arthur. In the present investigation, we identified factors that are often associated with a within-dimension rating approach. Most of these factors were also confounded with the rating approach in the studies from Woehr and Arthur's meta-analysis. Meta-analytic moderator analyses showed that the time at which assessors rated candidates, the exchange of information between assessors, and assessor rotation moderate AC construct validity. The effects of the rating approach can probably be attributed to these confounding factors.
Article
Leadership is an ill-defined complex construct, difficult if not impossible to measure in assessment centers. Current research is cited to support this view. Using work samples and measuring how well the work is accomplished is suggested as a way to avoid the problems associated with assessment of leadership and other constructs. The proposed process gets directly to the candidate's job qualifications. It avoids the problems of construct validation and classification of behaviors into interrelated and complex constructs such as leadership.
Article
In the present study, we provide a systematic review of the assessment center literature with respect to specific design and methodological characteristics that potentially moderate the construct-related validity of assessment center ratings. We also conducted a meta-analysis of the relationship between these characteristics and construct-related validity outcomes. Results for rating approach, assessor occupation, assessor training, and length of assessor training were in the predicted direction such that a higher level of convergent, and lower level of discriminant validity were obtained for the across-exercise compared to the within-exercise rating method; psychologists compared to managers/supervisors as assessors; assessor training compared no assessor training; and longer compared to shorter assessor training. Partial support was also obtained for the effects of the number of dimensions and assessment center purpose. Our review also indicated that relatively few studies have examined both construct-related and criterion-related validity simultaneously. Furthermore, these studies provided little, if any support for the view that assessment center ratings lack construct-related validity while at the same time demonstrating criterion-related validity. The implications of these findings for assessment center construct-related validity are discussed.
Chapter
Article
Interest in exercise effects commonly observed in assessment centers (ACs) has resurfaced with Lance, Lambert, Gewin, Lievens, and Conway's 200413. Lance , C. E. , Lambert , T. A. , Gewin , A. G. , Lievens , F. and Conway , J. M. 2004 . Revised estimates of dimension and exercise variance components in assessment center postexercise dimension ratings. . Journal of Applied Psychology , 89 : 377 – 385 . [CrossRef], [PubMed], [Web of Science ®]View all references study. The study presented here addressed the construct validity puzzle associated with ACs by investigating whether traditional trait-based overall assessment ratings (OARs) could be explained by behavioral performance on exercises. In a sample of 208 job applicants from a real-world AC, it was found that the multivariate combination of scores from three behavioral checklists explained around 90% (p < .001) of the variance in supposedly trait-based OARs. This study adds to the AC literature by suggesting that traditional OARs are predictive of work outcomes because they reflect exercise-specific behavioral performance rather than trait-based assessments. If this is the case, validity and efficiency are best served by abandoning redundant trait ratings (dimensions) in favor of more direct behavioral ratings.
Article
Studied the influence of independence and observability of assessment center (AC) dimensions on the construct validity of the AC. 115 normal male and female German adults (university students) (AC participants). 16 normal male and female German adults (mean age 25 yrs). Each AC participant completed 1 of 20 ACs, evaluating 4 independent dimensions (i.e., empathy, perseverance, creativity, and decisiveness) and 4 dependent dimensions (i.e., analytic skills, planning and control, persuasiveness, and assertiveness). The observability of these dimensions varied. Each participant's performance on a subset of these dimensions was rated by 4 observers. These ratings served as the basis for analyses of the convergent validity and discriminant validity of the AC exercises. (English abstract) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
When leaderless discussion participants were studied in groups of 2, 4, 6, 8, and 12, there was a significant decline in the mean leadership assessment earned by participants as the groups studied became larger in size. Maximum stratification in the absolute sense occurred in discussion groups of six. Relative stratification tended to increase directly with increases in discussion group size. Consistency of leadership behavior was at a minimum in discussion groups of 2. Beyond this point, no systematic trends were clearly discernible for behavioral consistency in relation to group size. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Discusses the dynamics operating in observing the behavior of managerial candidates in simulated exercises and in processing information for the evaluation of candidates. These dynamics are viewed from 3 perspectives: (1) information processing, (2) categorization and social cognition, and (3) group dynamics. Concepts such as categories and management behavior schema are used to explain how assessors recall information and make predictions and judgments. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Using the data from 4 previous studies of leaderless group discussion techniques, the authors computed indices representing measures of the discussions. These variables were intercorrelated. Findings indicated ways in which LGD's might be made more valid in assessing leadership potential, especially in terms of the relationship of validity to observer's ratings of different types of discussion groups. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
This investigation consisted of a questionnaire survey of the graduate selection methods used by 536 organizations in the U.K. The use of application forms for pre-selection purposes was widespread, although only a minority of organizations appeared to have approached this task in a systematic way. While references were also widely used, they were often taken up very late in the selection process, and only a minority of organizations sought specific information about job related abilities from referees. Interviews were universally used, both on their own, and as a component of an assessment centre. A total of 44 per cent of organizations used assessment centres as part of graduate recruitment. For most of those organizations using assessment centres, the interview was reported as being the most important component of the centre in determining final selection decisions. Most organizations provided some training for their selectors but this was typically general in nature, rather than being specific to graduate recruitment. The results are discussed in terms of their implications, both for practical application, and for future research.
Article
The present study examined the impact of attentional and memory demands on work performance ratings accorded men and women in traditionally male jobs. Of interest was whether sex discrimination would abate in the face of individuating and job-relevant work behavior even when the demands likely to be faced in actual work settings were taken into account. Two hundred and two subjects read a vignette depicting the work behavior of a male or female police officer and then rated the individual's work performance. The attentional demands imposed on subjects while reading the vignette and the amount of time elapsed prior to issuing the performance ratings were systematically varied. As predicted, men were evaluated more favorably than women when raters were faced with an additional task requiring attention and time pressures were made salient. Only when subjects were able to carefully allocate all of their attentional resources did sex bias in work performance ratings abate. Memory demands had no effects on work performance ratings. Gender-related work characterizations paralleled the performance ratings, providing support for the idea that sex stereotypes mediate discrimination in performance appraisal judgments. The theoretical and practical implications of these findings, as well as suggestions for future research, are discussed.
Article
Through a national survey of graduate recruitment coordinators, this study identified current recruitment and selection practices in Australia. Respondents (n=50) were mostly from private industry with about a third from the government sector, the full range of industry sectors was represented. Respondents were asked about the management of recruitment activities, methods used to communicate recruiting information, and the perceived accuracy of recruitment information. Information was also sought about the extent that job analyses were used, type of selection practices used, how applicant information was verified, the training and selection of interviewers and the effectiveness of recruitment activities. Descriptive statistics were used to provide a summary of the findings. A regression analysis was used to examine predictors of (a) recruiting effectiveness, (b) acceptance rates, and (c) unfilled vacancies. The results were compared with other studies of recruitment and selection. Future research and practical applications were discussed.
Article
Although research has established the criterion-related validity of assessment centers for selection purposes, the construct validity of dimension ratings has not been demonstrated. A quasi-experimental design was used to investigate the influence of retranslated behavior checklists on the construct validity of dimension ratings for two assessment center exercises. Assessor use of behavior checklists increased the average convergent (i.e., same dimension across exercise) validity from .24 to .43 while decreasing the average discriminant (i.e., different dimension within exercise) validity (.47 to .41). Behavior checklist sums were moderately correlated with corresponding dimension ratings and demonstrated a comparable level of construct validity. It is suggested that using behavior checklists may improve dimension construct validity by reducing the cognitive demands placed on raters.
Article
No recent survey documents assessment center (AC) practices across several countries. Therefore, we analyse AC practices in a sample of 97 organisations from nine countries in Western Europe and North America. We report findings regarding job analysis, dimensions, exercises, additional diagnostic methods, use of technology, assessor characteristics, contents and methods of assessor training, observational systems, information provided to participants, evaluation of participants’ reactions, data integration, characteristics of feedback, and features after the AC. Finally, we compare our results with prior findings to identify trends over time and point out features of ACs that could be improved. Face aux défis que soulèvent les centres d’évaluation (AC) dans les organisations internationales, nous proposons un modèle qui rend compte des variations transculturelles dans ces pratiques, variations relevant de données individuelles (la motivation et la qualification des experts en resources humaines), de conditions culturelles (le « contrôle de l’incertitude » et la « distance hiérarchique ») et de réalités institutionnelles (des differences dans le niveau officiel de collectivisme et des divergences en ce qui concerne les normes légales et les lois régissant l’emploi). Ce modèle est exploité pour expliquer les différences dans la planification, l’exécution et l’évaluation des AC dans des organisations situées dans neuf pays d’Europe de l’ouest et d’Amérique du nord. Nous mettons aussi en evidence des tendances sur le long terme dans les pratiques des AC et discutons de l’amélioration de ces pratiques et de l’orientation des futures recherches dans ce domaine.
Article
The operation of attention in the visual field has often been compared to a spotlight. We propose that a more apt analogy is that of a zoom or variable-power lens. Two experiments focused upon the following questions: (1) Can the spatial extent of the attentional focus be made to vary in response to precues? (2) As the area of the attentional focus increases, is there a decrease in processing efficiency for stimuli within the focus? (3) Is the boundary of the focus sharply demarked from the residual field, or does it show a gradual dropoff in processing resources? Subjects were required to search eight-letter circular displays for one of two target letters and reaction times were recorded. One to four adjacent display positions were precued by underlines at various stimulus onset asynchronies before display presentation. A response competition paradigm was used, in which the “other target” was used as a noise letter in noncued as well as cued locations. The results were in good agreement with the zoom lens model.
Article
Working memory capacity was differentiated along functional and content-related facets. Twenty-four tasks were constructed to operationalize the cells of the proposed taxonomy. We tested 133 university students with the new tasks, together with six working memory marker tasks. With structural equation models, three working memory functions could be distinguished: Simultaneous storage and processing, supervision, and coordination of elements into structures. Each function was further subdivided into distinct components of variance. On the content dimension, evidence for a dissociation between verbal–numerical working memory and spatial working memory was comparatively weak.
Article
One possible reason for the continued neglect of statistical power analysis in research in the behavioral sciences is the inaccessibility of or difficulty with the standard material. A convenient, although not comprehensive, presentation of required sample sizes is provided here. Effect-size indexes and conventional values for these are given for operationally defined small, medium, and large effects. The sample sizes necessary for .80 power to detect effects at these levels are tabled for eight standard statistical tests: (a) the difference between independent means, (b) the significance of a product-moment correlation, (c) the difference between independent rs, (d) the sign test, (e) the difference between independent proportions, (f) chi-square tests for goodness of fit and contingency tables, (g) one-way analysis of variance, and (h) the significance of a multiple or multiple partial correlation.