Article

A field study of performance appraisal purpose: Research versus administrative-based ratings

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Many researchers have discussed the theoretical and practical importance of rating purpose. Nevertheless, the body of empirical studies, the majority of which were conducted in a laboratory setting, focus on leniency. There has been little research on other effects of rating purpose. The present study examines 223 ratees in a field setting for whom there were both administrative-based performance appraisal ratings (which were actually used for personnel decisions) and research-based performance appraisal ratings (obtained for a validation study). Two of the hypotheses were supported; administrative ratings were more lenient than research-based ratings. The administrative-based ratings demonstrated a statistically significant relationship with ratee seniority, while the research-based ratings did not. There was mixed support for a third hypothesis: Research ratings were significantly correlated with a predictor, while the administrative ratings were not. The difference between the validity coefficients, however, was not significant. Contrary to the hypothesis, the rank order between administrative-based and research-based ratings was relatively high (r= 33).

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Several studies have lent support to these assertions. Indeed, researchers have found that ratings collected for administrative purposes are significantly higher than ratings collected for feedback or research purposes (Aleamoni & Hexner, 1980;Bernardin & Orban, 1990;Fredholm, 1998;Harris, Smith, & Champagne, 1995;Sharon & Bartlett, 1969;Zedeck & Cascio, 1982). For example, Aleamoni and Hexner (1980) found that student raters who were told that the results of their ratings would be used for salary and promotion consideration rated their instructor more favorably on all aspects of performance than students who were told that their ratings would be used to help the instructor determine the students' attitudes, interests, and opinions. ...
... For example, Aleamoni and Hexner (1980) found that student raters who were told that the results of their ratings would be used for salary and promotion consideration rated their instructor more favorably on all aspects of performance than students who were told that their ratings would be used to help the instructor determine the students' attitudes, interests, and opinions. More recently, Harris et al. (1995) found ratings of 223 production employees obtained for administrative purposes to be significantly more lenient than ratings obtained for research purposes. It is also important to note that raters exhibit greater behavioral accuracy when the perceived purpose of the rating is for developmental purposes as opposed to administrative purposes (Fredholm, 1998). ...
... These distinctions between developmental decisions and administrative decisions suggest that a particular training program may be better suited for developmental purposes, while another type of training program may be better suited for administrative purposes. According to Harris et al. (1995), the lenient ratings associated with administrative decisions reduce distinctions among ratees. These inflated ratings make it very difficult to decide who should get a raise, who should be promoted, who should be terminated, and so forth. ...
... Prior research, however, indicates that the purpose of the evaluation can impact the information acquisition and processing activities of the reviewer (Bobko and Colella 1994;Fedor and Bettenhausen 1989;Greguras et al. 2003;Harris et al. 1995;Zedeck and Cascio 1982). For example, Harris et al. (1995) find that performance evaluations with an evaluative purpose are less likely to identify negative information and more likely to be more lenient. ...
... Prior research, however, indicates that the purpose of the evaluation can impact the information acquisition and processing activities of the reviewer (Bobko and Colella 1994;Fedor and Bettenhausen 1989;Greguras et al. 2003;Harris et al. 1995;Zedeck and Cascio 1982). For example, Harris et al. (1995) find that performance evaluations with an evaluative purpose are less likely to identify negative information and more likely to be more lenient. Their findings suggest that such leniency negatively impacts the effectiveness of reviews as evaluative reviewers are more tolerant of negative performance information. ...
... Psychology research has found that reviews with a developmental purpose identified more overall performance information and resulted in higher quality evaluations than those made with an evaluative purpose (Greguras et al. 2003;Harris et al. 1995). Additionally, managerial performance evaluations (i.e., evaluations of the work of high-level employees) using a developmental purpose have more valid measurements of managerial technical skills, administrative skills, human skills, and citizenship behaviors (Scullen et al. 2003). ...
Article
The importance of engagement quality review on audit engagements has increased in recent years. The SEC has cited engagement quality reviewers for failing to adequately review audit work, and the Sarbanes-Oxley Act of 2002 requires that auditing standards be issued for engagement quality review. The purpose of this research is to examine factors that can improve engagement quality reviews across varying levels of engagement partner performance. One hundred twenty-three experienced engagement quality reviewers participated in an experiment, which varied the purpose of the review, the introduction of a detailed practice aid, and engagement partner performance. Our results show that: (1) the purpose of the review affects the engagement quality reviewers' recall of the engagement partners' performance and judgments, (2) the purpose of the review also affects the overall engagement partners' performance ratings as well as ratings within individual performance categories, (3) the introduction of a detailed practice aid prior to reviewing the engagement partner's performance improves overall recall of performance, and (4) engagement quality reviewers recall more audit risk related information when the engagement partners' performance is low. These results identify possible process improvements that may produce more effective engagement quality reviews and address questions about review objective and review processes currently being considered by the PCAOB.
... Research investigating the effects of rating purpose on rating quality generally has found that ratings made for administrative purposes are more lenient (Dobbins, Cardy, & Ituxillo, 1988;Farh, Cannella, & Bedeian, 1991;Farh & Werbel, 1986;Harris, Smith, & Champagne, 1995;Jawahar & Williams, 1997;Longenecker et al., 1987;McIntyre, Smith, & Hassett, 1984;Williams, DeNisi, Blencoe, & Cafferty, 1985), less variable (e.g., Farh et al., 1991), and less accurate (McIntyre et al., 1984) than ratings made for developmental or research purposes. These rater biases and the restricted range associated with administrative ratings likely serve to decrease interrater reliability. ...
... A few comments on the design of the current study are warranted. Similar to a study by Harris et al. (1995), all ratings for a given purpose were made after ratings for a different purpose. As such, this design cannot disentangle the effects due to time and order (Administration 1 and Administration 2) and those due to rating purpose (administrative or developmental). ...
... As such, this design cannot disentangle the effects due to time and order (Administration 1 and Administration 2) and those due to rating purpose (administrative or developmental). As Harris et al. (1995) note, the first administration of ratings may have a priming effect on the later ratings. However, there are two primary reasons why we believe this potential confound concern is minimized in the current investigation. ...
Article
Using a field sample of peers and subordinates, the current study employed generalizability theory to estimate sources of systematic variability associated with both developmental and administrative ratings (variance due to items, raters, etc.) and then used these values to estimate the dependability (i.e., reliability) of the performance ratings under various conditions. Results indicated that the combined rater and rater-by-ratee interaction effect and the residual effect were substantially larger than the person effect (i.e., object of measurement) for both rater sources across both purpose conditions. For subordinates, the person effect accounted for a significantly greater percentage of total variance in developmental ratings than in administrative ratings; however, no differences were observed for peer ratings as a function of rating purpose. These results suggest that subordinate ratings are of significantly better quality when made for developmental than for administrative purposes, but the same is not true for peer ratings.
... Substantiating the latter theory, three empirical studies (Harris et al. 1995;Tziner et al. 2001Tziner et al. , 2002 found significant correlations between administrative and developmental purposes (r = 0.58, 0.72, and 0.16, p < 0.05, respectively). Providing stronger evidence, Youngcourt et al. (2007) reported that correlations among administrative, developmental, and role-definition purposes were r ≥ 0.60, at p < 0.01. ...
... Based on an analysis of two data sets, one for developmental purposes (ratings of 193 raters) and the other for administrative purposes (ratings of 223 ratees), Harris et al. (1995) found that ratings for administrative purposes were more biased (lenient) than those for developmental purposes. Moreover, their results revealed administrative purposes to have a significant relationship with ratee seniority (r = 0.18, p < 0.05), but developmental ratings did not have a significant relationship (r = 0.00). ...
Article
Based on a robust analysis of the existing literature on performance appraisal (PA), this paper makes a case for an integrated framework of effectiveness of performance appraisal (EPA). To achieve this, it draws on the expanded view of measurement criteria of EPA, i.e. purposefulness, fairness and accuracy, and identifies their relationships with ratee reactions. The analysis reveals that the expanded view of purposefulness includes more theoretical anchors for the purposes of PA and relates to various aspects of human resource functions, e.g. feedback and goal orientation. The expansion in the PA fairness criterion suggests certain newly established nomological networks, which were ignored in the past, e.g. the relationship between distributive fairness and organization-referenced outcomes. Further, refinements in PA accuracy reveal a more comprehensive categorization of rating biases. Coherence among measurement criteria has resulted in a ratee reactions-based integrated framework, which should be useful for both researchers and practitioners.
... Substantiating the latter theory, three empirical studies (Harris et al. 1995;Tziner et al. 2001Tziner et al. , 2002 found significant correlations between administrative and developmental purposes (r = 0.58, 0.72, and 0.16, p < 0.05, respectively). Providing stronger evidence, Youngcourt et al. (2007) reported that correlations among administrative, developmental, and role-definition purposes were r ≥ 0.60, at p < 0.01. ...
... Based on an analysis of two data sets, one for developmental purposes (ratings of 193 raters) and the other for administrative purposes (ratings of 223 ratees), Harris et al. (1995) found that ratings for administrative purposes were more biased (lenient) than those for developmental purposes. Moreover, their results revealed administrative purposes to have a significant relationship with ratee seniority (r = 0.18, p < 0.05), but developmental ratings did not have a significant relationship (r = 0.00). ...
Data
Full-text available
Effectiveness of Performance Appraisal: An Integrated Framework
... Substantiating the latter theory, three empirical studies (Harris et al. 1995;Tziner et al. 2001Tziner et al. , 2002 found significant correlations between administrative and developmental purposes (r = 0.58, 0.72, and 0.16, p < 0.05, respectively). Providing stronger evidence, Youngcourt et al. (2007) reported that correlations among administrative, developmental, and role-definition purposes were r ≥ 0.60, at p < 0.01. ...
... Based on an analysis of two data sets, one for developmental purposes (ratings of 193 raters) and the other for administrative purposes (ratings of 223 ratees), Harris et al. (1995) found that ratings for administrative purposes were more biased (lenient) than those for developmental purposes. Moreover, their results revealed administrative purposes to have a significant relationship with ratee seniority (r = 0.18, p < 0.05), but developmental ratings did not have a significant relationship (r = 0.00). ...
Article
Based on a robust analysis of the existing literature on performance appraisal (PA), this paper makes a case for an integrated framework of effectiveness of performance appraisal (EPA). To achieve this, it draws on the expanded view of measurement criteria of EPA, i.e. purposefulness, fairness and accuracy, and identifies their relationships with ratee reactions. The analysis reveals that the expanded view of purposefulness includes more theoretical anchors for the purposes of PA and relates to various aspects of human resource functions, e.g. feedback and goal orientation. The expansion in the PA fairness criterion suggests certain newly established nomological networks, which were ignored in the past, e.g. the relationship between distributive fairness and organization-referenced outcomes. Further, refinements in PA accuracy reveal a more comprehensive categorization of rating biases. Coherence among measurement criteria has resulted in a ratee reactions-based integrated framework, which should be useful for both researchers and practitioners.
... Although performance ratings collected for research purposes exhibit a .58 correlation with ratings collected for administrative purposes, and both types of ratings exhibit similar correlations with other variables, ratings obtained for research purposes exhibit less leniency error than those obtained for administrative purposes (Harris, Smith, and Champagne, 1995). Accordingly, we ensured supervisors that performance ratings were obtained solely for research purposes. ...
... We are encouraged by evidence that supervisors agree with each other to a greater extent when they provide ratings for research purposes (r = .67 between two raters; Harris, Smith, and Champagne, 1995) than in general (interrater reliability = .52; Viswesvaran, Ones, and Schmidt, 1996). ...
Article
This paper examines how emotional intelligence and cognitive intelligence are associated with job performance. We develop and test a compensatory model that posits that the association between emotional intelligence and job performance becomes more positive as cognitive intelligence decreases. We report the results of a study in which employees completed tests of emotional intelligence and cognitive intelligence, and their task performance and organizational citizenship behavior were assessed by their supervisors. Hypotheses from the model were supported for task performance and organizational citizenship behavior directed at the organization, but not for organizational citizenship behavior directed at individuals. We discuss the theoretical implications and managerial ramifications of our model and findings.
... Ratings were found to be significantly higher in the promotion purpose but only when a graphic rating scale was utilized rather than a mixed standards scale. A subsequent field study by Harris, Smith, and Champagne (1995) also found ratings for administrative purposes to be more lenient that those for research purposes. ...
... Upward accountability, or raters being held accountable to their superior, is not common in organizations, nor prevalent in the literature. When raters believe that a superior will carefully review ratings, their motivation to be accurate should increase because of their desire to appear competent (Harris et al., 1995). The exception is when raters perceive that the person to whom they are accountable expects certain ratings (Tetlock, 1985). ...
Article
Full-text available
Although performance appraisal research has been ongoing for more than 50 years, the focus has largely been on the rater and the rating instruments. This study seeks to answer a more recent call by researchers to focus on contextual variables surrounding the performance appraisal process by analyzing two such variables: appraisal purpose and rater accountability. Results indicate that holding raters accountable for the accuracy of their ratings, especially when ratings are for administrative purposes, may be an effective strategy for reducing leniency error
... Drawing on the SCENT model, we posit that strategically positive self-evaluation is likely to occur when self-ratings are obtained in the context of organizations for non-research purposes (e.g., administrative and developmental purposes) because East Asian employees are likely to be aware that ratings are tied to incentives in organizations. Prior research has shown that self-ratings are likely to be more lenient in high stakes-settings where favorable self-ratings are associated with important outcomes (e.g., pay increase, promotion, rewards; Harris, Smith, & Champagne, 1995;Jawahar & Williams, 1997). Specifically, when ratings are used for administrative purposes, East Asian employees may self-enhance strategically because they are aware that positive self-evaluations can result in important outcomes (i.e., promotion, and bonus allocation). ...
Article
Full-text available
Much attention has been paid to the question of whether there is a modesty bias in East Asian employees’ self-ratings of job performance (i.e., a tendency to self-rate their performance lower than supervisors rate it). However, empirical results are conflicting, with some studies supporting the modesty bias and others not supporting it. We suggest that moderators representing boundary conditions for the modesty bias effect may shed light on these conflicting results. In essence, the question should not be “whether there is a modesty bias,” but rather “when is there a modesty bias?” We propose three moderators: purpose of the ratings (administrative, developmental, or research), job performance dimension (task performance, organizational citizenship behavior, or leadership), and country-level in-group collectivism. Based on 40 studies (63 independent samples) with samples from East Asia (mainland China, Japan, South Korea, and Taiwan), we found no evidence of a modesty bias. That is, East Asian employees’ self-ratings were, on average, higher than supervisor-ratings of job performance (i.e., a leniency bias). The one exception was when ratings were collected for research purposes; in this case there was, on average, no mean difference between self- and supervisor-ratings. Thus, East Asian employees’ research-purpose self-ratings are more modest, but this does not cross into a “modesty bias.” In all, our results do not support a modesty bias as a widespread cultural norm among East Asian employees.
... 4 These annual performance evaluations were used for administrative purposes in contrast to the quality of hire and hiring manager satisfaction ratings, which were collected for research purposes. Prior research suggests that performance evaluations collected for research purposes are likely of higher quality (less lenient; more reliable and valid) than those used for administrative purposes (Harris et al., 1995;Salgado & Moscoso, 2019). We, therefore, corrected this correlation for unreliability using Viswesvaran et al.'s (1996) estimate of the reliability of job performance ratings of 0.52 to arrive at a corrected reliability r c = 0. 07/ 0. 52 = 0.10. ...
Article
Full-text available
Recent research has highlighted the fact that narrative letters of recommendation in employment references could contribute to gender bias in personnel selection. Structured, quantitative employment references, however, may limit the opportunity for such biases to emerge. In a sample of nearly one million applicants and ratings by over four million employment reference providers, we found no meaningful effect of gender bias in highly structured, quantitative employment references across job levels and a wide variety of industries. Interestingly, and in contrast to existing theory, the effect of gender bias remained negligible across both stereotypically masculine and feminine jobs. Similarly, in a subsample of 5000 job applicants and 20,000 employment reference providers, coded verbatim comments of reference providers showed little practical gender differences in the frequency with which various comment types are made. These results suggest that highly structured, quantitative and semi‐structured, verbatim employment references are an effective tool in the advancement of fair and equitable personnel selection practices. Theoretical and practical implications are discussed, and future research is proposed.
... Upward accountability, or raters being held accountable for their ratings of their subordinates to a superior, is quite as common in public sector organizations (Randall & Sharples, 2012). If the raters realize that their senior will prudently evaluate ratings of their juniors, they might be highly motivated to conduct precise ratings in order to build the image of their self as competent (Harris, Smith, & Champagne, 1995). Curtis et al. (2005)stated that if raters are held accountable to the ratees and to their superiors, they should be indecisive as if they do not rate ratees on merit or hesitate due to express negative feedback to ratees their desire to be competent in the eyes of superiors will be sabotaged. ...
Article
Full-text available
The core objective of the study is to assess the role of two key appraisal characteristics; raters' accountability and perceived accuracy of rating forms in effectiveness of performance appraisal. In addition the aim is to examine mediating effect of fairness perceptions between appraisal characteristics and performance appraisal effectiveness. To conduct this dyadic study the data was collected from the 265 pairs of employees working in education department of Punjab using the prepaid postal survey in which head masters/mistresses were considered as raters and teachers were considered ratees. The data was analyzed using the SPSS software. The results revealed that both raters and ratees perceived that raters' accountability and accuracy of rating forms are predictors of effectiveness in performance appraisal. Perceived fairness is found as mediator between appraisal characteristics; raters' accountability and accuracy of rating forms, and effectiveness of performance appraisal. It is also found that there is significant difference between the perceptions of raters and ratees. In order to increase effectiveness in performance appraisal, the raters' should be more accountable and current performance evaluation report of employees should be more accurate and relevant to their jobs. This study provides the means by which the policy makers of education department can get the insight to improve the current appraisal system of Punjab, Pakistan.
... PA design PA has been frequently classified as administrative or developmental, both in classic papers on the matter and in more recent work (see, for example, Cleveland et al., 1989;Harris et al., 1995;Youndt et al., 1996;Jawahar and Williams, 1997;Boswell and Boudreau, 2002;Greguras et al., 2003;Levy and Williams, 2004;Rynes et al., 2005;Youngcourt et al., 2007;Jafari et al., 2009;Brown and Warren, 2011;Krats and Brown, 2013;Bayo-Moriones et al., 2020). This classification is based on Taylor and Wherry (1951) seminal work on the leniency of ratings. ...
Article
Purpose The purpose of this study is to analyze how the design of performance appraisal is influenced by the competitive strategy of the firm. Then, this paper examines if the alignment between appraisal and strategy impacts firm performance. Design/methodology/approach The study sample includes 258 Spanish firms in the manufacturing and services sectors. This information was gathered through questionnaires addressed to the CEO and the senior human resources manager. Several econometric models are estimated, using robust regression analysis and including a set of relevant control variables. Findings A positive relationship is found between an innovation strategy and developmental performance appraisal. A cost strategy has a negative impact on the adoption of developmental performance appraisal. The findings also confirm that firms with a quality strategy and developmental appraisal have higher performance. In addition, firms adopting an innovation strategy and administrative appraisal enjoy higher return of equity. Research limitations/implications Future research should analyze the dynamics of the relationships between appraisal, strategy and performance to rule out the flaws of cross-sectional data. Another potential extension is the analysis of the interactions of the design of other human resources management practices with both competitive strategy and firm performance. Practical implications Firms can improve performance by aligning performance appraisal design with strategy. Those with an innovation strategy should choose administrative appraisal, and those competing on quality should focus on developmental appraisal. Originality/value This paper compares the theoretical recommendations on performance appraisal for different competitive strategies, what firms actually do, and the impact that the alignment between appraisal and strategy has on firm performance.
... Very honestly and practically speaking, poor performers (e.g., rating ≤ 1) are likely to either self-select out of the organization or be terminated. Moreover, the archival data suffer from typical leniency biases that are common among most performance rating data sets (e.g., Harris, Smith, & Champagne, 1995). Therefore, results should be interpreted with caution as they may not generalize to lower performing subordinates. ...
Article
Full-text available
Building off and extending the meta‐theoretical framework of political skill, we examined the cognitive and behavioral mechanisms through which the intrapsychic effects of political skill inform its interpersonal effects, and how these interpersonal effects ultimately are transmitted into desirable outcomes. Specifically, we argue that politically skilled leaders demonstrate better situational appraisals (i.e., understanding), and thus, more appropriate situational responses (e.g., consideration and initiating structure behaviors); the demonstration of appropriate situational responses is argued to positively affect subordinates’ evaluations of their leaders (i.e., instrumentality) and subordinates’ concomitant attitudes (e.g., job satisfaction) and behaviors (e.g., performance). Results provided mixed support for the hypothesized relationships. Specifically, leader understanding mediated the relationship between political skill and consideration, but not the relationship between political skill and structuring behaviors. Moreover, consideration was positively related to subordinates’ group‐level instrumentality perceptions, whereas initiating structure was not. Finally, subordinates’ individual (within‐level) perceptions of leader instrumentality were positively related to job satisfaction and performance. The implications of these findings as they relate to theory and practice are discussed along with this investigation's strengths, limitations, and directions for future research. This article is protected by copyright. All rights reserved
... Beurteiler können zum Beispiel dazu neigen, negative Informationen zu stark zu gewichten (Ganzach 1995) und einmal gefällte Entscheidungen und Leistungsbeurteilungen in Zukunft weiter unterstreichen zu wollen (Schoorman 1988). Ein einmal erfolgter Karriereschritt könnte zum Beispiel nach dem Matthäus-Prinzip ("wer hat, dem wird gegeben") weitere Karriereschritte nach sich ziehen, die eigentlich nicht mit gegenwärtigen Leistungen begründbar sind (Merton 1968 (Harris et al. 1995;Jawahar und Williams 1997). Bates (2002) konnte überdies zeigen, dass sich Sympathie und Ähnlichkeit in Einstellungen und demografischen Merkmalen in positiveren Leistungsbewertungen niederschlagen (weswegen sich Frau Lutz in unserem Fallbeispiel sorgt, dass ein Kollege besser beurteilt werden könnte als sie). ...
Chapter
Die individuelle Arbeitsleistung ist eine wichtige Grundlage für Laufbahn- und Karriereentscheidungen. In diesem Kapitel werden Forschungsergebnisse diskutiert, die Zusammenhänge zwischen den verschiedenen Facetten von Arbeitsleistung, Leistungsbeurteilungen und Karriereerfolg aufzeigen. Dabei werden auch der organisationale Kontext sowie formale Leistungsbewertungssysteme mit ihren Vorteilen und Nachteilen berücksichtigt, ebenso wie individuelle Karriereeinstellungen und Leistungsvoraussetzungen. Es wird aufgezeigt, dass eine gute Arbeitsleistung nur unter bestimmten organisationalen Bedingungen laufbahn- und karrierewirksam ist.
... Beurteiler können zum Beispiel dazu neigen, negative Informationen zu stark zu gewichten (Ganzach 1995) und einmal gefällte Entscheidungen und Leistungsbeurteilungen in Zukunft weiter unterstreichen zu wollen (Schoorman 1988). Ein einmal erfolgter Karriereschritt könnte zum Beispiel nach dem Matthäus-Prinzip ("wer hat, dem wird gegeben") weitere Karriereschritte nach sich ziehen, die eigentlich nicht mit gegenwärtigen Leistungen begründbar sind (Merton 1968 (Harris et al. 1995;Jawahar und Williams 1997). Bates (2002) konnte überdies zeigen, dass sich Sympathie und Ähnlichkeit in Einstellungen und demografischen Merkmalen in positiveren Leistungsbewertungen niederschlagen (weswegen sich Frau Lutz in unserem Fallbeispiel sorgt, dass ein Kollege besser beurteilt werden könnte als sie). ...
... mance as valued in this context. On the other hand, performance ratings gathered for nonresearch purposes can be contaminated by a variety of influences and have relatively restricted range, so future research using ratings gathered specifically for research purposes might yield stronger or different effects (cf., Harris, Smith, & Champagne, 1995). Furthermore, to the extent that method effects might be evident, the magnitude of linear relationships could be inflated, but cross-level relationships would be less prone to such influences (Ostroff, Kinicki, & Clark, 2002). ...
Article
Full-text available
Organizations often operate in complex and dynamic environments which place a premium on employees' ongoing learning and acquisition of new competencies. Additionally, the majority of learning in organizations does not take place in formal training settings, but we know relatively little about how informal field-based learning (IFBL) behaviors relate to changes in job performance. In this study, we first clarified the construct of IFBL as a subset of informal learning. Second, on the basis of this clarified construct definition, we developed a measure of IFBL behaviors and demonstrated its psychometric properties using (a) a sample of subject matter experts who made item content validity judgments and (b) both an Amazon Mechanical Turk sample (N = 400) and a sample of 1,707 healthcare employees. Third, we advanced a grounded theory of IFBL in healthcare, and related it to individuals' regulatory foci and contextual moderators of IFBL behaviors-job performance relationships using a cross-level design and lagged nonmethod bound measures. Specifically, using a sample of 407 healthcare workers from 49 hospital units, our results suggested that promotion-focused individuals, especially in well-staffed units, readily engage in IFBL behaviors. Additionally, we found that the IFBL-changes in job performance relationship was strengthened to the extent that individuals worked in units with relatively nonpunitive climates. Interestingly, staffing levels had a weakening moderating effect on the positive IFBL-performance improvements relationship. Detailed follow-up analyses revealed that the peculiar effect was attributable to differential relationships from IFBL subdimensions. Implications for future theory building, research, and practice are discussed. (PsycINFO Database Record
... When raters are expected to be held accountable by their subordinates, they are motivated to show greater leniency in their appraisal because positive ratings are easier to justify and because they can avoid giving ratees negative feedback (Katz and Kahn 1978;Mero et al. 2007;Murphy and Cleveland 1995). By contrast, when raters expect their supervisors to hold them accountable for ratings, they tend to provide an accurate rating to show their competency in appraisal decision making and prepare to justify their decision (Curtis et al. 2005;Farh and Werbel 1986;Harris et al. 1995;Mero and Motowidlo 1995;Mero et al. 2007;Simonson and Nye 1992). ...
Article
Full-text available
The purpose of this study is to develop a measure of rater accountability, test differences in raters and ratees’ perceived level of rater accountability, and examine the positive relationship between an organizational culture promoting accurate appraisals and rater accountability. A total of 374 surveys were collected from full-time civil servants working in four central government agencies and three local government offices in South Korea. The sample consisted of 254 men and 120 women, and the average organization tenure was 15 years. Rater accountability measures were developed by modifying felt accountability measures and showed internal reliability. The results of multiple regression analysis reveal that an organizational culture promoting accurate appraisals affects rater accountability. However, contrary to expectation, raters and ratees show no meaningful difference in their level of perceived rater accountability.
... Assessments of crew compatibility should be standardized and formally integrated into crew staffing. Informal, unstandardized approaches of assessment are more likely to be influenced by irrelevant information such as issues associated with organizational politics(Harris, Smith, & Champagne, 1995;Longenecker, Sims, & Giola, 1987) or applicants' personal characteristics such as age, race, or sex(Rudman & Glick, 1999;Shaw, 1972;Ziegert & Hanges, 2005). Informal assessment procedures also may result in data that are statistically unreliable(Viswesvaran, Ones, & Schmidt, 1996). ...
Technical Report
Full-text available
Team composition, or the configuration of team member attributes and their relations, is a key enabling structure of effective teamwork. A large body of research supports the importance of team composition; however, much of it is based on teams that operate in traditional workplaces. Given the unique context within which long-distance space exploration (LDSE) crews will operate (e.g., isolation, confinement), we sought to identify psychological and psychosocial factors, measures, and combinations thereof that can be used to compose highly effective crews. We conducted a focused literature review and operational assessment related to team composition issues for LDSE. Our goals were to: (1) identify critical team composition issues and their effects on team functioning in LDSE-analogous environments with a focus on key composition factors that will most likely have the strongest influence on team performance and well-being, and (2) identify and evaluate methods used to compose teams with a focus on methods used in analogous environments. We summarize results in terms of the two primary paths through which team composition relates to mission success, indirect and direct methods of assessing compatibility, and the themes from our operational assessment. Recommendations for research and practice regarding effective team composition for LDSE are provided.
... For example, it may be possible that some organizations had higher aggregate individual performance ratings not because of actual performance levels, but because of group managers' rating biases (e.g., leniency) or characteristics of rating systems. However, we used a "research-only" individual performance measure that is less subject to such rater biases (Harris, Smith, & Champagne, 1995;Jawahar & Williams, 1997). Further, an adjacent body of cross-level research suggests that aggregated individual performance is associated with performance outcomes at higher levels such as the group-level (e.g., Chen, Kirkman, Kanfer, Allen, & Rosen, 2007), store-level (e.g., Liao & Chuang, 2004), and branch-level (e.g., Aryee, Walumbwa, Seidu, & Otaye, 2012). ...
Article
Full-text available
Drawing upon line-of-sight (Lawler, 1990, 2000; Murphy, 1999) as a unifying concept, we examine the cross-level influence of organizational use of individual pay-for-performance (PFP), theorizing that its impact on individual employees' performance-reward expectancy is boosted by the moderating effects of immediate group managers' contingent reward leadership and organizational use of profit-sharing. Performance-reward expectancy is then expected to mediate the interactive effects of individual PFP with contingent reward leadership and profit-sharing on employee job performance. Analyses of cross-organizational and cross-level data from 912 employees in 194 workgroups from 45 companies reveal that organizations' individual PFP was positively related to employees' performance-reward expectancy, which was strengthened when it was accompanied by higher levels of contingent reward leadership and profit-sharing. Also, performance-reward expectancy significantly transmitted the effects of individual PFP onto job performance under higher levels of contingent reward leadership and profit-sharing, thus delineating cross-level mediating and moderating processes by which organizations' individual PFP is linked to important individual-level employee outcomes. Several theoretical and practical implications are discussed. (PsycINFO Database Record (c) 2014 APA, all rights reserved).
... Moreover, the evaluations were used for a variety of purposes including providing feedback and determining rewards, rendering them susceptible to a variety of potential influences. Naturally, it would be preferable to have multiple ratings available that were not subject to other sources of contamination (Harris, Smith, & Champagne, 1995). That said, our performance criteria represent how a nurse's performance was evaluated within this organization. ...
Article
Full-text available
Employee psychological empowerment is widely accepted as a means for organizations to compete in increasingly dynamic environments. Previous empirical research and meta-analyses have demonstrated that employee psychological empowerment is positively related to several attitudinal and behavioral outcomes including job performance. While this research positions psychological empowerment as an antecedent influencing such outcomes, a close examination of the literature reveals that this relationship is primarily based on cross-sectional research. Notably, evidence supporting the presumed benefits of empowerment has failed to account for potential reciprocal relationships and endogeneity effects. Accordingly, using a multiwave, time-lagged design, we model reciprocal relationships between psychological empowerment and job performance using a sample of 441 nurses from 5 hospitals. Incorporating temporal effects in a staggered research design and using structural equation modeling techniques, our findings provide support for the conventional positive correlation between empowerment and subsequent performance. Moreover, accounting for the temporal stability of variables over time, we found support for empowerment levels as positive influences on subsequent changes in performance. Finally, we also found support for the reciprocal relationship, as performance levels were shown to relate positively to changes in empowerment over time. Theoretical and practical implications of the reciprocal psychological empowerment-performance relationships are discussed. (PsycINFO Database Record (c) 2014 APA, all rights reserved).
... These authors report that executives inflated ratings to maximize ratee merit increases as well as to protect ratees against low ratings that might haunt them for the rest of their careers if made part of their permanent records. Finally, the fact that ratings collected for administrative purposes tend to be much more lenient than ratings collected for research purposes (Harris, Smith, and Champagne, 1995) also suggests that raters are reticent to give ratings that negatively impact the organizational outcomes of ratees. stated by one subject in the London, Wohlers and Gallagher (1990) study, " I like my boss and would be less likely to give negative feedback if it would hurt " (p.29). ...
Article
Full-text available
ABSTRACT This exploratory study focuses onfactors that influence the purposeful adjustment of performance appraisal ratings given as part of a multi-source feedback (MSFB) process. The relative number of factors that influenced raters and the differential impact of these factors across superior, peer, and subordinate
... The most likely reason is that we and Dorfman et al. used actual performance appraisal data obtained from the organization's personnel les, whereas Nathan et al. measured them perceptually through questionnaires. Recently, Harris et al. (1995) found that administration-based ratings obtained from personnel records were more lenient than research-based ratings obtained through questionnaires. Needless to say, our data as well as Dorfman et al.'s were for administrative purposes (e.g. ...
Article
Do appraisal reviews actually change employees’ subsequent performance? To answer this question, longitudinal analyses are required. Dorfman et al. (1986) and Nathan et al. (1991) performed longitudinal studies, to attain contradicting results. Apparently we need additional longitudinal studies. We analysed a data set collected from a Korean petrochemical company, and found that, even though each of the three measures of appraisal review content (i.e. the degree to which, during the appraisal review, (1) employees have opportunity to participate in discussion, (2) goals are clearly set and (3) career issues are discussed) was significantly related to the employees’ reactions to the review, none of them had a positive impact on the subsequent job performance. This result is consistent with the Dorfman et al.’s finding. Possible reasons for our result being different from Nathan et al.’s finding, limitations of our study and further studies required are discussed.
... For example, in a laboratory simulation, Williams, DeNisi, Blencoe and Ca€erty (1985) found more leniency when study participants perceived a more negative consequence for low ratings, such as assignment to remedial training. However, in one of the few ®eld studies to examine the e€ects of purpose on ratings, Harris, Smith and Champagne (1995) concluded that the in¯ation due to administrative (versus research-based) ratings was smaller than one would expect. Harris and his colleagues suggested that future research consider other dependent variables. ...
Article
Procedural influences on peer-rater distortion and delay were investigated in a field experiment. Employees (N=123) of a business information firm were randomly assigned to conditions in a 2 (upward accountability versus no accountability) by 2 (administrative purpose versus research purpose) experimental design. Results revealed evidence for an accountability by purpose interaction on rater delay. Specifically, raters delayed rating their peers when the purpose was research-only and they had to explain their ratings to a supervisor. When the rating purpose was administrative, no differences in delay due to accountability were obtained. We found no effects for upward accountability and rating purpose on peer-rater inflation. © 1998 John Wiley & Sons, Ltd.
... Taylor and Wherry (1951) proposed that ratings collected for administrative purposes would be more lenient than ratings collected for research or developmental purposes. Over the last few years, there are no new empirical works which have been conducted (Greguras et al., 2003;Harris et al., 1995) but a useful review of it has been written (Jawahar & Williams, 1997). While the majority of the research on performance appraisal purpose has focused on the rater, some work has also been conducted on rater effects (Boswell & Boudreau, 2000. ...
Article
Full-text available
Performance appraisal is one of the most important processes in human resource management, because it has a great effect on both the financial and program components of any organization. There is a verity of methods for the appraisal of employees' performance. Obviously, no method can claim that it has an integrated approach in performance appraisal. Therefore, human resource managers should select an appraisal method which is most efficient in their organizations. In this paper, we propose a framework for the selection of appraisal methods and compare some performance appraisal methods in order to facilitate the selection process for organizations. The value of this framework is that, with use of it, organisations can evaluate their performance appraisal method with respect to the key features of it before implementing any method as well as expending extra costs. This framework is theoretical in nature, and is build based on a review of related literature.
... Research studying the effects of purpose of rating on rater evaluations has focused on three main purposes: administrative (i.e., promotion, discipline, salary increases and selection), feedback (i.e., training and employee development), and research (i.e., scale, selection and predictor validation). This literature suggests that raters are more lenient when they believe that their evaluations will be used for administrative decisions than when they believe that their evaluations will be used for feedback or research purposes (Aleamoni & Hexner, 1980;Driscoll & Goodwin, 1979;Farh & Werbel, 1986;Harris, Smith, & Champagne, 1995;Heron, 1956;Taylor & Wherry, 1951;Waldman & Thorton, 1988;Williams, DeNisi, Blencoe, & Cafferty, 1985;Zedeck & Casio, 1982). Other studies have not found a difference in leniency errors as a function of purpose of rating (Berkshire & Highland, 1953;Bernardin, 1978;Bernardin & Cooke, 1992;Centra, 1976;Hollander, 1965;McIntyre, Smith, & Hassett, 1984;Meier & Feldhusen, 1979;Murphy et al., 1984;Sharon, 1970;Sharon & Barlett, 1969). ...
... They designed a 2×2 factorial experiment to assess the effect of the purpose of the appraisal and expectation of validation on the amount of leniency. Harris et al. (1995) examined 223 ratees by administrative-based performance appraisal and research-based performance appraisal ratings and found that administrative ratings were more lenient than research-based ratings. ...
Conference Paper
Full-text available
Fuzzy measures have been widely used to determine the degrees of subjective importance of evaluation items. However, the leniency error may exist when most attributes are assigned unduly high ratings. Because respondents often assign similarly complimentary scores, errors of positive leniency make it difficult to differentiate the importance of decision attributes. To reduce positive leniency in fuzzy measure ratings, we develop a method by comparison of fuzzy number-valued fuzzy measures using a fuzzy distance measure.
... Future research should examine how the purpose of MSF would affect the results reported in this study. For instance, existing studies have found that ratings collected for administrative purposes are more prone to rating bias than those collected for developmental purposes (Farh, Cannella, & Bedeian, 1991;Harris, Smith, & Champagne, 1995;Jawahar & Williams, 1997). This is because, compared with developmental ratings, administrative ratings carry more important stakes for the ratees (e.g., no pay increase) that are likely to accentuate potential repercussions on the raters (e.g., retaliatory behaviors or decreased motivation to perform). ...
Article
Full-text available
This study extends multisource feedback research by assessing the effects of rater source and raters' cultural value orientations on rating bias (leniency and halo). Using a motivational perspective of performance appraisal, the authors posit that subordinate raters followed by peers will exhibit more rating bias than superiors. More important, given that multisource feedback systems were premised on low power distance and individualistic cultural assumptions, the authors expect raters' power distance and individualism-collectivism orientations to moderate the effects of rater source on rating bias. Hierarchical linear modeling on data collected from 1,447 superiors, peers, and subordinates who provided developmental feedback to 172 military officers show that (a) subordinates exhibit the most rating leniency, followed by peers and superiors; (b) subordinates demonstrate more halo than superiors and peers, whereas superiors and peers do not differ; (c) the effects of power distance on leniency and halo are strongest for subordinates than for peers and superiors; (d) the effects of collectivism on leniency were stronger for subordinates and peers than for superiors; effects on halo were stronger for subordinates than superiors, but these effects did not differ for subordinates and peers. The present findings highlight the role of raters' cultural values in multisource feedback ratings.
... Contrary to widespread confidence in the PAP effect, much of the research investigating the influence of PAP on appraisal leniency and accuracy has been inconsistent. Some studies support the PAP effect (Aleamoni & Hexner, 1980;Beckner, Highhouse, & Hazer, 1995;Bernardin & Orban, 1990;Driscoll & Goodwin, 1979;Farh, Cannella, & Bedeian, 1991;Farh & Werbel, 1986;Gmelch & Glasman, 1977;Harris, Smith, & Champagne, 1995;Jawahar, 1994;Pritchard, Peters, & Harris, 1973;Taylor & Wherry, 1951;Veres, Field, & Boyles, 1983;Waldman & Thornton, 1988;Williams, DeNisi, Blencoe, & Cafferty, 1985;and Zedeck & Cascio, 1982). However, many others do not (e.g., Berkshire & Highland, 1953;Bernardin, 1978;Bernardin & Cooke, 1992;Centra, 1976;Hollander, 1965;McIntyre, Smith, & Hassett, 1984;Meier & Feldhusen, 1979;Murphy, Balzer, Kellam, & Armstrong, 1984;Sharon & Bartlett, 1969;Shore, Adams, & Tashchian, 1995). ...
Article
Full-text available
More than 40 years ago, Taylor and Wherry (1951) hypothesized that performance appraisal ratings obtained for administrative purposes, such as pay raises or promotions, would be more lenient than ratings obtained for research, feedback, or employee development purposes. However, research on appraisal purpose has yielded inconsistent results, with roughly half of such studies supporting this hypothesis and the other half refuting it. To account for those differences, a meta-analysis of performance appraisal purpose research was conducted with 22 studies and a total sample size of 57,775. Our results support Taylor and Wherry's hypothesis as performance evaluations obtained for administrative purposes were, on average, one-third of a standard deviation larger than those obtained for research or employee development purposes. In addition, moderator analysis indicated larger differences between ratings obtained for administrative and research purposes when performance evaluations were made in field settings, by practicing managers, and for real world subordinates. Implications for researchers and practitioners are discussed.
... Therein, the hypothesis will be Performance rating agreement 14 tested that administrative ratings show increased levels of leniency because they are tied to valuable outcomes and rewards (e.g. Farh & Werbel, 1986;Harris, Smith & Champagne, 1995;Jawahar & Williams, 1997). In contrast, if appraisals are part of developmental feedback sessions, low self-other agreement can be perceived as an undesirable outcome for several reasons. ...
Article
Full-text available
This meta-analysis explores agreement in self- and supervisory ratings of job performance (k = 128 independent samples). It suggests a 3-stage model of the rating process and reviews the empirical evidence for the relevance of each of these 3 stages to an understanding of agreement in ratings. The proposed 3-stage model serves as the guiding rationale for the examination of an extensive set of variables that moderate rater agreement. Results are reported for 2 indicators of rater agreement (correlational and mean-level agreement). Self-supervisor ratings yielded an overall correlation of .22 (rho = .34; k = 115; n = 37,752). Position characteristics and the use of nonjudgmental performance indicators were the main moderators. Leniency in self-ratings is indicated by higher mean levels of self-ratings compared with supervisory ratings. Within Western samples, performance self-ratings showed leniency (d = 0.32, Delta = .49; k = 89; n = 35,417) dependent on contextual features, scale format, and scale content.
Article
Full-text available
Given the centrality of the job performance construct to organizational researchers, it is critical to understand the reliability of the most common way it is operationalized in the literature. To this end, we conducted an updated meta-analysis on the interrater reliability of supervisory ratings of job performance (k = 132 independent samples) using a new meta-analytic procedure (i.e., the Morris estimator), which includes both within- and between-study variance in the calculation of study weights. An important benefit of this approach is that it prevents large-sample studies from dominating the results. In this investigation, we also examined different factors that may affect interrater reliability, including job complexity, managerial level, rating purpose, performance measure, and rater perspective. We found a higher interrater reliability estimate (r = .65) compared to previous meta-analyses on the topic, and our results converged with an important, but often neglected, finding from a previous meta-analysis by Conway and Huffcutt (1997), such that interrater reliability varies meaningfully by job type (r = .57 for managerial positions vs. r = .68 for nonmanagerial positions). Given this finding, we advise against the use of an overall grand mean of interrater reliability. Instead, we recommend using job-specific or local reliabilities for making corrections for attenuation.
Article
Full-text available
Performance appraisal (PA) is used for various organizational purposes and is vital to human resources practices. Despite this, current estimates of PA reliability are low, leading to decades of criticism regarding the use of PA in organizational contexts. In this article, we argue that current meta-analytical interrater reliability (IRR) coefficients are underestimates and do not reflect the reliability of interest to most practitioners and researchers—the reliability of an employee’s direct supervisor. To establish the reliability of direct supervisor ratings, those making PA ratings must directly supervise employee job performance instead of nonparallel rater designs (e.g., direct supervisor ratings correlated with ratings from a more senior leader). The current meta-analysis identified 22 independent samples that met this more restrictive study inclusion criterion, finding an average observed IRR of .65. We also report reliability estimates for several important contextual moderators, including whether ratings were completed in operational settings (.60) or for research purposes (.67). In sum, we argue that this study’s meta-analytical IRR estimates are the best available estimates of direct supervisor reliability and should be used to guide future research and practice.
Article
Full-text available
Objectives: This reliability generalization study aimed to estimate the mean and variance of the interrater reliability coefficients (ryy) of supervisory ratings of overall, task, contextual, and positive job performance. The moderating effect of the appraisal purpose and the scale type was examined. It was hypothesized that the ratings collected for research purposes and multi-item scales have higher ryy. It was also examined whether ryy was similar for the four performance dimensions. Method: A database consisting of 224 independent samples was created and hierarchical sub-grouping meta-analyses were conducted. Results: The appraisal purpose was a moderator of ryy for the four performance dimensions. Scale type was a moderator of ryy for overall and task performance collected for research purposes. The findings also suggest that supervisors seem to have less difficulty evaluating overall job performance than task, contextual, and positive performance. The best estimates of the observed ryy for overall job performance are 0.61 for research-collected ratings and 0.45 for administrative-collected ratings. Conclusions: (1) Appraisal purpose moderates ryy and researchers and practitioners should be aware of its effects before collecting ratings or using empirically-derived interrater reliability distributions, (2) Scale type seems to moderate ryy in the case of the ratings collected for research purposes, only, (3) overall job performance is more reliably rated than task, contextual, and positive performance. Implications for research and practice are discussed.
Chapter
Having characterized the six emerged employment relationships, the goal of this chapter is to derive configurations of HRM practices for each of the six employment modes as far as they relate to performance appraisal. It is argued that performance appraisals are conducted mainly for three purposes; that is, to exercise control over employees (for example, through the administration of performance-related rewards), to provide a basis for staffing (including promotion) decisions, and to provide developmental feedback. A fourth, less frequently occurring purpose may be to evaluate the effectiveness of organizational systems; for example, different sources of contingent workers. The functional requirements in terms of control, staffing, and development are described for each of the six employment modes. Requirements for performance appraisal are derived on that basis. Thus, as an output of this chapter, an HRM configuration of control, staffing and development practices is proposed for each of several performance appraisal purposes, which are associated with the different employment categories in the following. The HRM configurations can be seen as an extension of the employment relations systems proposed in the preceding chapter.
Chapter
This final chapter is concerned with how performance appraisals should be conducted—i.e., with the appraisal process. Appraisal process models by cognitive psychologists typically distinguish three stages of the appraisal process: The collection of information for appraising someone, its organization and storage in memory, and its retrieval and integration into a coherent judgment for the respective appraisal purposes. ¹ Understanding the cognitive processes related to appraisals of performance helps design the appraisal system such that the purposes and goals of the appraisal can be achieved. Hence, the findings of cognitive psychologists will be referred to at several points of this chapter.
Chapter
Full-text available
Article
This dissertation presents meta-analyses on the congruence of self- and supervisor-ratings for job performance (k=104 independent samples). It examines two indicators of rater convergence (self-supervisor correlations and mean difference scores) and reports results for an extensive set of moderator variables. Correlation coefficients are interpreted as a measure of self-rating validity whereas the higher mean level of self-ratings compared to supervisory ratings is considered to indicate leniency in self-ratings. Self-supervisor ratings yielded an overall correlation of r=.22 (k=96; n=22287). Job type (i.e. blue collar vs. white collar; managerial responsibility) and the use of non-judgmental performance dimensions were the main moderators. The analysis also confirmed the notion that self-ratings of performance are lenient (d=.33; k=70; n=29386). For Western samples, self-ratings were consistently lenient under all conditions, but the extent to which ratings were lenient depended on situational variables (e.g. purpose for ratings). The two measures of rater congruence varied independently and were moderated by different sets of variables. A process model of performance rating is presented that integrates the study’s findings.
Article
Using social exchange theory, we argue that because supervisors tend to value employee trustworthiness, they will be more likely to adhere to interpersonal and informational justice rules with trustworthy employees. Given social exchange theory’s assumption that benefits are voluntary in nature, we propose that the benevolence and integrity facets of trustworthiness will be more likely to engender social exchange relationships than the ability facet. Specifically, we propose that employees seen as having high benevolence and integrity engender feelings of obligation and trust from their direct supervisors, increasing the likelihood that these supervisors will adhere to interpersonal and informational justice rules, which in turn influences employee perceptions of justice. We find partial support for our mediated model using a field sample.
Article
Despite its acknowledged importance, performance appraisal (PA) continues to be one of the most persistent problems in organizations, especially the appraisal interview (AI) component of PA, for which many techniques have been attempted with only mixed success. The authors conceptualize the AI as a "conversation about performance" and draw on an extensive review of the communication literature to identify the discursive resources available to the organization, the appraiser, and the appraisee for improving the preparation for and conduct of a conversation about performance. The authors' conceptualization extends research on PAs by identifying methodologies and conceptual underpinnings with connections to interpersonal, organizational, and mass communication scholarship.
Article
Although it is recognized that g is important for success in the workplace, it is suggested that further research is necessary to understand the nature of g and to determine how prediction of job performance may be enhanced. Major relevant theories of intelligence are discussed and criticized. Questions about the roles of measures of knowledge, interests, and personality in providing incremental validity to that afforded by g are discussed. Difficulties in criterion definition and measurement are assessed. Additions to utility analysis are recommended. New findings relevant to group differences are discussed. The future of the prediction of workplace performance is discussed, and recommendations regarding the roles of both theoretical concepts and practical innovations are made.
Article
Researchers have suggested that rater motives and the organizational context should be considered as sources of performance appraisal inaccuracies. A review of the performance appraisal literature revealed three primary non-performance factors that managers consider when rating employee performance: (a) Potential negative consequences of ratings, (b) organizational norms, and (c) the opportunity to advance self-interests. Using a policy-capturing methodology, the current study investigated if these three non-performance factors, as well as individual rater differences (e.g., conscientiousness, agreeableness, and performance appraisal experience), influence performance ratings. A sample of 303 experienced managers rated the performance of a fictitious employee, featured in a series of hypothetical scenarios, in which the above information was manipulated. Using hierarchical linear modeling, the results revealed that each of the three non-performance related considerations accounted for variance incremental to objective employee performance. Managers' performance appraisal experience also predicted ratings, such that more experience was associated with lower ratings. These results provide support for the view that non-performance factors can be a substantive component of performance ratings. Copyright © 2009 John Wiley & Sons, Ltd.
Article
Should principals explain and justify their evaluations? In this paper the principal’s evaluation is private information, but she can provide some justifications by sending a costly cheap-talk message. Indeed, it is optimal for the principal to explain her evaluation to the agent if and only if the evaluation turns out to be bad. The justification assures the agent that the principal has not distorted the evaluation downwards. In equilibrium, the wage increases in the performance of the agent, as long as the principal provides a justification. For good performances, however, the principal pays a given high wage without providing justifications. This wage pattern fits empirical observations that subjective evaluations are lenient and discriminate poorly between good performances. I show that this pattern is part of the optimal contract instead of biased behavior. Furthermore, it is possible to implement the optimal contract in an ex-post budget-balanced way if stochastic contracts are feasible.
Article
This study evaluated operative performance rating (OPR) characteristics and measurement conditions necessary for reliable and valid operative performance (OP) assessment. Operative performance is a signature surgical-practice characteristic that is not measured systematically and specifically during residency training. Expert surgeon raters from multiple institutions, blinded to resident characteristics, independently evaluated 8 open and laparoscopic OP recordings immediately after observation. A plurality of raters agreed on operative performance ratings (OPRs) for all performances. Using 10 judges adjusted for rater idiosyncrasies. Interrater agreement was similar for procedure-specific and general items. Higher post graduate year (PGY) residents received higher OPRs. Supervising-surgeon ratings averaged 0.51 points (1.2 standard deviations) above expert ratings for the same performances. OPRs have measurement properties (reliability, validity) similar to those of other well-developed performance assessments (Mini-CEX [clinical evaluation exercise], standardized patient examinations) when ratings occur immediately after observation. OPRs by blinded expert judges reflect the level of resident training and are practically significant differences as the average rating for PGY 4 residents corresponded to a "Good" performance whereas those for PGY 5 residents corresponded to a "Very Good" performance. Supervising surgeon ratings are higher than expert judge ratings reflecting the effect of interpersonal factors on supervising surgeon ratings. Use of local and national norms for interpretation of OPRs would adjust for these interpersonal factors. The OPR system provides a practical means for measuring operative performance, which is a signature characteristic of surgical practice.
Article
The present study sought to examine the relationship between managers’ perceptions of employee motivation and performance appraisal by surveying managers and employees in three distinct cultural regions (North America, Asia, and Latin America) within a single global organization. Three distinct cultural patterns emerged in the theories managers’ held about their subordinates. While North American managers perceived their employees as being more extrinsically than intrinsically motivated, perceptions of intrinsic motivation proved to be a more robust predictor of performance appraisal. Asian managers exhibited a holistic tendency in that they perceived their subordinates as equally motivated by intrinsic and extrinsic factors, and their perceptions of both motivations proved to be comparable predictors of performance appraisal. Latin American managers perceived their employees as being more intrinsically than extrinsically motivated, and accordingly, only their perceptions of intrinsic motivation proved to be significantly correlated with performance appraisal. In contrast to the cultural variations exhibited in manager perceptions, employees consistently reported themselves as being more motivated by intrinsic than extrinsic incentives. Explanations for the distinct cultural patterns that emerged and their implications for the study of culture and organizational behavior are discussed.
Article
The effectiveness of multi-source feedback (MSF) tools, which are increasingly important in medical careers, will be influenced by their users' attitudes. This study compared perceptions of two tools for giving MSF to UK junior doctors, of which one provides mainly textual feedback and one provides mainly numerical feedback. We then compared the perceptions of three groups, including: trainees; raters giving feedback, and supervisors delivering feedback. Postal questionnaires about the usability, usefulness and validity of a feedback system were distributed to trainees, raters and supervisors across the north of England. Questionnaire responses were analysed to compare opinions of the two tools and among the different user groups. Overall there were few differences. Attitudes towards MSF in principle were positive and the tools were felt to be usable, but there was little agreement that they could effectively identify doctors in difficulty or provide developmental feedback. The text-oriented tool was rated as more useful for giving feedback on communication and attitude, and as more useful for identifying a doctor in difficulty. Raters were more positive than other users about the usefulness of numerical feedback, but, overall, text was felt to be more useful. Some trainees expressed concern that feedback was based on insufficient knowledge of their work. This was not supported by raters' responses, although many did use indirect information. Trainees selected raters mainly for the perceived value of their feedback, but also based on personal relationships and the simple pragmatics of getting a tool completed. Despite positive attitudes to MSF, the perceived effectiveness of the tools was low. There are small but significant preferences for textual feedback, although raters may prefer numerical scales. Concerns about validity imply that greater awareness of contextual and psychological influences on feedback generation is necessary to allow the formative benefits of MSF to be optimised and to negate the risk of misuse in high-stakes contexts.
Article
The assignment offers introductorily a detailed survey on the current level of research regarding transformational leadership as well as agreement of self and other appraisals of leadership behaviour. In a questionnaire survey carried out in the automobile industry 210 persons rated leadership behaviour from a self, upwards and downwards perspective (270° assessment) as well as several success criteria (job satisfaction, job motivation, performance, well-being). There was hardly any correlation between the self and other ratings, whereas both other perspectives correlated with each other positively at middle level. Furthermore a clear negative correlation was shown between the extent of the interperspective discrepancy (self-upwards, self-downwards, upwards-downwards) and the outcome criteria. The results of regression analyses show that the quality of leadership, however, must be taken as much into account as assessment discrepancies, resulting in important implications for the increasing popularity of the 360° feedback method. In future, beside self-other congruities more importance should be placed on the other-other ones, as they explain at least as much variance and even show incremental validity over the self-other discrepancy suggesting a leadership mechanism, which has yet to be explored.
Article
Full-text available
This meta-analytic review presents the findings of a project investigating the validity of the employment interview. Analyses are based on 245 coefficients derived from 86,311 individuals. Results show that interview validity depends on the content of the interview (situational, job related, or psychological), how the interview is conducted (structured vs. unstructured; board vs. individual), and the nature of the criterion (job performance, training performance, and tenure; research or administrative ratings). Situational interviews had higher validity than did job-related interviews, which, in turn, had higher validity than did psychologically based interviews. Structured interviews were found to have higher validity than unstructured interviews. Interviews showed similar validity for job performance and training performance criteria, but validity for the tenure criteria was lower.
Article
Full-text available
On the surface, it is not readily apparent how some performance appraisal research issues inform performance appraisal practice. Because performance appraisal is an applied topic, it is useful to periodically consider the current state of performance research and its relation to performance appraisal practice. This review examines the performance appraisal literature published in both academic and practitioner outlets between 1985 and 1990, briefly discusses the current state of performance appraisal practice, highlights the juxtaposition of research and practice, and suggests directions for further research.
Article
Full-text available
Previous research has suggested that ratings used to make administrative decisions are lenient when compared with ratings obtained for research purposes only. The present study examined the effects of the purpose of rating on multivariate measures of accuracy in observing teacher behavior as well as measures of accuracy in evaluating teaching performance. 45 undergraduates viewed and evaluated videotaped lectures under conditions in which they were informed that their ratings would be used for research only or for making important decisions about those being rated. The purpose of rating did not affect measures of accuracy in rating the frequency with which a number of critical behaviors occurred on each tape. The purpose of rating also did not affect multivariate measures of performance rating accuracy. Purpose did, however, affect the relationship between accuracy in observing teacher behavior and accuracy in evaluating teaching performance. It is suggested that purpose affects the way in which raters process behavioral information without necessarily affecting the general level of rating. (40 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Presents the findings of a project investigating the validity of the employment interview. Analyses are based on 245 coefficients derived from 86,311 individuals. Results show that interview validity depends on the content of the interview (situational, job related, or psychological), how the interview is conducted (structured vs unstructured; board vs individual), and the nature of the criterion (job performance, training performance, and tenure; research or administrative ratings). Situational interviews had higher validity than did job-related interviews, which, in turn, had higher validity than did psychologically based interviews. Structured interviews were found to have higher validity than unstructured interviews. Interviews showed similar validity for job performance and training performance criteria, but validity for the tenure criteria was lower. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
The most ubiquitous method of performance appraisal is rating. Ratings, however, have been shown to be prone to various types of systematic and random error. Studies relating to performance rating are reviewed under the following headings: roles, context, vehicle, process, and results. In general, cognitive characteristics of raters seem to hold the most promise for increased understanding of the rating process. A process model of performance rating is derived from the literature. Research in the areas of implicit personality theory and variance partitioning is combined with the process model to suggest a unified approach to understanding performance judgments in applied settings. (6 p ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Court cases since the classic Brito v. Zia (1973) decision dealing with terminations based on subjective performance appraisals are reviewed. Professional interpretations of Brito v. Zia are also examined and criticized in light of professional practice and subsequent court decisions. Major themes and issues are distilled from the review of cases, and implications and recommendations for personnel practices were discussed.
Article
Full-text available
The purpose of this study was to investigate conflicting findings in previous research on personality and job performance. Meta-analysis was used to (a) assess the overall validity of personality measures as predictors of job performance, (b) investigate the moderating effects of several study characteristics on personality scale validity, and (c) appraise the predictability of job performance as a function of eight distinct categories of personality content, including the “Big Five” personality factors. Based on review of 494 studies, usable results were identified for 97 independent samples (total N= 13,521). Consistent with predictions, studies using confirmatory research strategies produced a corrected mean personality scale validity (.29) that was more than twice as high as that based on studies adopting exploratory strategies (.12). An even higher mean validity (.38) was obtained based on studies using job analysis explicitly in the selection of personality measures. Validities were also found to be higher in longer tenured samples and in published articles versus dissertations. Corrected mean validities for the “Big Five” factors ranged from .16 for Extroversion to .33 for Agreeableness. Weaknesses in the reporting of validation study characteristics are noted, and recommendations for future research in this area are provided. Contrary to conclusions of certain past reviews, the present findings provide some grounds for optimism concerning the use of personality measures in employee selection.
Article
Full-text available
This paper presents a model of performance appraisal which focuses on the cognitive processes employed by a rater attempting to form an evaluation. The model describes the method by which a rater collects, encodes, stores, and later retrieves information from memory, and the method by which he or she weights and combines this information to form an evaluation which is converted to a rating on a scale. The model is based on diverse bodies of literature which share a social-cognitive orientation, and it forms the foundation for a number of testable research propositions.
Article
Full-text available
On the surface, it is not readily apparent how some performance appraisal research issues inform performance appraisal practice. Because performance appraisal is an applied topic, it is useful to periodically consider the current state of performance research and its relation to performance appraisal practice. This review examines the performance appraisal literature published in both academic and practitioner outlets between 1985 and 1990, briefly discusses the current state of performance appraisal practice, highlights the juxtaposition of research and practice, and suggests directions for further research.
Article
Full-text available
This paper reports the follow-up phase of a study of peer nominations begun in 1955 at the Naval OCS in Newport, Rhode Island. Over 700 trainees completed several peer nomination forms at various stages of training, 1 in particular on "success as a future Naval Officer" (FO). Subsequently, 639 trainees were identified who had gone on to duty as officers for about 3 yr. The average grade they secured on a key portion of the fitness report ratings assigned by their direct superior officers was used as a performance criterion; it had a split-half reliability of .90 In the prediction of this criterion, the FO peer nomination score from the 3rd wk. of training gave a validity of .40 which was as high as that for later FO scores and which was only slightly diminished after academic grades and popularity were partialed. The findings support the use of early peer nominations as a valid supplemental measure in predicting performance after training. (17 ref.)
Article
The interrater reliabilities of ratings of 9,975 ratees from 79 organizations were examined as a function of length of exposure to the ratee. It was found that there was a strong, nonlinear relationship between months of exposure and interrater reliability. The correlation between a logarithmic transformation of months of experience and reliability was .73 for one type of ratings and .65 for another type. The relationship was strongest during the first 12 months on the job. Changes in reliability were accounted for mostly by changes in criterion variance. Asymptotic levels of reliability were only about .60, even with 10-20 years of experience. Implications for estimating reliabilities in individual and meta-analytic studies and for performance appraisal were presented, and possible explanations of the reliability-variance relationship were advanced.
Article
The interrater reliabilities of ratings of 9,975 ratees from 79 organizations were examined as a function of length of exposure to the ratee. It was found that there was a strong, nonlinear relationship between months of exposure and interrater reliability. The correlation between a logarithmic transformation of months of experience and reliability was .73 for one type of ratings and .65 for another type. The relationship was strongest during the first 12 months on the job. Changes in reliability were accounted for mostly by changes in criterion variance. Asymptotic levels of reliability were only about .60, even with 10–20 yrs of experience. Implications for estimating reliabilities in individual and meta-analytic studies and for performance appraisal were presented, and possible explanations of the reliability–variance relationship were advanced. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Hulin, Henry, and Noon (1990) reviewed evidence from a number of studies which supported, in their view, the position that predictive validities decreased over time. If correct, their results would have significant implications for personnel selection practice and research. However, further analysis of their evidence suggested that their results may have only limited generalizability. More specifically, few of the studies they used to support their claim of decreasing predictive validities were field studies of prediction-criterion pairs. Furthermore^ reported data on lagged intercorrelations were of limited relevance to the question of decreasing validities. Finally, a large body of data relevant to the issue of time-lagged validities in a personnel selection context were omitted because the data did not meet Hulin et al.'s restrictive criteria.
Article
We argue a divergent perspective from that taken by Barrett, Caldwell, and Alexander (1985) in a critical reanalysis of the evidence for dynamic criteria. Those authors distinguished three definitions of the dynamic criterion phenomenon and concluded, on the basis of secondary analyses of several sets of published data, that dynamic criteria do not exist. Moreover, they concluded that most of the temporal changes in criteria reported in those data sets could be explained by methodological artifacts. In several cases these artifacts were listed in summary form, without a complete consideration of the implications of invoking these artifacts as post hoc explanations. The purpose of this comment is to clarify the debate on dynamic criteria by critiquing the Barrett et al. study. We suggest that a fruitful solution to the problem may lie in trying to understand criteria per se rather than searching for artifacts.
Article
The effect of rating format and non-performance variables on rating leniency were studied in two law enforcement organizations. One of these variables, trust in the appraisal process, was defined as the extent to which a rater believes that fair and accurate appraisal will be made in the organization. A measure of trust in appraisal accounted for a significant proportion of variance in performance ratings. The purpose of appraisal (i.e., feedback or promotion) also accounted for rating variance. A mixed-standard rating format showed less susceptibility to the non-performance variables on the extent of leniency. Discussion centers on the usefulness of rater and organizational variables in performance appraisal research.
Article
Past research on the role of appraisal purpose in the appraisal decision-making process has concentrated on the motivational role of purpose. Research has found that raters are less willing to give poor ratings when appraisals are to be used for some purposes rather than others. The present paper describes two experiments which explore how appraisal purpose might affect rater cognitive activities as well. The first experiment investigated how appraisal purpose and outcomes affect how raters differentially utilize information to make appraisal decisions. Few differences were found. The second experiment investigated how raters differentially search for performance information to make appraisal decisions for different purposes and outcomes. Raters were found to search for more comparative information when they had to select one of several ratees for some treatment. The results also indicated a discrepancy between how information is collected and how it is used. Implications for defining the role of purpose in the appraisal process, as well as for recent process approaches to performance appraisal, are discussed.
Article
This experiment investigated the effects of two factors felt to influence the quality of ratings: anticipated feedback sharing and knowledge of subordinates' self-assessment. One hundred and eighty subjects receiving either favorable, unfavorable, or no subordinate self-assessment information were led to anticipate either face-to-face, written, or no feedback sharing with a subordinate. Main effects were found both for type of feedback sharing and level of subordinate self-assessment. Partial support was also found for the interaction of feedback sharing and partial self-assessment effects. Supervisors anticipating sharing face-to-face feedback with a subordinate rated the latter's performance significantly more positively than did supervisors who received no self-assessment data; while supervisors receiving knowledge of an unfavorable self-assessment rated their subordinates significantly more negatively than those receiving no self-assessment information. Ancillary analyses support the contention that the impact of knowledge of self-assessment information is largely motivational, as opposed to informational, in nature. Despite their potential to influence ratings, it is suggested that face-to-face feedback sharing requirements and the use of subordinate self-assessment data are not necessarily detrimental, but rather that care should be taken to minimize their potential to reduce rating quality.
Article
Verbal protocol analysis was used to trace cognitive processes for 36 managers as they evaluated their own performance and the performance of their subordinates. Attribution theory and exploratory analyses were used in order to detect differences in information processing between the self-evaluations and the evaluations of subordinates. These results appear to clarify differences in patterns of rating outcomes usually obtained for the two rating types. Empirical support was found for the application of attribution theory in the models of performance appraisal. However, a number of discoveries were made which emphasize the need for more empirical research in modeling the appraisal process.
Article
A literature-based model of the determinants of the accuracy of performance ratings is presented. The model indicates that the major determinants of accuracy are: (a) rater motivation; (b) rater ability; and (c) availability of appropriate judgmental norms. Several propositions and suggestions for further research are derived from the components of the model.
Article
In the early 1980s, Landy and Farr (1980) and Feldman (1981) redirected performance appraisal research from issues related to the development of psychometrically sound rating scales to those involving the cognitive processes of raters. Since that time, several reviews have attempted to translate principles from social cognition and cognitive psychology to the specific conditions of formal appraisal systems in work-oriented organizations. In addition, a number of empirical studies have been conducted on this topic. This article reviews empirical research during the 1980s that focused on performance appraisal processes, particularly the research that has focused upon rating accuracy. The review is structured around a three-stage process model of gathering, storing, and retrieving information about social stimuli for the purposes of rating performance. Factors affecting this process are clustered into four categories: appraisal settings, ratees, raters, and the nature of the scales used for the appraisal. Once reviewed, the research is evaluated in terms of its contributions to improving the quality of appraisal systems as they are used in organizations.
Personnel selection Annual Review ofPsychol-ogy Accountability: The neglected social context of judgment and choice Personality measures as predictors of job performance: A meta-analytic review
  • Schmidt Fl
  • Ds
  • Je Hunter
Schmidt FL, Ones DS, Hunter JE. (1992). Personnel selection. Annual Review ofPsychol-ogy, 43,627-670. Tetlock PE. (1985). Accountability: The neglected social context of judgment and choice. In Staw BM, Cummings LL (Eds.), Research in organizational behavior (pp. 297-332). Greenwich, CT: JAI Press. Tett RP, Jackson DN, Rothstein M. (1991). Personality measures as predictors of job performance: A meta-analytic review. PERSONNEL PSYCHOLOGY, 44,703-742.
Job behavior, performance, and effectiveness Handbook of industrial and organizationalpsychology The current state of performance appraisal research and practice: Concerns, directions, and implications
  • M Lexington
  • Lexington Books
  • Wc Borman
Lexington, M A Lexington Books. Borman WC. (1991). Job behavior, performance, and effectiveness. In Dunnette MD, Hough LM (Eds.), Handbook of industrial and organizationalpsychology (Volume 2 pp. 271-326). Palo Alto, CA: Consulting Psychologists Press. Bretz RD, Milkovich GT, Read W. (1992). The current state of performance appraisal research and practice: Concerns, directions, and implications. Journal of Manage-ment, 18,321-352.
Accountability forces in performance appraisal. Organiza-tional Behavior and Human Decision &messes
  • R Klimoski
  • L Inks
Klimoski R, Inks L. (1990). Accountability forces in performance appraisal. Organiza-tional Behavior and Human Decision &messes, 45, 194-208.
Analyzing performance appraisal as goal-directed be-havior The performance appraisal process: A model and some testable propositions
  • Cleveland J Murphy
  • Kr
Cleveland J, Murphy KR. (1992). Analyzing performance appraisal as goal-directed be-havior. In Ferris G, Rowland KR (Eds.), Research inpersonneland human resources management (pp. 121-185). Greenwich, CT: JAI Press. DeCotiis TA, Petit A. (1978). The performance appraisal process: A model and some testable propositions. Academy of Management Review, 3,635-645.
Human resource selection Statistics for the socialsciences Validity of peer nominations in predicting a distant performance criterion
  • Gatewood Rd
  • Hs Feild
  • Ft
  • T X Worth
  • Dryden
  • W Hays
Gatewood RD, Feild HS. (1994). Human resource selection. Ft. Worth, T X : Dryden. Hays W. (1973). Statistics for the socialsciences. NY: Holt, Rinehart, & Winston. Hollander EP: (1965). Validity of peer nominations in predicting a distant performance criterion. Journal of Applied Psychology, 49,434-438.
The validity of employment inter-views: A comprehensive review and meta-analysis Effect of purpose of rating on accuracy in observing teacher behavior and evaluating teaching performance Performance appraisal: An organizational perspecfive
  • M Mcdaniei
  • D Whetzel
  • F Schmidt
  • S Maurer
  • Murphy Kr
  • Balzer Wk
  • Armstrong K J Kellam
McDanieI M, Whetzel D, Schmidt F, Maurer S. (1994). The validity of employment inter-views: A comprehensive review and meta-analysis. Journal of Applied Psychology, 79,599416. Murphy KR, Balzer WK, Kellam K, Armstrong J. (1984). Effect of purpose of rating on accuracy in observing teacher behavior and evaluating teaching performance. Journal of Educational Psychology, 76,45-54. PERSONNEL PSYCHOLOGY Murphy KR, Cleveland JN. (1991). Performance appraisal: An organizational perspecfive. Boston, MA: Allyn & Bacon.
The measurement of workperfomzance. NY: Academic Press Use of verbal protocol to trace cognitions associated with self-and supervisor evaluations of performance
  • Fj Landy
  • J Farr
Landy FJ, Farr J. (1983). The measurement of workperfomzance. NY: Academic Press. Martin SL, Klimoski RJ. (1990). Use of verbal protocol to trace cognitions associated with self-and supervisor evaluations of performance. Organizational Behavior and Human Decision Processes, 46, 135-154.
Criterion purpose as a covariate of employment test validity
  • P Rogers
  • M Mcdaniel
Criterion purpose as a covariate of employment test validity
  • Rogersp Mcdanielm