Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Despite the widespread use of interrater agreement statistics for multilevel modeling and other types of research, the existing guidelines for inferring the statistical significance of interrater agreement are quite limited. They are largely relevant only under conditions that numerous researchers have argued rarely exist. Here we address this problem by generating guidelines for inferring statistical significance under a number of conditions via a computer simulation. As a set, these guidelines cover many of the conditions researchers commonly face. We discuss how researchers can use the guidelines presented to more reasonably infer the statistical significance of interrater agreement relative to using the limited guidelines available in the extant literature. (PsycINFO Database Record (c) 2013 APA, all rights reserved).

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The r WG index ranges from 0 (absent agreement) to 1 (maximum agreement), and heuristic values for unacceptable agreement are 0 to 0.59, for sufficient agreement are 0.60 to 0.70, for moderate agreement are 0.70 to 0.80, and for strong agreement are 0.80 to 1.00 (Brown & Hauenstein, 2005;Lanz et al., 2018;Wagner et al., 2010). While some studies have justified informant agreement by reporting an acceptable average r WG index across targets, r WG values are meant to be applied individually to each target in the study (Smith-Crowe et al., 2014). For this heuristic strategy, the cutoff value of r WG = .60 ...
... A second way to determine targets' informant agreement is to test the statistical significance of r WG (Strategy 2.2) (Lanz et al., 2018), by comparing r WG values to critical values relative to different types of distributions (Smith-Crowe et al., 2014). Because the r WG calculates agreement by comparing the actual observed variability of the multiple informants' ratings to the theoretical variance of ratings that would be expected in the case of no agreement, the r WG is heavily dependent on the choice of the null distribution underlying variation in informant responding. ...
... Because the r WG calculates agreement by comparing the actual observed variability of the multiple informants' ratings to the theoretical variance of ratings that would be expected in the case of no agreement, the r WG is heavily dependent on the choice of the null distribution underlying variation in informant responding. While the most commonly used null distribution for modeling informant responses has been the uniform distribution, which is recommended when researchers are confident the raters do not share a common bias (Wagner et al., 2010), it is best practice to test multiple alternative null distributions when the absence of informant bias cannot be assumed (Lanz et al., 2018;Smith-Crowe et al., 2014). For example, skewed null distributions are useful when informant responses may be biased by social desirability, leniency, or overly positive or negative anchor valence (Smith-Crowe et al., 2014), and triangular/normal null distributions are useful when informant responses exhibit a central tendency bias (LeBreton & Senter, 2008). ...
Article
Full-text available
According to narrative identity people understand themselves through story. However, evidence suggests people vary dispositionally in how strongly they experience narrative identity. Recently, a self-report scale called the Cinematic Self scale was constructed and validated to measure this trait variation in narrative identity. In that article, evidence across five studies supported the validity of the cinematic self scale as measuring narrative identity. A limitation of that evidence was its exclusive reliance on self-reports; this can be problematic because associations may be inflated by shared method biases. The present research builds on that limitation by using data from multiple methods: self-reports and informant reports. Do self-reported cinematic self scores correspond with informant-reported narrative behaviors? University students (N = 127) completed the cinematic self scale and informants reported on their narrative behaviors (N = 395). Positive, medium-sized correlations between self-reports and informant reports were found across multiple analytical approaches, controlling for personality variables. The results extend validation evidence of the cinematic self scale as a measure of narrative identity.
... Furthermore, we also provide ICC (2), which represents the group mean reliability. For r WG , which measures the extent to which women agree in their ratings of the focal variables (James et al. 1984), we determined cutoffs using the critical values reported by Smith-Crowe and colleagues (Smith-Crowe et al. 2014). These critical values indicate the statistical significance of a particular interrater agreement estimate. ...
... Team identification is a single-item measure with six response categories. Smith-Crowe and colleagues (Smith-Crowe et al. 2014) suggest that for a group size of five (the lowest reported group size), the range of critical values for r WG is between .81 (when using five response categories) and .86 ...
... As was the case with identification, we use r WG and ICC to justify aggregation of individual responses to the team level. For r WG , Smith-Crowe et al. (2014) suggest that the critical value for a five-item measure with five response categories for a group size of five is .86. Thus, aggregation metrics supported aggregation of women's individual ratings of collective efficacy to the team level of analysis (median r WG = .90; ...
Article
Full-text available
The relationships among the percentage of women in a team and women’s sense of team identification and collective efficacy as well as team performance was examined. We explored these relationships in a sample of student teams conducting a semester-long social science research project within the context of science and technology-focused university. Findings with 95 U.S. college students (43 women) show that women experience higher team identification and collective efficacy as the percent of women teammates increases. Additionally, women’s team identification and collective efficacy mediate the relationship between the percentage of women on the team and overall team performance. Interestingly, the number of men on the team did not influence men’s sense of team identification, collective efficacy, or team performance. This research has implications for team composition. Specifically, when navigating diversity in teams, managers and leaders should aim to build teams that are composed of multiple women versus an approach that divides women up among various teams. In doing so, managers can better secure conditions for the development of positive teamwork experiences and, ultimately, performance.
... Cohen, Doveh, & Nahum-Shani, 2009). To date, in drawing inferences concerning whether agreement is different from "no agreement," researchers have largely focused on single items and scales for a single group (Dunlap, Burke, & Smith-Crowe, 2003;Smith-Crowe, Burke, Cohen, & Doveh, 2014) or compared the homogeneity for two or more independent groups (Cohen, Doveh, & Eick, 2001;Pasisz & Hurtz, 2009). Limited research attention has been devoted to drawing inferences based on the mean or median of a sample of groups. ...
... Applying a rule-of-thumb, researchers consider values of .70 and greater as indicating agreement. Increasingly, guidelines have emerged for assessing the statistical significance of a single r WG value (either for an item or a scale) given a variety of null distributions (Cohen et al., 2009;Smith-Crowe et al., 2014). The average deviation (AD; Burke, Finkelstein, & Dusig, 1999) is an alternative interrater agreement statistic that is increasingly used. ...
... In previous research on agreement indices, data were simulated for single groups (Smith-Crowe et al., 2014), not for individuals nested in groups. We generalized earlier procedures in Smith-Crowe et al. (2014) by developing an underlying model upon which we based our data generation method. ...
Article
In Study 1 of this two-part investigation, we present a “central tendency approach” and procedures for assessing overall interrater agreement across multiple groups. We define parameters for mean group agreement and construct bootstrapped confidence intervals around the mean population parameters for r<sub>WG</sub>, AD, and ICC(1). In Study 2, we extend assessments of overall interrater agreement by developing a “matched difference approach” and procedures for assessing real versus pseudo agreement in a sample of groups. Here, we use random group resampling and the matched difference between assessments of the respective r<sub>WG</sub>, AD, and ICC(1) values for actual and pseudo groups, with the establishment of bootstrapped confidence intervals around such differences. In both studies, we employ simulated and real data to demonstrate the accuracy and practical utility of the new procedures for assessing agreement with respect to groups. Notably, to generate simulated data for Studies 1 and 2, we developed a new underlying model for multilevel data and procedure for data generation, and we discuss its potential utility for enhancing research in group-level studies. Moreover, we discuss, relative to current practices, how and why the new inference procedures provide information about mean interrater agreement in the population, which can improve data aggregation decisions and interpretations of findings from group-level studies.
... Additionally, criteria may include the use of other cutoffs for agreement or cutoffs moored to statistical significance testing (cf. Bliese & Halverson, 2002;Burke et al., 2017;Cohen et al., 2001;Dunlap et al., 2003;LeBreton & Senter, 2008;Smith-Crowe et al., 2014;Smith-Crowe et al., 2012;Woehr et al., 2015). Most importantly, researchers must clearly articulate the criteria used to guide decisions about data aggregation. ...
... They were statistically significant (p ≤ .05) compared to the corresponding critical r WG values for the 5-and 10-item scales (e.g., Smith-Crowe et al., 2014). Furthermore, the intraclass correlation coefficient-1 (ICC1) was .29 and the ICC2 was .64. ...
Article
Full-text available
The multilevel paradigm is omnipresent in the organizational sciences, with scholars recognizing data are almost always nested – either hierarchically (e.g., individuals within teams) or temporally (e.g., repeated observations within individuals). The multilevel paradigm is moored in the assumption that relationships between constructs often reside across different levels, often requiring data from a lower-level (e.g., employee-level justice perceptions) to be aggregated to a higher-level (e.g., team-level justice climate). Given the increased scrutiny in the social sciences around issues of clarity, transparency, and reproducibility, this paper first introduces a set of data aggregation principles that are then used to guide a brief literature review. We found that reporting practices related to data aggregation are quite variable with little standardization as to what information and statistics are included by authors. We conclude our paper with a Data Aggregation Checklist and a new R package, WGA (Within-Group Agreement & Aggregation), intended to improve the clarity and transparency of future multilevel studies.
... It should be noted that there is considerable debate on the interpretation of the size of r wg scores as well as the appropriateness of the kind of null distribution to use. For example, some scholars have suggested that 0.70 should be taken as a threshold for justifying aggregation (e.g., Lance et al. 2006), although Smith-Crowe et al. (2014) show that in case that the number of judges increases, the threshold for justifying aggregation can and should decrease to a considerable extent. As an alternative, testing of the statistical significance of r wg by means of Monte Carlo simulations has been suggested (Cohen et al. 2009). ...
... As an alternative, testing of the statistical significance of r wg by means of Monte Carlo simulations has been suggested (Cohen et al. 2009). However, as seen in Smith-Crowe et al. (2014), if the number of judges in the groups exceeds 100 (as is clearly the case in our samples), a test of the significance of r wg becomes increasingly less informative. Finally, a rectangular distribution might not seem to be the most appropriate assumption because, per definition, in the case of a strong social norm, the distribution of answers within a reference group should be skewed. ...
Article
Full-text available
Employment relationships are embedded in a network of social norms that provide an implicit framework for desired behaviour, especially if contractual solutions are weak. The COVID-19 pandemic has brought about major changes that have led to situations, such as the scope of short-time work or home-based work in a firm. Against this backdrop, our study addresses three questions: first, are there social norms dealing with these changes; second, are there differences in attitudes between employees and supervisors (misalignment); and third, are there differences between respondents’ average attitudes and the attitudes expected to exist in the population (pluralistic ignorance). We find that for the assignment of short-time work and of work at home, there are shared normative attitudes with only small differences between supervisors and nonsupervisors. Moreover, there is evidence for pluralistic ignorance; asked for the perceived opinion of others, respondents over- or underestimated the consensus in the (survey) population. Such pluralistic ignorance can contribute to the upholding of a norm even if individuals do not support the norm, with potentially far-reaching consequences for the quality of the employment relationship and the functioning of the organization. Our results show that, especially in times of change, social norms should be considered for the analysis of labour markets.
... AD does not require the specification of null distribution and estimates inter-rater disagreement in the units of the original scale (Pina, Hren and Marusic 2015). There has been long discussion about which one of the available indices (AD, intraclass correlation-ICC or within group agreement index-r wg ) used for similar cases (James, Demaree and Wolf 1993;Mutz, Bornmann and Daniel 2012;Smith-Crowe et al. 2014), performs better and describes more accurately inter-rater agreement. According to simulation research (Smith-Crowe, Burke and Kouchaki 2013;Smith-Crowe et al. 2014), AD index has been proved to perform better (Kline and Hambley 2007;Roberson, Sturman and Simons 2007). ...
... There has been long discussion about which one of the available indices (AD, intraclass correlation-ICC or within group agreement index-r wg ) used for similar cases (James, Demaree and Wolf 1993;Mutz, Bornmann and Daniel 2012;Smith-Crowe et al. 2014), performs better and describes more accurately inter-rater agreement. According to simulation research (Smith-Crowe, Burke and Kouchaki 2013;Smith-Crowe et al. 2014), AD index has been proved to perform better (Kline and Hambley 2007;Roberson, Sturman and Simons 2007). Additionally, the fact that ICC measures both agreement and reliability simultaneously (LeBreton and Senter 2008) might potentially complicate inferences. ...
Article
Full-text available
In this study, we analyze the two-phase bottom-up procedure applied by the Future and Emerging Technologies Program (FET-Open) at the Research Executive Agency (REA) of the European Commission (EC), for the evaluation of highly interdisciplinary, multi-beneficiary research proposals which request funding. In the first phase, remote experts assess the proposals and draft comments addressing the pre-defined (by FET-Open) evaluation criteria. In the second phase, a new set of additional experts (of more general expertise and different from the remote ones), after cross reading the proposals and their remote evaluation reports, they convene in an on-site panel where they discuss the proposals. They complete the evaluation by reinforcing per proposal and per criterion one or another assessment, as assigned remotely during the first phase. We analyze the level of the inter-rater agreement among the remote experts and we identify its relative correlation with the funded proposals resulted after the end of the evaluation. Our study also provides comparative figures of the evolution of the proposalsscores during the two phases of the evaluation process. Finally, by carrying out an appropriate quantitative and qualitative analysis of all scores from the seven past cutoffs , we elaborate on the significant contribution of the panel (the second phase of the evaluation) in identifying and promoting the best proposals for funding.
... Following recent recommendations (e.g., Biemann et al., 2012;James et al., 1984;LeBreton & Senter, 2008;Rego et al., 2013;Smith-Crowe et al., 2014), we calculated r wg(j) values using both the uniform distribution and a slightly skewed distribution, which are based on theoretically justifiable null distributions. Indeed, we considered team leaders' ethical leadership, TFL, ethical climate, and team moral efficacy to be liable to a slightly skewed distribution, because responses to these scales may be affected by a positive leniency bias (Biemann et al., 2012;K.-Y. ...
... Indeed, we considered team leaders' ethical leadership, TFL, ethical climate, and team moral efficacy to be liable to a slightly skewed distribution, because responses to these scales may be affected by a positive leniency bias (Biemann et al., 2012;K.-Y. Ng et al., 2011;Smith-Crowe et al., 2014). The estimated mean r wg(j) values indicated strong within-team agreement: .95 ...
... Following recent recommendations (e.g., Biemann et al., 2012;James et al., 1984;LeBreton & Senter, 2008;Rego et al., 2013;Smith-Crowe et al., 2014), we calculated r wg(j) values using both the uniform distribution and a slightly skewed distribution, which are based on theoretically justifiable null distributions. Indeed, we considered team leaders' ethical leadership, TFL, ethical climate, and team moral efficacy to be liable to a slightly skewed distribution, because responses to these scales may be affected by a positive leniency bias (Biemann et al., 2012;K.-Y. ...
... Indeed, we considered team leaders' ethical leadership, TFL, ethical climate, and team moral efficacy to be liable to a slightly skewed distribution, because responses to these scales may be affected by a positive leniency bias (Biemann et al., 2012;K.-Y. Ng et al., 2011;Smith-Crowe et al., 2014). The estimated mean r wg(j) values indicated strong within-team agreement: .95 ...
... Following recent recommendations (e.g., Biemann et al., 2012;James et al., 1984;LeBreton & Senter, 2008;Rego et al., 2013;Smith-Crowe et al., 2014), we calculated r wg(j) values using both the uniform distribution and a slightly skewed distribution, which are based on theoretically justifiable null distributions. Indeed, we considered team leaders' ethical leadership, TFL, ethical climate, and team moral efficacy to be liable to a slightly skewed distribution, because responses to these scales may be affected by a positive leniency bias (Biemann et al., 2012;K.-Y. ...
... Indeed, we considered team leaders' ethical leadership, TFL, ethical climate, and team moral efficacy to be liable to a slightly skewed distribution, because responses to these scales may be affected by a positive leniency bias (Biemann et al., 2012;K.-Y. Ng et al., 2011;Smith-Crowe et al., 2014). The estimated mean r wg(j) values indicated strong within-team agreement: .95 ...
Article
In recent years, unethical conduct (e.g., Enron, Lehman Brothers, Oxfam, Volkswagen) has become an important issue in management; relatedly, there is growing interest regarding the nature and implications of ethical leadership. Drawing from social learning theory, we posited that ethical leadership would positively relate to team ethical voice and organizational citizenship behavior (OCB) through team moral efficacy. Furthermore, building on social information processing theory and the social intuitionist model, we expected these effects to be accentuated in teams with a strong ethical climate. Using survey data from subordinates and leaders pertaining to 150 teams from the Republic of Korea Army, ethical leadership was found to indirectly relate to increased team ethical voice and OCB directed at individuals and the organization through team moral efficacy. These relationships tended to be amplified among teams with a strong ethical climate. In addition, these findings persisted while controlling for transformational leadership, thereby highlighting the incremental value of ethical leadership for team outcomes. Theoretical and practical implications are discussed.
... It is a good indicator to assess whether an instrument measures one circumscribable concept and thus item aggregation of total scores is valid. Smith-Crowe et al [31] give significance levels for r WG(j) for questionnaires up to ten items. For instruments with ten items, 100 respondents and seven categories, which is the closest to our present instrument, an r WG(j) of 0.63 is claimed to be sufficient for aggregation of a total score. ...
... Reliability within group measures are well suited to assess and compare patterns of team safety climate for small teams as about 100 respondents are considered to represent large groups [31]. ...
Article
Introduction: Safe practice and safety culture are important issues in outpatient diagnostic imaging services. As questionnaires assessing safety culture through the measurement of safety climate in this setting are not yet available, the present study aimed to develop and validate such an instrument. Materials and methods: After adaptation of an existing questionnaire and qualitative pretesting, the instrument was tested by collaborators from three outpatient imaging services in Switzerland. Results were first assessed using descriptive statistics. Scores of individual services were compared using a Wilcoxon test assessing differences between rank distributions. The final instrument was tested for validity using inter-rater agreement measures, such as reliability within groups (rWG), and an intraclass correlation coefficient measure (ICC(1)). These measures allowed the assessment of validity of aggregation into a total score (rWG(j)) and validated the instrument for its capacity to distinguish various safety climates of different groups by comparing inter-rater agreement in the overall sample to inter-rater agreement of individual services (rWG) and by measuring group effects (ICC(1)). Furthermore, the final instrument was tested for internal consistency and reliability using Cronbach's Alpha. Results: Safety climate scores vary significantly between services. Inter-rater agreement measures show that item aggregation is justified and that the instrument distinguishes various patterns of safety climate. The final instrument proves to be valid, consistent and reliable. Conclusions: The final instrument presents a valid, consistent and reliable option to measure safety climate in outpatient diagnostic imaging services. Results can be used as a basis for quality improvement. Key points: • An adapted questionnaire that assesses safety climate in outpatient diagnostic imaging services was developed and tested in Switzerland. • Psychometric evaluation showed the questionnaire to be a valid, consistent and reliable instrument. • Results are of interest for imaging services as well as for stakeholders interested more globally in monitoring and quality improvement.
... Report observed effects and those corrected for range restriction and/or measurement error (29,44,58,83,90) 4. Recognize how the method used to identify and correct for measurement error or range restriction (e.g., Pearson correlations corrected using the Spearman-Brown formula; Thorndike's Case 2) may have led to over-or under-estimated effect sizes Notes: r wg 5 Within-group inter-rater reliability; ICC 5 Intraclass Correlation Coefficient. Sources used to derive evidence-based recommendations: 3 Aguinis and Vandenberg (2014), 22 Bono and McNamara (2011), 24 Boyd, Grove, and Hitt (2005), 25 Brannick, Chan, Conway, Lance, and Spector (2010), 28 Brutus, Gill, and Duniewicz (2010), 29 Cafri, Kromrey, and Brannick (2010), 31 Casper, Eby, Bordeaux, Lockwood, and Lambert (2007), 32 Castro (2002), 36 Combs (2010), 38 , 42 Crook, Shook, Morris, and Madden (2010), 44 Edwards (2008), 45 Edwards and Bagozzi (2000), 48 Evert, Martin, McLeod, and Payne (2016), 50 Feng (2015), 58 Geyskens, Krishnan, Steenkamp, and Cunha (2009), 68 , 70 Klein and Kozlowski (2000), 71 Leavitt, Mitchell, and Peterson (2010), 72 LeBreton and Senter (2008), 73 Mathieu and Taylor (2006), 83 Scandura and Williams (2000), 88 Smith-Crowe, Burke, Cohen, and Doveh (2014), 90 Van Iddekinge and Ployhart (2008), 95 Yammarino, Dionne, Chun, and Dansereau (2005), 96 Zhang and Shaw (2012). ...
... Brooks, Dalal, and Nolan (2014),27 Brutus, Aguinis, and Wassmer (2013),28 Brutus, Gill, and Duniewicz (2010),30 Capraro and Capraro (2002),31 Casper, Eby, Bordeaux, Lockwood, and Lambert (2007), 32Castro (2002),36 Combs (2010),39 Cools, Armstrong, and Verbrigghe (2014),40 Cortina (2003),42 Crook, Shook, Morris, and Madden (2010),44 Edwards (2008),52 Finch, Cumming, and Thomason (2001),53 Firebaugh (2007),54 Freese (2007a),55 Freese (2007b),57 Gerber and Malhotra (2008),59 Gibbert and Ruigrok (2010),60 Gibbert, Ruigrok, and Wicki (2008),61 Goldfarb and King (2016),64 Hoekstra, Finch, Kiers, and Johnson (2006),65 Hoetker (2007),66 Hogan, Benjamin, and Brezinski (2000),68 Jackson, Gillaspy, and Purc-Stephenson (2009),70 Klein and Kozlowski (2000),71 Leavitt, Mitchell, and Peterson (2010), 72LeBreton and Senter (2008), 74 McDonald andHo (2002),76 Nickerson(2000),77 Nuijten, Hartgerink, Assen, Epskamp, and Wicherts (2016),82 Rogelberg, Adelman, and Askay (2009),84 Schafer and Graham (2002),86 Shen, Kiger, Davies, Rasch, Simon, and Ones (2011),87 Sijtsma (2016), 88 Smith-Crowe, Burke,Cohen, and Doveh (2014), 90 Van Iddekinge andPloyhart (2008), 91 Waldman and Lilienfeld (2016),92 Werner, Praxedes, and Kim (2007),93 Wigboldus and Dotsch (2016),94 Wright(2016),95 Yammarino,Dionne, Chun, and Dansereau (2005),96 Zhang and Shaw (2012).20January Academy of Management Annals ...
... Report observed effects and those corrected for range restriction and/or measurement error (29,44,58,83,90) 4. Recognize how the method used to identify and correct for measurement error or range restriction (e.g., Pearson correlations corrected using the Spearman-Brown formula; Thorndike's Case 2) may have led to over-or under-estimated effect sizes Notes: r wg 5 Within-group inter-rater reliability; ICC 5 Intraclass Correlation Coefficient. Sources used to derive evidence-based recommendations: 3 Aguinis and Vandenberg (2014), 22 Bono and McNamara (2011), 24 Boyd, Gove, and Hitt (2005), 25 Brannick, Chan, Conway, Lance, and Spector (2010), 28 Brutus, Gill, and Duniewicz (2010), 29 Cafri, Kromrey, and Brannick (2010), 31 Casper, Eby, Bordeaux, Lockwood, and Lambert (2007), 32 Castro (2002), 36 Combs (2010), 38 , 42 Crook, Shook, Morris, and Madden (2010), 44 Edwards (2008), 45 Edwards and Bagozzi (2000), 48 Evert, Martin, McLeod, and Payne (2016), 50 Feng (2015), 58 Geyskens, Krishnan, Steenkamp, and Cunha (2009), 68 Jackson, Gillaspy, and Purc-Stephenson (2009), 70 Klein and Kozlowski (2000), 71 Leavitt, Mitchell, and Peterson (2010), 72 LeBreton and Senter (2008), 73 Mathieu and Taylor (2006), 83 Scandura and Williams (2000), 88 Smith-Crowe, Burke, Cohen, and Doveh (2014), 90 Van Iddekinge and Ployhart (2008), 95 Yammarino, Dionne, Chun, and Dansereau (2005), 96 Zhang and Shaw (2012). ...
... Brooks, Dalal, and Nolan (2014),27 Brutus, Aguinis, and Wassmer (2013),28 Brutus, Gill, and Duniewicz (2010),30 Capraro and Capraro (2002),31 Casper, Eby, Bordeaux, Lockwood, and Lambert (2007), 32Castro (2002),36 Combs (2010),39 Cools, Armstrong, and Verbrigghe (2014),40 Cortina (2003),42 Crook, Shook, Morris, and Madden (2010),44 Edwards (2008),52 Finch, Cumming, and Thomason (2001),53 Firebaugh (2007),54 Freese (2007a),55 Freese (2007b),57 Gerber and Malhotra (2008),59 Gibbert and Ruigrok (2010),60 Gibbert, Ruigrok, and Wicki (2008),61 Goldfarb and King (2016),64 Hoekstra, Finch, Kiers, and Johnson (2006),65 Hoetker (2007),66 Hogan, Benjamin, and Brezinski (2000),67 Hubbard and Ryan (2000),68 Jackson, Gillaspy, and Purc-Stephenson (2009),70 Klein and Kozlowski (2000),71 Leavitt, Mitchell, and Peterson (2010), 72 LeBreton and Senter (2008), 74 McDonald and,76 Nickerson (2000),77 Nuijten, Hartgerink, Assen, Epskamp, and Wicherts (2016),82 Rogelberg, Adelman, and Askay (2009),84 Schafer and Graham (2002),86 Shen, Kiger, Davies, Rasch, Simon, and Ones (2011),87 Sijtsma (2016), 88 Smith-Crowe, Burke,Cohen, and Doveh (2014), 90 Van Iddekinge andPloyhart (2008),91 Waldman and Lilienfeld (2016),92 Werner, Praxedes, and Kim (2007),93 Wigboldus and Dotsch (2016),94 Wright (2016),95 Yammarino, Dionne, Chun, and Dansereau (2005),96 Zhang and Shaw (2012). ...
Article
We review the literature on evidence-based best practices on how to enhance methodological transparency, which is the degree of detail and disclosure about the specific steps, decisions, and judgment calls made during a scientific study. We conceptualize lack of transparency as a “research performance problem” because it masks fraudulent acts, serious errors, and questionable research practices, and therefore precludes inferential and results reproducibility. Our recommendations for authors provide guidance on how to increase transparency at each stage of the research process: (1) theory, (2) design, (3) measurement, (4) analysis, and (5) reporting of results. We also offer recommendations for journal editors, reviewers, and publishers on how to motivate authors to be more transparent. We group these recommendations into the following categories: (1) manuscript submission forms requiring authors to certify they have taken actions to enhance transparency, (2) manuscript evaluation forms including additional items to encourage reviewers to assess the degree of transparency, and (3) review process improvements to enhance transparency. Taken together, our recommendations provide a resource for doctoral education and training; researchers conducting empirical studies; journal editors and reviewers evaluating submissions; and journals, publishers, and professional organizations interested in enhancing the credibility and trustworthiness of research.
... Rwg-values between 0.51 and 0.70 indicate moderate interrater-agreement; values above 0.71 indicate strong agreement [24]. Statistical significance criteria were provided by Smith-Crowe et al. [26], and indicate, depending on sample size, number of response scale categories, and chosen null distribution, whether agreement is due to chance or not. For details on the exact calculation of rwg, see James et al. [23]. ...
... In a next step, we analysed subgroups with respect to interrater-agreement. Rwg-values for safety climate in each subgroup are presented in Table 2. All rwg-values were, according to Smith-Crowe et al. [26] significant, indicating that interrater-agreement was greater than chance. Also, all rwg-values indicate strong agreement within the units analysed. ...
Article
Full-text available
Safety Climate has been acknowledged as an unspecific factor influencing patient safety. However, studies rarely provide in-depth analysis of climate data. As a helpful approach, the concept of “climate strength” has been proposed. In the present study we tested the hypotheses that even if safety climate remains stable on mean-level across time, differences might be evident in strength or shape. The data of two hospitals participating in a large national quality improvement program were analysed for differences in climate profiles at two measurement occasions. We analysed differences on mean-level, differences in percent problematic response, agreement within groups, and frequency histograms in two large hospitals in Switzerland at two measurement occasions (2013 and 2015) applying the Safety Climate Survey. In total, survey responses of 1193 individuals were included in the analyses. Overall, small but significant differences on mean-level of safety climate emerged for some subgroups. Also, although agreement was strong at both time-points within groups, tendencies of divergence or consensus were present in both hospitals. Depending on subgroup and analyses chosen, differences were more or less pronounced. The present study illustrated that taking several measures into account and describing safety climate from different perspectives is necessary in order to fully understand differences and trends within groups and to develop interventions addressing the needs of different groups more precisely. © 2017 Mascherek, Schwappach. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
... In sum, interpretable r WG values range from 0 to 1; agreement is sufficient if r WG ranges between 0.60 and 0.70, moderate when r WG ranges between 0.70 and 0.80, and strong when r WG is higher than 0.80 (Brown & Hauenstein, 2005). To test the statistical significance of r WG , Smith-Crowe, Burke, Cohen, and Doveh (2014) recently published the critical values relative to different types of distributions. ...
... In particular, they presented upper limit cutoffs for different distributions, some defined by skew and others defined by kurtosis and variance, for both five-and seven-point scales. To determine whether a sample AD value could have been obtained by chance or test the null hypothesis, Burke and Dunlap (2002) presented AD's critical values at the 5% level of statistical significance for single items (see also Smith-Crowe et al., 2014). ...
Article
Full-text available
This paper addresses the potential applications of multiple informant methodology (MIM) to family research. The MIM is used when data on the same unit of analysis are collected from more than one informant, and when the researchers’ purpose requires a dyadic or group level of analysis. The MIM relies on inter-rater agreement (IRA) indices, which are needed to both estimate agreement among informants and aggregate scores from different informants. This review describes the main IRA indices: reliability within-group index (rWG), agreement within-group index (aWG), average deviation indices (ADM and ADMd), intra-class correlation coefficient (ICC), and random group index (rWG). For each index, we describe the aim, application contexts, formula(s), ranges, cut-offs, requirements to apply the index, and its possible application to family research.
... The values for the slightly skewed distribution represent the lower bound of the true agreement (Smith-Crowe, et al., 2014). The Rwg(j) calculated based on a uniform (i.e. ...
... The Rwg(j) calculated based on a uniform (i.e. rectangular) distribution represents the upper bound of the likely true agreement (Smith-Crowe, et al., 2014). ...
... As Table 1 reports, we examined interrater reliabilities (ICC1 and ICC2) and interrater agreement statistics (rWG and ADM(J)) to support the aggregation of firm-level variables (LeBreton & Senter, 2008). To formally test the aggregation, we used an F test from a one-way ANOVA for ICC1 (Bliese et al., 2018) and simulated sample-specific cutoff criteria for rWG and ADM(J) (Smith-Crowe et al., 2014). Firm membership explained 10% of the variance in promotion climate (F = 6.81, df = 76, p < .001), ...
Article
Full-text available
Prior research suggests that the organizational context supports the emergence of employee ambidexterity; however, the interplay between formal and informal context has been largely unexplored. We analyze this interplay with a multilevel, multi-source data set of 2,446 individual employees nested in 77 organizations. We find that a promotion climate – unlike a prevention climate – contributes to employee ambidexterity. In addition, formalization positively moderates the effects of both promotion and prevention climate on employee ambidexterity, while centralization weakens the positive effect of promotion climate. Our results advance a contingency perspective that brings together formal and informal contextual drivers of employee ambidexterity and shows that even though an informal climate signals the preferred manner of goal pursuit, a formal structure affects the impact of such signals by delineating opportunity corridors of admissible behaviors.
... Reliability was considered good if the intraclass correlation coefficient was 0.75 and excellent if it was 0.85. 28 Correlation between parameters was calculated using Pearson correlation. Graft signal between the 5 ROIs was compared using 1-way analysis of variance. ...
Article
Background: An increase has been seen in the number of studies of anterior cruciate ligament reconstruction (ACLR) that use magnetic resonance imaging (MRI) as an outcome measure and proxy for healing and integration of the reconstruction graft. Despite this, the MRI appearance of a steady-state graft and how long it takes to achieve such an appearance have not yet been established. Purpose: To establish whether a hamstring tendon autograft for ACLR changes in appearance on MRI scans between 1 and 2 years and whether this change affects a patient’s ability to return to sports. Study Design: Case series; Level of evidence, 4. Methods: Patients with hamstring tendon autograft ACLR underwent MRI and clinical outcome measures at 1 year and at a final follow-up of at least 2 years. MRI graft signal was measured at multiple regions of interest using oblique reconstructions both parallel and perpendicular to the graft, with lower signal indicative of better healing and expressed as the signal intensity ratio (SIR). Changes in tunnel aperture areas were also measured. Clinical outcomes were side-to-side anterior laxity and patient- reported outcome measures (PROMs). Results: A total of 42 patients were included. At 1 year, the mean SIR for the graft was 2.7 6 1.2. Graft SIR of the femoral aperture was significantly higher than that of the tibial aperture (3.4 6 1.3 vs 2.6 6 1.8, respectively; P = .022). Overall, no significant change was seen on MRI scans after 2 years; a proximal graft SIR of 1.9 provided a sensitivity of 96% to remain unchanged. However, in the 6 patients with the highest proximal graft SIR (.4) at 1 year, a significant reduction in signal was seen at final follow-up (P = .026), alongside an improvement in sporting level. A significant reduction in aperture area was also seen between 1 and 2 years (tibial, –6.3 mm2, P \ .001; femoral, –13.3 mm2, P \ .001), which was more marked in the group with proximal graft SIR .4 at 1 year and correlated with a reduction in graft signal. The patients had a high sporting level; the median Tegner activity score was 6 (range, 5-10), and a third of patients scored either 9 or 10. Overall, PROMs and knee laxity were not associated with MRI appearance. Conclusion: In the majority of patients, graft SIR on MRI did not change significantly after 1 year, and a proximal graft SIR \2 was a sensitive indicator for a stable graft signal, implying healing. Monitoring is proposed for patients who have a high signal at 1 year (proximal graft SIR .4), because a significant reduction in signal was seen in the second year, indicative of ongoing healing, alongside an improvement in sporting level. A reduction in tunnel aperture area correlated with a reduction in graft SIR, suggesting this could also be a useful measure of graft integration.
... In order to investigate whether there is support for aggregation, we performed an analysis of variance (ANOVA) on the differences between teams. We then calculated the r wg , representing the observed variance in ratings compared to the variance of a theoretical distribution representing no agreement (Smith-Crowe et al., 2014). In sample 1, the ANOVA indicated a significant difference between teams (F(148,165) = 1.34; p < .05 and strong intergroup agreement (LeBreton & Senter, 2008) on LAC scores (r wg = .77). ...
Article
Effective communication is a foundational leadership skill. Many leadership theories implicitly assume communication skills, without investigating them behaviorally. To be able to research leader communication as a building block of effective leader behavior, we propose a new concept, i.e., leader attentive communication which refers to "an open-minded, attentive demeanor while in a conversation with an employee". Instead of focusing on the content or form of the communication, we propose to study the communication skills of the leader from the viewpoint of the employee. In this article, we both validate a questionnaire and test LAC's influence on employee wellbeing in four different studies. We use information from 1,320 employees and their leaders, in 422 teams, in 3 different datasets. The result is a 10-item questionnaire with 2 dimensions consisting of general attention (towards the employee) and attention to non-verbal cues. We also find that LAC is associated with work engagement, psychological needs and Kahn's conditions for work engagement. With this questionnaire , we contribute to calls for a more behavioral, detailed view on leader communication behavior.
... In order to investigate whether there is support for aggregation, we performed an ANOVA on the differences between teams. We then calculated the rwG, or the observed variance in ratings compared to the variance of a theoretical distribution representing no agreement (Smith-Crowe, Burke, Cohen, & Doveh, 2014). In sample 1, the ANOVA indicated a significant difference between teams (F(148,165) = 1.34; p = 0.03) and intergroup agreement on LAC scores (rwG = .77). ...
Thesis
Full-text available
In a time characterized by growing uncertainty, e.g. because of the influence of the COVID-19 pandemic, effective leadership is more important than ever. In addition, employee well-being has been named one of the critical drivers of business success. In this dissertation, we therefore answer the following overarching question: Exactly how can leaders contribute to employee well-being? In order to answer this question, we execute several theoretical and empirical studies, and we also develop new ways of investigating leader (communication) behavior itself. In the first part of this dissertation, we look into the main ways in which positive leadership styles influence employee work engagement. In the first theoretical study, we argue why certain leader behaviors are shared across positive leadership styles, and we identify several theory-driven processes and pathways through which leaders can influence employee work engagement. In the second study, a moderated meta-analysis, we investigate the meta-correlation of positive leadership styles and work engagement, as well as provide an empirically-driven overview of categories of mediating and moderating mechanisms, to end up with an overarching research model. In the second part of this dissertation, we look into the role of leaders’ own well-being, for both their own leadership as well as for employee well-being. In the first study, we test a moderated mediation and find that 1) mindfulness is an antecedent of positive leadership (here: transformational leadership), 2) leaders’ psychological need satisfaction mediates the relationship between mindfulness and transformational leadership and 3) neuroticism moderates the relationship between mindfulness and relatedness need satisfaction. In the second study, with multilevel and multisource data, we investigate the trickle-down effect of leaders’ psychological need satisfaction. We find that psychological need satisfaction indeed trickles down to employees, mediated by (employee-rated) levels of LMX. We also find a direct positive association between leader competence and employee competence, as well as a negative one between leader autonomy and employee competence. In the last part of this dissertation, we look into how we can improve leader communication to increase employee well-being. In the first study we develop a new construct and validate a new 10-item questionnaire for leader attentive communication (LAC), i.e. an open-minded, attentive demeanor while in a conversation with an employee. We also find that psychological need satisfaction and Kahn’s conditions for engagement mediate the relationship between LAC and work engagement. In the second study, we devise and test a two-day training protocol to improve leader communication. Despite an interference by the pandemic in the data-collection, we find small increases in employee-rated outcomes after the training. We also find that employee-rated LAC is related to employee well-being, and that this is mediated by both psychological need satisfaction and Kahn’s conditions for engagement.
... The ICC was considered good if it was 0.75 and excellent if it was 0.85. 27 Correlation between MRI parameters and patient-related factors (age, sex, and body mass index [BMI]) and clinical outcomes (assessed using IKDC, Lysholm, and Tegner questionnaires) was calculated using Pearson correlation. Binominal logistic regression was performed with graft rupture as the dependent variable, with P values, odds ratios, and 95% CIs for odds ratio reported. ...
Article
Background: There is currently no analysis of 1-year postoperative magnetic resonance imaging (MRI) that reproducibly evaluates the graft of a hamstring autograft anterior cruciate ligament reconstruction (ACLR) and helps to identify who is at a higher risk of graft rupture upon return to pivoting sports. Purpose: To ascertain whether a novel MRI analysis of ACLR at 1 year postoperatively can be used to predict graft rupture, sporting level, and clinical outcome at a 1-year and minimum 2-year follow-up. Study Design: Case-control study; Level of evidence, 3. Methods: Graft healing and integration after hamstring autograft ACLR were evaluated using the MRI signal intensity ratio at multiple areas using oblique reconstructions both parallel and perpendicular to the graft and tunnel apertures. Clinical outcomes were assessment of side-to-side laxity and International Knee Documentation Committee (IKDC) Subjective Knee Evaluation Form, Lysholm, and Tegner activity level scores at 1 year. Repeat outcome measures and detection of graft rupture were evaluated at a minimum of 2 years. Results: A total of 250 patients (42.4% female) underwent MRI analysis at 1 year, and assessment of 211 patients between 1 year and the final follow-up (range, 24-36 months) detected 9 graft ruptures (4.3%; 5 in female patients). A significant predictor for graft rupture was a high signal parallel to the proximal intra-articular graft and perpendicular to the femoral tunnel aperture (P = .032 and P = .049, respectively), with each proximal graft signal intensity ratio (SIR) increase by 1 corresponding to a 40% increased risk of graft rupture. A cutoff SIR of 4 had a sensitivity and specificity of 66% and 77%, respectively, in the proximal graft and 88% and 60% in the femoral aperture. In all patients, graft signal adjacent to and within the tibial tunnel aperture, and in the mid intra-articular portion, was significantly lower than that for the femoral aperture (P < .001). A significant correlation was seen between the appearance of higher graft signal on MRI and those patients achieving top sporting levels by 1 year. Conclusion: ACLR graft rupture after 1 year is associated with MRI appearances of high graft signal adjacent to and within the femoral tunnel aperture. Patients with aspirations of quickly returning to a high sporting level may benefit from MRI analysis of graft signal. Graft signal was highest at the femoral tunnel aperture, adding further radiographic evidence that the rate-limiting step to graft healing occurs proximally.
... Stated differently, when individual-level scores are aggregated to the grouplevel of analysis, sufficient within-group agreement is required to justify the aggregation of individual-level scores to the group-level of analysis (see Kozlowski & Chao, 2018 ;Kozlowski & Klein, 2000 ;Manata, Miller, DeAngelis, & Paik, 2016 ). In addition to providing standard descriptive and reliability statistics in Table 3 , median ( ) values are also provided as a measure of within-group homogeneity ( James, Demaree, & Wolf, 1984 ;Lebreton, James, & Lindell, 2005 ;Smith-Crowe, Burke, Cohen, & Doveh, 2014 ). ...
Article
Recently, architecture, engineering, and construction (AEC) project teams have begun to adopt the integrated project delivery (IPD) method with increased frequency. This adoption has been made on the assumption that IPD will lead to beneficial project team dynamics and outcomes. Although novel and potentially useful, we contend that the mere implementation of the IPD method does not in and of itself guarantee successful project team dynamics and outcomes. In specific, in this manuscript we argue that commitment disparities between project team members are problematic for numerous project team dynamics and outcomes: goal alignment, communication behaviors, and decision-quality. Using data from 21 IPD project teams, we show this to be the case. Results from this investigation suggest that when members’ commitment levels vary within project teams, goal misalignment, poor communication behaviors, and reduced decision quality are expected to follow. Moreover, results suggest that this is especially likely to occur when members’ commitment levels are low as opposed to high. This work contributes substantially to our understanding of project team dynamics in the AEC industry, especially as they relate to those delivered under contractually followed IPD. Although the IPD method holds the potential to be beneficial, these results suggest that such benefits accrue only if team members’ commitment levels are both high and uniform. In the absence of such conditions, problematic team-level dynamics and outcomes are expected to ensue.
... To put it another way, better-than-average effects (Larwood & Whittaker, 1977), comparative-optimism effects (Edwards et al., 2006), positive illusions superiority bias, leniency error (Smith-Crowe et al., 2014), and sense of relative superiority may cause people to cling to overly positive beliefs about themselves, illusions of control, and beliefs that could lead to a lower rival estimation. However, some psychologists have observed that uncertain situations of importance are threatening, and if the appropriate response is immediately available, distress or guilt may occur, which is inconsistent with holding a positive belief about the self in competition. ...
Article
Full-text available
We studied the effect of two inconsistent emotions, fear and hope, in strategic decision-making during a competition. We sought to examine which emotion will be more related to whether decision makers accurately and objectively estimate their rival We developed a nuanced perspective on the effects of trait anxiety on rival estimation by integrating it with the competition shadow. Using a competition simulation and basing on data from 221 individuals across two countries, we found support for a predicted effect of trait anxiety on rival estimation. Several theoretical implications are discussed.
... Nevertheless, the country-level stereotypes did provide empirical support for some of our hypotheses thus demonstrating their relevance and power within the workplace. Lastly, the SCM variables from the ESS show significant skewness leading to skewness-adjusted interrater reliability (rWG) estimates (Smith-Crowe et al., 2014) to fall below an acceptable threshold for some country samples, causing potential concerns regarding aggregation of these SCM variables to their country levels. ...
... Providing support for this assumption requires that researchers establish substantial within-group agreement, so that individual-level scores can be aggregated to the group-level of analysis (see Kozlowski & Chao, 2018;Kozlowski & Klein, 2000;Manata et al., 2016). Consequently, in addition to providing standard descriptive statistics, R wg(j) values are also provided as a measure of within-group homogeneity (for aggregation procedures, see James, Demaree, & Wolf, 1984;LeBreton, James, & Lindell, 2005; Smith-Crowe, Burke, Cohen, & Doveh, 2014). In the main, the ( ) R wg j index was used to establish within-group agreement, given its widespread use in the social sciences (Woehr, Loignon, & Schmidt, 2015; for alternative procedures used to establish within-group agreement, however, see Burke, Cohen, Doveh, & Smith-Crowe, 2018;Kashy & Kenny, 2000;McGraw & Wong, 1996;Schmidt & Hunter, 1989;see LeBreton & Senter, 2008, for an overview). ...
Article
This study investigates the effects of team density, normative performance standards, and team accountability on team member performance. Analyses revealed that the effect of normative standards on member performance was positive, whereas the effect of density was negative. Additionally, accountability had a small, insubstantial effect on member performance. Interaction analyses further suggested that dense teams hold the ability to constrain members to either high or low performance standards. Specifically, members received the best performance evaluations when placed on dense teams with high standards of performance. Team density and team accountability also interacted to impact performance evaluations, but in the opposite direction of what was predicted. Specifically, analyses suggested that holding others accountable to performance standards was only beneficial in low-visibility situations. This research contributes to our understanding of how performance norms are communicated within organizational teams, as well as their nuanced effects on team members’ performance evaluations.
... Under the most commonly used rectangular distribution (Meyer, Mumford, Burrus, Campion, & James, 2014), the median r wg was 0.98 and the mean r wg was 0.95 (SD = 0.10). Additionally, we conducted a simulation-based statistical significance test to ensure that the senior managers exhibited greater-than-chance agreement in terms of HR evaluations (Smith-Crowe, Burke, Cohen, & Doveh, 2014). The mean and median r wg values reached statistical significance, suggesting greater-than-chance within-group agreement for senior managers' evaluations. ...
Article
Full-text available
This study adopts an identity perspective to explore the relationship between human resource (HR) practices and turnover intentions among migrant workers. Informed by HR attribution theory, we propose that the effects of HR practices will be more effective in reducing turnover among migrant workers when these workers have stronger post‐migration place identities and when they experience a sense of justice regarding their work and nonwork environments. Using a three‐way interaction model, we tested these ideas on a sample composed of 1,985 migrant workers in 141 firms in China. The results support the theoretical model.
... We estimated the r WG index of inter-rater agreement and intraclass correlation coefficient (ICC). The mean score of r WG across participants was 0.96 which is greater than the suggested cutoff of 0.86 (Smith-Crowe et al., 2014). ICC score was 0.67 which is slightly lower than the acceptable level of 0.70 (Klein et al., 2001). ...
Article
Full-text available
Purpose This study aims to examine empirically the effect of emotional intelligence of the team, as calculated by the average of all team members’ individual emotional intelligence measurements, on the cohesiveness of the team, and the effect of the perception of self-efficacy of the team members on the relationship between emotional intelligence and team cohesion. Finally, certain financial indicators were analyzed to evaluate team performance. Design/methodology/approach This study used quasi-experimental design. Participated in the experiment a total of 146 students (35 teams) who were senior business major students in the mid-sized university in the USA. In the experiment, the participants played a business simulation game over an eight-year simulated time frame. After the final round of the simulation game, the variables of emotional intelligence, self-efficacy and team cohesion were measured using the survey questionnaire and team performance and participation data were collected from the business simulation game. In the support of the quantitative data analysis, the current study also collected and analyzed qualitative data comments on other group members’ contribution to the group task. Findings Results indicated that team cohesion was highest when team members demonstrated greater emotional intelligence. Self-efficacy also had a positive influence on team cohesion. High self-efficacy was found to be an important mediator of the relationship between emotional intelligence and team cohesion. High emotional intelligence promoted the development of self-efficacy, resulting in increased team cohesion. Increased team cohesion resulted in improved team performance and participation. Research limitations/implications The current study has several limitations. First, the sample is mostly business major students in the mid-sized university in the USA. There is a limitation in generalizing the findings into other populations. Second, this study accessed information on 35 teams comprising a total of 146 students. While the number of students and teams is sufficient for a study, more data would improve the robustness of the results. Third, this study collected and analyzed cross-sectional data, so there is the possibility for the reversed causal relationship in the findings. Although the authors concluded that team cohesion had a positive impact on team performance and participation, they also found the reverse relationship from the additional analysis. Fourth, the validity of the construct for emotional intelligence has some detractors, mainly because of the subjective nature of the measurement that tends to overlap existing personality measures and the objective measurement which involves a consensual scoring method with poor reliability. Practical implications This paper implies practical strategies to manage teams and team members for enhanced team productivity. Teams are critical resources within companies. This study demonstrates that high team cohesion leads to better team performance. As team cohesion is important for team performance, the authors found that two antecedents for team cohesion are emotional intelligence and self-efficacy within team members. Therefore, it is important for managers to hire and select team members with high levels of emotional intelligence and self-efficacy. Managers can train employees to internalize increased levels of these traits. Originality/value The current study demonstrated that self-efficacy mediated emotional intelligence and team cohesion during a research project lasting one semester. There have been few studies examining the mediating effect of self-efficacy on the relationship between emotional intelligence and team cohesion. In particular, unlike many other studies that use short-term laboratory experiments, the duration of this study could provide enough time to more thoroughly develop cohesion among members. The current study collected both quantitative and qualitative data. In addition to the quantitative data analysis, the analysis of qualitative data reinforced the findings of the quantitative data analysis.
... The Rwg (j) was in the range 0.75-0.85, indicating a high level of agreement and reliability (Smith-Crowe et al., 2014). ...
Article
This study utilized social consistency and social exchange theories to propose leadership motivation and self-concept variables as possible antecedents of servant leadership (SL). This is a departure from two past studies which established the leader’s behaviour, experience and personality as antecedents of SL. The study was based on cross-sectional survey methodology, and data acquired through multi-source to minimize common method variance. Data analysis was carried out using statistical package of social sciences, and the analyses of moments of structure software. Participants were managers and their subordinates from six organizations located in Lagos, Nigeria. Key findings of the study are self-efficacy (SE) is a critical variable because of its effect on SL and other antecedents; motivation-to-serve (MTS) is an antecedent and the primary motive for enacting SL behaviour; only one dimension of motivation-to-lead (MTL), non-calculative, is an antecedent of SL; and leader-member exchange, organizational citizenship behaviour and job satisfaction are either direct or indirect outcomes of SL. The tested model explained more variance in the outcomes of SL. Managerial implications include the use of SE, MTS and MTL as selection tools for managers, assigning future leaders as mentees to identified servant leaders through formal mentoring process established by the organization.
... We first calculated r wg(j) , which is the measure of within-group agreement, for each of the nine countries. Using the guidelines of Smith-Crowe, Burke, Cohen, and Doveh (2014), the r wg(j) values for the measures of SNBI scale were significant (P < 0.05) across all nine countries. We then conducted an analysis of variance (ANOVA) test to assess within- group and between-group variances for the remaining nine countries. ...
Article
There is a substantial body of literature on behavior inappropriateness in face-to-face social settings; however, not much is known about what individuals consider inappropriate (or appropriate) on Internet-mediated social networks. Although online social networks enable the exchange of ideas between and among geographically and culturally diverse individuals, cultural differences across countries will likely affect individuals' perceived appropriateness of social network behaviors. To better understand this phenomenon, this study proposes a new construct of social network behavior inappropriateness (SNBI) and tests its relationship with a recently proposed national cultural dimension of personal-sexual attitudes, which captures country-level cultural norms.
... Such a distribution is theoretically plausible in these contexts given the possibility that respondents preferred not to rate their leaders' or coworkers' commitment to safety poorly. Median r wg [j] values across workgroups were statistically significant whether based on critical values (see Smith-Crowe, Burke, Cohen, & Doveh, 2014) for uniform distributions (r wg [j] ≥ .78) or moderately skewed distributions (r wg [j] ≥ .87). ...
Article
Although safety climate research has increased in recent years, persisting conceptual ambiguity not only raises questions about what safety climate really is—as operationalized in the literature— but also inhibits increased scientific understanding of the construct. Consequently, using climate theory and research as a conceptual basis, we inductively articulated safety climate’s general content domain by identifying seven core indicators of safety’s perceived workplace priority: leader safety commitment, safety communication, safety training, coworker safety practices, safety equipment and housekeeping, safety involvement, and safety rewards. These indicators formed the basis for a generalized safety climate measure that we designed for use across organizations, industries, and construct levels. We then conducted a multilevel construct validation of safety climate using the newly created measure in two separate studies. Results from five samples spanning multiple organizations, industries, and cultural settings revealed that the identified safety climate indicators were parsimoniously explained by an overarching safety climate factor at the individual and workgroup levels. In addition, multilevel homology tests indicated that safety climate’s associations with past safety incidents were nearly two times stronger at the workgroup level relative to the individual level, although this difference was not statistically significant. Finally, workgroup-level validity evidence demonstrated expected associations between safety climate and organization-reported pre- and postsurvey safety incidents. On the basis of this supportive evidence, we recommend that this conceptualization and measure of safety climate be adopted in research and practice to facilitate future scientific progress.
... r WG measures how the observed variance in ratings compares to the variance of a theoretical distribution representing no agreement (i.e., the null distribution). When factors such as social desirability or leniency affect the ratings ( James et al., 1984;LeBreton & Senter, 2007;Smith-Crowe et al., 2014), they can lead to a restricted range of responses ( Klein et al., 2001). In these circumstances, Smith-Crowe et al. ...
Article
Prior research has mainly focused on the interpretation of self-observer rating discrepancies while overlooking the adequateness of certain external raters to assess particular Emotional Intelligence (EI) competencies. Indeed, literature has generally assumed that external raters are equally equipped and accurate sources of feedback information. We challenge this assumption. By analyzing data from 555 business executives enrolled in an international MBA program and a total of 8,309 external raters, we compare the different raters’ perspectives on a 360o assessment of EI competencies. Specifically, we analyze the ability of six types of raters within the professional (supervisors, peers and subordinates) and personal spheres (partners, relatives and friends) to assess an individual’s EI competencies. Thus, the purpose of this article is two- fold: (1) It evaluates the quality of a behavioral instrument measuring EI competencies, by verifying the constructs’ validity and reliability and testing the overall model fit using structural equation modeling (SEM) techniques, and (2) It performs a comparative analysis of the different perspectives that various types of observers have when appraising an individual’s competencies. The main research question inquires over whether and why there are specific types of raters that are better equipped to assess specific EI competencies than others. We question whether there are observers that being more exposed to certain types of behaviors can provide better judgments of certain competencies than others. What about gender differences? Are there competencies in which women are perceived as more apt then men, and vice-versa? Findings provide some methodological insights into the multisource assessment process by identifying and explaining the differences between different sources’ ratings, and advance our understanding on how certain types of observers assess certain EI Competencies. This study draws implications for the use of multisource feedback evaluations for individual development in competency learning programs in the educational and organizational environments.
... Extending the results reported by Klein and colleagues (2001), Whitman et al.'s (2012) meta-analysis on work unit justice climate showed that the climate-effectiveness relationship was stronger when the referent of climate items was the work unit rather than the individual. Finally, statistical significance tables for r WG and the average deviation index (AD) were developed (Dunlap, Burke, & Smith-Crowe, 2003;Smith-Crowe, Burke, Cohen, & Doveh, 2014). After this impressive work, climate (and other "higher-levels") researchers could hardly base their decisions about within-unit agreement on popular rules-of-thumb. ...
Article
We review the literature on organizational climate and culture paying specific attention to articles published in the Journal of Applied Psychology (JAP) since its first volume in 1917. The article traces the history of the 2 constructs though JAP has been far more important for climate than culture research. We distinguish 4 main periods: the pre-1971 era, with pioneering work on exploring conceptualization and operationalizations of the climate construct; the 1971–1985 era, with foundational work on aggregation issues, outcome-focused climates (on safety and service) and early writings on culture; the 1986–1999 era, characterized by solidification of a focused climate approach to understanding organizational processes (justice, discrimination) and outcomes (safety, service) and the beginnings of survey approaches to culture; and the 2000–2014 era, characterized by multilevel work on climate, climate strength, demonstrated validity for a climate approach to outcomes and processes, and the relationship between leadership and climate and culture. We summarize and comment on the major theory and research achievements in each period, showing trends observed in the literature and how JAP has contributed greatly to moving research on these constructs, especially climate, forward. We also recommend directions for future research given the current state of knowledge.
... Reflection was measured using the reduced sixteen-item REMINT. Calculated r*wg (j) values (Lindell, Brandt, & Whitney, 1999) were above the required threshold (Smith-Crowe, Burke, Cohen, & Doveh, 2014), and thus responses were aggregated to the team level. ...
Article
Full-text available
A growing number of studies have investigated the role of team reflexivity, the extent to which teams reflect on and adapt their functioning. However, the way team reflexivity has been conceptualized and operationalized reveals several weaknesses, in particular the conception as a unidimensional construct. To provide greater conceptual clarity, we therefore propose a team reflexivity framework that integrates four interacting but distinct reflexive processes. In four studies, we focus on reflection as a fundamental reflexive process, and develop and validate an extended multidimensional reflection measure that captures the relevant dimensions of quality and quantity of reflection and the key transition processes of information seeking and information evaluation. Moreover, in order to delineate two common composition methods, we develop and validate a direct consensus and a referent-shift consensus version of the reflection measure. Data collected from a total of 803 students and employees in four studies revealed excellent construct validity, as well as good nomological validity (Studies 1 and 2). Furthermore, we found evidence of the criterion-related validity at the team level (Study 3) and the individual level (Study 4). Together, the results demonstrate the effectiveness of our measure, revealing consistent relations with outcome measures and diverse behavioural indicators across different contexts.
... The r wg(j) value indicated substantial agreement (mean r wg(j) ϭ 0.72; median ϭ .67) and it is statistically significant based on the critical values reported by Smith-Crowe, Burke, Cohen, and Doveh (2014). Together with the theoretical foundation, this information supports using the mean of all respondents from each group to index PDC. ...
Article
Although authoritarian leadership is viewed pejoratively in the literature, in general it is not strongly related to important follower outcomes. We argue that relationships between authoritarian leadership and individual employee outcomes are mediated by perceived insider status, yet in different ways depending on work unit power distance climate and individual role breadth self-efficacy. Results from technology company employees in China largely supported our hypothesized model. We observed negative indirect effects of authoritarian leadership on job performance, affective organizational commitment, and intention to stay among employees in units with relatively low endorsement of power distance, whereas the indirect relationships were not significant among employees in relatively high power distance units. These conditional indirect effects of authoritarian leadership on performance and intention to stay were significantly stronger among employees with relatively high role breadth self-efficacy. We discuss how the model and findings promote understanding of how, and under what circumstances, authoritarian leadership may influence followers' performance and psychological connections to their organizations. (PsycINFO Database Record
Article
This scoping review offers a comprehensive synthesis of the literature on team PsyCap. Based on a sample of 31 studies, our review indicates that (1) researchers have been somewhat inconsistent in how they operationalize team PsyCap, (2) gaps still remain in the nomological network of team PsyCap, (3) previous studies have mostly relied on time-insensitive designs, and (4) there is a thin use of theory when it comes to the explanation of the emergence of team PsyCap. In response, we highlight how issues pertaining to composition models contribute to clarify the nature of the team PsyCap construct, we propose interesting avenues for future research, and we introduce a temporal and recurring phase model of the emergence of team PsyCap.
Article
The present research explores the role of lateral exchange relationships among peer leaders (i.e., leader–leader exchange relationships; PLLX) as drivers of ethical leadership. Across two studies involving male subordinates and leaders pertaining to 150 teams (Study 1) and 158 leader–follower dyads (Study 2) in the Republic of Korea Army, PLLX was found to mediate a positive relation between peer leaders' ethical leadership and focal team leaders' ethical leadership, while controlling for upper leaders' ethical behaviours. Moreover, this relation was moderated by focal leaders' organizational tenure but in opposite directions in the two studies: the relation between peer leaders' ethical leadership and focal leaders' ethical leadership through PLLX was stronger in Study 1 (vs. weaker in Study 2) when tenure was high. La présente étude porte sur le rôle des relations d’échange latéral entre leaders pairs (c'est‐à‐dire sur les relations d’échange leader‐leader; PLLX) en tant que vecteurs du leadership éthique. Dans le cadre de deux études portant sur des subordonnés et des leaders masculins de 150 équipes (étude 1) et de 158 dyades leader‐suiveur (étude 2) de l'armée de la République de Corée, on constate que les PLLX modèrent une relation positive entre le leadership éthique des leaders pairs et le leadership éthique des leaders de l’équipe focale, tout en contrôlant les comportements éthiques des leaders supérieurs. De plus, l'ancienneté des chefs focaux a un effet modérateur sur cette relation, mais dans des directions opposées dans les deux études : la relation entre le leadership éthique des pairs et le leadership éthique des chefs focaux par le biais du PLLX est plus forte dans l’étude 1 (et plus faible dans l’étude 2) lorsque l'ancienneté est élevée.
Article
In this investigation, we tested hypotheses concerning how external validity, in relation to leadership and teamwork, was affected as participants moved from organizational to academic settings. Participants consisted of working business students (N = 159) from two countries, Peru and the United States, who adopted leader/teammate roles across settings. Results indicated that (a) transactional leadership and teamwork behavior demonstrated in organizational contexts were predictive of similar behavior in academic contexts, (b) the cultural setting of the study moderates the carry over effect of teamwork and leadership behavior from organizations to laboratories, and (c) for several leadership and teamwork behaviors, role identity and self-awareness incrementally added to the prediction of similar behaviors in academic contexts. We discuss the implications of our findings for enhancing the external validity of laboratory studies in applied psychology and for instruction of teamwork and leadership in academe.
Article
Introduction Learner-centered authentic learning opportunities in health science disciplines can be provided using cases to allow integration of theoretical knowledge across multiple subject areas and development of problem-solving skills. We have previously described the adaptation of the case difficulty cube (CDC), a model from business education, that proposes assignment of case difficulty based on three dimensions (analytical, conceptual, and presentation) in pharmacy education. Methods The CDC for use in health science disciplines (modCDC) was evaluated using 13 cases from summative undergraduate pharmacy examinations. Inter-rater agreement (IRA) and inter-rater reliability (IRR) for modCDC ratings were first determined, then a post hoc investigation of the relationship between the modCDC score and student marks was undertaken. Results First, the IRA for each dimension of the modCDC was adequate for aggregating ratings. IRR was excellent for the conceptual axis, good for the presentation axis, and poor for the analytical axis. Second, analysis of the relationship between the modCDC score and student marks indicated that there was a significant difference between student marks awarded at each level of case difficulty, except for the lower levels of difficulty. The results indicate that the modCDC is a relatively robust tool that could be used to determine case difficulty prior to cases being used in assessments. Conclusions The modCDC is a simple tool that can assist academic staff in providing consistent learning opportunities for, and assessment of, pharmacy students at an appropriate level.
Article
In this study we present a process model of team planning that distinguishes between four specific processes: exploration, strategic planning, detailed planning, and prognosis. From this model, we developed and validated a 16-item multidimensional long-form scale, a 4-item one-dimensional short-form scale, and a single-item scale. Results from three samples (total N = 536) with varying populations and settings provide support for the multidimensionality of the planning construct and the theorized structure of the scales and, also, demonstrate discriminant and convergent validity and predictive validity in terms of team performance.
Article
This paper explores the linguistic cues that distinguish conversations about work topics from conversations about non-work topics and how those differences affect conversation partners. Using an exploratory analysis of a field experiment in a large U.S. technology firm, we generate hypotheses that when the conversations topic is work, people use more words associated with achievement, which makes them seem less supportive and attentive to their conversation partners. Subsequently, conversation partners are less interested in future interactions. We then test and largely confirm our hypotheses by analyzing data from a laboratory experiment. This research illuminates one potential reason why some new connections persist while others do not and suggests how people might have interactions that endure beyond a first encounter.
Article
Tutkimuksen tarkoituksena oli kuvata julkisen terveydenhuollon organisaatiokulttuuria ja -ilmapiiriä hoitohenkilökunnan näkökulmasta. Tutkimukseen osallistui hoitohenkilökuntaa, sairaanhoitajia sekä lähi- ja perushoitajia (n=289) 11 sairaanhoitopiiristä. Organisaatioiden kulttuuria ja -ilmapiiriä tutkittiin Organisaation sosiaalinen konteksti -mittarilla. Aineiston analysointiin käytettiin T-arvoa, rWG-indeksiä, ICC sisäkorrelaatiota, Etan neliöitä, yksisuuntaista varianssianalyysiä (ANOVA), Welchin testiä sekä monitasoanalyysiä. Työntekijäryhmittäin vaihtelua ilmeni organisaatiokulttuuriosa-alueista joustamattomuuden sekä vastarinnan osalta ja organisaatioilmapiirin suhteen kaikilla osa-alueilla. Organisaatioittain näkemyksissä ilmeni eroavaisuuksia kulttuurin joustamattomuuden ja vastarinnan suhteen. Toimialueittain tarkasteltuna operatiiviset ja psykiatriset ryhmät erosivat kaikkien kulttuuri- ja ilmapiiriosa-alueiden suhteen, lukuunottamatta organisaatiokulttuurin pätevyyttä. Konservatiiviset ryhmät olivat näkemyksissään yhteneväisempiä. Iällä, sukupuolella, lisäkoulutuksella, työyksikön hoitohenkilökuntamäärällä sekä toimialueella oli yhteys organisaatiokulttuuri- ja -ilmapiiriarvioihin. Työntekijäryhmä on organisaatiota merkityksellisempi kulttuuri- ja ilmapiirinäkemyksiä selittävä tekijä hoitotyössä. Vaihtelu ryhmittäin, organisaatioittain ja toimialueittain on tärkeää tunnistaa. Organisaation sosiaalisen kontekstin ymmärtäminen voi mahdollistaa onnistuneen organisaatiokehityksen tukemisen myös hoitotyön näkökul-masta.
Article
Gossip is a behavior that has been traditionally viewed as harmful in organizations. However, a more balanced perspective has emerged in recent years that suggests gossip can have important benefits. We propose that one way to uncover potential benefits of gossip in teams is to focus on the valence (positive or negative nature) of the gossip. Drawing on expectancy theory, we propose team gossip indirectly influences team performance through social loafing because it plays a key role in shaping beliefs about effort in team contexts; effects determined by team gossip valence. We hypothesize that positive team gossip decreases social loafing, whereas negative team gossip increases it. In turn, we expect that through social loafing, positive team gossip has a positive indirect effect on team performance, whereas negative team gossip has a negative indirect effect. We test these predictions in a sample of 63 self‐managing teams. We find support for our predictions regarding positive team gossip but not regarding negative team gossip. Our findings point to the potential benefits of gossip and highlight why efforts to abolish gossip in organizations may impair team effort and performance.
Article
Full-text available
In this paper, we investigate the effects of multiple project team membership on individual and team learning. Data from 435 members of 85 project teams shows that, at the individual level, membership variety has a positive impact on individual learning. Moreover, this positive relationship is stronger for individuals with an average need for cognition, in comparison to individuals with a high or a low need for cognition. At the project team level, the simultaneous inter-organizational memberships of a project team have a positive impact on the team’s external learning. However, the simultaneous intra-organizational project team memberships negatively moderate this positive relationship. Furthermore, cross-level analyses show that individual learning has a positive impact on both internal and external team learning. Our findings are relevant for project management practice as they suggest ways in which work design can be configured to increase individual and team learning.
Article
The concept of climate strength – the extent of agreement among group members regarding climate perceptions – has evolved from a statistical criterion for aggregation to a focal management construct. We review 156 empirical team climate studies spanning the last decade, observing a widely held assumption that environmental stimuli influence climate strength. However, closer inspection suggests that this relationship is far more complex and nuanced than previously considered. This is problematic since an oversimplified view of how climate strength develops may lead to erroneous conclusions: for example, that everyone will share similar perceptions if exposed to the same stimuli. Our review: (1) distinguishes experiences from interpretations, explaining how some stimuli are experienced by all (some) yet are interpreted differently (the same); (2) distinguishes stimuli from the contexts in which they occur, explaining how contextual elements – specifically, the structural dimensions of teams – are not stimuli but rather act as a lenses through which experiences and interpretations occur; and (3) develops a more complete theory of climate strength reflecting contemporary work practices – including informal structures and teams with more fluid boundaries – by explaining how these lenses simultaneously filter multiple stimuli in either complementary or competing ways. Keywords: Climate (Organization), Teams, Sensemaking, Groups, Social Networks; Climate Strength; Stimuli; Discretionary Stimuli; Ambient Stimuli
Article
In this investigation involving 227 lone professional truck drivers from a national transportation company in a low and moderate-income country (LMIC), Colombia, a multidimensional model of drivers’ safety performance and expectations concerning how safety performance dimensions would predict hard braking were evaluated. The results supported a multidimensional conceptualization of professional truck drivers’ safety performance, with factors aligned with confirmed general safety performance factors and occupation-specific factors. Furthermore, results supported the expectation that the dimensions associated with communicating safety information and complying with laws and regulations would predict hard braking over and above less conceptually relevant safety performance dimensions such as using personal protective equipment and preparing to drive, safety climate, and region of operation. Notably, a dimension of safety performance expected to promote workplace safety, the communication of health and safety information, was associated with increased hard braking. We discuss the implications of a multidimensional conceptualization and measure of safety performance for studying workplace safety for professional truck drivers in Colombian organizations and beyond.
Preprint
We advance research on legitimacy and institutional change by theorizing a multilevel model of how an individual legitimacy belief-propriety-is affected by an exogenous crisis event and how two collective constructs-validity and consensus-influence this effect. Whereas consensus and validity can overlap in that validity may reflect consensual approval, they are not the same, given that validity may hide underlying disagreement. Disentangling validity from consensus allows us to theorize that changes in evaluators' propriety beliefs are strongest in contexts characterized by high validity and low consensus, because, in such contexts, a crisis may reveal that validity is contested. Using a regression discontinuity design based on data from 6,198 evaluators across 16 countries, we examine the effect of the 2008 financial crisis on individuals' propriety beliefs in free markets. Our results provide empirical support for our theory and thereby shed light on the foundations of legitimacy formation as a multilevel process.
Article
Despite a continued interest in team climates (i.e., team members’ shared perceptions regarding policies, practices, and procedures within their organization), scholars lament a lack of theory regarding their formation and maintenance. Multilevel theory serves as the underlying foundation for most team climate constructs: team members are exposed to similar “ambient stimuli” – including team member and supervisor interactions – allowing them to form shared perceptions. Advancing this perspective and integrating the literature on teams, we argue that shared perceptions among team members will be more or less likely depending on the degree to which teams exhibit skill differentiation, temporal stability, and supervisor prominence. We also explain how the increasing prevalence of (a) social networks within organizations, and (b) influences from employees’ lives outside of work can reduce shared perceptions within any type of team. Overall, we add to theory on team climates by highlighting the various ways in which “discretionary stimuli” uniquely experienced by individual team members can reduce the degree to which perceptions are shared. Our work is also of practical importance since positive team climates are associated with a variety of positive outcomes for both teams and individuals.
Conference Paper
Full-text available
Team size, heterogeneity, and an aggregate measure of teamwork quality predicted the effectiveness of organizational problem solving teams in generating ideas and obtaining the acceptance of management for these ideas. The results of regression analyses revealed that large teams generated more total and implemented ideas than smaller teams. In addition to more total and implemented ideas, teams with higher functional heterogeneity and teamwork quality generated more total and implemented ideas per member. Team size also moderated the effects of self-reported teamwork quality such that larger teams showed a stronger positive relation of teamwork quality with total and implemented ideas than smaller teams. Management evaluations of the teams were unrelated to size, functional heterogeneity, and teamwork quality. The findings support the treatment of team size as an important predictor of effectiveness rather than relegating it to the status of a mere control variable. Also, the results support previous observations that subjective judgments of team effectiveness are not equivalent to objective measures and that researchers should use multiple criteria of team success. Finally, rather than relying on concurrent, cross-sectional designs, research is needed that uses predictive models to assess how well team characteristics forecast effectiveness.
Article
Research on psychological safety climate has primarily focused on its salutary effects on group risk-taking behaviors. We developed a group-level dual-pathway model in which psychological safety climate also exerts a simultaneous negative effect on risk-taking behaviors by diminishing group average work motivation. In a field survey, we found that psychological safety climate was positively related to group learning behavior and voice through a reduction in group average fear of failure but negatively related to them through a reduction in group average work motivation. This dual-pathway model and its mechanisms were conceptually replicated in a laboratory experiment with group creativity as a different risk-taking behavior. In this experiment, we examined the moderating effects of group individualism/collectivism and found that psychological safety climate increased the originality and flexibility dimensions of group creativity through a reduction in group average fear of failure only in groups with a collectivistic orientation and reduced the fluency dimension of and time spent on creativity through a reduction in group average work motivation only in individualistic groups.
Article
Full-text available
Researchers assessing interrater agreement for ratings of a single target have increasingly used the rWG(j) index, but have found it can display irregular behavior. Mathematical analyses show this problem arises from the use of random response, operationalized by the variance of a uniform distribution (jgy), for the baseline of comparison. These analyses suggest that researchers should continue to use rWG(j) as a summary measure of interrater agreement, but should use maximum dissensus as a reference distribution for computing rWG(j). Although values of s2EU can be descriptively misleading, they provide an important inferential baseline. Thus, s2EU should be used in computing χ2 tests of the departure of the observed response variance from random responding. Researchers should also examine interrater agreement as a theoretical variable in its own right, investigating the causes and consequences of rater dissensus.
Article
Full-text available
The use of interrater reliability (IRR) and interrater agreement (IRA) indices has increased dramatically during the past 20 years. This popularity is, at least in part, because of the increased role of multilevel modeling techniques (e.g., hierarchical linear modeling and multilevel structural equation modeling) in organizational research. IRR and IRA indices are often used to justify aggregating lower-level data used in composition models. The purpose of the current article is to expose researchers to the various issues surrounding the use of IRR and IRA indices often used in conjunction with multilevel models. To achieve this goal, the authors adopt a question-and-answer format and provide a tutorial in the appendices illustrating how these indices may be computed using the SPSS software.
Article
Full-text available
This study examines the role of customer retention as a mediator in the service climate–firm performance chain. Using a predictive design that involves data collected from 1,500 automotive service stores from 12,518 employees and approximately 30,000 customers, a model linking service climate (a concern for employees and customers), customer satisfaction, customer retention, and firm performance was tested. Notably, the results support the overall model and the hypothesized mediating effect of customer retention regarding the relationship between customer satisfaction and firm performance. © 2011 Wiley Periodicals, Inc.
Article
Full-text available
Despite persistent concerns as to the quality of performance information obtained from multisource performance ratings (MSPRs), little research has sought ways to improve the psychometric properties of MSPRs. Borrowing from past methodologies designed to improve performance ratings, we present a new method of presenting items in MSPRs, frame‐of‐reference scales (FORS), and test the efficacy of this method in a field and lab study. The field study used confirmatory factor analysis to compare the FORS to traditional rating scales and revealed that FORS are associated with increased variance due to dimensions, decreased overlap among dimensions, and decreased error. The laboratory study compared rating accuracy associated with FORS relative to frame‐of‐reference training (FORT) and a control group and demonstrated that FORS are associated with higher levels of accuracy than the control group and similar levels of accuracy as FORT. Implications for the design and implementation of FORS are discussed.
Article
Full-text available
Currently, guidelines do not exist for applying interrater agreement indices to the vast majority of methodological and theoretical problems that organizational and applied psychology researchers encounter. For a variety of methodological problems, we present critical values for interpreting the practical significance of observed average deviation (AD) values relative to either single items or scales. For a variety of theoretical problems, we present null ranges for AD values, relative to either single items or scales, to be used for determining whether an observed distribution of responses within a group is consistent with a theoretically specified distribution of responses. Our discussion focuses on important ways to extend the usage of interrater agreement indices beyond problems relating to the aggregation of individual level data.
Article
Full-text available
Research on organizational diversity, heterogeneity, and related concepts has prolif- erated in the past decade, but few consistent findings have emerged. We argue that the construct of diversity requires closer examination. We describe three distinctive types of diversity: separation, variety, and disparity. Failure to recognize the meaning, maximum shape, and assumptions underlying each type has held back theory devel- opment and yielded ambiguous research conclusions. We present guidelines for conceptualization, measurement, and theory testing, highlighting the special case of demographic diversity
Article
Full-text available
The most popular index of agreement has been rWG(J); more recently, the ADM(J) index also has been used. This study addresses two problems: first, how to test the statistical significance of rWG(J) and ADM(J) and, second, how to infer from the indices that were evaluated for each group about the agreement of the ensemble of groups. The authors extend the inference based on either rWG(J) or ADM(J) by focusing on multiple-item scales and on the whole ensemble of groups. Their method is based on simulations, as was done by Dunlap, Burke, and Smith-Crowe (2003) and by Cohen, Doveh, and Eick (2001). The tests are illustrated on the data of Bliese, Halverson, and Schriesheim (2002) pertaining to a sample of 2,042 U.S Army soldiers in 49 U.S. Army companies. Software for our procedures is available both as a SAS code and in the Multilevel Modeling in R package (Bliese, 2006).
Article
Full-text available
The authors present guidelines for establishing a useful range for interrater agreement and a cutoff for acceptable interrater agreement when using Burke, Finkelstein, and Dusig’s average deviation (AD) index as well as critical values for tests of statistical significance with the AD index. Under the assumption that judges respond randomly to an item or set of items in a measure, the authors show that a criterion for acceptable interrater agreement or practical significance when using the AD index can be approximated as c/6, where c is the number of response options for a Likert-type item. The resulting values of 0.8, 1.2, 1.5, and 1.8 are discussed as standards for acceptable interrater agreement when using the AD index with 5-, 7-, 9-, and 11-point items, respectively. Using similar logic, the AD agreement index and interpretive standard are generalized to the case of a response scale that involves percentages or proportions, rather than discrete categories, or at the other extreme, the assessment of interrater agreement with respect to the rating of a single target on a dichotomous item (e.g., yes-no, agree-disagree, true-false item formats). Finally, the usefulness of these guidelines for judging acceptable levels of interrater agreement with respect to the metric (or units) of the original response scale is discussed.
Article
Full-text available
In this investigation, the authors report the results of two studies designed to investigate the efficacy of two proposed indices of interrater agreement based on average deviations from the mean and from the median (ADM and ADMd, respectively). Using survey response data collected from 6,549 sales employees in 119 stores of a national retail company, Study 1 compared the results of six interrater agreement indices across four types of Likert-type response scales (i.e., 5-, 6-, 7-, and 11-point scales). The results indicated that the AD indices were highly correlated with an index of proportional agreement and with within-group interrater agreement indices. Study 2, based on survey data collected from 4,158 sales employees in 109 other stores of this company, constructively replicated Study 1 and examined the consistency of interrater agreement decisions across six indices with respect to a priori decision rules. Study 2 results also supported the use of AD indices. Practical issues concerning the use of AD indices for estimating interrater agreement and future research directions are discussed.
Article
Full-text available
Using data from 68 organizations embedded within 14 nations, we examined hypotheses concerning the moderating roles of national culture and organizational climate on the transfer of training to the work context. A dimension of national culture, uncertainty avoidance, moderated the transfer of safety training with regard to reducing accidents and injuries; and organizational safety climate moderated the transfer of safety training with respect to both engaging in safe work behaviour and reducing accidents and injuries. Along with discussing the implications of a positive safety climate, we discuss how the tendency within a culture to avoid uncertainty may paradoxically lead to greater uncertainty and negative consequences in relation to the transfer of safety training.
Article
Full-text available
Researchers assessing interrater agreement for ratings of a single target have increasingly used the rWG(j) index, but have found it can display irregular behavior. Mathematical analyses show this problem arises from the use of random response, operationalized by the variance of a uniform distribution (sEU), for the baseline of comparison. These analyses suggest that researchers should continue to use rWG(j) as a summary measure of interrater agreement, but should use maximum dissensus as a reference distribution for computing rWG(j). Although values of s2 can be descriptively misleading, they provide an important inferential baseline. Thus, sEU should be used in computing x2 tests of the departure of the observed response variance from random responding. Researchers should also examine interrater agreement as a theoretical variable in its own right, investigating the causes and consequences of rater dissensus.
Article
Full-text available
This article presents a clarification of Burke and Dunlap’s (2002) instructions for interpreting the statistical significance of observed average deviation (AD) interrater agreement values. An explanation is offered for how to use Burke and Dunlap’s critical AD values to determine statistical significance of observed AD values. In addition, Burke and Dunlap’s cutoffs for practical significance of observed AD values are discussed.
Article
Full-text available
The use of interrater reliability (IRR) and interrater agreement (IRA) indices has increased dramatically during the past 20 years. This popularity is, at least in part, because of the increased role of multilevel modeling techniques (e.g., hierarchical linear modeling and multilevel structural equation modeling) in organizational research. IRR and IRA indices are often used to justify aggregating lower-level data used in composition models. The purpose of the current article is to expose researchers to the various issues surrounding the use of IRR and IRA indices often used in conjunction with multilevel models. To achieve this goal, the authors adopt a question-and-answer format and provide a tutorial in the appendices illustrating how these indices may be computed using the SPSS software.
Article
Full-text available
In recent years, researchers have paid increasing attention to the idea of “climate strength”—the level of agreement about climate within a work group or organization. However, at present the literature is unclear about the extent to which climate strength is a positive attribute, and is concerned predominantly with small teams or organizational units. This article considers three theoretical perspectives of climate strength, and extends these to the organizational level. These three roles of climate strength were tested in 56 hospitals in the United Kingdom. Positive relationships were discovered between two of three climate dimensions (Quality and Integration) and expert ratings of organizational performance, and a curvilinear effect between Integration climate strength and performance was also found. Very high or very low Integration climate strength was less beneficial than a moderate level of climate strength. However, there were no interaction effects discovered between climate and climate strength. Implications for future climate strength research are discussed.
Article
Full-text available
Presents methods for assessing agreement among the judgments made by a single group of judges on a single variable in regard to a single target. For example, the group of judges could be editorial consultants, members of an assessment center, or members of a team. The single target could be a manuscript, a lower level manager, or a team. The variable on which the target is judged could be overall publishability in the case of the manuscript, managerial potential for the lower level manager, or a team cooperativeness for the team. The methods presented are based on new procedures for estimating interrater reliability. For such situations, these procedures furnish more accurate and interpretable estimates of agreement than estimates provided by procedures commonly used to estimate agreement, consistency, or interrater reliability. The proposed methods include processes for controlling for the spurious influences of response biases (e.g., positive leniency and social desirability) on estimates of interrater reliability. (49 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
In a field experiment involving 83 computer-processing employees of a large service organization, a leadership intervention based on the leader–member exchange (LMX) model was tested against a control condition. It was hypothesized that Ss having initially low LMX would respond more positively (after adjusting for regression effects) to the leadership intervention than those having higher quality relationships. Dependent measures included scores on the Job Diagnostic Survey and Role Orientation Index and work productivity. Analysis of interaction effects indicated that comparing the leadership intervention condition to the control condition, the initially low-LMX group showed significant gains in productivity, job satisfaction, and supervisor satisfaction compared to the initially high-LMX group. The initially low-LMX group also perceived significantly higher gains in member availability and support from their supervisors than the initially high-LMX group. The initial quality of LMX appears to moderate the leadership intervention effect in the hypothesized direction. (21 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Using training history data and supervisory ratings of 133 hazardous waste workers' safety performance collected within two organizations in the U.S. nuclear waste industry, this study examined organizational climate for the transfer of safety training as a moderator of relationships between safety knowledge and safety performance. The trend in the results was consistent with the hypothesis that these relationships would be stronger in the less restrictive (more supportive) organizational climate. The implications of these findings for promoting safe work behaviors through the creation of a positive and strategically focused organizational climate for the transfer of safety training are discussed. Copyright © 2003 John Wiley & Sons, Ltd.
Article
Full-text available
Moral and financial scandals emerging in recent years around the world have created the momentum for reconsidering the role of virtuousness in organizational settings. This empirical study seeks to contribute toward maintaining this momentum. We answer to researchers’ suggestions that the exploratory study carried out by Cameron etal. (Am Behav Sci 47(6):766–790, 2004), which related organizational virtuousness (OV) and performance, must be pursued employing their measure of OV in other contexts and in relation to other outcomes (Wright and Goodstein, J Manage 33(6):928–958, 2007). Two hundred and sixteen employees reported their perceptions of OV and their affective well-being (AWB) at work (one of the main indicators of employees’ happiness), their supervisors reporting their organizational citizenship behaviors (OCB). The main finding is that the perceptions of OV predict some OCB both directly and through the mediating role of AWB. The evidence suggests that OV is worthy of a higher status in the business and organizational psychology literatures. Key wordsorganizational virtuousness-psychological climate-affective well-being-happiness-organizational citizenship behaviors
Article
Recently, F.J. Yammarino and S.E. Markham (1992) summarized the basic within and between analysis (WABA) approach, applied it to data previously collected and reported on by J.M. George (1990), and used this application to critique George's findings. After briefly reviewing the theoretical underpinnings of George, the authors discuss several points of confusion in Yammarino and Markham's article in regard to the determination of the appropriateness of aggregation and the existence of relations between variables at the group level of analysis. The authors then discuss several shortcomings in Yammarino and Markham's description of WABA, including the failure to consider Person × Situation interactions in the basic WABA equation and misinterpretations of comparisons of the within component to the between component in WABA. The need to properly treat levels-of-analysis issues from both a theoretical and a statistical perspective is emphasized.
Article
For continuous constructs, the most frequently used index of interrater agreement (r wg(1))can be problematic. Typically, rwg(1) is estimated with the assumption that a uniform distribution represents no agreement. The authors review the limitations of this uniform nullr wg(1) index and discuss alternative methods for measuring interrater agreement. A new interrater agreement statistic,a wg(1),is proposed. The authors derive thea wg(1)statistic and demonstrate thatawg(1) is an analogue to Cohen’s kappa, an interrater agreement index for nominal data. A comparison is made between agreement estimates based on the uniformr wg(1)and a wg(1), and issues such as minimum sample size and practical significance levels are discussed. The authors close with recommendations regarding the use ofr wg(1)/rwg(J) when a uniform null is assumed,r wg(1)/rwg(J) indices that do not assume a uniform null,awg(1) / a wg(J)indices, and generalizability estimates of interrater agreement.
Article
The hypothesis was examined that the negative skew found in most distributions of performance rat ings is a function of the verbal labels used as anchors. When verbal labels quantified on the basis of the range of real-life performance were employed, distri butional parameters (means, skewness) were affected. Typically used sets of labels were shown to be more negative than believed, thus tending to force responses toward the high end of the scale and thereby contrib uting to negative skew.
Article
Formulae and graphs are presented that allow one to compute the variances of three prototypical distributions over a finite number of categories. The distributions are (1) the maximum variance distribution, (2) the uniform distribution, and (3) a unimodal triangular distribution. The use of the variances of these prototypical distributions to make inferences about distribution shapes is illustrated with several examples.
Article
Within the context of climate strength, this simulation study examines the validity of various dispersion indexes for detecting meaningful relationships between variability in group member perceptions and outcome variables. We used the simulation to model both individual-and group-level phenomena, vary appropriate population characteristics, and test the proclivity of standard and average deviation, interrater agreement indexes (rwg, r*wg, awg), and coefficient of variation (both normed and unnormed) for Type I and Type II errors. The results show that the coefficient of variation was less likely to detect interaction effects although it outperformed other measures when detecting level effects. Standard deviation was shown to be inferior to other indexes when no level effect is present although it may be an effective measure of dispersion when modeling strength or interaction effects. The implications for future research, in which dispersion is a critical component of the theoretical model, are discussed.
Article
There is no evidence to support the belief that training raters to change rating distributions will increase accuracy or validity. Such training may merely promote a temporary and situation-specific response set. We call for a new emphasis in rater training programs on: (1) diary-keeping procedures to increase observational skills; (2) the establishment of a common rater frame of reference to enhance agreement on what constitutes effective job performance; and (3) mastery-based training to increase rater self-efficacy regarding negative appraisal situations.
Article
L. R. James, R. G. Demaree, and G. Wolf (1984) introduced r <sub>wg( J )</sub> to estimate interrater agreement for a group. This index is calculated by comparing an observed group variance with an expected random variance. As researchers have gained experience using this index, several questions have arisen. What are the consequences of replacing values beyond the unit interval by 0? What is the dependence of r <sub>wg( J )</sub> on the group size? The authors' simulations show that a positive bias is caused by the truncation, but for large population values of r <sub>wg( J )</sub> it is negligible. Also, in this case, the group size has no effect on the expected value of r <sub>wg( J )</sub>. For inference on r <sub>wg( J )</sub>, researchers can exploit the availability of computers to simulate data from the hypothesized distribution and then compare the simulation results for r <sub>wg( J )</sub> with the actual values. In addition, it is shown how the bootstrap method can be used for comparing the indices of 2 groups.
Article
This study was concerned with the effects of item presentation mode on the degree of leniency bias inherent in responses to standard field research questionnaires. Two types of modes were examined : the first with items measuring the same dimensions grouped together and the second with items measuring the same dimensions distributed randomly throughout the questionnaire. Sixty respondents completed questionnaires containing items from the Leader Behavior Description Questionnaire (Stogdill, 1963) and the Michigan Four Factor Leadership Questionnaire (Taylor and Bowers, 1972); there were thirty respondents in each of the modes (grouped and random). The random relative to the grouped mode showed substantially less leniency response bias, as assessed by both correlational and factor-analytic procedures. However, the magnitude of these effects was still considerable for the random mode. There were no notable differences in leniency effects between the two questionnaires, but some differences were obtained for specific leadership dimensions. Based upon these and the total set of findings, implications for questionnaire validity and for future research are discussed.
Article
Self-reports of behaviors and attitudes are strongly influenced by features of the research instrument, including question wording, format, and context. Recent research has addressed the underlying cognitive and communicative processes, which are systematic and increasingly well-understood. The author reviews what has been learned, focusing on issues of question comprehension, behavioral frequency reports, and the emergence of context effect in attitude measurement. The accumulating knowledge about the processes underlying self-reports promises to improve the questionnaire design and data quality. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Through a team-level analysis, the study shows how authentic leadership (AL) predicts team potency both directly and through the mediating role of team virtuousness and team affective commitment. Data about AL and team virtuousness were collected two months before data collection on team affective commitment and team potency. Fifty-one teams were selected for testing the hypotheses. The main findings are the following: (a) AL predicts team affective commitment through the mediating role of team virtuousness; (b) team virtuousness predicts team potency through the mediating role of team affective commitment; (c) AL predicts team potency through the mediating role of team virtuousness and team affective commitment. By focusing on two positive constructs (AL and team virtuousness), for which interrelations have rarely been explored, the study contributes to the Positive Organizational Scholarship movement, and suggests that AL and virtuousness are good in themselves and also potential facilitators of team success.
Article
When and why does the experience of helping others at work spill over into positive affect at home? This paper presents a within‐person examination of the association between perceived prosocial impact at work and positive affect at home, as well as the psychological mechanisms that mediate this relationship. Sixty‐eight firefighters and rescue workers completed electronic diaries twice a day over the course of 1 working week. Random‐coefficient modeling showed that perceived prosocial impact predicted positive affect at bedtime. This relationship was mediated by perceived competence at the end of the working day and positive work reflection during after‐work hours but not by positive affect at the end of the working day. The findings demonstrate that the experience of helping others at work has delayed emotional benefits at home that appear to be channeled through the cognitive mechanisms of perceived competence and reflection rather than through an immediate affective boost.
Article
Recently, climate strength, which is an index of the level of agreement within a group around perceptions of climate, has been shown to have a moderating effect on the relationship between climate level and organizational outcomes. This study makes a contribution to the emerging body of climate strength research as it attempted to constructively replicate the findings of Schneider, Salvaggio, and Subirats (200229. Schneider , B. , Salvaggio , A. M. and Subirats , M. 2002 . Climate strength: A new direction for climate research . Journal of Applied Psychology , 87 : 220 – 229 . [CrossRef], [PubMed], [Web of Science ®], [CSA]View all references) with a sample of 756 employees from 129 stores within the automotive services industry, and expanded the organizational outcome variables to include employee turnover and profitability. Support was not found for the hypothesis that climate strength would moderate the relationship between climate levels and organizational outcomes. However, significant main effects were found for some of the climate strength – outcome relationships. Implications for practice and future research are discussed.
Article
Recently, F. J. Yammarino and S. E. Markham (see record 1992-29454-001) summarized the basic within and between analysis (WABA) approach, applied it to data previously collected and reported on by J. M. George (see record 1990-18547-001), and used this application to critique George's findings. After briefly reviewing the theoretical underpinnings of George, the authors discuss several points of confusion in Yammarino and Markham's article in regard to the determination of the appropriateness of aggregation and the existence of relations between variables at the group level of analysis. The authors then discuss several shortcomings in Yammarino and Markham's description of WABA, including the failure to consider Person × Situation interactions in the basic WABA equation and misinterpretations of comparisons of the within component to the between component in WABA. The need to properly treat levels-of-analysis issues from both a theoretical and a statistical perspective is emphasized. (PsycINFO Database Record (c) 2012 APA, all rights reserved)