Philip L Roth

Clemson University, Clemson, SC, USA

Are you Philip L Roth?

Claim your profile

Publications (32)78.78 Total impact

  • Article: The critical role of the research question, inclusion criteria, and transparency in meta-analyses of integrity test research: a reply to Harris et al. (2012) and Ones, Viswesvaran, and Schmidt (2012).
    [show abstract] [hide abstract]
    ABSTRACT: We clear up a number of misconceptions from the critiques of our meta-analysis (Van Iddekinge, Roth, Raymark, & Odle-Dusseau, 2012). We reiterate that our research question focused on the criterion-related validity of integrity tests for predicting individual work behavior and that our inclusion criteria flowed from this question. We also reviewed the primary studies we could access from Ones, Viswesvaran, and Schmidt's (1993) meta-analysis of integrity tests and found that only about 30% of the studies met our inclusion criteria. Further, analyses of some of the types of studies we had to exclude revealed potentially inflated validity estimates (e.g., corrected validities as high as .80 for polygraph studies). We also discuss our experience trying to obtain primary studies and other information from authors of Harris et al. (2012) and Ones, Viswesvaran, and Schmidt (2012). In addition, we address concerns raised about certain decisions we made and values we used, and we demonstrate how such concerns would have little or no effect on our results or conclusions. Finally, we discuss some other misconceptions about our meta-analysis, as well as some divergent views about the integrity test literature in general. Overall, we stand by our research question, methods, and results, which suggest that the validity of integrity tests for criteria such as job performance and counterproductive work behavior is weaker than the authors of the critiques appear to believe.
    Journal of Applied Psychology 05/2012; 97(3):543-9; discussion 531-6, 537-42. · 4.31 Impact Factor
  • Dataset: TOWARD BETTER META-ANALYTIC MATRICES: HOW INPUT VALUES CAN AFFECT RESEARCH CONCLUSIONS IN HUMAN RESOURCE MANAGEMENT SIMULATIONS
    [show abstract] [hide abstract]
    ABSTRACT: Simulations and analyses based on meta-analytic matrices are fairly common in human resource management and organizational behav-ior research, particularly in staffing research. Unfortunately, the meta-analytic values estimates for validity and group differences (i.e., ρ and δ, respectively) used in such matrices often vary in the extent to which they are affected by artifacts and how accurately the values capture the underlying constructs and the appropriate population. We investigate how such concerns might influence conclusions concerning key issues such as prediction of job performance and adverse impact of selection procedures, as well as noting wider applications of these issues. We also start the process of building a better matrix upon which to base many such simulations and analyses in staffing research. Finally, we offer guidelines to help researchers/practitioners better model human resources processes, and we suggest ways that researchers in a variety of areas can better assemble meta-analytic matrices. Some of the most central issues in staffing and human resource man-agement are the validity of selection systems (e.g., Hunter & Hunter, 1984), the adverse impact against protected groups that can result from those systems (e.g., Aguinis & Smith, 2007; McKay, 2010; McKay & McDaniel, 2006; Reilly & Warech, 1993; Schmitt & Quinn, 2010), and Correspondence and requests for reprints should be addressed to Philip L.
  • Article: Are you interested? A meta-analysis of relations between vocational interests and employee performance and turnover.
    [show abstract] [hide abstract]
    ABSTRACT: A common belief among researchers is that vocational interests have limited value for personnel selection. However, no comprehensive quantitative summaries of interests validity research have been conducted to substantiate claims for or against the use of interests. To help address this gap, we conducted a meta-analysis of relations between interests and employee performance and turnover using data from 74 studies and 141 independent samples. Overall validity estimates (corrected for measurement error in the criterion but not for range restriction) for single interest scales were .14 for job performance, .26 for training performance, -.19 for turnover intentions, and -.15 for actual turnover. Several factors appeared to moderate interest-criterion relations. For example, validity estimates were larger when interests were theoretically relevant to the work performed in the target job. The type of interest scale also moderated validity, such that corrected validities were larger for scales designed to assess interests relevant to a particular job or vocation (e.g., .23 for job performance) than for scales designed to assess a single, job-relevant realistic, investigative, artistic, social, enterprising, or conventional (i.e., RIASEC) interest (.10) or a basic interest (.11). Finally, validity estimates were largest when studies used multiple interests for prediction, either by using a single job or vocation focused scale (which tend to tap multiple interests) or by using a regression-weighted composite of several RIASEC or basic interest scales. Overall, the results suggest that vocational interests may hold more promise for predicting employee performance and turnover than researchers may have thought.
    Journal of Applied Psychology 07/2011; 96(6):1167-94. · 4.31 Impact Factor
  • Article: The criterion-related validity of integrity tests: an updated meta-analysis.
    [show abstract] [hide abstract]
    ABSTRACT: Integrity tests have become a prominent predictor within the selection literature over the past few decades. However, some researchers have expressed concerns about the criterion-related validity evidence for such tests because of a perceived lack of methodological rigor within this literature, as well as a heavy reliance on unpublished data from test publishers. In response to these concerns, we meta-analyzed 104 studies (representing 134 independent samples), which were authored by a similar proportion of test publishers and non-publishers, whose conduct was consistent with professional standards for test validation, and whose results were relevant to the validity of integrity-specific scales for predicting individual work behavior. Overall mean observed validity estimates and validity estimates corrected for unreliability in the criterion (respectively) were .12 and .15 for job performance, .13 and .16 for training performance, .26 and .32 for counterproductive work behavior, and .07 and .09 for turnover. Although data on restriction of range were sparse, illustrative corrections for indirect range restriction did increase validities slightly (e.g., from .15 to .18 for job performance). Several variables appeared to moderate relations between integrity tests and the criteria. For example, corrected validities for job performance criteria were larger when based on studies authored by integrity test publishers (.27) than when based on studies from non-publishers (.12). In addition, corrected validities for counterproductive work behavior criteria were larger when based on self-reports (.42) than when based on other-reports (.11) or employee records (.15).
    Journal of Applied Psychology 02/2011; 97(3):499-530. · 4.31 Impact Factor
  • Article: Updating the trainability tests literature on Black-White subgroup differences and reconsidering criterion-related validity.
    Philip L Roth, Maury A Buster, Philip Bobko
    [show abstract] [hide abstract]
    ABSTRACT: A number of applied psychologists have suggested that trainability test Black–White ethnic group differences are low or relatively low (e.g., Siegel & Bergman, 1975), though data are scarce. Likewise, there are relatively few estimates of criterion-related validity for trainability tests predicting job performance (cf. Robertson & Downs, 1989). We review and clarify the existing (and limited) literature on Black–White group differences on trainability tests, provide new trainability test data from a recent video-based trainability exam, and present archival data about how trainability test scores relate to cognitive ability, Black–White differences, and job performance. Consistent with hypotheses, our results suggest large correlations of trainability tests with cognitive ability (e.g., .80) and larger standardized ethnic group differences than previously thought (ds of 0.86, 1.10, and 1.21 for 3 samples). Results also suggest that trainability tests have higher validity than previously thought. Overall, our analysis provides a substantial amount of data to update our understanding of the use of trainability tests in personnel selection.
    Journal of Applied Psychology 10/2010; 96(1):34-45. · 4.31 Impact Factor
  • Source
    Article: Some Comments on Pareto Thinking, Test Validity, and Adverse Impact: When and is Optimal and or is a Trade-Off
    Denise Potosky, Philip Bobko, Philip L. Roth
    [show abstract] [hide abstract]
    ABSTRACT: De Corte, Lievens, and Sackett add to the literature on selection test validity and adverse impact (AI). Their Pareto-based weighting scheme essentially asks organizations if they are willing to give up some validity to hopefully achieve some reduction in AI. We considered their approach and conclusions in relation to the regression weighting method we used, and we offer five points that reflect our observations as well as our shared goals. We hope our comments, like their work in this field, will invigorate the pursuit of new ways of examining, and one day resolving, the persistent concern regarding the AI associated with valid selection tests.
    Wiley-Blackwell: International Journal of Selection & Assessment. 08/2008;
  • Article: Ethnic and gender subgroup differences in assessment center ratings: a meta-analysis.
    Michelle A Dean, Philip L Roth, Philip Bobko
    [show abstract] [hide abstract]
    ABSTRACT: Assessment centers are widely believed to have relatively small standardized subgroup differences (d). However, no meta-analytic review to date has examined ds for assessment centers. The authors conducted a meta-analysis of available data and found an overall Black-White d of 0.52, an overall Hispanic-White d of 0.28, and an overall male-female d of -0.19. Consistent with our expectations, results suggest that Black-White ds in assessment center data may be larger than was previously thought. Hispanic-White comparisons were smaller than were Black-White comparisons. Females, on average, scored higher than did males in assessment centers. As such, assessment centers may be associated with more adverse impact against Blacks than is portrayed in the literature, but the predictor may have less adverse impact and be more "diversity friendly" for Hispanics and females.
    Journal of Applied Psychology 06/2008; 93(3):685-91. · 4.31 Impact Factor
  • Chapter: Coping with Missing Data
    Fred S. Switzer, Philip L. Roth
    01/2008: pages 310 - 323; , ISBN: 9780470756669
  • Chapter: Outliers and Influential Cases: Handling those Discordant Contaminated Maverick Rogues
    Philip L. Roth, Fred S. Switzer
    01/2008: pages 296 - 309; , ISBN: 9780470756669
  • Source
    Article: A Meta-Analysis of Achievement Motivation Differences between Entrepreneurs and Managers
    Wayne H. Stewart, Philip L. Roth
    [show abstract] [hide abstract]
    ABSTRACT: As a result of conflicting conclusions in primary studies, most narrative reviews have questioned the role of personality in explaining entrepreneurial behavior. We examine one stream of this research by conducting a meta-analysis of studies that contrast the achievement motivation of entrepreneurs and managers. The results indicate that entrepreneurs exhibit higher achievement motivation than managers and that these differences are influenced by the entrepreneur's venture goals, by the use of U.S. or foreign samples, and, to a less clear extent, by projective or objective instrumentation. Moreover, when the analysis is restricted to venture founders, the difference between entrepreneurs and managers on achievement motivation is substantially larger and the credibility intervals do not include zero.
    Kauffman Data: Panel Study on Entrepreneurial Dynamics (Topic). 08/2007;
  • Article: PRIOR SELECTION CAUSES BIASED ESTIMATES OF STANDARDIZED ETHNIC GROUP DIFFERENCES: SIMULATION AND ANALYSIS
    [show abstract] [hide abstract]
    ABSTRACT: Assessment of standardized ethnic group differences (d) on predictors of job performance has become an important issue for applied psychologists. A number of studies have used an experimental design in which the predictor of interest was administered after an initial screening predictor. We examined the influence of prior selection on a first predictor on observed ds for second predictors in multiple-hurdle selection systems. Results of a Monte Carlo simulation indicate observed dson the second predictor are underestimated in the presence of prior selection on another predictor. More important, “downward bias” in observed standardized ethnic group difference is substantial (30-70%) when selection ratios are low, standardized ethnic group differences on the screening predictor are high, and when the first and second predictors correlate above .30. Researchers should consider the influence of range restriction in designing studies of ethnic group differences and comparing ds across predictors, particularly when data are collected under a multiple-hurdle design.
    Personnel Psychology 12/2006; 54(3):591 - 617. · 2.93 Impact Factor
  • Article: ETHNIC GROUP DIFFERENCES IN COGNITIVE ABILITY IN EMPLOYMENT AND EDUCATIONAL SETTINGS: A META‐ANALYSIS
    [show abstract] [hide abstract]
    ABSTRACT: The cognitive ability levels of different ethnic groups have interested psychologists for over a century. Many narrative reviews of the empirical literature in the area focus on the Black-White differences, and the reviews conclude that the mean difference in cognitive ability (g) is approximately 1 standard deviation; that is, the generally accepted effect size is about 1.0. We conduct a meta-analytic review that suggests that the one standard deviation effect size accurately summarizes Black-White differences for college application tests (e.g., SAT) and overall analyses of tests of g for job applicants in corporate settings. However, the 1 standard deviation summary of group differences fails to capture many of the complexities in estimating ethnic group differences in employment settings. For example, our results indicate that job complexity, the use of within job versus across job study design, focus on applicant versus incumbent samples, and the exact construct of interest are important moderators of standardized group differences. In many instances, standardized group differences are less than 1 standard deviation. We conduct similar analyses for Hispanics, when possible, and note that Hispanic-White differences are somewhat less than Black-White differences.
    Personnel Psychology 12/2006; 54(2):297 - 330. · 2.93 Impact Factor
  • Article: DERIVATION AND IMPLICATIONS OF A META‐ANALYTIC MATRIX INCORPORATING COGNITIVE ABILITY, ALTERNATIVE PREDICTORS, AND JOB PERFORMANCE
    PHILIP BOBKO, PHILIP L. ROTH, DENISE POTOSKY
    [show abstract] [hide abstract]
    ABSTRACT: A variety of recent articles in the personnel selection literature have used analyses of meta-analytically derived matrices to draw general conclusions for the field. The purpose of this article is to construct a matrix that incorporates as complete information as possible on the relationships among cognitive ability measures, three sets of alternative predictors, and job performance, We build upon a starting matrix used by Schmitt, Rodgers, Chan, Sheppard, and Jennings (1997). Mean differences, by race, for each of the measures and the potential for adverse impact of predictor composites are also considered. We demonstrate that the use of alternative predictors alone to predict job performance (in the absence of cognitive ability) lowers the potential for adverse impact. However, in contrast to recent claims, adverse impact continues to occur at many commonly used selection ratios. Future researchers are encouraged to use our matrix and to expand upon it as new primary research becomes available. We also report and reaffirm many methodological lessons along the way, including the many judgment calls that appear in an effort of this magnitude and a reminder that the field could benefit from even greater conceptual care regarding what is labeled an “alternative predictor.” Directions for future meta-analyses and for future primary research activities are also derived.
    Personnel Psychology 12/2006; 52(3):561 - 589. · 2.93 Impact Factor
  • Source
    Article: Comparing the Psychometric Characteristics of Ratings of Face‐to‐Face and Videotaped Structured Interviews
    [show abstract] [hide abstract]
    ABSTRACT: Videotaped interviews are used for both research and for making selection decisions in organizations. However, little research has examined the extent to which the psychometric characteristics of ratings of videotaped interviews are comparable with those of ratings made on the basis of face-to-face (FTF) interviews. Within a simulated selection setting, we compared ratings of interviewers who conducted FTF structured interviews to ratings of interviewers who viewed videotapes of those interviews. Results revealed that FTF ratings were significantly higher than video ratings of the same interviewees. We also found that the two sets of interviewers rated the relative performance of interviewees differently. For example, the correlation between FTF and video ratings (r=.31) was significantly smaller than the correlation between ratings of interviewers who conducted FTF panel interviews with the same interviewees (r=.73). Overall results suggest that researchers and practitioners should be cautious about generalizing research findings and selection decisions made on the basis of videotaped interviews to FTF interviews.
    International Journal of Selection and Assessment 11/2006; 14(4):347 - 359. · 1.30 Impact Factor
  • Article: Modeling the behavior of the 4/5ths rule for determining adverse impact: reasons for caution.
    Philip L Roth, Philip Bobko, Fred S Switzer
    [show abstract] [hide abstract]
    ABSTRACT: The Equal Employment Opportunity Commission's 4/5ths rule has been used for over 20 years in applied psychology and employment law. The rule signals that there is adverse impact when the protected group selection ratio is less than 80% of the highest scoring group's selection ratio. We conducted several simulations and found, consistent with some previous management science literature, that the 4/5ths rule often resulted in false-positive readings of adverse impact even when there were no underlying (population) standardized group differences between subgroups. We then incorporated tests of statistical significance and found that adding such tests to the 4/5ths rule eliminated many false-positive indications of adverse impact. We also examined simulated selection systems based on meta-analytic values from the selection literature. The frequency of adverse impact signals from the 4/5ths rule increased markedly relative to simulations with no subgroup population differences. Adding statistical tests mitigated the number of indications of adverse impact to some extent.
    Journal of Applied Psychology 06/2006; 91(3):507-22. · 4.31 Impact Factor
  • Source
    Article: Personality Saturation in Structured Interviews
    [show abstract] [hide abstract]
    ABSTRACT: Applied psychologists have long been interested in the relationship between applicant personality and employment interview ratings. Analysis of data from two studies, one using a situational interview and one using a behavioral interview, suggests that the correlations of structured interview ratings with self-report measures of personality factors are generally rather low. Further, a small meta-analysis integrates these two studies and the limited previous literature to arrive at a similar conclusion - there is relatively little relationship between structured interviews and self-reported personality factors.
    Wiley-Blackwell: International Journal of Selection & Assessment. 01/2006;
  • Source
    Article: Forming Composites of Cognitive Ability and Alternative Measures to Predict Job Performance and Reduce Adverse Impact: Corrected Estimates and Realistic Expectations
    Denise Potosky, Philip Bobko, Philip L. Roth
    [show abstract] [hide abstract]
    ABSTRACT: Although there has been empirical attention paid to the criterion-related validity of predictor composites, there has been much less attention paid to the standardized ethnic group differences associated with these composites. One important area of inquiry in predictor composite research is the influence of adding predictors to a test of general mental ability. The limited empirical literature on this practice is mixed, but the prevailing expectation is that there is likely to be higher validity and less adverse impact. Unfortunately, much of the previous work is limited by the presence of inaccurate validity and standardized ethnic group difference values. In this analysis we formed meta-analytic matrices to more accurately estimate the validity and standardized ethnic group differences of several composites that combine a measure of cognitive ability with measures of conscientiousness, a structured interview, or biodata. While results were somewhat complex, we found that adding alternative predictors does not result in a situation in which validity automatically goes up and adverse impact potential automatically goes down. In fact, the reductions in adverse impact (if any) from adding “non-cognitive” predictors were more modest than much of the literature suggests.
    International Journal of Selection and Assessment 11/2005; 13(4):304 - 315. · 1.30 Impact Factor
  • Source
    Article: A META‐ANALYSIS OF WORK SAMPLE TEST VALIDITY: UPDATING AND INTEGRATING SOME CLASSIC LITERATURE
    PHILIP L. ROTH, PHILIP BOBKO, LYNN A. McFARLAND
    [show abstract] [hide abstract]
    ABSTRACT: Work sample tests have been used in applied psychology for decades as important predictors of job performance, and they have been suggested to be among the most valid predictors of job performance. As we examined classic work sample literature, we found the narrative review by Asher and Sciarrino (1974) to be plagued by many methodological problems. Further, it is possible that data used in this study may have influenced the results (e.g., r= .54) reported by Hunter and Hunter in their seminal work in 1984. After integrating all of the relevant data, we found an observed mean correlation between work sample tests and measures of job performance of .26. This value increased to .33 when measures of job performance (e.g., supervisory ratings) were corrected for attenuation. Our results suggest that the level of the validity for work sample tests may not be as large as previously thought (i.e., approximately one third less than previously thought). Further, our work also summarizes the relationship of work sample exams to measures of general cognitive ability. We found that work sample tests were associated with an observed correlation of .32 with tests of general cognitive ability.
    Personnel Psychology 11/2005; 58(4):1009 - 1037. · 2.93 Impact Factor
  • Article: A PROCESS FOR CONTENT VALIDATION OF EDUCATION AND EXPERIENCED‐BASED MINIMUM QUALIFICATIONS: AN APPROACH RESULTING IN FEDERAL COURT APPROVAL
    MAURY A. BUSTER, PHILIP L. ROTH, PHILIP BOBKO
    [show abstract] [hide abstract]
    ABSTRACT: The use of education and experience minimum qualifications (MQs) is nearly ubiquitous in employment settings, yet it appears to be rare that such MQs are validated by the end user (either via content validity or criterion-related validation approaches). In this article, we present a method of content-validating MQs that is related to a procedure noted by Levine, Maye, Ulm, and Gordon (1997), and we demonstrate the method's application for an upper-level management position in a state agency. Our procedure is based on adherence to the Uniform Guidelines and sound professional practice. In addition, the procedure was heard in a federal court proceeding (deposition and expert testimony in relation to 3 jobs), was approved by that court, and we discuss the court's findings. In particular, the court found, “The MQ development process used by the State Personnel Department (SPD) is consistent with the requirements of the Guidelines and leads to content-valid MQs.” Given data available to us for another job, we also show that the obtained MQs may result in less adverse impact than a previously determined, task-based set of MQs.
    Personnel Psychology 08/2005; 58(3):771 - 799. · 2.93 Impact Factor
  • Article: Assessing personality with a structured employment interview: construct-related validity and susceptibility to response inflation.
    [show abstract] [hide abstract]
    ABSTRACT: The authors evaluated the extent to which a personality-based structured interview was susceptible to response inflation. Interview questions were developed to measure facets of agreeableness, conscientiousness, and emotional stability. Interviewers administered mock interviews to participants instructed to respond honestly or like a job applicant. Interviewees completed scales of the same 3 facets from the NEO Personality Inventory, under the same honest and applicant-like instructions. Interviewers also evaluated interviewee personality with the NEO. Multitrait-multimethod analysis and confirmatory factor analysis provided some evidence for the construct-related validity of the personality interviews. As for response inflation, analyses revealed that the scores from the applicant-like condition were significantly more elevated (relative to honest condition scores) for self-report personality ratings than for interviewer personality ratings. In addition, instructions to respond like an applicant appeared to have a detrimental effect on the structure of the self-report and interview ratings, but not interviewer NEO ratings.
    Journal of Applied Psychology 06/2005; 90(3):536-52. · 4.31 Impact Factor