Article

The Relative Power of Training Evaluation Designs Under Different Cost Configurations

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

After noting that the statistical power of training evaluation designs is a complex function of sample size, the reliability of the dependent measure, the correlation between pre- and posttest measures, and whether a randomized pretest–posttest or randomized posttest-only design is used, the authors show that the costs of conducting an evaluation are important considerations that also affect the relative power of the designs. Specifically, S costs, administrative costs, and item development costs are different components that can absorb resources when training evaluations are conducted. When total cost resources are fixed, these separate costs affect the relative power of pretest–posttest and posttest-only designs differently, and the posttest-only design may be the more preferred design under many different conditions. In other words, a variety of design and parameter tradeoffs affect power when total costs are fixed. (PsycINFO Database Record (c) 2012 APA, all rights reserved)

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Because such low values are not uncommon, the ANOVA approach can be more powerful under a range of realistic situations. Arvey, Maxwell, and Salas (1992) applied similar techniques to training evaluation designs. Although framed for industrial-organizational psychologists, these issues are generally applicable to the design of social science research. ...
... Arvey and Cole (1989) discussed how increasing the reliability of the measure (lengthening) influences the various designs. Arvey et al. (1992) added evaluation costs, dividing them into subject, administrative, and item development categories. Furthermore, they wrote a computer program (SAS) to compare relative power under different cost scenarios, each empha- Table 1 How to lncrease Power (reducing Type II errors) 1. Conduct power analyses before research when possible, otheMlise afterward. ...
... Maxwell, and Salas (1992) applied similar techniques to training evaluation designs. Although framed for industrial-organizational psychologists, these issues are generally applicable to the design of social science research. Arvey and Cole (1989) discussed how increasing the reliability of the measure (lengthening) influences the various designs. Arvey et al. (1992) added evaluation costs, dividing them into subject, administrative, and item development categories. Furthermore, they wrote a computer program (SAS) to compare relative power under different cost scenarios, each How to lncrease Power (reducing Type II errors) 1. Conduct power analyses before research when possible, otheMlise afterward. ...
Article
Statistical conclusion validity is concerned with an integmted evaluation of stahtical powec sign~jicance testing, and effect size. A lack of attention to the integrated argument occurs because of an emphasis on sigrufiance testing, a lack of knowledge, and a lack of motivation. This article has three objectives. First, the central logic of the statistical conclusion validity argument is e x p h d Follow-ing that, issues relclring to the three components are reviewed These issues include computations, multivariate extensions, and recommendarions for practice. In-creasing use of model-testing procedures in which the goal of the analysis is not to reject the null hypothesis is noted Finally, conclusions am ofered and research needs are discussed. Training in quantitative methods is claimed to be lacking (Aiken, West, Sechrest, & Reno, 1990). One aspect of justifying conclusions from behavioral research is statis-tical conclusion validity (SCV), an umbrella term for issues that organizational researchers have historically mistreated (Mazen, Graf, Kellogg, & Hemrnasi, 1987). As a complement to internal validity, SCV is concerned with drawing conclusions about population covariation from sample data.
... Por otra parte, el cálculo de los intervalos de confianza entraña más dificultades, porque no sólo es necesario rechazar la hipótesis nula y acertar en la tendencia (el patrón de orden) que llevarán las medias en la población, también es preciso estimar la magnitud del efecto (Cumming y Finch, 2005;Nickerson, 2000;Schmidt, 1996;Steiger y Fouladi, 1997;Wilkinson e Inference, 1999). Esta mejora en la precisión también implica el mayor costo que supone limitar el error del tipo II (e.g., Arvey, Salas y Maxwell, 1992). ...
... The above example demonstrates the necessity of measuring the impacts of an HR intervention given limited budgets and resources that set the boundaries of testing. In general, three types of testing costs are associated with statistical power in an intervention setting (Arvey, Maxwell, & Salas, 1992). The first type is item development costs, which are the expenses associated with item development and validation. ...
Article
Full-text available
The development of optimal human resource practices is often contingent on the accurate statistical testing of potential interventions. Testing the efficacy of HR interventions can be enhanced by taking additional measures to improve statistical power, but the traditional means of increasing power through sample size are often beyond the cost and ability of HR professionals to pursue. This article, therefore, focuses on measurement procedures as an alternative way to increase statistical power for detecting HR intervention effects. Selection of reliable and appropriate measures and subsequent instrumentation are examined as efficacious and cost-beneficial techniques that can be employed during the planning and designing stage of a study for augmenting statistical power to optimize business decision making. © 2012 Wiley Periodicals, Inc.
... The main limitation of this work is that the experimental process used to verify our hypotheses should in theory include a control group to make it possible to test the internal validity of the approach. However, it is often difficult, even impossible, within the context of training evaluation, to actually implement this type of procedure (in particular to identify a control group which is really "equivalent" to the test group), and therefore empirical studies are often restricted to analyzing the degree of significance of the statistical results observed (Arvey, Maxwell, and Salas 1992;Sackett and Mullen 1993) and taking into account the context (Cook and Campbell 1979). ...
Article
Do entrepreneurship education programs (EEPs) really influence participants’ attitudes and intention toward entrepreneurship? How is this influence related to past experience and how does it persist? Researchers and entrepreneurship education stakeholders alike have been looking into this question for quite a while, with a view to validating the efficacy of such programs. The authors of this paper propose to operationalize the concept of entrepreneurial intention and its antecedents in an attempt to address those issues. In particular, we propose an original research design where (1) we measure the initial state and persistence of the impact and not only short-term effects; (2) we deal with a compulsory program, allowing to avoid self-selection biases; and (3) we deal with an homogeneous “compact” program rather than programs combining multiple teaching components whose effects cannot be disentangled. Our main research results show that the positive effects of an EEP are all the more marked when previous entrepreneurial exposure has been weak or inexistent. Conversely, for those students who had previously significantly been exposed to entrepreneurship, the results highlight significant countereffects of the EEP on those participants.
... Por otra parte, el cálculo de los intervalos de confianza entraña más dificultades, porque no sólo es necesario rechazar la hipótesis nula y acertar en la tendencia (el patrón de orden) que llevarán las medias en la población, también es preciso estimar la magnitud del efecto (Cumming y Finch, 2005;Nickerson, 2000;Schmidt, 1996;Steiger y Fouladi, 1997;Wilkinson e Inference, 1999). Esta mejora en la precisión también implica el mayor costo que supone limitar el error del tipo II (e.g., Arvey, Salas y Maxwell, 1992). ...
Article
Full-text available
Design and power analysis: n and confidence intervals of means. In this study, we analyzed the validity of the conventional 80% power. The minimal sample size and power needed to guarantee nonoverlapping (1 - α)% confidence intervals for population means were calculated. Several simulations indicate that the minimal power for two means (m= 2) to have non-overlapping CIs is .80, for (1 - α) set to 95%. The minimal power becomes .86 for 99% CIs and .75 for 90% CIs. When multiple means are considered, the required minimal power increases considerably. This increase is even higher when the population means do not increase monotonically. Therefore, the often adopted criterion of a minimal power equal to .80 is not always adequate. Hence, to guarantee that the limits of the CIs do not overlap, most situations require a direct calculation of the minimum number of observations that should enter in a study.
... Regarding design, evaluators of experiential training are encouraged to: I) Use other methods in combination with reaction questionnaires. (Consult sources such as Arvey, Maxwell, & Salas, 1992; and Sackett & Mullen, 1993, for more regarding design approaches.), 2) Use items, similar to those used in the present study, that are reflective of theoretical assumptions about experiential learning; 3) Carefully consider item wording (this study has led the ELC to reword several " double-barreled " or otherwise ambiguous items); and 4) Consider steps, including instructions to subjects to carefully consider each item separately and/or alternating positive and negative item stems, to combat response styles. (See Gable, 1986, for more about response styles.) ...
... Por otra parte, el cálculo de los intervalos de confianza entraña más dificultades, porque no sólo es necesario rechazar la hipótesis nula y acertar en la tendencia (el patrón de orden) que llevarán las medias en la población, también es preciso estimar la magnitud del efecto (Cumming y Finch, 2005;Nickerson, 2000;Schmidt, 1996;Steiger y Fouladi, 1997;Wilkinson e Inference, 1999). Esta mejora en la precisión también implica el mayor costo que supone limitar el error del tipo II (e.g., Arvey, Salas y Maxwell, 1992). ...
Article
Full-text available
In this study, we analyzed the validity of the conventional 80% power. The minimal sample size and power needed to guarantee non-overlapping (1-alpha)% confidence intervals for population means were calculated. Several simulations indicate that the minimal power for two means (m = 2) to have non-overlapping CIs is .80, for (1-alpha) set to 95%. The minimal power becomes .86 for 99% CIs and .75 for 90% CIs. When multiple means are considered, the required minimal power increases considerably. This increase is even higher when the population means do not increase monotonically. Therefore, the often adopted criterion of a minimal power equal to .80 is not always adequate. Hence, to guarantee that the limits of the CIs do not overlap, most situations require a direct calculation of the minimum number of observations that should enter in a study.
Book
Introduction to Industrial/Organizational Psychology provides a complete overview of the psychological study of the world of work. Written with the student in mind, the book presents classic theory and research in the field alongside examples from real-world work situations to provide deeper insight. This edition has been thoroughly updated to include the latest research on each key topic, and now features: A spotlight on diversity, equity, and inclusion throughout, including coverage of LGBTQIA+ inclusion and racial justice Expanded coverage of ethics in I/O psychology practice Increased emphasis on cross-cultural and international issues Coverage of the changing nature of work, post-pandemic, including remote working, worker stress, and burnout A new focus on technologies related to I/O such as virtual reality and computer adaptive testing New figures, illustrations, and charts to grab the reader's attention and facilitate learning Accompanied by extensive student and instructor resources, it is a must read for all students on I/O psychology courses and courses in work psychology and organizational behavior, and for practicing managers who want a comprehensive overview of the psychology of work. © 2022 Ronald E. Riggio & Stefanie K. Johnson. All rights reserved.
Article
Statistical conclusion validity is concerned with an integrated evaluation of statistical power, significance testing, and effect size. A lack of attention to the integrated argument occurs because of an emphasis on significance testing, a lack of knowledge, and a lack of motivation. This article has three objectives. First, the central logic of the statistical conclusion validity argument is explained. Following that, issues relating to the three components are reviewed. These issues include computations, multivariate extensions, and recommendations for practice. Increasing use of model-testing procedures in which the goal of the analysis is not to reject the null hypothesis is noted. Finally, conclusions are offered and research needs are discussed.
Article
Reports on an effort to implement good practices in learning evaluation. Reviews learning evaluation practices and gathers data using a dedicated software system. Demonstrates learning takes place within complex social systems populated by a multiplicity of factors that influence perceptions of learning and performance outcomes. Argues that technology enables cost-effective evaluations to be implemented that encompass a broad spectrum of influencing variables and acknowledge the empowered status of the learner. Discusses the implications for evaluation methodologies and the role of trainers within organisations.
Article
Full-text available
Adequate statistical power is increasingly demanded in research designs. However, obtaining adequate research funding is increasingly difficult. This places researchers in a difficult position. In response, the authors advocate an approach to designing studies that considers statistical power and financial concerns simultaneously. Their purpose is twofold: (a) to introduce the general paradigm of cost optimization in the context of power analysis and (b) to present techniques for such optimization. Techniques are presented in the context of a randomized clinical trial. The authors consider (a) selecting optimal cutpoints for subject screening tests; (b) optimally allocating subjects to different treatment conditions; (c) choosing between obtaining more subjects or taking more replicate measurements; and (d) using prerandomization covariates. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
This study used a randomized group experimental design combined with qualitative research methods to assess the outcomes of an outdoor management education (OME) program in one organization. Measures of trainee reactions, learning, attitudes and motivation, behavior, and organizational results were collected. The OME program positively influenced participant knowledge, organizational commitment, organizational-based self-esteem, and intentions to implement learning. It did not improve trust or self-efficacy levels. Additionally, there was evidence of behavioral change and improvements is several organizational results up to three years after the training, although attribution of cause and effect is difficult at these levels of analysis. © 1997 John Wiley & Sons, Inc.
Article
Two ways to reduce the costs of training evaluation are examined. First, we examine the potential for reducing the costs of training evaluation by assigning different numbers of subjects into training and control groups. Given a total N of subjects, statistical power to detect the effectiveness of a training program can be maximized by assigning the subjects equally to training and control groups. If we take into account the costs of training evaluation, however, an unequal-group-size design with a larger total N may achieve the same level of statistical power at lower cost. We derive formulas for the optimal ratios of the control group size to the training group size for both ANOVA and ANCOVA designs, incorporating the differential costs of training and control group participation. Second, we examine the possibility that using a less expensive proxy criterion measure in place of the target criterion measure of interest when evaluating the training effectiveness can pay off. We show that using a proxy criterion increases the sample size needed to achieve a given level of statistical power, and then we describe procedures for examining the tradeoff between the costs saved by using the less expensive proxy criterion and the costs incurred by the larger sample size.
Article
Full-text available
Purpose This paper seeks to estimate importance of various factors affecting the choice of fast food outlets by Indian young consumers. Design/methodology/approach The study applies multivariate statistical tools to estimate importance of various factors affecting the choice of fast food outlets by Indian young consumers. In addition, the authors analysed the consumption patterns, impact of hygiene and nutritional values, and rating of various attributes of McDonald's and Nirula's. Findings Results indicate that the young Indian consumer has passion for visiting fast food outlets for fun and change but home food is their first choice. They feel homemade food is much better than food served at fast food outlets. They have the highest value for taste and quality (nutritional values) followed by ambience and hygiene. Three dimensions (service and delivery dimension, product dimension, and quality dimension) of fast food outlets' attributes are identified based on factor analysis results. The two fast food outlets' rating differs significantly on the seven attributes. McDonald's scores are higher on all attributes except “variety”. Further, consumers feel that fast food outlets must provide additional information on nutritional values and hygiene conditions inside kitchen. Practical implications Fast food providers need to focus on quality and variety of food besides other service parameters. There is need to communicate the information about hygiene and nutrition value of fast food which will help in building trust in the food provided by fast food players. Originality/value Estimates importance of various factors affecting the choice of fast food outlets by Indian young consumers.
Article
Full-text available
Do entrepreneurship education programmes influence participants’ attitudes and perceptions towards entrepreneurial intentions and therefore the entrepreneurial behaviour itself ? Researchers and entrepreneurship education stakeholders alike (public institutions, academic authorities, teachers, etc.) have been looking into this question for quite a while, with a view to validating the efficacy of such programmes. The authors of this paper propose to operationalize the concept of entrepreneurial intention and its antecedents in an attempt to answer this question. A study to measure the effects of an entrepreneurship education programme (EEP) and the factors that may influence the participants in this type of programme is presented. Our main research results show that the positive effects of an EEP are all the more marked where previous entrepreneurial exposure has been weak or inexistent. Conversely, for those students who had previously been exposed to entrepreneurship, the results highlight significant counter-effects.
Article
Full-text available
Unexpected events, particularly those creating surprise, interrupt ongoing mental and behavioral processes, creating an increased potential for unwanted outcomes to the situation. Human reactions to unexpected events vary. One can hypothesize a number of reasons for this variation, including level of domain expertise, previous experience with similar events, emotional connotation, and the contextual surround of the event. Whereas interrupting ongoing activities and focusing attention temporarily on a surprising event may be a useful evolutionary response to a threatening situation, the same process may be maladaptive in today's highly dynamic world. The purpose of this study was to investigate how different aspects of expertise affected one's ability to detect and react to an unexpected event. It was hypothesized that there were two general types of expertise, domain expertise and judgment (Hammond, 2000), which influenced one's performance on dealing with an unexpected event. The goal of the research was to parse out the relative contribution of domain expertise, so the role of judgment could be revealed. The research questions for this study were: (a) Can we identify specific knowledges and skills which enhance one's ability to deal with unexpected events? (b) Are these skills "automatically" included in domain expertise? (c) How does domain expertise improve or deter one's reaction and response to unexpected events? (d) What role does judgment play in responding to surprise? The general hypothesis was that good judgment would influence the process of surprise at different stages and in different ways than would domain expertise. The conclusions from this research indicated that good judgment had a significant positive effect in helping pilots deal with unexpected events. This was most pronounced when domain expertise was low.
Article
Full-text available
En este estudio se analiza la validez del criterio del 80% de potencia estadística para que no se solapen las medias de los intervalos de confianza (IC). Varias simulaciones indican que la potencia mínima para que los límites de dos medias no se solapen, cuando el IC está en el 95%, es de 0,80; pero cuando el IC está en el 99%, es 0,86; y cuando el IC está en el 90%, es 0,75. Si hay más de dos medias, la potencia mínima aumenta considerablemente. Siendo todavía mayor este aumento cuando las medias poblacionales no aumentan monotónicamente. Por lo tanto, para garantizar que los límites no se solapen, en la mayoría de las situaciones analizadas es necesario calcular directamente el mínimo número de observaciones, siendo de poca utilidad los criterios convencionales de la potencia mínima de 0,80
Article
Full-text available
Meta-analysis procedures were applied to the results of 70 managerial training (MT) studies. The meta-analysis results for 34 distributions of MT effects representing 6 training-content areas, 7 training methods, and 4 types of criteria (subjective learning, objective learning, subjective behavior, and objective results) indicated that MT was moderately effective. For 12 of the 17 MT method distributions, the 90% lower-bound credibility values were positive, and thus the effectiveness of these training methods, at least minimally, can be generalized to new situations. A list of the 70 MT studies is included. (97 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Power in a randomized between-Ss design can be enhanced by increasing sample size or increasing alpha level. This article compares 2 other methods of increasing power by reducing within-group error: (1) by adding a pretest to the design and using analysis of covariance (ANCOVA) and (2) by increasing the length of the posttest and using analysis of variance (ANOVA). Results showed that the relative power of these approaches depends on the degree to which the posttest is lengthened, on the reliability of the posttest, and on the pretest–posttest correlation. When reliability or the pretest–posttest correlation is low, doubling the length of the posttest makes ANOVA more powerful than ANCOVA conducted on the original measures. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Relationships of power of (Fisher) F tests to expected mean squares, E(MS), in the analysis of variance is discussed. While components of variance in the E(MS) are largely a function of nature, the coefficients associated with them are matters of experimental design. Frequently a different cost is associated with each type of experimental unit represented by the different coefficients. It is possible to maximize power relative to cost by optimal allocation of available resources among various types of experimental units––for example, numbers of Ss, duplicate measures, replicates, etc. A simple index of relative power involving the ratio of the estimated F ratio to F alpha is proposed as useful in choosing the allocation of resources most likely to yield significant results.
Article
It is written for undergraduate and graduate students, and for practitioners who are concerned with needs assessment, systematic development, and thoughtful evaluation of training programs in a variety of organizational settings. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
It [the book] brings together research findings from experts in industrial and organizational psychology, organizational behavior, management, and related disciplines to identify effective new approaches to the development, application, and evaluation of training in the workplace. In fourteen original chapters, the contributors present a wide range of strategies for improving training—offering ways to better evaluate training needs, design more effective training methods, and more accurately measure results. They discuss methods of conducting job analysis surveys—and show how to structure questionnaires to get the most useful information about the organization, the individual, and the task to be performed. "Training and Development in Organizations" provides models for measuring the benefits of training in terms of increased output, payroll savings, and more. The contributors offer a variety of instructional techniques based on cognitive and behavioral theory, including training in self-management and training through reinforcement. They examine how retraining midcareer and older workers can enhance job performance. And they explain how effective training goes beyond formal instruction in the job to include such diverse factors as work-group settings, informal training by peers, and the socialization process of the newcomer. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Toward a more organizationally effective training strategy and practice
  • R R Camp
  • P N Blanchard
  • G E Huszczo
  • R. R. Camp
  • P. N. Blanchard
  • G. E. Huszczo