Article

# Generalized Eta and Omega Squared Statistics: Measures of Effect Size for Some Common Research Designs

If you want to read the PDF, try requesting it from the authors.

## Abstract

The editorial policies of several prominent educational and psychological journals require that researchers report some measure of effect size along with tests for statistical significance. In analysis of variance contexts, this requirement might be met by using eta squared or omega squared statistics. Current procedures for computing these measures of effect often do not consider the effect that design features of the study have on the size of these statistics. Because research-design features can have a large effect on the estimated proportion of explained variance, the use of partial eta or omega squared can be misleading. The present article provides formulas for computing generalized eta and omega squared statistics, which provide estimates of effect size that are comparable across a variety of research designs.

## No full-text available

... 3 ̂ 2 p and ̂ 2 p are not to be confused with (generalized) ̂ 2 G and ̂ 2 G . The latter two measures have been proposed to make effect sizes comparable across studies with different designs (e.g., Fleiss, 1969;Olejnik & Algina, 2003). These examples are typical experiments from cognitive psychology. ...
... In the previous section, we have described a strategy that requires specifying the exact mean and covariance structure or knowing the correct d at a population level. In the present section, we will consider an "effect size approach": A researcher might have an idea about the effect size of an interaction or a main effect (for an overview and a review of common effect size measures, see e.g., Bakeman, 2005;Carroll & Nordholm, 1975;Cohen, 1973;Keselman et al., 1998;Lakens, 2013;Levine & Hullett, 2002;Olejnik & Algina, 2000, 2003Richardson, 2011;Steiger, 2004), and is now confronted with transforming these values to d. Two cases can be distinguished. ...
... The F-statistic is the ratio of mean sums of squares (Nesselroade & Cattell, 1988), that is, where MS effect is the mean sum of squares from the main or interaction effect of interest (e.g., an interaction between factor A and factor B), and MS error is the mean error sum of squares. Both are defined as the respective sums of squares divided by the corresponding degrees of freedom df 1 (effect) and df 2 (error): The effect sizes 2 p and 2 p (e.g., Bakeman, 2005;Carroll & Nordholm, 1975;Cohen, 1973;Keselman et al., 1998;Lakens, 2013;Levine & Hullett, 2002;Olejnik & Algina, 2000, 2003Richardson, 2011;Steiger, 2004) are estimated as: ...
Article
Full-text available
The a priori calculation of statistical power has become common practice in behavioral and social sciences to calculate the necessary sample size for detecting an expected effect size with a certain probability (i.e., power). In multi-factorial repeated measures ANOVA, these calculations can sometimes be cumbersome, especially for higher-order interactions. For designs that only involve factors with two levels each, the paired t test can be used for power calculations, but some pitfalls need to be avoided. In this tutorial, we provide practical advice on how to express main and interaction effects in repeated measures ANOVA as single difference variables. In particular, we demonstrate how to calculate the effect size Cohen’s d of this difference variable either based on means, variances, and covariances of conditions or by transforming $${\eta _{p}^{2}}$$ η p 2 or $${\omega _{p}^{2}}$$ ω p 2 from the ANOVA framework into d . With the effect size correctly specified, we then show how to use the t test for sample size considerations by means of an empirical example. The relevant R code is provided in an online repository for all example calculations covered in this article.
... 33,34 Partial omega squared (ω 2 ) with 95% confidence intervals were calculated to report effect sizes. 35 All data are shown as mean ± SEM, and the alpha level for all comparisons was set at p < 0.05. Analyses were conducted in R v. 4.1.1 and GraphPad Prism v. 8.3. ...
... SryÀ mice consumed more EtOH than Sry+ mice and XX mice consumed more than XY mice. A three-way ANOVA identified main effects of the Sry gene (F (1,35) Additional analyses on body weights across EtOH concentrations revealed that Sry+ mice weighed more than SryÀ mice at 10%, 15%, and 20% concentrations (Table 1) There were no significant interactions between Sry and chromosomes (p > 0.20 for all; ω 2 < 0.01 for all). ...
Article
Alcohol use and high-risk alcohol drinking behaviours among women are rapidly rising. In rodent models, females typically consume more ethanol (EtOH) than males. Here, we used the four core genotypes (FCG) mouse model to investigate the influence of gonadal hormones and sex chromosome complement on EtOH drinking behaviours. FCG mice were given access to escalating concentrations of EtOH in a two-bottle, 24-h continuous access drinking paradigm to assess consumption and preference. Relapse-like behaviour was measured by assessing escalated intake following repeated cycles of deprivation and re-exposure. Twenty-four-hour EtOH consumption was greater in mice with ovaries (Sry-), relative to those with testes, and in mice with the XX chromosome complement, relative to those with XY sex chromosomes. EtOH preference was higher in XX versus XY mice. For both consumption and preference, the influences of the Sry gene and sex chromosomes were concentration dependent. Escalated intake following repeated cycles of deprivation and re-exposure emerged only in XX mice (vs. XY). Mice with ovaries (Sry- FCG mice and C57BL/6J females) were also found to consume more water than mice with testes. These results demonstrate that aspects of EtOH drinking behaviour may be independently regulated by sex hormones and chromosomes and inform our understanding of the neurobiological mechanisms which contribute to EtOH dependence in male and female mice. Future investigation of the contribution of sex chromosomes to EtOH drinking behaviours is warranted. We used the FCG mouse model to investigate the influence of gonadal hormones and sex chromosome complement on EtOH drinking behaviours, including the alcohol deprivation effect. Escalated intake following repeated cycles of deprivation and re-exposure emerged only in XX mice (vs. XY). These results demonstrate that aspects of EtOH drinking behaviour may be independently regulated by sex hormones and chromosomes.
... Alpha level was set to 0.05 for all analyses. Generalized eta squared (η G 2 ) was calculated for repeated measures ANOVA as a measure of effect size (Olejnik and Algina, 2003;Bakeman, 2005). For all non-repeated measures ANOVA, eta squared (η 2 ) was reported. ...
... For all non-repeated measures ANOVA, eta squared (η 2 ) was reported. Standard effect size guidelines were applied to interpretations of η G 2 and η 2 , with 0.02 as a small effect, 0.13 as a medium effect, and 0.26 as a large effect (Olejnik and Algina, 2003;Bakeman, 2005). ...
Article
Full-text available
Oral contraceptives and hormone therapies require a progestogen component to prevent ovulation, curtail uterine hyperplasia, and reduce gynecological cancer risk. Diverse classes of synthetic progestogens, called progestins, are used as natural progesterone alternatives due to progesterone’s low oral bioavailability. Progesterone and several synthetic analogs can negatively impact cognition and reverse some neuroprotective estrogen effects. Here, we investigate drospirenone, a spironolactone-derived progestin, which has unique pharmacological properties compared to other clinically-available progestins and natural progesterone, for its impact on spatial memory, anxiety-like behavior, and brain regions crucial to these cognitive tasks. Experiment 1 assessed three drospirenone doses in young adult, ovariectomized rats, and found that a moderate drospirenone dose benefited spatial memory. Experiment 2 investigated this moderate drospirenone dose with and without concomitant ethinyl estradiol (EE) treatment, the most common synthetic estrogen in oral contraceptives. Results demonstrate that the addition of EE to drospirenone administration reversed the beneficial working memory effects of drospirenone. The hippocampus, entorhinal cortex, and perirhinal cortex were then probed for proteins known to elicit estrogen- and progestin- mediated effects on learning and memory, including glutamate decarboxylase (GAD)65, GAD67, and insulin-like growth factor receptor protein expression, using western blot. EE increased GAD expression in the perirhinal cortex. Taken together, results underscore the necessity to consider the distinct cognitive and neural impacts of clinically-available synthetic estrogen and progesterone analogs, and why they produce unique cognitive profiles when administered together compared to those observed when each hormone is administered separately.
... Where sphericity was violated, the Greenhouse-Geisser correction was applied. The generalised effect size (Olejnik and Algina, 2003), which estimates the proportion of variability explained by the within-subjects factor, is reported with each F test. Qualitative descriptors of effect size are consistent with Cohen's benchmarks, with small, medium and large effects ascribed to effect sizes of 0.2, 0.5 and 0.8, respectively (Olejnik and Algina, 2003). ...
... The generalised effect size (Olejnik and Algina, 2003), which estimates the proportion of variability explained by the within-subjects factor, is reported with each F test. Qualitative descriptors of effect size are consistent with Cohen's benchmarks, with small, medium and large effects ascribed to effect sizes of 0.2, 0.5 and 0.8, respectively (Olejnik and Algina, 2003). Pairwise t-tests were used to interrogate main effects and interactions and reported p-values are adjusted for multiple comparisons using the Bonferroni correction. ...
Article
Full-text available
Cortical processing of binocular disparity is believed to begin in V1 where cells are sensitive to absolute disparity, followed by the extraction of relative disparity in higher visual areas. While much is known about the cortical distribution and spatial tuning of disparity-selective neurons, the relationship between their spatial and temporal properties is less well understood. Here, we use steady-state Visual Evoked Potentials and dynamic random dot stereograms to characterize the temporal dynamics of spatial mechanisms in human visual cortex that are primarily sensitive to either absolute or relative disparity. Stereograms alternated between disparate and non-disparate states at 2 Hz. By varying the disparity-defined spatial frequency content of the stereograms from a planar surface to corrugated ones, we biased responses towards absolute vs. relative disparities. Reliable Components Analysis was used to derive two dominant sources from the 128 channel EEG records. The first component (RC1) was maximal over the occipital pole. In RC1, first harmonic responses were sustained, tuned for corrugation frequency, and sensitive to the presence of disparity references, consistent with prior psychophysical sensitivity measurements. By contrast, the second harmonic, associated with transient processing, was not spatially tuned and was indifferent to references, consistent with it being generated by an absolute disparity mechanism. Thus, our results reveal a duplex coding strategy in the disparity domain, where relative disparities are computed via sustained mechanisms and absolute disparities are computed via transient mechanisms.
... were necessary. For one-way ANOVA with repeated measures, three different types of effect sizes have been suggested, namely partial eta squared (η 2 p), generalized eta squared (η²G) (Lakens, 2013;Olejnik & Algina, 2003), and omega squared (Field, 2017). Partial eta squared can be calculated in SPSS and is therefore widely used; however, it has been found to be misleading and imprecise (Field, 2017;Lakens, 2013;Olejnik & Algina, 2003). ...
... For one-way ANOVA with repeated measures, three different types of effect sizes have been suggested, namely partial eta squared (η 2 p), generalized eta squared (η²G) (Lakens, 2013;Olejnik & Algina, 2003), and omega squared (Field, 2017). Partial eta squared can be calculated in SPSS and is therefore widely used; however, it has been found to be misleading and imprecise (Field, 2017;Lakens, 2013;Olejnik & Algina, 2003). Thus, Field (2017) (Field, 2017). ...
Thesis
Full-text available
... The dependent variable of the ANOVAs was mean RT (see Appendix for ANOVAs on arcsine transformed error rates). In all ANOVAs, we report both 2 and 2 in order to provide useful measures of effect size for both power analysis and meta-analysis, respectively (Bakeman, 2005;Lakens, 2013;Olejnik & Algina, 2003). Cohen's dz is reported for paired t-tests, computed by dividing mean difference scores by their standard deviation (Brysbaert, 2019;Lakens, 2013). ...
Article
Full-text available
... Table 8 depicts that the significant value (p-value) of F-test calculated was 0.000, which displays that all three explanatory variables are highly significant. The application of multiple regression on the respondents and its constituent variables can be displayed with the help of the below mentioned equation (Olejnik and Algina, 2003). Responses of Wellness Tourists (Y) = -.002 ...
Article
On a global scale, people are resorting more to travel in order to invigorate, relieve stress and lead a healthy life. Therefore, there is a great desire to add wellness component to their travel itinerary post COVID-19. The study aims to find the preferences and changing needs of wellness tourists post COVID-19. The study was conducted on 400 foreign tourists visiting India and find their preferences and changing needs pertaining to various variables of wellness tourism post COVID-19. Factor Analysis was applied to reduce the 12 variables identified through scholarly literature into 3 factors i.e., Core Wellness services, Allied Wellness Services, Ancillary Wellness services. Multiple regression was used to determine the factors impacting the preferences and needs of wellness tourists. The study indicates that core wellness services i.e., yoga, Ayurveda, spirituality, meditation has stronger impact contributing to the satisfaction level of wellness tourists.
... It is also important to report some effect size measures that indicate whether the observed statistical differences among groups are of practical significance. For a two-way ANOVA and small sample size, the effect size measure omega squared (ω 2 ) is recommended [77][78][79][80][81][82]. ω 2 also determines the percentage of the variation in the dependent variable attributable to the individual independent factors (i.e., species and measurement dates) [78]. ...
Article
Full-text available
Photosynthetic light response curve parameters help us understand the interspecific variation in photosynthetic traits, leaf acclimation status, carbon uptake, and plant productivity in specific environments. These parameters are also influenced by leaf traits which rely on species and growth environment. In accessions of four amaranth species ( Amaranthus . hybridus , A . dubius , A . hypochondriacus , and A . cruentus ), we determined variations in the net photosynthetic light response curves and leaf traits, and analysed the relationships between maximum gross photosynthetic rate, leaf traits, and whole-plant productivity. Non-rectangular hyperbolae were used for the net photosynthesis light response curves. Maximum gross photosynthetic rate ( P gmax ) was the only variant parameter among the species, ranging from 22.29 to 34.21 μmol m –2 s –1 . Interspecific variation existed for all the leaf traits except leaf mass per area and leaf inclination angle. Stomatal conductance, nitrogen, chlorophyll, and carotenoid contents, as well as leaf area correlated with P gmax . Stomatal conductance and leaf nitrogen explained much of the variation in P gmax at the leaf level. At the plant level, the slope between absolute growth rate and leaf area showed a strong linear relationship with P gmax . Overall, A . hybridus and A . cruentus exhibited higher P gmax at the leaf level and light use efficiency at the whole-plant level than A . dubius , and A . hypochondriacus . Thus, A . hybridus and A . cruentus tended to be more efficient with respect to carbon assimilation. These findings highlight the correlation between leaf photosynthetic characteristics, other leaf traits, and whole plant productivity in amaranths. Future studies may explore more species and accessions of Amaranthus at different locations or light environments.
... A similar problem arises when including additional fixed-effects covariates: Their inclusion will reduce the residual variance and therefore lead to inflated standardized effect sizes that are difficult to compare across studies that differ in terms of the covariates considered. This problem is well documented in the technical literature on effect sizes, in which we find methods that try to remedy it (Olejnik & Algina, 2003). Unfortunately, current methods for computing default Bayes Factors do not address this issue, which means that one risks obtaining inflated effects that are not easily generalizable. ...
Article
Full-text available
Statistical modeling is generally meant to describe patterns in data in service of the broader scientific goal of developing theories to explain those patterns. Statistical models support meaningful inferences when models are built so as to align parameters of the model with potential causal mechanisms and how they manifest in data. When statistical models are instead based on assumptions chosen by default, attempts to draw inferences can be uninformative or even paradoxical—in essence, the tail is trying to wag the dog. These issues are illustrated by van Doorn et al. (this issue) in the context of using Bayes Factors to identify effects and interactions in linear mixed models. We show that the problems identified in their applications (along with other problems identified here) can be circumvented by using priors over inherently meaningful units instead of default priors on standardized scales. This case study illustrates how researchers must directly engage with a number of substantive issues in order to support meaningful inferences, of which we highlight two: The first is the problem of coordination, which requires a researcher to specify how the theoretical constructs postulated by a model are functionally related to observable variables. The second is the problem of generalization, which requires a researcher to consider how a model may represent theoretical constructs shared across similar but non-identical situations, along with the fact that model comparison metrics like Bayes Factors do not directly address this form of generalization. For statistical modeling to serve the goals of science, models cannot be based on default assumptions, but should instead be based on an understanding of their coordination function and on how they represent causal mechanisms that may be expected to generalize to other related scenarios.
... A similar problem arises when including additional fixed-effects covariates: Their inclusion will reduce the residual variance and therefore lead to inflated standardized effect sizes that are difficult to compare across studies that differ in terms of the covariates considered. This problem is well documented in the technical literature on effect sizes, in which we find methods that try to remedy it (Olejnik & Algina, 2003). Unfortunately, current methods for computing default Bayes Factors do not address this issue, which means that one risks obtaining inflated effects that are not easily generalizable. ...
Article
Full-text available
Statistical modeling is generally meant to describe patterns in data in service of the broader scientific goal of developing theories to explain those patterns. Statistical models support meaningful inferences when models are built so as to align parameters of the model with potential causal mechanisms and how they manifest in data. When statistical models are instead based on assumptions chosen by default, attempts to draw inferences can be uninformative or even paradoxical—in essence, the tail is trying to wag the dog. These issues are illustrated by van Doorn et al. (this issue) in the context of using Bayes Factors to identify effects and interactions in linear mixed models. We show that the problems identified in their applications (along with other problems identified here) can be circumvented by using priors over inherently meaningful units instead of default priors on standardized scales. This case study illustrates how researchers must directly engage with a number of substantive issues in order to support meaningful inferences, of which we highlight two: The first is the problem of coordination , which requires a researcher to specify how the theoretical constructs postulated by a model are functionally related to observable variables. The second is the problem of generalization , which requires a researcher to consider how a model may represent theoretical constructs shared across similar but non-identical situations, along with the fact that model comparison metrics like Bayes Factors do not directly address this form of generalization. For statistical modeling to serve the goals of science, models cannot be based on default assumptions, but should instead be based on an understanding of their coordination function and on how they represent causal mechanisms that may be expected to generalize to other related scenarios.
... For the adaptive assessment applications, the selection rule was outlined as a within-group factor in mixed-effects ANOVAs. Partial 2 and generalized 2 (Olejnik & Algina, 2003) effect sizes were used to quantify the relevance of these effects in fixed-effects and mixed-effects ANOVAs, respectively. All analyses were conducted using R software (R Core Team, 2020) and ...
Article
Full-text available
Multidimensional forced-choice (FC) questionnaires have been consistently found to reduce the effects of socially desirable responding and faking in non-cognitive assessments. Although FC has been considered problematic for providing ipsative scores under the classical test theory, IRT models enable the estimation of non-ipsative scores from FC responses. However, while some authors indicate that blocks composed of opposite-keyed items are necessary to retrieve normative scores, others suggest that these blocks may be less robust to faking, thus impairing the assessment validity. Accordingly, this article presents a simulation study to investigate whether it is possible to retrieve normative scores using only positively keyed items in pairwise FC computerized adaptive testing (CAT). Specifically, a simulation study addressed the effect of 1) different bank assembly (with a randomly assembled bank, an optimally assembled bank, and blocks assembled on-the-fly considering every possible pair of items), and 2) block selection rules (i.e., T, and Bayesian D and A-rules) over the estimate accuracy and ipsativity and overlap rates. Moreover, different questionnaire lengths (30 and 60) and trait structures (independent or positively correlated) were studied, and a non-adaptive questionnaire was included as baseline in each condition. In general, very good trait estimates were retrieved, despite using only positively keyed items. Although the best trait accuracy and lowest ipsativity were found using the Bayesian A-rule with questionnaires assembled on-the-fly, the T-rule under this method led to the worst results. This points out to the importance of considering both aspects when designing FC CAT.
... It shows that only one variable was found to have a statistically significant association with receptivity to research: cohort. None of the other interactions between the independent variables exhibited meaningful differences (Olejnik and Algina, 2003). This cohort effect suggests that different cohorts included in our sample scored differently on our measure of receptivity to research. ...
Article
Full-text available
Purpose This paper aims to investigate how evidence-based policing (EBP) is understood by police officers and citizens in Taiwan and the influence of police education on police recruit's receptivity to research evidence in policing. Design/methodology/approach The study uses a cross-sectional design that includes Taiwanese police officers ( n = 671) and a control group of Taiwanese criminology undergraduate students ( n = 85). A research instrument covering five themes is developed, and after a pilot test the final scale remains 14 items. Findings The analysis suggests that police officers in Taiwan generally hold a positive view towards the role of research and researchers in policing, more so than is often observed in similar studies conducted in Western countries. Receptivity to research was found to be significantly higher among the non-police sample compared to the police sample. Moreover, time spent in police education was significantly associated with lower levels of receptivity to research. Originality/value The paper makes two original contributions to the literature on police officer receptivity to research. It is the first paper to (1) empirically examine police officers' openness to, and use of research in an Asian setting and (2) to compare police officers' receptivity to research with those of a relevant non-police group.
... To test the statistical significance of the decision time differences between the patterns, we performed mixed ANOVA on the log-transformed decision times, with pattern and explanations as within-participant variables, and the content and the order of the explanations as betweenparticipant variables. ANOVA, or analysis of variance, tests the differences between means in different experimental conditions (Howell, 1997;Olejnik and Algina, 2003). We used Log transform to normalize the skewed decision time distributions. ...
Article
Full-text available
Imaging science has approached subjective image quality (IQ) as a perceptual phenomenon, with an emphasis on thresholds of defects. The paradigmatic design of subjective IQ estimation, the two-alternative forced-choice (2AFC) method, however, requires viewers to make decisions. We investigated decision strategies in three experiments both by asking the research participants to give reasons for their decisions and by examining the decision times. We found that typical for larger quality differences is a smaller set of subjective attributes, resulting from convergent attention toward the most salient attribute, leading to faster decisions and better accuracy. Smaller differences are characterized by divergent attention toward different attributes and an emphasis on preferential attributes instead of defects. In larger differences, attributes have sigmoidal relationships between their visibility and their occurrence in explanations. For other attributes, this relationship is more random. We also examined decision times in different attribute configurations to clarify the heuristics of IQ estimation, and we distinguished a top-down-oriented Take-the-Best heuristic and a bottom-up visual salience-based heuristic. In all experiments, heuristic one-reason decision-making endured as a prevailing strategy independent of quality difference or task.
... All p-values reported below were adjusted with the Greenhouse-Geisser correction when the degree of freedom in the numerator was larger than 1. The reported eta squared (η 2 ) was used to measure the effect size for ANOVAs (Olejnik and Algina, 2003). ...
Article
Full-text available
The two event-related potentials (ERP) studies investigated how verbs and nouns were processed in different music priming conditions in order to reveal whether the motion concept via embodiment can be stimulated and evoked across categories. Study 1 (Tasks 1 and 2) tested the processing of verbs (action verbs vs. state verbs) primed by two music types, with tempo changes (accelerating music vs. decelerating music) and without tempo changes (fast music vs. slow music) while Study 2 (Tasks 3 and 4) tested the processing of nouns (animate nouns vs. inanimate nouns) in the same priming condition as adopted in Study 1. During the experiments, participants were required to hear a piece of music prior to judging whether an ensuing word (verb or noun) is semantically congruent with the motion concept conveyed by the music. The results show that in the priming condition of music with tempo changes, state verbs and inanimate nouns elicited larger N400 amplitudes than action verbs and animate nouns, respectively in the anterior regions and anterior to central regions, whereas in the priming condition of music without tempo changes, action verbs elicited larger N400 amplitudes than state verbs and the two categories of nouns revealed no N400 difference, unexpectedly. The interactions between music and words were significant only in Tasks 1, 2, and 3. Taken together, the results demonstrate that firstly, music with tempo changes and music without tempo prime verbs and nouns in different fashions; secondly, action verbs and animate nouns are easier to process than state verbs and inanimate nouns when primed by music with tempo changes due to the shared motion concept across categories; thirdly, bodily experience differentiates between music and words in coding (encoding and decoding) fashion but the motion concept conveyed by the two categories can be subtly extracted on the metaphorical basis, as indicated in the N400 component. Our studies reveal that music tempos can prime different word classes, favoring the notion that embodied motion concept exists across domains and adding evidence to the hypothesis that music and language share the neural mechanism of meaning processing.
... Mixed-model analyses of variance (ANOVAs) were used to determine the main and interaction effects of layout (betweensubject variable) and depth (within-subject variable) on performance variables, eye gaze features, and self-reported evaluation of the learning experience. The effect size was measured with a generalized η 2 (Olejnik & Algina, 2003). Regarding post hoc analyses, we conducted pairwise t-tests with false discovery rate (FDR) adjustment and simple effect analyses for significant interaction effects. ...
Article
Full-text available
Many video sites or learning platforms allow real-time chatting or asynchronous commenting on specific time points during video lectures. Comments, as user-generated knowledge, facilitate social interaction but also affect cognitive learning. The visual layout of these comments can affect learners' attention and learning, but the effect has rarely been studied. This study compares two common layouts (embedded vs. separated) and considers the content depth of comments through a laboratory eye-tracking experiment involving 40 participants. The results suggest that, with both layouts, learners switched attention to the comments every 10 seconds and stayed focused for 1.3 seconds on average before returning attention to the video. With an embedded layout, learners switched attention more frequently to the comments and remembered more surface-level comments. With a separate layout presenting deep-level comments, learners searched for information faster and performed better on open-book quizzes. We outline the design implications of using timeline-anchored comments to promote online learning.
... For analyses on VO 2 , we log scaled body mass to meet the assumptions of homogeneity of variance. For each analysis, we also reported effect sizes (ω 2 ) for variables using the sjstats package (Olejnik and Algina 2003). For the habitat suitability models, we created a suite of climatic suitability models for green salamanders under current and future climatic conditions. ...
Article
Rapid global change has increased interest in developing ways to identify suitable refugia for species of conservation concern. Correlative and mechanistic species distribution models (SDMs) represent two approaches to generate spatially‐explicit estimates of climate vulnerability. Correlative SDMs generate distributions using statistical associations between environmental variables and species presence data. In contrast, mechanistic SDMs use physiological traits and tolerances to identify areas that meet the conditions required for growth, survival and reproduction. Correlative approaches assume modeled environmental variables influence species distributions directly or indirectly; however, the mechanisms underlying these associations are rarely verified empirically. We compared habitat suitability predictions between a correlative‐only SDM, a mechanistic SDM and a correlative framework that incorporated mechanistic layers (‘hybrid models'). Our comparison focused on green salamanders Aneides aeneus, a priority amphibian threatened by climate change throughout their disjunct range. We developed mechanistic SDMs using experiments to measure the thermal sensitivity of resistance to water loss (ri) and metabolism. Under current climate conditions, correlative‐only, hybrid and mechanistic SDMs predicted similar overlap in habitat suitability; however, mechanistic SDMs predicted habitat suitability to extend into regions without green salamanders but known to harbor many lungless salamanders. Under future warming scenarios, habitat suitability depended on climate scenario and SDM type. Correlative and hybrid models predicted a 42% reduction or 260% increase in area considered to be suitable depending on the climate scenario. In mechanistic SDMs, energetically suitable habitat declined with both climate scenarios and was driven by the thermal sensitivity of ri. Our study indicates that correlative‐only and hybrid approaches produce similar predictions of habitat suitability; however, discrepancies can arise for species that do not occupy their entire fundamental niche, which may hold consequences of conservation planning of threatened species.
... With regard to the overall effect of a group, task, or interaction between two factors, pairwise t-test comparisons with Bonferroni corrections were applied. As a means of measuring effect size, generalized eta square statistics were used [32]. Effect sizes were evaluated as small (0.01), medium (0.06), and large (above 0.14) based on the guidelines described by Cohen [33]. ...
Article
Full-text available
Balance can be a main factor contributing to success in many disciplines, and biathlon is a representative example. A more stable posture may be a key factor for shooting scores. The center of foot pressure (COP) is commonly recorded when evaluating postural control. As COP measurements are highly irregular and non-stationary, non-linear deterministic methods, such as entropy, are more appropriate for the analysis of COP displacement. The aim of our study was to investigate whether the longitudinal effects of biathlon training can elicit specific changes in postural control. Eight national-level biathletes, 15 non-athletes who prior to the experiment took part in 3 months of shooting training, and 15 non-athletes with no prior rifle shooting experience took part in our study. The data was collected with the use of a force plate. Participants performed three balance tasks in quiet standing, the shooting position (internal focus–participants concentrated on maintaining the correct body position and rifle), and aiming at the target (external focus–participants concentrated on keeping the laser beam centered on the targets). Biathletes obtained significantly lower values of sample entropy compared to the other groups during the shooting and aiming at the target trials (p
... The t-Test and ANOVA were used to compare dependent variables according to sociodemographic factors [63]. Effect sizes were also calculated: Conhen's d for t test (small = 0.2, medium = 0.5 and large = 0.8, based on benchmarks suggested by Cohen [64] and eta squared for ANOVA (small = 0.01, medium = 0.06, large = / > 0.14 as suggested by Olejnik and Algina [65]. The internal consistency of the instruments and subscales was assessed through Cronbach's alpha, whose values, according to DeVellis [66], are above the acceptable internal consistency threshold (α ≥ 0.60). ...
Article
Full-text available
This study extends the theory of planned behavior model and examines the humane factors (altruism, environmental knowledge, personal appearance concerns, attitude, perceived behavioral control, and subjective norms) that shape attitudes and buyer behavior toward cruelty-free cosmetics and the consumer characteristics that reflect their behavior toward such products. Recent global occurrences have affected human behavioral patterns, namely, the COVID-19 pandemic, which we aim to study. Has behavior changed to become more ethical? A survey was carried out involving a sample of 425 Portuguese participants (a feminine culture), following a convenience-and snowball-sampling procedure. Significant correlations were found between environmental knowledge, subjective norms, and buyer behavior toward cruelty-free cosmetics with attitude and environmental knowledge and buyer behavior. Through structural equation modeling to evaluate the conceptual model, a good model fit was found, being that standardized values in the model are significant except for regressions from perceived behavior control and personal appearance concerns to buyer behavior toward cruelty-free cosmetics. Women present higher values than men on attitude, altruism, environmental knowledge, and buyer behavior, in line with what is expected in a traditional and conservative feminine culture such as that to be found in Portugal. Such a result points to the need to promote increased gender equality, for example, in senior leadership roles, as women are seen to have the desirable qualities required for a more sustainable, cruelty-free, and humane society. This is an alert for human-resource managers in the region.
... To reliably measure a twofold change in the levels of microRNAs relative to NGTs (the equivalent of one Ct-value difference by qPCR) and assuming a standard deviation (SD) of 30% of the mean (effect size = 1.667) 14 with α = 0.05, power = 87%, we would need eight individuals per group. With samples from 8 pre-diabetes and 10 NGT participants at all three time points, the observed SDs were much smaller (<13% of mean; all time points) than the expected 30%, thereby offering the desired statistical power for these analyses. ...
Article
Full-text available
With type 2 diabetes presenting at younger ages, there is a growing need to identify biomarkers of future glucose intolerance. A high (20%) prevalence of glucose intolerance at 18 years was seen in women from the Pune Maternal Nutrition Study (PMNS) birth cohort. We investigated the potential of circulating microRNAs in risk stratification for future pre-diabetes in these women. Here, we provide preliminary longitudinal analyses of circulating microRNAs in normal glucose tolerant (NGT@18y, N = 10) and glucose intolerant ( N = 8) women (ADA criteria) at 6, 12 and 17 years of their age using discovery analysis (OpenArray™ platform). Machine-learning workflows involving Lasso with bootstrapping/leave-one-out cross-validation identified microRNAs associated with glucose intolerance at 18 years of age. Several microRNAs, including miR-212-3p, miR-30e-3p and miR-638, stratified glucose-intolerant women from NGT at childhood. Our results suggest that circulating microRNAs, longitudinally assessed over 17 years of life, are dynamic biomarkers associated with and predictive of pre-diabetes at 18 years of age. Validation of these findings in males and remaining participants from the PMNS birth cohort will provide a unique opportunity to study novel epigenetic mechanisms in the life-course progression of glucose intolerance and enhance current clinical risk prediction of pre-diabetes and progression to type 2 diabetes.
... The Matlab-toolbox "Measures of Effect Size" version 1.6.1 (Hentschke and Stüttgen, 2011) was used to calculate the effect sizes Cohen's U3 for Mann-Whitney U-tests, the η 2 for analyses of variance, Hedges' g 1 for one sample t-tests, and Glass' for two sample t-tests. The partial-eta-squared (η 2 p ) and generalizedeta-squared (η 2 G ) were calculated by us for repeated measure analyses of variance (rmANOVA) (Olejnik and Algina, 2003;Bakeman, 2005). In all tests, an alpha level of 0.05 was used for significance. ...
Article
Full-text available
... Corrected degrees of freedom using the Greenhouse-Geisser technique are always reported for repeated measures with more than two levels. Generalized eta squared (η 2 G ) is provided as an effect size statistic (Bakeman, 2005;Lakens, 2013;Olejnik & Algina, 2003) for all omnibus tests. Post hoc comparisons were Holm-Bonferroni corrected to control for multiple comparisons. ...
Preprint
Full-text available
The idea that there is a self-controlled learning advantage, where individuals demonstrate improved motor learning after exercising choice over an aspect of practice compared to no-choice groups, has different causal explanations according to the OPTIMAL theory or an information-processing perspective. Within OPTIMAL theory, giving learners choice is considered an autonomy-supportive manipulation that enhances expectations for success and intrinsic motivation. In the information-processing view, choice allows learners to engage in performance-dependent strategies that reduce uncertainty about task outcomes. To disentangle these potential explanations, we provided participants in choice and yoked groups with error or graded feedback (Experiment 1) and binary feedback (Experiment 2) while learning a rapid reaching task with spatial and timing goals. Across both experiments (N = 228 participants), we did not find evidence to support a self-controlled learning advantage. Exercising choice during practice did not increase perceptions of autonomy, competence, or intrinsic motivation, nor did it lead to more accurate error estimation skills. Both error and graded feedback facilitated skill acquisition and learning, whereas no improvements from pre-test performance were found with binary feedback. Finally, the impact of graded and binary feedback on perceived competence highlights a potential dissociation of motivational and informational roles of feedback. Although our results regarding self-controlled practice conditions are difficult to reconcile with either the OPTIMAL theory or the information-processing perspective, they are consistent with a growing body of evidence that strongly suggests self-controlled conditions are not an effective approach to enhance motor performance and learning.
... For these tests, the clusters we are using are the true UFD remnant groups because we take the labels directly from the simulations. To quantify the level of the effect, we also calculate the ω 2 value of each test (e.g., Olejnik & Algina 2003). This metric is similar to R 2 in the context of regression analysis while also accounting for the degrees of freedom in the model. ...
Article
Full-text available
The Milky Way has accreted many ultra-faint dwarf galaxies (UFDs), and stars from these galaxies can be found throughout our Galaxy today. Studying these stars provides insight into galaxy formation and early chemical enrichment, but identifying them is difficult. Clustering stellar dynamics in 4D phase space ( E , L z , J r , J z ) is one method of identifying accreted structure that is currently being utilized in the search for accreted UFDs. We produce 32 simulated stellar halos using particle tagging with the Caterpillar simulation suite and thoroughly test the abilities of different clustering algorithms to recover tidally disrupted UFD remnants. We perform over 10,000 clustering runs, testing seven clustering algorithms, roughly twenty hyperparameter choices per algorithm, and six different types of data sets each with up to 32 simulated samples. Of the seven algorithms, HDBSCAN most consistently balances UFD recovery rates and cluster realness rates. We find that, even in highly idealized cases, the vast majority of clusters found by clustering algorithms do not correspond to real accreted UFD remnants and we can generally only recover 6% of UFDs remnants at best. These results focus exclusively on groups of stars from UFDs, which have weak dynamic signatures compared to the background of other stars. The recoverable UFD remnants are those that accreted recently, z accretion ≲ 0.5. Based on these results, we make recommendations to help guide the search for dynamically linked clusters of UFD stars in observational data. We find that real clusters generally have higher median energy and J r , providing a way to help identify real versus fake clusters. We also recommend incorporating chemical tagging as a way to improve clustering results.
Article
... The posterior odds, and 95% credible intervals (95% CI) were reported. We calculated also Omega-squared (ω 2 ) for ANOVA to estimate the effect size ES for the differences between our groups to ensure less biased estimations of variance across aspects of the design (Olejnik and Algina, 2003;Lakens, 2013). Effect sizes were set at: ω 2 > 0.01 = small; ω 2 > 0.06 = moderate; ω 2 > 0.14 = large (Field, 2013). ...
Article
Full-text available
CITATION Alhamdan AA, Murphy MJ and Crewther SG (2022) Age-related decrease in motor contribution to multisensory reaction times in primary school children. Front. Hum. Neurosci. 16:967081. Traditional measurement of multisensory facilitation in tasks such as speeded motor reaction tasks (MRT) consistently show age-related improvement during early childhood. However, the extent to which motor function increases with age and hence contribute to multisensory motor reaction times in young children has seldom been examined. Thus, we aimed to investigate the contribution of motor development to measures of multisensory (auditory, visual, and audiovisual) and visuomotor processing tasks in three young school age groups of children (n = 69) aged (5−6, n = 21; 7−8, n = 25.; 9−10 n = 18 years). We also aimed to determine whether age-related sensory threshold times for purely visual inspection time (IT) tasks improved significantly with age. Bayesian results showed decisive evidence for age-group differences in multisensory MRT and visuo-motor processing tasks, though the evidence showed that threshold time for visual identification IT performance was only slower in the youngest age group children (5−6) compared to older groups. Bayesian correlations between performance on the multisensory MRT and visuo-motor processing tasks indicated moderate to decisive evidence in favor of the alternative hypothesis (BF 10 = 4.71 to 91.346), though not with the threshold IT (BF 10 < 1.35). This suggests that visual sensory system development in children older than 6 years makes a less significant contribution to the measure of multisensory facilitation, compared to motor development. In addition to this main finding, multisensory facilitation of MRT within race-model predictions was only found in the oldest group of children (9−10), supporting previous suggestions that multisensory integration is likely to continue into late childhood/ early adolescence at least.
... To calculate the gain or effect size, we preferred omega squared (ω 2 ), which is an estimate of how much variance in the dependent variables (grammar and listening) are accounted for by the independent variable (CLIL approach with language tuition). ω 2 is recommended for small samples (Olejnik & Algina, 2003) and categorized as small effect=0.01, medium effect=0.06, ...
... For the ANOVA analyses, we report generalized eta-squared as measure of the effect size (ηG 2 , Olejnik & Algina, 2003). We chose this measure of effect size to facilitate comparability of the effect sizes across different research designs as the mixed design in our study is not a standard design). ...
Article
Full-text available
This study reports a field experiment investigating how instructional videos with and without background music contribute to the learning of examination techniques within a formal curriculum of medical teaching. Following a classroom teaching unit on the techniques for examining the knee and the shoulder joint, our participants (N = 175) rehearsed the studied techniques for either the knee or the shoulder joint with an instructional video with or without background music. As dependent measures, we collected a general questionnaire, a prediction of test performance, as well as performance on an exam-like knowledge test covering both joints. For both videos, the participants who had watched the particular video during rehearsal were more accurate in answering the corresponding questions than the participants who had seen the other video, signaling that instructional videos provide a useful tool for rehearsal (i.e., both groups reciprocally served as control groups). For the knee video (less difficult), we observed a detrimental effect of the background music, whereas we observed no such effect for the shoulder video (more difficult). Further explorations revealed that background music might be detrimental for learning, as it reduces the perceived demand characteristics. Because the impact of the demand characteristics might be more pronounced in less difficult instructional videos, we discuss video difficulty as a potential moderating factor. Overall, our study provides evidence that instructional videos could be usefully implemented in formal teaching curricula and that such instructional videos probably should be designed without background music.
... In addition, gender and age groups were used as covariates to adjust for their confounding effect. Given the well-known limitations of p-values [25], omega-squared statistics were calculated to indicate the magnitude of these associations [26]. According to Cohen, statistics close to 0.01, 0.06, and 0.14 should be interpreted as small, medium, and large effects, respectively [27]. ...
Preprint
Background: This study describes the attitudes and practices of Brazilian adults regarding the mandatory vaccination for COVID-19 and the hesitancy to children´s vaccination. Methods: The participants answered an online questionnaire disseminated on social networks. An adaptation of the SAGE-WG questionnaire was used to measure the children's vaccination hesitancy. Results: Among 1,007 participants, 677 (67.4%) believed that vaccination for COVID-19 among adults should be mandatory. Just over half of the participants (51.5%) believe that parents and guardians should be free to decide whether their children should be vaccinated against COVID-19, and 9.1% were unsure about this. Younger, non-religious people who have higher self-perceptions of risk for COVID-19, and who evaluate the federal government's performance in combating the disease as bad or very bad, have a higher agreement with mandatory vaccination, a lower agreement that parents and guardians should be free to vaccinate their children, and lower child vaccination hesitancy scores. Conclusion: In Brazil, mandatory COVID-19 vaccination for adults is far from a consensus, and an expressive part of the population believes that parents and guardians should be free to choose whether or not to vaccinate their children. These perceptions and vaccine hesitancy for children are associated with religious and political inclinations.
... The ANCOVA and post hoc analyses results were accompanied with F-statistics, t-statistics, p-values and effect size. The effect size was evaluated by generalized eta squared ( 2 G ) (Olejnik & Algina, 2003), Cohen's d values and characterized as small ( < 0.06 ), medium (0.06-0.14), or large ( > 0.14 ), according to (Cohen, 2013). Additionally, we reported the mean (M), and standard deviation (SD) of the measures of interest. ...
Article
Full-text available
Identification of informative signatures from electrophysiological signals is important for understanding brain developmental patterns, where techniques such as magnetoencephalography (MEG) are particularly useful. However, less attention has been given to fully utilizing the multidimensional nature of MEG data for extracting components that describe these patterns. Tensor factorizations of MEG yield components that encapsulate the data’s multidimensional nature, providing parsimonious models identifying latent brain patterns for meaningful summarization of neural processes. To address the need for meaningful MEG signatures for studies of pediatric cohorts, we propose a tensor-based approach for extracting developmental signatures of multi-subject MEG data. We employ the canonical polyadic (CP) decomposition for estimating latent spatiotemporal components of the data, and use these components for group level statistical inference. Using CP decomposition along with hierarchical clustering, we were able to extract typical early and late latency event-related field (ERF) components that were discriminative of high and low performance groups ($$p < 0.05$$) and significantly correlated with major cognitive domains such as attention, episodic memory, executive function, and language comprehension. We demonstrate that tensor-based group level statistical inference of MEG can produce signatures descriptive of the multidimensional MEG data. Furthermore, these features can be used to study group differences in brain patterns and cognitive function of healthy children. We provide an effective tool that may be useful for assessing child developmental status and brain function directly from electrophysiological measurements and facilitate the prospective assessment of cognitive processes.
... Significant interactions were followed by Bonferroni post hoc tests. Since there is no consensus regarding the calculation of effect sizes in HLM models, effect sizes of the interactions were calculated by partial eta squared, computed from general linear models (GLM), as recommended by Olejnik and Algina (2003). Partial eta squared values of 0.01 are considered small, 0.09 medium, and 0.25 large (Cohen et al., 2003). ...
Article
Full-text available
Objectives During the last decade, mindfulness-based interventions have been implemented in the educational system. Such programs could follow several approaches, including an indirect approach, in which interventions are delivered only to teachers and a combination in which interventions are delivered to both teachers and students. Because of the importance of teacher’s involvement in programs designed to help children, we compared students’ impact of indirect, combined, and control groups over time. The indirect program delivered was the “Call to Care – Israel for Teachers,” and the direct program was the “Call to Care Israel” for students. Both programs employ mindfulness, compassion, and training of social-emotional skills, with a unique emphasis on care.Methods Two hundred 4th and 5th grade students were divided into indirect (2 classrooms), combined (3 classrooms), or control groups (3 classrooms). Each condition was implemented in a different school; schools were randomly divided into groups. The interventions were delivered by trained facilitators and included 20 weekly meetings. Outcomes for students were measured before the intervention, after it ended, and 6 months later.ResultsHierarchical linear models revealed that both the indirect and the combined approaches were effective in improving well-being, anxiety, attention, and teacher’s availability and acceptance, while only the combined approach was effective in improving mindfulness, somatization, classroom atmosphere, and pro-social behavior.Conclusions Our results suggest that the combined approach is more beneficial than the indirect approach. However, given the scalability and cost of the indirect approach, it should also be considered an effective alternative.
Article
Online and face-to-face coactions are widely used work organization modes. This study aims to investigate the effect of social comparison direction on task performance when people coact online. A total of 40 individuals were recruited to participate in a 2 (coaction type: online and face to face) × 3 (social comparison direction: upward, downward, and no comparison) × 2 (phase: pre-comparison and post-comparison) within-subject experiment. The participants performed visual search tasks while their response time and search accuracy rates were measured. Results showed that the participants were reported to perform faster when they coacted online than face to face. The upward comparison led to a stronger social facilitation effect than the downward and no comparison directions, either in online or face-to-face coaction. These findings provide practical implications in the design of coaction modes for groups and teams working remotely.
Article
Full-text available
Bimanual coordination is an essential component of human movement. Cooperative bimanual reaching tasks are widely used to assess the optimal control of goal-directed reaching. However, little is known about the neuromuscular mechanisms governing these tasks. Twelve healthy, right-handed participants performed a bimanual reaching task in a 3-dimensional virtual reality environment. They controlled a shared cursor, located at the midpoint between the hands, and reached to targets located at 80% of full arm extension. Following a baseline of normal reaches, we placed a wrist weight on one arm and measured the change in coordination. Relative contribution (RC) was computed as the displacement of the right hand divided by the sum of displacements of both hands. We used surface electromyography placed over the anterior deltoid and biceps brachii to compute muscle contribution (MC) from root mean squared muscle activity data. We found RC was no different than 50% during baseline, indicating participants reached with equal displacements when no weights were applied. Participants systematically altered limb coordination in response to altered limb dynamics. RC increased by 0.91% and MC decreased by 5.3% relative to baseline when the weight was applied to the left arm; RC decreased by 0.94% and MC increased by 6.3% when the weight was applied to the right arm. Participants adopted an optimal control strategy that attempted to minimize both kinematic and muscular asymmetries between limbs. What emerged was a tradeoff between these two parameters, and we propose this tradeoff as a potential neuromuscular mechanism of cooperative bimanual reaching.
Article
This work contributes a research protocol for evaluating human-AI interaction in the context of specific AI products. The research protocol enables UX and HCI researchers to assess different human-AI interaction solutions and validate design decisions before investing in engineering. We present a detailed account of the research protocol and demonstrate its use by employing it to study an existing set of human-AI interaction guidelines. We used factorial surveys with a 2x2 mixed design to compare user perceptions when a guideline is applied versus violated, under conditions of optimal versus sub-optimal AI performance. The results provided both qualitative and quantitative insights into the UX impact of each guideline. These insights can support creators of user-facing AI systems in their nuanced prioritization and application of the guidelines.
Article
To successfully interact with objects in complex and crowded environments, we often perform visual search to detect or identify a relevant target (or targets) among distractors. Previous studies have reported a redundancy gain when two targets instead of one are presented in a simple target detection task. However, research is scant about the role of multiple targets in target discrimination tasks, especially in the context of visual search. Here, we address this question and investigate its underlying mechanisms in a pop-out search paradigm. In Experiment 1, we directly compared visual search performance for one or two targets for detection or discrimination tasks. We found that two targets led to a redundancy gain for detection, whereas it led to a redundancy cost for discrimination. To understand the basis for the redundancy cost observed in discrimination tasks for multiple targets, we further investigated the role of perceptual grouping (Experiment 2) and stimulus–response feature compatibility (Experiment 3). We determined that the strength of perceptual grouping among homogenous distractors was attenuated when two targets were present compared with one. We also found that response compatibility between two targets contributed more to the redundancy cost compared with perceptual compatibility. Taken together, our results show how pop-out search involving two targets is modulated by the level of feature processing, perceptual grouping, and compatibility of perceptual and response features.
Article
Marketing scholars often compare groups that occur naturally (e.g., socioeconomic status) or due to random assignment of participants to study conditions. These scholars often report group means, standard deviations, and effect sizes. Although such summary information can be helpful, it can misinform marketing practitioners’ decisions. To avoid this problem, scholars should also report the probability that one group’s member will score higher than another group’s member, and by various amounts. In this vein, newly conceived gain-probability diagrams can depict relevant, concise, and easy-to-comprehend probabilistic information. These diagrams’ nuanced perspective can contradict traditional significance test and effect size implications.
Article
The measurement and communication of the effect size of an independent variable on a dependent variable is critical to effective statistical analysis in the Social Sciences. We develop ideas about how to extend traditional methods of evaluating relationships in multivariate models to explain and illustrate the statistical power of a focal independent variable. Even with a growing acceptance of the need to report effect sizes, scholars in the management community have few well-established protocols or guidelines for reporting effect sizes. In this editorial essay, we: (1) review the necessity of reporting effect sizes; (2) discuss commonly used measures of effect size and accepted cut-offs for large, medium, and small effect sizes; (3) recommend standards for reporting effect sizes via verbal descriptions and graphical presentations; and (4) present best practice examples of reporting and discussing effect size. In summary, we provide guidance for authors on how to report and interpret effect sizes, advocating for rigor and completeness in statistical analysis.
Article
The idea that there is a self-controlled learning advantage, where individuals demonstrate improved motor learning after exercising choice over an aspect of practice compared to no-choice groups, has different causal explanations according to the OPTIMAL theory or an information-processing perspective. Within OPTIMAL theory, giving learners choice is considered an autonomy-supportive manipulation that enhances expectations for success and intrinsic motivation. In the information-processing view, choice allows learners to engage in performance-dependent strategies that reduce uncertainty about task outcomes. To disentangle these potential explanations, we provided participants in choice and yoked groups with error or graded feedback (Experiment 1) and binary feedback (Experiment 2) while learning a novel motor task with spatial and timing goals. Across both experiments (N = 228 participants), we did not find any evidence to support a self-controlled learning advantage. Exercising choice during practice did not increase perceptions of autonomy, competence, or intrinsic motivation, nor did it lead to more accurate error estimation skills. Both error and graded feedback facilitated skill acquisition and learning, whereas no improvements from pre-test performance were found with binary feedback. Finally, the impact of graded and binary feedback on perceived competence highlights a potential dissociation of motivational and informational roles of feedback. Although our results regarding self-controlled practice conditions are difficult to reconcile with either the OPTIMAL theory or the information-processing perspective, they are consistent with a growing body of evidence that strongly suggests self-controlled conditions are not an effective approach to enhance motor performance and learning.
Thesis
Full-text available
This study, inspired by the conceptual framework adapted into school settings by Hoy, investigated teacher and school level factors predicting school mindfulness, and employed an explanatory sequential mixed-methods design. Quantitative data came from 1354 teachers nested in 69 middle schools in Ankara. The role of teacher (gender, age, level of education, length of time with principal, years of experience in school, total years of experience in teaching, number of days attended professional development activities) and school level factors (school size, mode of schooling, class size, organizational trust, collective teacher efficacy, academic press) on the school mindfulness variation within and between schools was explored by HLM. Results revealed that 12,7% of the total variance in school mindfulness originates from between school variations, length of time with principal is a significant yet negative predictor at teacher level, and teacher trust in principal, teacher trust in colleagues and collective efficacy in student discipline are significant predictors at school level. All significant teacher and school level predictors explain 96.9% of the between school variation in school mindfulness. In the second phase, an embedded single case design was adopted, and 12 semi-structured interviews were made with school principals, who scored the highest and the lowest in principal mindfulness subscale of the M-Scale, based on teachers' responses within the former phase. Qualitative data was analyzed by theoretical thematic analysis method. For triangulation, unstructured observations and school websites were also analyzed. This study concludes that trust, dynamic collaboration and mutual communication in schools enhance mindfulness by increasing the capacity for problem-solving, sharing decision-making and learning together.
Article
Thesis
Full-text available
In today's world, where technological developments are advancing rapidly, it is needed to individuals who can produce creative solutions for the problems encountered in daily life. This situation revealed the importance of providing algorithmic thinking education which provides quick and easy solution by dividing the problems into steps. For this reason, from an early age programming education based on algorithmic thinking is taught in the schools. Correct determination of the materials and teaching methods to be used in programming education is important in order to be successful in achieving the goals of the course. In addition, impact of the methods, techniques and materials used in these trainings on the academic achievement of the students is a important issue for educators. In this thesis, in order to help the high school students to overcome the difficulties encountered while learning the basics of programming, an 8-week Arduino based instruction was conducted and the effect of this course on students' academic achievement and programming attitudes was examined. Firstly, the importance of programming and the problems encountered were explained and robotic courses were designed with Arduino in order to achieve the goals determined in the curriculum of the Ministry of National Education. Academic achievement test and attitude scale towards programming were developed as data collection tools. These instruments were applied to the study groups consisting of experimental and control groups before and after the experimental procedure and the data were collected and analyzed. Repeated ANOVA method was used for data analysis too. The results of the study showed that there was a positive increase in academic achievement between experimental group students taking robotics course and control group students who did not take the course. In addition, according to the results of the permanency test applied to students after four weeks of robotic education is interrupted, it can be said that the teaching performed is permanent. There was no significant difference in as for the attitude towards programming between the two groups.
Article
Gaze behaviour is an important component of successful social interactions. Existing research on social gaze and attention has largely focused on gaze detection and following, rather than the two-way communicative component of gaze that operates between individuals. The present study sought to address this in two experiments. First, “hiders” were eye-tracked while they selected hiding places among a grid of boxes on a computer screen; these boxes were either homogeneous or contained a visually unique pop-out item. Importantly, sometimes hiders believed that their gaze would be seen by hypothetical “seekers” who they might wish to deceive or communicate truthful information to; and sometime hiders believed that their gaze would be concealed. In a second experiment, seekers were asked to select the hiders' locations after viewing the hiders' gaze behaviour, including the eye movements that hiders had been (falsely) told would be concealed. Results indicate that seekers are most accurate when hiders use their gaze to truthfully communicate their selected locations and least accurate when hiders aim to deceive. Notably, both communication and interpretation strategies were affected by the visual display type (e.g., hiders looked to and preferentially selected pop-out items when communicating truthfully while seekers interpreted gaze differently when allocated to these pop-out items), indicating that the visual context can be integrated with gaze to facilitate mis/communication. Our study illuminates how the gaze of an individual acquires and signals information, and that individuals will spontaneously adjust the balance between these two functions based on their current goal and visual environment.
Chapter
Much research in the economics and social sciences is based on nonmeaningful units. A consequence is that substantive researchers often have difficulty in interpreting their data. If there is a one-point difference between the means in two conditions on, say, subjective well-being, it need not be clear how to interpret that difference. In addition, many researchers have criticized the typical conversion to standardized effect size indices, expressed as standard deviation units, such as Cohen’s d. In contrast, we take a completely different approach that features conversion into a probability that a randomly selected person from one population will score higher on the dependent variable than a randomly selected person from the other population. Because most distributions are skewed, the mathematics to be presented fall under the large umbrella of skew normal distributions rather than under the smaller umbrella of normal distributions. Two real data examples are given for illustration of our main results.
Article
Full-text available
The goal of the present study is to examine the cognitive/affective physiological correlates of passenger travel experience in autonomously driven transportation systems. We investigated the social acceptance and cognitive aspects of self-driving technology by measuring physiological responses in real-world experimental settings using eye-tracking and EEG measures simultaneously on 38 volunteers. A typical test run included human-driven (Human) and Autonomous conditions in the same vehicle, in a safe environment. In the spectrum analysis of the eye-tracking data we found significant differences in the complex patterns of eye movements: the structure of movements of different magnitudes were less variable in the Autonomous drive condition. EEG data revealed less positive affectivity in the Autonomous condition compared to the human-driven condition while arousal did not differ between the two conditions. These preliminary findings reinforced our initial hypothesis that passenger experience in human and machine navigated conditions entail different physiological and psychological correlates, and those differences are accessible using state of the art in-world measurements. These useful dimensions of passenger experience may serve as a source of information both for the improvement and design of self-navigating technology and for market-related concerns.
Article
To enable flexible and controlled research on personality, information processing, and interactions in socio-emotional contexts, the availability of highly controlled stimulus material, especially trait words and related attributes, is indispensable. Existing word databases contain mainly nouns and rating dimensions, and their role in studies within socio-emotional contexts are limited. This study aimed to create an English list of traits (ELoT), a database containing 500 trait adjectives rated by a large sample ( n = 822, 57.42% female). The rating categories refer to the perceived valence associated with the traits and their social desirability and observability. Participants of different ages (18 to 65 years of age) and educational levels rated the words in an online survey. Both valence and social desirability ratings showed a bimodal distribution, indicating that most traits were rated either positive (respectively socially desirable) or negative (respectively socially undesirable), with fewer words rated as neutral. For observability, a bell-shaped distribution was found. Results indicated a strong association between valence and social desirability, whereas observability ratings were only moderately associated with the other ratings. Valence and social desirability ratings were not related to participants’ age or gender, but observability ratings were different for females and males, and for younger, middle-aged, and older participants. The ELoT is an extensive, freely available database of trait norms. The large sample and the balanced age and gender distributions allow to account for age- and gender-specific effects during stimulus selection.
Article
Users’ personality traits can take an active role in affecting their behavior when they interact with a computer interface. However, in the area of recommender systems (RS), though personality-based RS has been extensively studied, most works focus on algorithm design, with little attention paid to studying whether and how the personality may influence users’ interaction with the recommendation interface. In this manuscript, we report the results of a user study (with 108 participants) that not only measured the influence of users’ personality traits on their perception and performance when using the recommendation interface but also employed an eye-tracker to in-depth reveal how personality may influence users’ eye-movement behavior. Moreover, being different from related work that has mainly been conducted in a single product domain, our user study was performed in three typical application domains (i.e., electronics like smartphones, entertainment like movies, and tourism like hotels). Our results show that mainly three personality traits, i.e., Openness to experience , Conscientiousness , and Agreeableness , significantly influence users’ perception and eye-movement behavior, but the exact influences vary across the domains. Finally, we provide a set of guidelines that might be constructive for designing a more effective recommendation interface based on user personality.
Article
Writing is an important skill for communicating knowledge in science, technology, engineering, and mathematics (STEM) and an aid to developing students' communication skills, content knowledge, and disciplinary thinking. Despite the importance of writing, its incorporation into the undergraduate STEM curriculum is uneven. Research indicates that understanding faculty beliefs is important when trying to propagate evidence-based instructional practices, yet faculty beliefs about writing pedagogies are not yet broadly characterized for STEM teaching at the undergraduate level. Based on a nationwide cross-disciplinary survey at research-intensive institutions, this work aims to understand the extent to which writing is assigned in undergraduate STEM courses and the factors that influence faculty members' beliefs about, and reported use of, writing-based pedagogies. Faculty attitudes about the effectiveness of writing practices did not differ between faculty who assign and do not assign writing; rather, beliefs about the influence of social factors and contextually imposed instructional constraints informed their decisions to use or not use writing. Our findings indicate that strategies to increase the use of writing need to specifically target the factors that influence faculty decisions to assign or not assign writing. It is not faculty beliefs about effectiveness, but rather faculty beliefs about behavioral control and constraints at the departmental level that need to be targeted.
Article
Supervision of automated systems is an ubiquitous aspect of most of our everyday life activities which is even more necessary in high risk industries (aeronautics, power plants, etc.). Performance monitoring related to our own error making has been widely studied. Here we propose to assess the neurofunctional correlates of system error detection. We used an aviation-based conflict avoidance simulator with a 40% error-rate and recorded the electroencephalographic activity of participants while they were supervising it. Neural dynamics related to the supervision of system's correct and erroneous responses were assessed in the time and time-frequency domains to address the dynamics of the error detection process in this environment. Two levels of perceptual difficulty were introduced to assess their effect on system's error detection-related evoked activity. Using a robust cluster-based permutation test, we observed a lower widespread evoked activity in the time domain for errors compared to correct responses detection, as well as a higher theta-band activity in the time-frequency domain dissociating the detection of erroneous from that of correct system responses. We also showed a significant effect of difficulty on time-domain evoked activity, and of the phase of the experiment on spectral activity: a decrease in early theta and alpha at the end of the experiment, as well as interaction effects in theta and alpha frequency bands. These results improve our understanding of the brain dynamics of performance monitoring activity in closer-to-real-life settings and are a promising avenue for the detection of error-related components in ecological and dynamic tasks.
Article
The statistical literature is replete with calls to report standardized measures of effect size alongside traditional p-values and null hypothesis tests. While effect-size measures such as Cohen’s d and Hedges’s g are straightforward to calculate for t tests, this is not the case for parameters in more complex linear models, where traditional effect-size measures such as η ² and ω ² face limitations. After a review of effect sizes and their implementation in Stata, I introduce the community-contributed command mces. This postestimation command reports standardized effect-size statistics for dichotomous comparisons of marginal-effect contrasts obtained from margins and mimrgns, including with complex samples, for continuous outcome variables. mces provides Stata users the ability to report straightforward estimates of effect size in many modeling applications.
Article
Full-text available
Tests for experiments with matched groups or repeated measures designs use error terms that involve the correlation between the measures as well as the variance of the data. The larger the correlation between the measures, the smaller the error and the larger the test statistic. If an effect size is computed from the test statistic without taking the correlation between the measures into account, effect size will be overestimated. Procedures for computing effect size appropriately from matched groups or repeated measures designs are discussed.
Article
Full-text available
Statistical significance is concerned with whether a research result is due to chance or sampling variability; practical significance is concerned with whether the result is useful in the real world. A growing awareness of the limitations of null hypothesis significance tests has led to a search for ways to supplement these procedures. A variety of supplementary measures of effect magnitude have been proposed. The use of these procedures in four APA journals is examined, and an approach to assessing the practical significance of data is described.
Article
Full-text available
Although the consequences of ignoring a nested factor on decisions to reject the null hypothesis of no treatment effects have been discussed in the literature, typically researchers in applied psychology and education ignore treatment providers (often a nested factor) when comparing the efficacy of treatments. The incorrect analysis, however, not only invalidates tests of hypotheses, but it also overestimates the treatment effect. Formulas were derived and a Monte Carlo study was conducted to estimate the degree to which the F statistic and treatment effect size measures are inflated by ignoring the effects due to providers of treatments. These untoward effects are illustrated with examples from psychotherapeutic treatments.
Article
Article
In the light of continuing debate over the applications of significance testing in psychology journals and following the publication of J. Cohen's (1994) article, the Board of Scientific Affairs (BSA) of the American Psychological Association (APA) convened a committee called the Task Force on Statistical Interference (TFSI) whose charge was "to elucidate some of the controversial issues surrounding applications of statistics including significance testing and its alternatives; alternative underlying models and data transformation; and newer methods made possible by powerful computers" (BSA, personal communication, February 28, 1996). After extensive discussion, the BSA recommended that publishing an article in American Psychologist, as a way to initiate discussion in the field about changes in current practices of data analysis and reporting may be appropriate. This report follows that request. Following each guideline are comments, explanations, or elaborations assembled by L. Wilkinson for the task force and under its review. The report is concerned with the use of statistical methods only and is not meant as an assessment of research methods in general. The title and format of the report are adapted from an article by J. C. Bailar and F. Mosteller (1988). (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
The present paper is concerned with measuring the size of an effect for fixed effects factorial analysis of variance (ANOVA) experimental designs. A brief review of the literature is provided, emphasizing the need to use such a measure in actual research. Measuring strength of effect is discussed in correlational terms, taking advantage of the linear model formulation for fixed effects ANOVA. It is concluded that a squared partial correlation between factor and response (partialing out effects of other factors) is usually to be preferred to the corresponding un-partialled measure (ω2) advocated by Hays (1963) and others. Examples are provided to illustrate the practical implications of this distinction.
Article
Statistics used to estimate the population correlation ratio were reviewed and evaluated. The sampling distributions of Kelley's ε2 and Hays' ω2 were studied empirically by computer simulation within the context of a three level one-way fixed effects analysis of variance design. These statistics were found to have rather large standard errors when small samples were used. As with other correlation indices, large samples are recommended for accuracy of estimation. Both ε2 and ω2 were found to be negligibly biased. Heterogeneity of variances had negligible effects on the estimates under conditions of proportional representativeness of sample sizes with respect to their population counterparts, but combinations of heterogeneity of variance and unrepresentative sample sizes yielded especially poor estimates.
Article
Encourages additional analysis of data by providing computational formulas appropriate to estimating the strength of effects in basic 1-way, 2-way, and 3-way ANOVA designs. Issues which develop when such estimation is attempted in repeated-measures designs are examined, and limitations to attempting such analysis in respect of specific treatment levels (and contrasts) are indicated. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
The effectiveness of 3 estimators of treatment magnitude were compared numerically for samples from a normal and exponential distribution. The estimators were compared for J. Cohen's (1969) definitions of small, medium, and large population treatment effects. It was found that omega squared was a more accurate estimator, while eta squared had the smallest sampling variability. (17 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Analyzes and clarifies the differences between eta-squared and partial eta-squared in fixed factor analysis of variance (ANOVA) designs. The formulas are presented and discussed, and an example is presented along with the appropriate use and meaning of the 2 coefficients. Finally, a general discussion of the use of eta-squared and partial eta-squared is provided. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Discusses the increasing awareness that the mere statistical significance of an experimental effect is insufficient to warrant the conclusion that the effect is large and practically important. A number of related measures of the magnitude of experimental effects which can be applied to the results of a 1-way analysis of variance but not to the results of a more complicated design are available. The proper measure for a complex design depends on whether other factors are fixed or random, and the uncritical following of advice given in the literature can result in serious over- or underestimation of the magnitude of experimental effects. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
A large and thorough book in the intermediate level. Notable for its clarity and coverage. Harvard Book List (edited) 1964 #77 (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Describes computational procedures for determining estimates of magnitude of effect or proportion of variance (w2) for a variety of analysis of variance designs. Tables are presented summarizing simplified computational formulas for fixed, random, and mixed designs, including both nonrepeated- and repeated-measures cases. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Proposes a method for estimating the variance explained by each term of an analysis of variance model. The core of this method is the determination of an equation that expresses the total variance of the dependent variable as a weighted sum of variance components. Previous estimation methods are thus extended to include finite random effects, random effects, and fixed effects models in 1 parsimonious set of formulae. It is shown that previous methods overestimated the variance explained by interactions in mixed models. The proposed procedure is demonstrated for the 2-way design, and a brief historical review is presented. (27 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Examines relationships among 3 ANOVA measures of association—eta squared, epsilon squared, and omega squared. The rationale for each measure is developed within the fixed-effects ANOVA model, and the measures are related to corresponding measures of association in the regression model. Special attention is paid to the conceptual distinction between measures of association in fixed- vs random-effects designs. Limitations of these measures in fixed-effects designs are discussed, and recommendations for usage are provided. (43 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
In meta-analysis, it is possible to average results across multiple studies because the effect sizes estimated from each study are in the same metric (e.g., the standardized mean difference). However, when effect sizes are computed from a factorial analysis of variance, these estimates are influenced by the other factors in the design. A correction developed by G. V. Glass, B. McGaw, and M. L. Smith (1981) solves this problem; however, it requires information (e.g., sums of squares) that is often not available in published research. A reformulated version of the correction is presented, which requires only F values and degrees of freedom. The impact of the correction on effect size estimates is examined. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Two different approaches have been used to derive measures of effect size. One approach is based on the comparison of treatment means. The standardized mean difference is an appropriate measure of effect size when one is merely comparing two treatments, but there is no satisfactory analogue for comparing more than two treatments. The second approach is based on the proportion of variance in the dependent variable that is explained by the independent variable. Estimates have been proposed for both fixed-factor and random-factor designs, but their sampling properties are not well understood. Nevertheless, measures of effect size can allow quantitative comparisons to be made across different studies, and they can be a useful adjunct to more traditional outcome measures such as test statistics and significance levels.
Article
Although dissatisfaction with the limitations associated with tests for statistical significance has been growing for several decades, applied researchers have continued to rely almost exclusively on these indicators of effect when reporting their findings. To encourage an increased use of alternative measures of effect, the present paper discusses several measures of effect size that might be used in group comparison studies involving univariate and/or multivariate models. For the methods discussed, formulas are presented and data from an experimental study are used to demonstrate the application and interpretation of these indices. The paper concludes with some cautionary notes on the limitations associated with these measures of effect size.
Article
In recent years, researchers have recognized the importance of the concept of effect size for planning research, determining the significance of research results, and accumulating results across studies. However, the uncritical use of effect-size indicators may lead to different interpretations of similar research findings because of differences in assumptions underlying the nature of the research, aspects of the phenomenon being investigated, or the methodological characteristics of the research. This article reviews the substantive, measurement, and methodological issues that influence the relative magnitude of an empirical effect size, The relationships and transformations between different types of effect-size indicators are presented. It is the thesis of this article that the meaningfulness of an estimated effect size should be interpreted with consideration of the type of research (relational vs, experimental), the anticipated application of the results obtained (effects application vs. theory testing), and the research history in the domain of inquiry. Researchers must be cognizant of the many different causal factors that influence effect size before using the magnitude of an effect for assessing the importance of research results, calculating the statistical power of a test, or synthesizing findings across different studies.
• I Special
• Section Eta
• Omega Squared
Accepted June 23, 2003 I SPECIAL SECTION: ETA AND OMEGA SQUARED STATISTICS
A and C are between-subjects factors, and B is a repeated measures factor. N ˜ is the total number of individuals in the study. SPECIAL SECTION: Sampling characteristics of Kelly's 2 and Hays' 2
• Eta Note
• Omega
• R M Statistics References Carroll
• L A Nordholm
Note. A and C are between-subjects factors, and B is a repeated measures factor. N ˜ is the total number of individuals in the study. SPECIAL SECTION: ETA AND OMEGA SQUARED STATISTICS References Carroll, R. M., & Nordholm, L. A. (1975). Sampling characteristics of Kelly's 2 and Hays' 2. Educational and Psychological Measurement, 35, 541–554.
Statistical procedures and their mathematical bases
• C C Peters
• W R Van Voorhis
Peters, C. C., & Van Voorhis, W. R. (1940). Statistical procedures and their mathematical bases. New York: McGraw-Hill.