Article
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The difference was significant with a very large effect size (p < .0001, d = 1.80; Sawilowsky, 2009). The ROC curves of the two models are compared in Figure 2A. ...
... The difference was significant with a very large effect size (p < .0001, d = 1.87; Sawilowsky, 2009). The ROC curves of the two models are presented in Figure 2B. ...
... Similarly, the upper bound of the null hypothesis's confidence interval for spontaneous speech analysis was 58.7%, which indicates that the in-lab spontaneous speech DPI was not performing better than random features. In contrast with in-lab DPIs, the average accuracy of the in-field DPI was significantly higher at 73.4% with a very large effect size (Cohen's d = 1.8; Sawilowsky, 2009). Therefore, DPI appears to need more voicing data than what is obtained in a short traditional in-clinic recording to perform appropriately. ...
Article
Full-text available
Purpose The Daily Phonotrauma Index (DPI) can quantify pathophysiological mechanisms associated with daily voice use in individuals with phonotraumatic vocal hyperfunction (PVH). Since DPI was developed based on weeklong ambulatory voice monitoring, this study investigated if DPI can achieve comparable performance using (a) short laboratory speech tasks and (b) fewer than 7 days of ambulatory data. Method An ambulatory voice monitoring system recorded the vocal function/behavior of 134 females with PVH and vocally healthy matched controls in two different conditions. In the laboratory, the participants read the first paragraph of the Rainbow Passage and produced spontaneous speech (in-lab data). They were then monitored for 7 days (in-field data). Separate DPI models were trained from the in-lab and in-field data using the standard deviation of the difference between the magnitude of the first two harmonics (H1–H2) and the skewness of neck-surface acceleration magnitude. First, 10-fold cross-validation evaluated the classification performance of the in-lab and in-field DPIs. Second, the effect of the number of ambulatory monitoring days on the accuracy of in-field DPI classification was quantified. Results The average in-lab DPI accuracy computed from the Rainbow Passage and spontaneous speech were 57.9% and 48.9%, respectively, which are close to chance performance. The average classification accuracy of the in-field DPI was significantly higher with a very large effect size (73.4%, Cohen's d = 1.8). Next, the average in-field DPI accuracy increased from 66.5% for 1 day to 75.0% for 7 days, with the gain of including an additional day on accuracy dropping below 1 percentage point after 4 days. Conclusions The DPI requires ambulatory monitoring data as its discriminative power diminished significantly once computed from short in-lab recordings. Additionally, ambulatory monitoring should sample multiple days to achieve robust performance. The result of this research note can be used to make an informed decision about the trade-off between classification accuracy and cost of data collection.
... Importantly, high numbers of samples can yield artificially low p-values so we also quantified effect size, which is more indicative of the power of a model. To this end, we used Cohen's d, which is a signal-to-noise ratio, and we used the effect size rule of thumb suggested by Sawilowsky (2009) [32]. Of the models, only PhenoAge and AdaptAge had statistically higher means of the residuals (p < 0.05) for patients, but both have small effect sizes. ...
... Importantly, high numbers of samples can yield artificially low p-values so we also quantified effect size, which is more indicative of the power of a model. To this end, we used Cohen's d, which is a signal-to-noise ratio, and we used the effect size rule of thumb suggested by Sawilowsky (2009) [32]. Of the models, only PhenoAge and AdaptAge had statistically higher means of the residuals (p < 0.05) for patients, but both have small effect sizes. ...
Article
Full-text available
Biological age estimation from DNA methylation and determination of relevant biomarkers is an active research problem which has predominantly been tackled with black-box penalized regression. Machine learning is used to select a small subset of features from hundreds of thousands of CpG probes and to increase generalizability typically lacking with ordinary least-squares regression. Here, we show that such feature selection lacks biological interpretability and relevance in the clocks of the first and next generations and clarify the logic by which these clocks systematically exclude biomarkers of aging and age-related disease. Moreover, in contrast to the assumption that regularized linear regression is needed to prevent overfitting, we demonstrate that hypothesis-driven selection of biologically relevant features in conjunction with ordinary least squares regression yields accurate, well-calibrated, generalizable clocks with high interpretability. We further demonstrate that the interplay of inflammaging-related shifts of predictor values and their corresponding weights, which we term feature shifts, contributes to the lack of resolution between health and inflammaging in conventional linear models. Lastly, we introduce a method of feature rectification, which aligns these shifts to improve the distinction of age predictions for healthy people vs. patients with various chronic inflammation diseases.
... N = 231). The coefficient Cohen's d shows a medium effect size of the one-sample t-test for conflict resolution (t (231) = 9,48, p = 0.00, d = 0.62) (Cohen, 1988;Sawilowsky, 2009). This result indicates that most employees in the financial sector tend to consider that their conflicts are resolved. ...
... This result indicates that most employees in the financial sector tend to consider that their conflicts are resolved. These findings are consistent with the results of previous study (Mihaylova, 2022) The interpretation of the coefficient Cohen's d shows the effect size of the one-sample t-test (Cohen, 1988;Sawilowsky, 2009), as follows: These results indicate that: ...
... The main focus should be on the magnitude of the difference (effect size) rather than just statistical significance. Effect size measures, like Cohen's d, can provide more meaningful information about the practical importance of observed differences, and are independent of sample sizes (Kim 2015;Nakagawa and Cuthill 2007;Kelley and Preacher 2012;Sawilowsky 2009). ...
... Cohen's d is a standardized measure of effect size, independent of the sample size, that expresses the observed difference between two populations in units of the pooled standard deviation (Kim 2015;Nakagawa and Cuthill 2007;Kelley and Preacher 2012;Sawilowsky 2009). ...
Preprint
Full-text available
The study of viral quasispecies structure and diversity presents unique challenges in comparing samples, particularly when dealing with single experimental samples from different time points or conditions. Traditional statistical methods are often inapplicable in these scenarios, necessitating the use of resampling techniques to estimate diversity and variability. This paper discusses two proposed methods for comparing quasispecies samples: repeated rarefaction with z-test and permutation testing. The authors recommend the permutation test for its potential to reduce bias. The research highlights several key challenges in quasispecies analysis, including the need for high sequencing depth, limited clinical samples, technical inconsistencies leading to coverage disparities, and the sensitivity of diversity indices to sample size differences. To address these issues, the authors suggest using a combination of metrics with varying susceptibilities to large sample sizes, ranging from observed differences and ratios to multitest adjusted p-values. The paper emphasizes the importance of not relying solely on p-values, as the high statistical power resulting from large sample sizes can lead to very low p-values for small, potentially biologically insignificant differences. The authors also stress the need for multiple experimental replicates to account for stochastic variations and procedural inconsistencies, particularly when dealing with complex quasispecies populations.
... The magnitude of mean differences was expressed with a standardized Cohen's d effect size (ES) for the parametric variables or with the ES given by the Wilcoxon signed-rank test for the nonparametric variables. Thresholds for qualitative descriptors of Cohen's d were defined as follows: 0.01 to 0.19 as a very small effect [51], 0.20 as a small effect, 0.21 to 0.50 as a moderate effect, 0.51 to 0.80 as a large effect, and greater than 0.80 as a very large effect [50]. The 95% confidence intervals for mean values were also calculated. ...
Article
Full-text available
This study aimed to assess the acute impact of a simulated kumite bout (WKF formula) on peak isometric strength performance of the dominant and non-dominant lower limbs in elite karate athletes of different age categories (U16, U18, Senior), in the context of inter-limb asymmetry. Sixty-one elite male and female athletes (age = 17.48 ± 3.26 [years], body height = 1.72 ± 0.08 [m], body mass = 63.79 ± 10.00 [kg]) participated in this study, which included a randomized crossover design (two experimental sessions under two different conditions). Inter-limb asymmetry was assessed based on the isometric mid-thigh pull test. Friedman's test indicated significant differences in the mean values of the peak vertical force (PVF) between the assessed limbs (test = 10.8; p = 0.013; Kendall's W = 0.059). Elite karate athletes, regardless of the age category, tend to have inter-limb strength asymmetries in the lower extremities; however, the impact of a simulated kumite bout was not fully confirmed. A kumite bout seems to have a favorable impact on bilateral asymmetries in U16 and U18 athletes, but not in Seniors, who seem to be at increased risk of injury after completing the bout (asymmetry > 15%). Limb dominance is not necessarily related to greater values of PVF.
... To assess the importance of the increased perceived stress level in Classes 2 to 7 compared with Class 1, we calculated Cohen's d and utilized his rule of thumb, as expanded by Sawilowsky [60], for interpretation. The effect size of membership of Class 3 compared with Class 1 ...
Article
Full-text available
Objective We aimed to 1) identify distinct segments within the general population characterized by various combinations of stressors (stressor profiles) and to 2) examine the socio-demographic composition of these segments and their associations with perceived stress levels. Methods Segmentation was carried out by latent class analysis of nine self-reported stressors in a representative sample of Danish adults (N = 32,417) aged 16+ years. Perceived stress level was measured by the Perceived Stress Scale (PSS). Results Seven classes were identified: Class 1 was labeled Low Stressor Burden (64% of the population) and the remaining six classes, which had different stressor combinations, were labeled: 2) Burdened by Financial, Work, and Housing Stressors (10%); 3) Burdened by Disease and Death among Close Relatives (9%); 4) Burdened by Poor Social Support and Strained Relationships (8%); 5) Burdened by Own Disease (6%); 6) Complex Stressor Burden Involving Financial, Work, and Housing Stressors (2%); and 7) Complex Stressor Burden Involving Own Disease and Disease and Death among Close Relatives (2%). Being female notably increased the likelihood of belonging to Classes 2, 3, 5, and 7. Higher age increased the likelihood of belonging to Class 3. Low educational attainment increased the likelihood of belonging to Classes 5 and 6. A significant difference was observed in perceived stress levels between the seven latent classes. Average PSS varied from 9.0 in Class 1 to 24.2 in Class 7 and 25.0 in Class 6. Conclusion Latent class analysis allowed us to identify seven population segments with various stressor combinations. Six of the segments had elevated perceived stress levels but differed in terms of socioeconomic composition and stressor combinations. These insights may inform a strategy aimed at improving mental health in the general population by targeting efforts to particular population segments, notably segments experiencing challenging life situations.
... According to Hair et al. (2019), R 2 values are classified as weak (0.25), moderate (0.50) or strong (0.75), suggesting our model has moderate to strong in-sample predictive power. The effect sizes (f 2 ) (Sawilowsky, 2009) are detailed in Table 4, where H1 showed a small effect (f 2 ¼ 0.013), H2 had a large effect (f 2 ¼ 0.759), and H3 exhibited small effects (f 2 ¼ 0.071). In addition, the Q 2 values for all endogenous constructs, presented in Table 4, were greater than 0, confirming the model's acceptable predictive relevance. ...
Article
Purpose This study aims to investigate the impact of sustainable supply chain practices on sustainability performance in North American and Canadian firms in a business-to-business (B2B) context, specifically focusing on the mediating role of emerging technologies. It aims to deepen the understanding of this complex relationship, contributing to both theoretical knowledge and practical applications. Design/methodology/approach This study collected data from supply chain managers in the USA and Canada using a mixed-methods approach that includes partial least squares structural equation modeling (PLS-SEM), necessary condition analysis (NCA) and importance-performance map analysis (IPMA). PLS-SEM was utilized to model the relationships between sustainable practices, emerging technologies and sustainability performance. NCA identified the essential conditions required for sustainability performance, while IPMA was used to assess the importance and performance of different constructs, helping to pinpoint areas where the managerial focus can yield the most significant improvements. Findings This study reveals that sustainable supply chain practices (SSCP) alone do not directly lead to enhanced sustainability performance. SSCP includes product design, procurement, investment recovery and social sustainability. Sustainability performance includes economic, environmental and social performance. Instead, adopting specific emerging technologies, particularly artificial intelligence, wearable devices and virtual reality, is crucial. A significant threshold identified is these technologies’ 80% adoption rate for substantial performance improvements. Furthermore, this study distinguishes the varying impacts of different technologies on economic, social and environmental aspects of sustainability. Originality/value This research offers new insights by showing that emerging technologies fully mediate the relationship between SSCP and performance. It expands on existing literature by detailing the specific impacts of various technologies, moving beyond the generalized approach seen in prior research. Specific impacts of emerging digital technologies on SSCP and performance remain underexplored in a B2B environment, and this research aims to address this gap.
... The findings of this study indicate that the Group Investigation (GI) learning model significantly enhances middle school students' science literacy, particularly on the topic of thermal expansion. The calculated Cohen's d value of 1.457024 falls within the huge category based on Sawilowsky's (2009) criteria, demonstrating the substantial impact of this learning model. The GI model's emphasis on collaboration, active discussion, and group investigations likely contributed to this improvement, as supported by Sharan and Sharan (1992), who highlighted that GI provides opportunities for students to plan and engage in meaningful learning activities. ...
... where s where the subscript m represents the mean of either ρ and ρ over the set of videos, and s p is the pooled standard deviation s p = (ρ 2 s +ρ 2 s )/2, with subscript s representing the standard deviation of the subscripted variable. In the plots of results, we include the qualitative ranges for interpreting the numerical values proposed in [31] expanding the original Cohen's ranges [29]. ...
Article
Full-text available
The concept of temporal visual attention in dynamic contents, such as videos, has been much less studied than its spatial counterpart, i.e., visual salience. Yet, temporal visual attention is useful for many downstream tasks, such as video compression and summarisation, or monitoring users’ engagement with visual information. Previous work has considered quantifying a temporal salience score from spatio-temporal user agreements from gaze data. Instead of gaze-based or content-based approaches, we explore to what extent only brain signals can reveal temporal visual attention. We propose methods for (1) computing a temporal visual salience score from salience maps of video frames; (2) quantifying the temporal brain salience score as a cognitive consistency score from the brain signals from multiple observers; and (3) assessing the correlation between both temporal salience scores, and computing its relevance. Two public EEG datasets (DEAP and MAHNOB) are used for experimental validation. Relevant correlations between temporal visual attention and EEG-based inter-subject consistency were found, as compared with a random baseline. In particular, effect sizes, measured with Cohen’s d , ranged from very small to large in one dataset, and from medium to very large in another dataset. Brain consistency among subjects watching videos unveils temporal visual attention cues. This has relevant practical implications for analysing attention for visual design in human-computer interaction, in the medical domain, and in brain-computer interfaces at large.
... Third, due to the exploratory nature of this study, we generated effect sizes reflecting group differences to supplement the formal significance testing results. We calculated Cohen's d using pooled standard deviation (Cohen, 1988) and evaluated the magnitudes of d following the guidelines given by Cohen (1988) and Sawilowsky (2009). All the statistical analyses were conducted via the IBM SPSS Version 27 software. ...
Article
Full-text available
Makerspaces are used to promote classroom change and creativity for the 21st century. Building on the learning theory of Constructionism, this intervention study used a curriculum intervention program, “Making a Makerspace” (MM), to integrate the makerspace into Chinese kindergartens. We used a quasi-experimental research design to evaluate the effects of this curriculum intervention, with 120 children enrolled in the experimental classrooms, while the other 111 children enrolled in the waitlist control classrooms. Teacher-report child performance (N = 231) showed that the MM program resulted in significantly higher scores in children's STEM habits of mind in the intervention group, relative to the control group. Analyses of parent-report child behaviors revealed that there was a significant effect of the MM program on post-intervention temperamental surgency. Our evidence shows that such a scalable program encourages and guides teachers to build a positive learning environment for supporting young children's making and thinking in everyday preschool experiences. The makerspace further sets a solid foundation for the development of children's STEM thinking skills and socioemotional skills in a rapidly changing digital society.
... For continuous variables, the effect size was evaluated by the difference in means relative to the standard deviation, denoted as Cohen's d [54]. A nontrivial effect size for continuous variables was defined as |d| ≥ 0.50 [55]. The effect size for categorical variables was evaluated using Cramer's V. A nontrivial effect size for categorical variables was defined as |V | ≥ 0.30, |V | ≥ 0.21, and |V | ≥ 0.17 for 1, 2, and 3 degrees of freedom, respectively (degrees of freedom were calculated by (r −1) * (c−1), ...
Preprint
Full-text available
Background: With a trend toward de-escalation of axillary surgery in breast cancer, prediction models incorporating imaging modalities can help reassess the need for surgical axillary staging. Although mammography is routinely performed for breast cancer imaging, its potential in nodal staging remains underutilized. This study aims to employ advancements in deep learning (DL) to comprehensively evaluate the potential of routine mammograms for predicting lymph node metastasis (LNM) in preoperative clinical settings. Methods: This retrospective study included 1,265 cN0 T1-T2 breast cancer patients, comprising 368 node-positive and 897 node-negative cases, diagnosed from 2009-2017 at three Swedish institutions. Patients diagnosed in 2017 were assigned to the independent test set (n=123, site 2) and the external test set (n=103, site 3), while the remaining patients (n=1,039, site 1 and 2) were used for model development and double cross-validation. A neck module, in conjunction with a ResNet backbone pretrained on unlabeled mammograms, was developed to extract global information from full-breast or region-of-interest (ROI) mammograms by predicting five cancer outcomes. Clinicopathological characteristics were combined with the learned mammogram features to predict LNM collaboratively. The models were evaluated using area under the receiver operating characteristic (ROC) curve (AUC), calibration, and decision curve analysis. Results: Compared to models using only clinical variables, incorporating full-breast mammograms with preoperative clinical variables improved the ROC AUC from 0.690 ± 0.063 (SD) to 0.774 ± 0.057 in the independent test set and from 0.584 ± 0.068 to 0.637 ± 0.063 in the external test set. The combined model showed good calibration and, at sensitivity ≥ 90%, achieved a better net benefit, and a higher sentinel lymph node biopsy reduction rate of 41.7% in the independent test set. Full-breast mammograms showed comparable ability to tumor ROIs in predicting LNM. Conclusion: Our findings underscore that routine mammograms, particularly full-breast images, can enhance preoperative nodal status prediction. They may substitute key postoperative predictors such as pathological tumor size and multifocality, aiding patient stratification before surgery. Interestingly, the added predictive value of mammography was consistent across all sites, whereas the overall performance varied over time periods and sites, likely due to advancements in equipment and procedures.
... To discover the relationship between the morphological and handgrip neuromuscular characteristics with psychological characteristics (H2), as well as gender-and occupation-based specific patterns (H3), a Spearman's correlation analysis was performed on all four subsamples. The criteria for evaluation of the effect size in Mann-Whitney U test were: 0.01 < η 2 < 0.06-small, 0.06 < η 2 < 0.14-medium, η 2 > 0.14-large effect [37]. The effect size of correlation coefficients was defined as 0.20 ≤ weak ≤ 0.49, 0.50 ≤ moderate ≤ 0.79 and strong ≥ 0.80 [38,39]. ...
Article
Full-text available
Background/Objectives: The correlation of handgrip strength (HGS) and morphological characteristics with Big Five personality traits is well documented. However, it is unclear whether these relationships also exist in highly trained and specialized populations, such as tactical athletes, and whether there are specific differences compared to the general population. This study aimed to explore the interplay of handgrip neuromuscular, morphological, and psychological characteristics in tactical athletes and the general population of both genders. Methods: The research was conducted on a sample of 205 participants. A standardized method, procedure, and equipment (Sports Medical solutions) were used to measure the isometric neuromuscular characteristics of the handgrip. Basic morphological characteristics of body height, body mass, and body mass index were measured with a portable stadiometer and the InBody 720 device. Psychological characteristics were assessed with the Mental Toughness Index and Dark Triad Dirty Dozen questionnaires. Results: Numerous significant correlations were obtained, as well as differences between tactical athletes and the general population of both genders. The most prominent correlations were between the excitation index with Psychopathy and the Dark Triad (ρ = −0.41, −0.39) in female tactical athletes, as well as Neuroticism with body height, maximal force, and the maximum rate of force development in the male general population (ρ = 0.49, 0.43, 0.41). The obtained results also revealed gender and occupational specific patterns of researched relationships. Conclusions: Although the results of this study indicated the possibility of the existence of correlations between handgrip neuromuscular, morphological, and psychological characteristics in tactical athletes of both genders, nevertheless, at the moment, there is not enough solid evidence for that. That is why new research is needed. An analysis of muscle contractile and time parameters as neuromuscular indicators in the HGS task proved to be a possible promising method, which brought numerous new insights about the researched relationships. For practical application in the field, we propose including Mental Toughness and the Dark Triad traits in the selection process for future police officers and national security personnel based on the obtained results.
... Comparisons between sex were performed using Student's t test. In this case, the magnitude of the difference was calculated using Cohen's d, which can be classified as small (d = 0.20), medium (d = 0.50) and large (d = 0.8) and very large, when d = 1.2 (Sawilowsky, 2009). ...
Article
Full-text available
The objective of the present study was to evaluate the acute physiological responses of MMT sessions in young athletes. The sample was made up of 13 young athletes, aged between 15 and 18 (height 170.9±11.14 cm, 70.20±15.05 kg), who practice rugby, rowing or taekwondo. Prior to the intervention, maximum dynamic strength, maximum aerobic speed (MAS) and maximum heart rate (HRmax) were measured. As acute physiological measures, blood lactate concentration ([LAC]), HR, maximum oxygen consumption (VO2max) and subjective perception of exertion (RPE) were adopted. There were no significant differences between the sexes for HRrep, HRmax and performance in muscle strength tests (p>0.05). VAM and VO2max values were higher in males (p<0.05). No differences were found between sexes in [LAC] (p>0.05). Significant differences were identified between the HR intensity ranges when considering the time and percentage of the total session duration (p<0.001). RPE presented statistically higher values among male participants when compared to female participants (p< 0.05). Throughout the protocol, the athletes maintained high HR and in high intensity zones, which configured the MMT session as a vigorous activity. Regarding [LAC], the athletes obtained high values, which indicates high glycolytic demand. The RPE after the session was also high, indicating that, in addition to the high physiological demand, high effort values were also obtained.
... Before presenting said analysis, we measured the power of our statistical hypothesis. As we have a small sample, we consider a Cohen's d (effect size) of 1.2 (very large), as presented in [19]. By statistical power, we mean the probability of a hypothesis test to find an effect if it exists. ...
Conference Paper
Full-text available
Procedural Content Generation for multiple game content facets is a challenge for the game industry and academia. A content orches-trator is a software that can manage different procedural content generators, mixing their outputs while maintaining coherence and feasibility. We adapted a content orchestrator, originally meant for a top-down adventure game, to a 2D platformer. Both versions procedurally generate Levels, Rules, and Narrative, while adapting to distinct player profiles. A pre-test questionnaire is used to evaluate the player profile, and a post-test questionnaire is used to evaluate the game prototype we developed for this experimental purpose and the procedurally generated content. Results show the game was fun, challenging, interesting to explore, and with a moderate difficulty. Although with a limited sample, our results indicate the system was able to target content based on profiles. Therefore, this is a first step into understanding how a content orchestrator can be adapted to different game genres.
... Effect sizes for Chisquare comparisons were calculated following Fritz et al. (2012), using the Phi Coefficient for 2 × 2contingency tables and Cramer's V for larger tables. These effect sizes were categorised according to Sawilowsky (2009) ;Cohen's (2013) conventions, with qualitative descriptors as follows: very weak (0.00-0.20), weak (0.20-0.50), moderate (0.50--0.80), strong (0.80-1.20), very strong (1.20-2.00), and extremely strong (≥2.00). ...
... We addressed the second possibility, inadequate power, by calculating the effect size of each measure and conducting a power analysis to estimate the sample size needed to detect the effect of each measure. We used t-test results and the conventional benchmark for Cohen's d (Cohen, 1988;Sawilowsky, 2009) to estimate effect size because Generalized Linear Mixed-effects Models (GLMMs) present difficulties in defining the coefficient of determination due to their complexity and inherent heteroscedasticity (Nakagawa & Schielzeth, 2013;Schielzeth et al., 2017). Furthermore, the research literature on studies with our experimental design and population did not provide an appropriate effect size benchmark. ...
Article
Full-text available
An important question in literacy education is whether reading instruction should focus on whole words or subword constituents. We tested whether this question captures something general across writing systems by examining the functionalities of words and characters in learning Chinese. We introduce a character-word dual-focus instructional approach based on the Character-Word Dual Function model and test its predictions with American undergraduate students enrolled in a beginner-level Chinese course. One group learned new words through dual-focus instruction: characters for pronunciation and words for meaning. A second group followed typical word-focus instruction prevalent in classrooms, learning word-level pronunciation and meaning. Results indicated that while both approaches produced comparable levels of word pronunciation and meaning learning, the dual-focus instruction significantly enhanced character pronunciation and transfer to new word learning. The advantages of dual-focus instruction highlight the importance of learning the subword components through acquiring the systematic structure of the writing system in learning to read.
... Further analyses were conducted using standard deviations, effect sizes, and effect size indices. Specifically, Cohen's d was used to calculate the effect size, and the effect size index was determined by selecting the closest match based on the guidelines of Sawilowsky and Cohen [18,19]. When the effect size was substantial, it was classified as ''huge effect size,'' indicating a significant difference in shitsukan perception between the stimulus pairs. ...
... Prior to testing, testosterone and cortisol were log-transformed to normalize distribution and eliminate nonuniformity bias [4]. To evaluate training efficacy, each variable was examined using a two-way (i.e., time [ or huge (2.0+) effect [27]. Modeled data are presented as estimated marginal means with a 95% CI. ...
... The baseline characteristics of this research were the first author, year of publication, location, study design, sample size, mean age, sex, conducted period, follow-up time point, and major and minor complications. The quality of the RCTs was determined using the "large, d ≥ 0.8," [28] "very large, d ≥ 1.2," and "huge, d ≥ 2.0" [29]. The x 2 test was used to explore the heterogeneity and quality with the inconistency factor (I 2 ), and P < 0.10 or I 2 > 50% indicated significant heterogeneity [30]. ...
Article
Full-text available
Background Secondary hyperparathyroidism (SHPT) is a common complication of chronic kidney disease (CKD) that affects approximately 90% of end-stage renal disease and poses a significant threat to long-term survival and quality of life in patients. Objectives To assess whether radiofrequency ablation (RFA) is a productive and low-risk treatment for hyperparathyroidism secondary to CKD. Methods Embase, Web of Science, Cochrane Library, and PubMed were searched independently by two authors. The results after RFA and baseline biochemical indicators were compared, and parathyroid hormone (PTH), serum calcium, and serum phosphorus levels were the major outcomes. Results Four retrospective studies were screened out from 147 original literature and involved 118 cases. After RFA, serum PTH levels (1 d standardized mean difference [SMD] = −2.30, 95% confidence interval [CI] = from −3.04 to −1.56, P < 0.0001; 6 months SMD = −2.15, 95% CI = from −3.04 to −1.26, P < 0.0001; 12 months SMD = −2.35, 95% CI = from −3.52 to −1.17, P < 0.0001), serum calcium levels (1 d SMD = −1.49, 95% CI = from −2.18 to −0.81, P = 0.0001; 6 months SMD = −1.09, 95% CI = from −1.51 to −0.68, P < 0.0001), and serum phosphorus levels (1 d SMD = −1.37, 95% CI = from −1.67 to −1.07, P < 0.0001; 6 months SMD = −1.06, 95% CI = from −1.35 to −0.78, P < 0.0001) decreased significantly. Conclusions RFA, the newest thermal ablation technique, can effectively and safely treat hyperparathyroidism secondary to CKD. Hoarseness is the most common complication but is reversed within 6 months.
... The statistical analysis of METTL3, WTAP, and FTO gene expression data, evaluated by RT-qPCR, using the paired parametric t-test for matched samples and the bootstrap estimation of the confidence interval (Cohen's d effects size). 31 Differences in values were considered significant at p<0.05. ...
Article
Full-text available
Colorectal cancer (CRC) is a major public health concern and identifying prognostic molecular biomarkers can help stratify patients based on risk profiles, thus enabling personalized medicine. Epitranscriptomic modifications play a relevant role in controlling gene expression, N6-methyladenosine (m6A) regulators play crucial roles in cancer progression, but their clinical significance in CRC cancer has thus far not been elucidated. Thus, we aimed to examine by immunohistochemical techniques and RT-qPCR, protein levels and RNAs expression of m6A writers (METTL3, WTAP) and eraser (FTO) in a cohort of 10 patients affected by CRC. The patients were followed for 5 years and values of METTL3, WTAP and FTO RNAs in alive vs dead patients were compared. Proteins expression and RNAs expression had a different trend, METTL3, WTAP and FTO proteins’ expression showed an increasing trend from non-cancerous adjacent (N) tissue vs carcinoma (CA) tissue G1 stage, and then a decreasing trend from G1 to G2 and G3 stages. The most marked increase was observed in WTAP that, from a 40% of protein expression positivity in N tissue raised to the 81% of positivity in G1 stage K tissue. RNAs expression of METTL3, WTAP and FTO genes in N tissue vs G1 stage CA tissue was significantly different, the analysis and comparison of RNAs values in patient alive after 5 years (0.58±0.04) vs patients dead after 5 years (1.69±0.29) showed that only WTAP values resulted significantly high in dead patients. The fact that WTAP protein expression levels lower while WTAP RNA expression remains high, lets us hypothesize a sort of inhibition of protein expression, but further studies are needed to clarify the mechanism. Although the results suggest a relationship between biological meaning and prognostic utility of WTAP, this prognostic utility must be confirmed by further studies on a larger sample.
... Pre-post changes were quantified using "Cohen's d" effect size [59], calculated as the mean of the differences divided by the SD of differences for normally distributed outcomes. Effect sizes were classified according to Sawilowsky's extension [60] of Cohen's criteria as trivial (<0.01), very small (0.01-0. 19 considered the minimally important difference for health-related quality of life [61,62]. ...
Article
Full-text available
(1) Background: Respiratory dysfunction is a debilitating consequence of cervical and thoracic spinal cord injury (SCI), resulting from the loss of cortico-spinal drive to respiratory motor networks. This impairment affects both central and peripheral nervous systems, disrupting motor control and muscle innervation, which is essential for effective breathing. These deficits significantly impact the health and quality of life of individuals with SCI. Noninvasive stimulation techniques targeting these networks have emerged as a promising strategy to restore respiratory function. This study systematically reviewed the evidence on noninvasive electrical stimulation modalities targeting respiratory motor networks, complemented by previously unpublished data from our research. (2) Methods: A systematic search of five databases (PubMed, Ovid, Embase, Science Direct, and Web of Science) identified studies published through 31 August 2024. A total of 19 studies involving 194 participants with SCI were included. Unpublished data from our research were also analyzed to provide supplementary insights. (3) Results: Among the stimulation modalities reviewed, spinal cord transcutaneous stimulation (scTS) emerged as a particularly promising therapeutic approach for respiratory rehabilitation in individuals with SCI. An exploratory clinical trial conducted by the authors confirmed the effectiveness of scTS in enhancing respiratory motor performance using a bipolar, 5 kHz-modulated, and 1 ms pulse width modality. However, the heterogeneity in SCI populations and stimulation protocols across studies underscores the need for further standardization and individualized optimization to enhance clinical outcomes. (4) Conclusions: Developing standardized and individualized neuromodulatory protocols, addressing both central and peripheral nervous system impairments, is critical to optimizing respiratory recovery and advancing clinical implementation.
... Once again, a one-tailed t-test revealed the difference in these averages to be statistically significant (p < 0.0001). This difference in averages also corresponds to a very large effect size (Cohen's d = 1.2), according to the effect size classification scheme proposed by Sawilowsky [30]. Surprisingly, students in high-IAS classes averaged a pre-post improvement in their abilities that was almost an entire logit greater than the average pre-post ability improvement of students in low-IAS classes. ...
Preprint
This paper presents the first item response theory (IRT) analysis of the national data set on introductory, general education, college-level astronomy teaching using the Light and Spectroscopy Concept Inventory (LSCI). We used the difference between students' pre- and post-instruction IRT-estimated abilities as a measure of learning gain. This analysis provides deeper insights than prior publications into both the LSCI as an instrument and into the effectiveness of teaching and learning in introductory astronomy courses. Our IRT analysis supports the classical test theory findings of prior studies using the LSCI with this population. In particular, we found that students in classes that used active learning strategies at least 25\% of the time had average IRT-estimated learning gains that were approximately 1 logit larger than students in classes that spent less time on active learning strategies. We also found that instructors who want their classes to achieve an improvement in abilities of average Δθ=1\Delta \theta = 1 logit must spend at least 25\% of class time on active learning strategies. However, our analysis also powerfully illustrates the lack of insight into student learning that is revealed by looking at a single measure of learning gain, such as average Δθ\Delta \theta. Educators and researchers should also examine the distributions of students' abilities pre- and post-instruction in order to understand how many students actually achieved an improvement in their abilities and whether or not a majority of students have moved to post-abilities significantly greater than the national average.
... However, such a stringent threshold may penalise the statistical power of any test, i.e. its probability to correctly identify a genuine effect. As a consequence, a thoroughly evaluation of the results of the statistical tests was crucial, particularly when less numerous subgroups were involved and the p-values were higher than level a 0.001 instead of the conventionTaking into account the type of data collected, both Cohen-d (d) and a correlation effect size (r) were calculated to quantify the intensity of the effects and their values were classified coherently with Sawilowsky's rule of thumb (Sawilowsky, 2009). Data analysis was conducted by using the statistical open source software RStudio (https://www.rstudio.com/ ...
Article
Full-text available
Research on concerns about Emergency Remote Teaching has focused on teaching and management strategies, with some studies considering learners’ satisfaction, reactions, learning and overall acceptance. The present case study, based on a survey on 3,183 undergraduate and postgraduate learners, aimed at investigating engineering students’ self-reported experiences of the Emergency Remote Teaching. It identified the empirical factors characterising such experience and the predictors of the students’ responses. Moreover, it focused on their reaction to the innovation in teaching and learning methodologies in an extreme scenario. Quantitative methods, like confirmatory factor analysis and factorial ANOVA, were adopted to analyse data. Our findings highlighted that engineering students assessed their overall online learning experience of Emergency Remote Teaching slightly negatively. This evaluation concerned their opinion about three factors which achieved different assessment. These results did not appear to depend on the learners’ gender or their educational level of degree study, while the academic year of attendance seemed to influence their opinion on teaching. Moreover, the change in the learning approach experienced in the passage from bachelor to master’s programmes was discovered to be a further predictor which might be more critical for females than males. Finally, implications for policy makers and higher education institutions for online learning in the post-pandemic scenario are discussed.
... We remark that both, the parametric and nonparametric approach, yielded the same results in terms of statistical significance; see "Appendix " for a detailed description of the complete statistical analysis including the nonparametric results in Tables 2 and 3. Herein, we will mainly illustrate results with the classic repeated ANOVA analysis using the distances variable ( Hand Dist and Leg Dist ) . 5 For each gender, the repeated ANOVA analysis of the Hand Dist , yielded as significant the main factor method (Uniform vs Fitted), the avatar type (5 models) and also their interaction (all p values < 0.0001 ); see Table 2 (top) in "Appendix ". Additionally, we computed Cohen's f effect sizes (Cohen 1988), which can be interpreted as very small (Sawilowsky 2009). Regarding male participants, we observed a huge effect on the scaling method factor ( f = 2.30 ), a large effect on the interaction between factors ( f = 0.84 ), and a small effect for the avatar type factor ( f = 0.43 ). ...
Preprint
Full-text available
In the era of the metaverse, self-avatars are gaining popularity, as they can enhance presence and provide embodiment when a user is immersed in Virtual Reality. They are also very important in collaborative Virtual Reality to improve communication through gestures. Whether we are using a complex motion capture solution or a few trackers with inverse kinematics (IK), it is essential to have a good match in size between the avatar and the user, as otherwise mismatches in self-avatar posture could be noticeable for the user. To achieve such a correct match in dimensions, a manual process is often required, with the need for a second person to take measurements of body limbs and introduce them into the system. This process can be time-consuming, and prone to errors. In this paper, we propose an automatic measuring method that simply requires the user to do a small set of exercises while wearing a Head-Mounted Display (HMD), two hand controllers, and three trackers. Our work provides an affordable and quick method to automatically extract user measurements and adjust the virtual humanoid skeleton to the exact dimensions. Our results show that our method can reduce the misalignment produced by the IK system when compared to other solutions that simply apply a uniform scaling to an avatar based on the height of the HMD, and make assumptions about the locations of joints with respect to the trackers.
... We follow the advice of a widely cited paper by Sawilowsky (Sawilowsky 2009) as a standard when deciding the value of d in our statistical analysis, which asserts that "small" and "medium" effects can be measured using d = 0.2 and d = 0.5 (respectively). Splitting the difference, we will analyze this data by looking for differences larger than d = (0.5 + 0.2)/2 = 0.35. ...
Article
Full-text available
As software projects rapidly evolve, software artifacts become more complex and defects behind them get harder to identify. The emerging Transformer-based approaches, though achieving remarkable performance, struggle with long code sequences due to their self-attention mechanism, which scales quadratically with the sequence length. This paper introduces SparseCoder, an innovative approach incorporating sparse attention and learned token pruning (LTP) method (adapted from natural language processing) to address this limitation. Compared to previous state-of-the-art models (CodeBERT, RoBERTa and CodeT5), our experiments demonstrate that SparseCoder can handle significantly longer input sequences – at least twice as long, within the limits of our hardware resources and data statistics. Additionally, SparseCoder is four times faster than other methods measured in runtime, achieving a 50% reduction in floating point operations per second (FLOPs) with a negligible performance drop of less than 1% compared to Transformers using sparse attention (Sparse Atten). Plotting FLOPs of model inference against token lengths reveals that SparseCoder scales linearly, whereas other methods, including the current state-of-the-art model CodeT5, scale quadratically. Moreover, SparseCoder enhances interpretability by visualizing non-trivial tokens layer-wise.
... In addition, the point estimates of effects were presented as MD with a 95% confidence interval (CI) and standardized mean differences (SMD) with 95% CI, analyzed using Cohen's d. According to the newly presented definition, the Cohen's d effect size can be divided as follows: from 0.01 to 0.19: very small, from 0.2 to 0.49: small, from 0.5 to 0.79: medium, from 0.8 to 1.19: large, from 1.2 to 1.99 very large, more than 2: huge [40]. Cohen's d effect size was also used to evaluate three analysis models and assess the effects of confounder/covariable on results. ...
Article
Full-text available
Background Rotator cuff repair (RCR) is one of the most prevalent procedures to manage rotator cuff tears (RCT). Postoperative shoulder pain is a common complication following RCR and may be aggravated by activation of myofascial trigger points (MTrP) associated with the injury to the soft tissues surrounding the surgical incision. This study aimed to describe a preliminary, randomized, sham-controlled trial to evaluate the effectiveness of implementing 4 sessions of myofascial trigger point dry needling (MTrP-DN) as a muscle treatment approach along with 10 sessions of multimodal rehabilitation protocol (MRh) consisting of therapeutic exercise, manual therapy, and electrotherapy on postoperative shoulder pain, range of motion (ROM), strength, and functional outcome scores for patients following RCR surgery. Methods Forty-six patients aged 40–75 following RCR surgery were recruited and randomly allocated into 2 groups: (1) MTrP-DN plus MRh (experimental group), and (2) sham dry needling (S-DN) plus MRh (control group). This trial had a 4-week intervention period. The primary outcome was the Numeric Pain Rating Scale (NPRS) for postoperative shoulder pain. Secondary outcomes were the Shoulder Pain and Disability Index (SPADI), ROM, and strength. The mentioned outcomes were measured at baseline and week 4. In the current study, adverse events were recorded as well. Results No statistically significant differences were observed between groups when adding MTrP-DN to MRh for postoperative shoulder pain after 4 weeks of intervention (mean difference 0.32, [95% CI -0.41,1.05], p = 0.37). However, this trial found a small effect size for postoperative shoulder pain. No significant between-group differences were detected in any of the secondary outcomes ( p > 0.05) either. We found significant within-group changes in all studied outcome measures. ( p < 0.001). This study also reported minor adverse events. following the needling approach. Conclusion The lack of statistically significant differences in the outcomes and small clinical significance in shoulder pain highlights the complexity of pain management, suggesting that alternative methodologies may be needed for meaningful clinical benefits. Future studies should consider different control groups, long-term follow ups, larger sample sizes, and more MTrP-DN sessions to better understand their potential impact. Trial registration This trial was registered at ( https://www.irct.ir ), (IRCT20211005052677N1) on 19/02/2022.
... The effect size was established with the Cohen d. The following scale was used to assess Effect Size: (Sawilowsky, 2009). All analyses were conducted using the Statistical Package for the Social Sciences (SPSS, version 28.0.0.0., IBM, Boston, IL, USA). ...
Article
Full-text available
Background: Players, through the different stages of their development, increase their performance due to their maturation process, training, and the increase in their experiences. The college competition in the US allows players to train and compete in a stable context over 4 years in their transition from U18 to senior level. Objective: The objective of the study was to analyse the evolution of game statistics as a function of the year of college of NCAA Division I men’s players. Methodology: The sample was 52,852 Division I National Collegiate Athletic Association (NCAA) players of the United States (2010-2021 seasons). The study design was retrospective non-experimental. The variables studied were games played, games as a starter, points, goals, assists, shot attempts, shots on goal, effectiveness of attempts, effectiveness of shots on goal, fouls, yellow cards, and red cards. To establish the evolution between players from top and bottom teams, one-way ANOVA was used. To analyse the differences according to the top and bottom teams, a T-test and discriminant analysis were performed. Results: The older, more experienced, and trained players were, the more they participated in the game, both in terms of games played and in terms of game actions (assists, shots, and goals). The increase in the effectiveness of shots and shots on goal shows that the evolution in training and experience leads to higher player skill. Conclusion: These findings highlight the importance of experience, training, and maturity in the performance of U-23 male football players.
Article
The study aimed to evaluate the effect of tinnitus masking on cognitive functioning and the efficacy of tinnitus retraining therapy in reducing the severity of tinnitus handicap. The study recruited 15 subjects (mean age = 47.1 ± 11.5) with unilateral chronic tinnitus (6 participants with right-ear tinnitus and 9 participants with left-ear tinnitus); and mild to moderate sensorineural hearing loss. Conducted in two phases, Phase 1 comprised case history, audiometric testing, tinnitus matching, and THI-Bangla assessment. Phase 2 involved P300 administration followed by a 60-session tinnitus retraining therapy (online-offline modality), concluding with post-therapy P300 and THI-Bangla assessments. Therapy outcomes were evaluated by improvements in P300 latency and amplitude and THI-Bangla scores. A significant improvement observed post-therapy, with reductions in mean P300 latency for both left (p < 0.001) and right (p < 0.001) ears. Mean P300 amplitude also increased significantly post-therapy for left (p < 0.001) and right (p < 0.001) ears. THI scores also declined substantially (p < 0.001), with a post-therapy mean THI score of 17.80 (SD = 4.64) for males and 12.20 (SD = 3.42) for females, revealing a significant difference (p = 0.033). These findings support the efficacy of tinnitus retraining therapy in reducing tinnitus related distress and enhancing cognitive processing, as evidenced by P300 improvements.
Article
Full-text available
Introduction The COVID-19 pandemic and lockdown experience required finding new resources and lifestyle changes. Understanding people’s behaviors during the pandemic and its potential effects on mental health indicators and well-being is essential for future directions. This study aimed to compare mental health symptoms during Portugal’s first lockdown with normative data, analyze the perceived utility of activities during this period, and explore how these perceptions related to individuals’ mental health indicators and emotion regulation processes (e.g., mindfulness and compassion). Methods A sample comprising 238 (186 women and 52 men) participants completed an online survey during the first lockdown of the COVID-19 pandemic in April (M1) and June (M2) 2020. The survey addressed the perceived utility of various activities to deal with the lockdown (e.g., exercise). It also included self-report measures of psychopathological symptoms and emotion regulation processes. Results No differences were found between depression and anxiety scores (at M1 and M2) and Portuguese normative pre-pandemic data. During the lockdown, perceived stress was higher than normative data. Reading, being outdoors, talking with friends, video calls, and helping others were scored as very useful. Mean comparisons of emotion regulation processes and psychopathological symptoms based on the utility of activities to deal with the lockdown were reported. Discussion Activities related to meaningful connections with others and personal activities may be encouraged as protective activities buffering the potentially harmful effect of isolation and external and uncontrollable threats. These findings highlighted the importance of promoting personal health behaviors and emphasize real social connections (e.g., social activities prescriptions).
Preprint
Full-text available
Despite the remarkable coherence of Large Language Models (LLMs), existing evaluation methods often suffer from fluency bias and rely heavily on multiple-choice formats, making it difficult to assess factual accuracy and complex reasoning effectively. LLMs thus frequently generate factually inaccurate responses, especially in complex reasoning tasks, highlighting two prominent challenges: (1) the inadequacy of existing methods to evaluate reasoning and factual accuracy effectively, and (2) the reliance on human evaluators for nuanced judgment, as illustrated by Williams and Huckle (2024)[1], who found manual grading indispensable despite automated grading advancements. To address evaluation gaps in open-ended reasoning tasks, we introduce the EQUATOR Evaluator (Evaluation of Question Answering Thoroughness in Open-ended Reasoning). This framework combines deterministic scoring with a focus on factual accuracy and robust reasoning assessment. Using a vector database, EQUATOR pairs open-ended questions with human-evaluated answers, enabling more precise and scalable evaluations. In practice, EQUATOR significantly reduces reliance on human evaluators for scoring and improves scalability compared to Williams and Huckle's (2004)[1] methods. Our results demonstrate that this framework significantly outperforms traditional multiple-choice evaluations while maintaining high accuracy standards. Additionally, we introduce an automated evaluation process leveraging smaller, locally hosted LLMs. We used LLaMA 3.2B, running on the Ollama binaries to streamline our assessments. This work establishes a new paradigm for evaluating LLM performance, emphasizing factual accuracy and reasoning ability, and provides a robust methodological foundation for future research.
Article
Full-text available
Purpose In China, stringent and long-lasting infection control measures, which were called “dynamic zero-COVID policy”, have significantly affected the mental health of college students, particularly concerning depressive and insomnia symptoms. This study aims to investigate how depressive and insomnia symptoms evolved among Chinese college students throughout the pandemic, including the beginning and end of the dynamic zero-COVID policy period. Patients and Methods We conducted a 2-years longitudinal survey involving 1102 college students, collecting data at three key time points. Depressive symptoms were assessed using the Patient Health Questionnaire-9, and insomnia symptoms were measured with the Youth Self-rating Insomnia Scale-8. Three contemporaneous symptom networks and two cross-lagged panel networks were constructed. Results In the current sample, the prevalence of clinically significant depressive symptoms was 6.1%, 8.9%, and 7.7% during the first, second, and third waves, respectively. The prevalence of clinically significant insomnia symptoms was 8.1%, 13.0%, and 14.1%. Over time, the severity of depressive and insomnia symptoms and network density increased, persisting at least one year after the pandemic and control measures ended. “Difficulty initiating sleep” bridged the two disorders, while “anhedonia” played a pivotal role in triggering and sustaining other symptoms. Conclusion This study underscores the lasting impact of the evolving zero-COVID policy on depressive and insomnia symptoms among college students, elucidating the underlying interaction mechanisms. There is a pressing need for a more comprehensive evaluation of the implementation of restrictive public policies, taking into account their potential long-term consequences.
Preprint
Patient-derived cancer organoids (PDCOs) are a valuable model to recapitulate human disease in culture with important implications for drug development. However, current methods for assessing PDCOs are limited. Label-free imaging methods are a promising tool to measure organoid level heterogeneity and rapidly screen drug response in PDCOs. The aim of this study was to assess and predict PDCO response to treatments based on mutational profiles using label-free wide-field optical redox imaging (WF ORI). WF ORI provides organoid-level measurements of treatment response without labels or additional reagents by measuring the autofluorescence intensity of the metabolic co-enzymes NAD(P)H and FAD. The optical redox ratio is defined as the fluorescence intensity of [NAD(P)H / NAD(P)H +FAD] which measures the oxidation-reduction state of PDCOs. We have implemented WF ORI and developed novel leading-edge analysis tools to maximize the sensitivity and reproducibility of treatment response measurements in colorectal PDCOs. Leading-edge analysis improves sensitivity to redox changes in treated PDCOs (GΔ = 1.462 vs GΔ = 1.233). Additionally, WF ORI resolves FOLFOX treatment effects across all PDCOs better than two-photon ORI, with ∼7X increase in effect size (GΔ = 1.462 vs GΔ = 0.189). WF ORI distinguishes metabolic differences based on driver mutations in CRC PDCOs identifying KRAS+PIK3CA double mutant PDCOs vs wildtype PDCOs with 80% accuracy and can identify treatment resistant mutations in mixed PDCO cultures (GΔ = 1.39). Overall, WF ORI enables rapid, sensitive, and reproducible measurements of treatment response and heterogeneity in colorectal PDCOs that will impact patient management, clinical trials, and preclinical drug development. Statement of Significance Label-free wide-field optical redox imaging of patient-derived cancer organoids enables rapid, sensitive, and reproducible measurements of treatment response and heterogeneity that will impact patient management, clinical trials, and preclinical drug development.
Article
Adolescents are susceptible to developing depression and anxiety, and educational interventions could improve their mental well‐being. This systematic review aimed to evaluate the effectiveness of universal educational prevention interventions in improving mental health literacy, depression, and anxiety among adolescents. Eight electronic databases were searched until June 2024: Cochrane Library, PubMed, EMBASE, CINAHL, PsycINFO, Scopus, Web of Science, ProQuest Dissertations, and Theses Global. Since the included studies assessed various aspects of mental health literacy, the results for mental health literacy were synthesized narratively. In contrast, a meta‐analysis using a random‐effects model was applied to the depression and anxiety outcomes. Heterogeneity was examined using I ² statistics and Cochran's Q Chi‐squared test. The Cochrane risk of bias tool and the GRADE approach conducted quality appraisal at the study and outcome levels, respectively. The review was reported according to the PRISMA guidelines. This review included 34 randomized controlled trials. Universal education prevention interventions were found to be promising in improving adolescents' mental health literacy but showed limited effects on individual mental health literacy components and on reducing depression (SMD = −0.06, 95%CI: [−0.11, −0.02], Z = 2.58, p = 0.01, I ² = 45%) and anxiety (SMD = −0.00, 95%CI: [−0.06, 0.06], Z = 0.07, p = 0.94, I ² = 58%) at post‐intervention. Future trials should consider using a hybrid delivery model utilizing health care and the non‐health care professionals. These interventions must incorporate skills‐based sessions to develop emotional regulation strategies, complemented by extended follow‐up periods that include booster sessions to reinforce learning. Given the very low quality of evidence as rated by the GRADE approach, current findings need to be interpreted with caution.
Article
Abtract Equine-assisted psychotherapy and therapeutic riding for veterans with posttraumatic stress disorder ( PTSD ) has been increasingly studied but not meta-analyzed. We identified 2,609 records from database, backward (i.e., reference), and forward (i.e., citation) searches to identify 18 studies addressing effectiveness. The studies were published from 2013–2021 with sample sizes ranging from 5–85 and quality ratings medium – high. Equine-assisted services show large significant effects for change during treatment ( n = 14, d = 1.156), follow-up ( n = 6, d = 0.994), and compared to waitlist control ( n = 1, d = 0.842), and medium effects compared to treatment-as-usual ( n = 3, d = 0.465). Examination of these four types of evidence suggests promise for the use of equine-assisted services with veterans for reduction of PTSD symptomatology and a need for more controlled studies, particularly in relation to other specialized treatments also assessed as effective in treating PTSD in veterans.
Article
Metacognitive interventions have received increasing interest the last decade and there is a need to synthesize the evidence of these type of interventions. The current study is an updated systematic review and meta-analysis where we investigated the efficacy of metacognitive interventions for adults with psychiatric disorders. We included randomized controlled trials that investigated either metacognitive therapy (MCT; developed by Wells) or metacognitive training (MCTraining; developed by Moritz). Ovid MEDLINE, Embase OVID, and PsycINFO were searched for articles published until May 2024. The final analyses included 21 MCT- and 28 MCTraining studies (in total 3239 individuals). Results showed that MCT was more efficacious than both waiting-list control conditions (g = 1.84) as well as other forms of cognitive behavior therapies (g = 0.43). MCTraining was superior to treatment as usual (g = 0.45), other psychological treatments (g = 0.46) and placebo conditions (g = 0.15). Many of the included studies lacked data on blinding procedures, reporting of inter-rater reliability, treatment adherence, competence, treatment expectancy and pre-registration procedures. We conclude that both MCT and MCTraining are probably efficacious treatments but that future studies need to incorporate more quality aspects in their trial designs.
Article
Crowdsourcing contests allow firms to seek ideas from external solvers to address their problems. This research examines solvers’ use of developmental feedback from different sources and of different constructiveness when generating ideas in contests. I theorize a source effect in solvers’ feedback use where they use seeker feedback more than peer feedback, even if both give identical suggestions for their ideas. I also show how the source effect affects solvers’ use of constructive and less constructive feedback from the respective sources. An insight is that compared with their use of peer feedback, solvers’ use of seeker feedback is more extensive at any level of, but less sensitive to, feedback constructiveness. An implication is that solvers may underuse constructive peer feedback and overuse less constructive seeker feedback. Such behaviors can be solver optimal (in terms of improving solvers’ winning prospects) but not seeker optimal (in terms of enhancing ideas for seekers’ problems), as constructive feedback is likely to improve idea quality, whereas less constructive feedback may hurt it. I propose a priming intervention of a feedback evaluation mechanism to mitigate the source effect in solvers’ feedback use—in a way, the intervention can cause solvers to behave more optimally for the seekers. A field survey and three online experiments test the theorizing and proposed intervention. I discuss the contributions and implications of this research for various stakeholders in crowdsourcing contests. Funding: This work was supported by the Hong Kong Research Grants Council [Grant 16502518]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/orsc.2021.15885 .
Article
Full-text available
Sprint interval exercise can cause transient, intense exercise-induced pain (EIP) during and several minutes after the activity. A hypoalgesic strategy for high-intensity exercise, such as sprint interval exercise, against EIP is necessary to maintain exercise habituation and improve training quality/exercise performance. Preexercise caffeine supplementation, a well-known ergogenic strategy, may improve sprint performance and alleviate EIP as the hypoalgesic strategy. However, whether preexercise caffeine supplementation exhibits both the ergogenic effect on sprint interval performance and the hypoalgesic effect on intensive EIP during and several minutes after high intensity sprint interval exercise remains unknown, and thus we investigated to clarify those points. In this double-blind, randomized, crossover trial, sixteen male collegiate athletes performed 3 sets of 30-sec all-out Wingate pedaling exercises at 2-min intervals. Participants ingested 6 mg·kg-1 caffeine or placebo via capsules at 60 min prior to exercise. Quadriceps EIP was measured using a visual analogue scale during and up to 20 min after exercise. The results showed that caffeine did not significantly affect peak or mean power during sprint interval exercise (peak power: P = 0.196, ηp2 = 0.11, mean power: P = 0.157, ηp2 = 0.13; interaction). No significant interactions were also found for quadriceps EIP during (P = 0.686, ηp2 = 0.03) and immediately after exercise (P = 0.112, ηp2 = 0.12), nor for changes in physiological responses (blood lactate and ammonia concentrations) and caffeine-induced side effects (all P > 0.05). In conclusion, caffeine had no ergogenic or hypoalgesic effects on sprint interval exercise with intensive EIP.
Book
Full-text available
The Ninth International Conference on Advances and Trends in Software Engineering (SOFTENG 2023), held between April 24th and April 28th, 2023, continued a series of events focusing on these challenging aspects for software development and deployment, across the whole life-cycle. Software engineering exhibits challenging dimensions in the light of new applications, devices, and services. Mobility, user-centric development, smart-devices, e-services, ambient environments, e-health and wearable/implantable devices pose specific challenges for specifying software requirements and developing reliable and safe software. Specific software interfaces, agile organization and software dependability require particular approaches for software security, maintainability, and sustainability.
Article
Background Prior research suggests leader-based interventions are considered to have a much stronger influence on worker safety behavior and climate than worker-based interventions. However, no prior research has evaluated training effectiveness of safety-specific leadership skill development for front-line supervisors on dairy farms. A tailored safety leadership training program targeting dairy farm supervisors was developed, delivered, and evaluated for its training effect on the supervisor’s safety leadership behavior. Methods A 12-module safety leadership training program was developed and delivered in an asynchronous format using e-learning methods to 73 dairy farm supervisors, representing 30 farms across five western U.S. states. We employed the Kirkpatrick Model to evaluate different levels of training effectiveness. Findings Evaluation of knowledge gained among participants revealed significant differences between pre- and post-test scores with medium to very large learning effect sizes across all training modules, particularly with training modules addressing safety culture, workplace conflict, and safety meetings. Safety leadership behavior change evaluation revealed significant pre-post training effects across most training modules, particularly regarding safety dialogue, hazard assessment, safety modeling, and conducting safety meetings. Conclusions Our findings suggest that safety leadership training can result in essential leadership behavior change among front-line dairy farm supervisors. Application to Practice This study provides many insights into the successful implementation of a safety leadership training program in a challenging industrial sector (rural/remote workplaces, immigrant workforce), as well as training effectiveness evaluation using novel data collection methodology. Additional research is needed on the effectiveness and sustainability of safety leadership training in high-risk industrial sectors such as agriculture.
Chapter
Many questions are answered using quantitative data. The main options are to analyse data using a narrative approach of tables and narrative explanation, or the formal pooling of data using meta-analysis. In this chapter, we outline a four-step approach to understanding quantitative data: what is the point estimate, how much variability or uncertainty is there about this, what is the clinical significance, and what is the statistical significance? When undertaking a meta-analysis, there are key decisions such as the outcome measure and meta-analysis model to be used. Key results include the point estimate with its confidence interval, prediction interval, and measures of heterogeneity. When the key assumption of studies being independent is violated, there are different approaches, namely, multilevel and multivariate analyses.
Article
Full-text available
The dynamics of the knowledge-based economy are intrinsically linked to the concept of gamification, which has been increasingly discussed in the context of companies’ marketing strategies. It is therefore necessary to understand how gamification can be strategically applied in different contexts to increase the performance and competitiveness of companies. This paper aimed to analyze how gamification impacts customers’ intention to participate in the gamification process and how this strategy influences customers’ attitudes towards the brand. Multiple linear regression analyses were conducted in a sample of 238 Portuguese consumers using the Nike Run Club application. The study revealed that gamification can be an effective tool to increase users’ interaction with brands. Perceived usefulness, perceived social influence, engagement intention and gamification performance were identified as the main predictors of gamification on brand attitude. Furthermore, social interaction is a key factor for the success of game systems, which should promote interaction between communities of players to share functionality and solve task problems. It was observed that attitude towards the brand is positively influenced by factors such as usefulness, perceived ease of use, social influence, intention to get involved, and gamification of performance, especially when these are mediated by the sensation of pleasure. This finding points to the importance of these elements in shaping consumers’ perceptions of brands. To maximize engagement with brands, gamification tools should allow sharing content and tasks, promoting discussions and broadening the connection between the gamers community. It is important to consider technological resources and innovation in the design of gamification tools, as well as to meet the different expectations of consumers, allowing different tasks and high connection between different information-sharing channels. This study contributes to the development of knowledge in gamification on brand attitude. Practical implications have been suggested to guide companies in implementing a successful gamified marketing strategy.
Article
Full-text available
Background Given the rapidly increasing interest in national futures programmes, and the associated significant increased resource investment, there is a pressing need for data specific to futures programmes to inform practice across world football. Aim To investigate the differences in the physical and perceptual demands of match-play using Global Positioning Software technology and Rating of Perceived Exertion (RPE) in traditional youth international team and age-matched international future teams for biologically late-maturing players over one in-season period. Subjects and methods A total of 18 U15 future team (FT) players and 21 national team (NT) players were examined. Results The results showed that FT players performed 9% greater total distances (p = 0.008, Cohen’s d 1.29) and accumulated 20% greater total player loads (p < 0.001, Cohen’s d 1.88) than NT players during matches. In contrast, NT players covered 113% greater sprinting distances (p = 0.033, Cohen’s d 0.63) and performed 62% more high-intensity accelerations (p = 0.015, Cohen’s d 0.90) than FT players. There were no differences in high-intensity and very high-intensity running distances, number of accelerations, number of decelerations or high-intensity decelerations, or match-play RPE. When accounting for biological maturation, the adjusted marginal means were not different between FT and NT players in any physical metric except for total player load (p = 0.046) and high-intensity accelerations (p < 0.030). Conclusion We conclude that while several physical performance metrics differ significantly between FT and NT match-play, the most robust differences after controlling for maturation are in sprint performance and high-intensity accelerations.
Article
Full-text available
Justice-based exposure and response prevention (ERP) has been touted as an alternative approach to (mis)uses and “Fear Factor” overcorrection applications of ERP for obsessive-compulsive disorder (OCD) with identity-related themes (i.e., sexual orientation, gender identity, racism, age, disability/diagnostic status, and economic-themed.). Justice-based exposure maintains fidelity to the theoretical and evidence-based mechanisms of ERP while avoiding inadvertent stigmatization of marginalized communities. Justice-based ERP also avoids contributing to societal stigma while potentially increasing buy-in among patients. Despite its widespread support across the OCD field, empirical evidence regarding preferences for justice-based ERP is currently lacking. The present study sought to compare perspectives on justice-based and overcorrection ERP approaches to identity-related OCD themes among 450 individuals with current or past identity-related obsessions (85.6% female, Mage = 32.0). Participants reviewed idiographic, symptom-specific Justice-Based and Overcorrection ERP hierarchies for each endorsed symptom theme and were asked a series of questions regarding their perspectives on each hierarchy. Participants reported a considerable preference for justice-based exposures over overcorrection exposures for these themes. Specifically, while overcorrection exposures were associated with higher anxiety expectancy, participants were much more likely to report a willingness to engage in justice-based exposures, perceived them to be more relevant and effective, and found them to be less derogatory and offensive than overcorrection exposures. Findings support the notion that (mis)uses and overcorrection exposures for identity-related OCD themes represent a “Fear Factor” approach that prioritizes anxiety increase at the expense of functionality, and that a justice-based approach may be better aligned with participants’ values and treatment expectancies.
Article
Full-text available
During adolescence, sleep is essential for physical and cognitive development. However, modern lifestyles often disrupt sleep patterns in young people, potentially impacting athletic performance. Swimming, with its demanding training schedules, presents unique challenges to sleep patterns and may influence performance. This study investigates the role of sleep in the performance of young swimmers during the tapering period. Nineteen athletes participated in this study, but only 15 completed it. The data were collected 21 days before the target competition, when the athletes were already engaged in training. Based on sonography, divided into good sleep group (GSG; n=8) and poor sleep group (PSG; n=7). Athletes used an actigraph daily (for 21 days, during tapering phase) to identify sleep quality and had their performance results obtained by the International Point Score (IPS). We observed that 46.67% of the athletes had poor sleep, with no difference between genders. We identified a significant difference (p=0.001; ES=1.52; 18.65%) between week 1 and week 3 of total sleep time (TST), without showing changes in sleep efficiency (SE). Sleep latency (SL) in both groups improved, with a reduction in the difference between the groups (9.73%), and wake after sleep onset (WASO) decreased (p=0.001; ES=-1.79; 20.91%). The increase in TST, maintenance of SE, reduction of WASO, and the difference in SL between the groups, associated with the equal performance of the groups obtained by IPS, suggest better sleep quality during the period, with tapering reducing any performance differences that PSG could have compared to GSG.
Research
Full-text available
This study is motivated by the primary research question, "Is Project Lead the Way (PLTW) instruction associated with elementary teachers' perceptions of their relationships with students and how?" Previous research has attempted to identify predictors that would positively impact teacher-student relationships (TSRs), however few studies have sought to understand associations between TSRs and instructional styles. The purpose of this mixed methods sequential explanatory study was to explore associations between the teacher participants' perceptions of their TSRs and their use of the PLTW curriculum. The two-phase investigation obtained quantitative results from a 19-item online survey of 83 elementary teachers in a public school district and then followed up with interviews of nine purposefully selected individuals. A mixed regression analysis of the quantitative data from the present study found that teachers from the PLTW-trained group had significantly lower STRS-SF scores (B =-2.713, β =-.287, t =-2.625, p < .05). The equation explained about 14 percent of the variance in STRS-SF scores (R 2 = .139, Adj. R 2 = .095). Two meta-inferences were developed from four themes that emerged from a line-by-line analysis of the qualitative interview data. Overall, this study found that the quantitative and qualitative strands of data combined to show that significantly lower STRS-SF scores within the PLTW group related to a supplanting of the teacher's role as facilitator and a limitation of opportunities for student-centered instruction.
Article
Ongoing debate specific to the power properties of the independent samples t test and Wilcoxon Mann-Whitney required a need for this study. Researchers chose the t test over the Wilcoxon, when testing for shift, claiming that in small treatment conditions, the Wilcoxon was erroneously rejecting the null hypothesis due to scale change. Therefore, the purpose of this study was to assess if, in the presence of a slight scale change, the reason the t test fails to reject and the Wilcoxon does reject is due to the scale change and not shift in location. ^ Applying Monte Carlo techniques, the comparative power and robustness of the t test and the Wilcoxon were investigated. In addition to the Gaussian distribution, two real prototypical data sets Smooth Symmetric and Extreme Asymmetry, Achievement, (Micceri, 1989) were applied. Sample Sizes included: (n1, n2) = (10, 30), (30, 10), (20, 20), (15, 45), (45, 15) and (30, 30). The ratio of variance for group one and group two ranged from 1.0-1.2 (increase in increments of .05). Shift/change in location parameters increased from 0.0-1.1 (increments of .05). Nominal alpha was set at .05. ^ Outcomes compared the robustness and power of each test. Recognized as the Behrens-Fisher problem, scale change without change in location, outcomes confirm neither test as robust. In studying shift while holding variance constant; the power of both tests are comparable specific to the Gaussian and Smooth Symmetric distributions, however with extreme skew, the Wilcoxon maintains much greater power. ^ The primary focus of this research is slight change in location and scale change. Under normality, the t test rejects more than the Wilcoxon. Further, as the variance difference increases, both test's rejection rates increase. With the introduction of non-normality, both tests reject at a higher rate, with the Wilcoxon rejecting more frequently then the t test. The outcomes of this study confirm the strength of the t test under normality however when the treatment impacts location, researchers can maintain confidence that if the Wilcoxon rejects the null and the t test does not, this rejection reflects a shift in location. ^
Visible learning: a synthesis of over 800 meta-analyses relating to achievement. Park Square
  • J Hattie
Hattie, J. (2009). Visible learning: a synthesis of over 800 meta-analyses relating to achievement. Park Square, OX: Rutledge. Lenth, R. V. (2001). Some practical guidelines for effective sample size determination. The American Statistician, 55, 187-193. Marzano. (2000). A new era of school reform: Going where the research takes us, p.