ArticlePDF Available

The need to report effect size estimates revisited. An overview of some recommended measures of effect size

Authors:
TRENDS in
Sport Sciences
2014; 1(21): 19-25.
ISSN 2299-9590
Vol. 1
(21)
TRENDS IN
SPORT
SCIENCES
19
The need to report effect size estimates revisited.
An overview of some recommended measures of effect size
MACIEJ TOMCZAK1, EWA TOMCZAK2
REVIEW ARTICLE
Recent years have witnessed a growing number of published
reports that point out the need for reporting various effect
size estimates in the context of null hypothesis testing (H
0
)
as a response to a tendency for reporting tests of statistical
signicance only, with less attention on other important aspects
of statistical analysis. In the face of considerable changes over
the past several years, neglect to report effect size estimates
may be noted in such elds as medical science, psychology,
applied linguistics, or pedagogy. Nor have sport sciences
managed to totally escape the grips of this suboptimal practice:
here statistical analyses in even some of the current research
reports do not go much further than computing p-values. The
p-value, however, is not meant to provide information on the
actual strength of the relationship between variables, and does
not allow the researcher to determine the effect of one variable
on another. Effect size measures serve this purpose well. While
the number of reports containing statistical estimates of effect
sizes calculated after applying parametric tests is steadily
increasing, reporting effect sizes with non-parametric tests is
still very rare. Hence, the main objectives of this contribution
are to promote various effect size measures in sport sciences
through, once again, bringing to the readers’ attention the
benets of reporting them, and to present examples of such
estimates with a greater focus on those that can be calculated
for non-parametric tests.
KEY WORDS: sport science, effect size calculation, parametric
tests, non-parametric tests, methodology.
Received: 12 September 2013
Accepted: 15 February 2014
Corresponding author: maciejtomczak5@gmail.com
1
University School of Physical Education in Poznań, Department
of Psychology, Poland
2 Adam Mickiewicz University in Poznań, Faculty of English,
Department of Psycholinguistic Studies, Poland
What is already known on this topic?
Estimates of effect size allow the assessment of the
strength of the relationship between the investigated
variables. In practice, they permit an evaluation of the
magnitude and importance of the result obtained. An
effect size estimate is a measure worth reporting next
to the p-value in null hypothesis testing. However,
not every research report contains it. After the null
hypothesis has been tested with the use of parametric
and non-parametric tests (statistical signicance
testing), measures of effect size can be estimated.
A few remarks on statistical hypothesis testing
S
tudies in sport sciences have addressed a wide
spectrum of topics. Empirical verication in these
areas often makes use of correlation models as well
as experimental research models. Just like other
scholars conducting empirical research, researchers
in sport sciences often rely on inferential statistics to
test hypotheses. From the point of view of statistics,
the hypothesis verication process often comes down
to determining the probability value (p-value), and to
deciding whether the null hypothesis (H
0
) is rejected
(a test of statistical signicance) [1, 2, 3, 4]. In the
case of rejecting the null hypothesis (H0), a researcher
20
TRENDS IN
SPORT
SCIENCES
March 2014
TOMCZAK, TOMCZAK
will accept an alternative hypothesis (H
1
), which is
often referred to as the so-called substantive hypothesis
as a researcher formulates it based on various criteria
applicable to their own studies. Such an approach to
hypothesis verication has its origin in Fisher’s approach
(p-value approach) and the Neyman-Pearson framework
to hypothesis testing that was developed later (xed-α
approach). Below, based on Aranowska and Rytel
[5, p. 250], we present the two approaches (Table 1).
Rejecting the null hypothesis (H
0
) when it is in fact
true is what Neyman and Pearson call making a Type
I error (known as “false positive” or “false alarm”). To
control for Type I error, or in other words, to minimize
the chance of nding a difference that is not really there
in the data, researchers set an appropriately low alpha
level in their analyses. By contrast, failing to reject the
null hypothesis (H
0
) when it is actually false (and should
be rejected) is referred to as a Type II error (known as
“false negative”). Here, increasing the sample size is an
effective way of reducing the probability of obtaining
a Type II error [1, 2, 3].
The presented approach to hypothesis testing has been
a common practice in many disciplines. However,
reporting the p-value alone and drawing inferences
based on the p-value alone is insufcient. Hence,
statistical analyses and research reports should be
supplemented with other essential measures that carry
more information about the meaningfulness of the
results obtained.
Why the p-value alone is not enough? – or On the
need to report effect size estimates
Thanks to some of its advantages, the concept of
statistical signicance testing has prevailed in the
empirical verication of hypotheses to the extent that
many areas have still seen other vital statistical measures
go largely unreported. In spite of recommendations
not to limit research reports to presenting the null
hypothesis testing and reporting the p-value only, to
this day a relatively large number of published articles
have not gone much beyond that. By way of illustration,
a meta-analysis of research accounts published in one
prestigious psychology journal in the years 2009 and
2010 showed that almost half of the articles reporting
an Analysis of Variance (ANOVA) did not contain any
measure of effect size, and only a mere quarter of the
surveyed research reports supplemented Student’s t-test
analyses with information about the effect size [6]. Sport
sciences have seen comparable practices every now and
then. As already pointed out, giving the p-value only to
support the signicance of the difference between groups,
or measurements, or the signicance of a relationship is
insufcient [7, 8]. The p-value alone merely indicates
what the probability of obtaining a result as extreme as
or more extreme than the one actually obtained, assuming
that the null hypothesis is true [1]. In many circumstances,
the computed p-value depends (also) on the standard error
(SE) [9]. It is now well established that the sample size
affects the standard error and, as a result of that, the
p-value. As the size of a sample increases, the standard
error becomes smaller, and the p-value tends to decrease.
Due to this dependence on sample size, p-values are seen
as confounded. Sometimes a result that is statistically
signicant mainly indicates that a huge sample size was
used [10, 11]. For this reason, the value of the p-value
does not say whether the observed result is meaningful or
important in terms of (1) the magnitude of the difference
in the mean scores of the groups on some measure, or (2)
Table 1. Fisher’s and Neyman-Pearson’s approaches to hypothesis testing
The Fisher approach to hypothesis testing
(also known as the p-value approach)
The Neyman-Pearson approach to hypothesis testing (also
known as the xed-α approach)
− formulate the null hypothesis (H0)
− select the appropriate test statistic and specify its
distribution
− collect the data and calculate the value of the test statistic
for your set of data
− specify the p-value
− if the p-value is sufciently small (according to the
criterion adopted), then reject the null hypothesis.
Otherwise, do not reject the null hypothesis.
− formulate two hypotheses: the null hypothesis (H0) and
the alternative hypothesis (H1)
− select the appropriate test statistic and specify its
distribution
− specify α (alpha) and select the critical region (R)
− collect the data and calculate the value of the test statistic
for your set of data
− if the value of the test statistic falls in the critical
(rejection) region, then reject the null hypothesis at
a chosen signicance level (α). Otherwise, do not reject
the null hypothesis.
Vol. 1
(21)
TRENDS IN
SPORT
SCIENCES
21
THE NEED TO REPORT EFFECT SIZE ESTIMATES REVISITED. AN OVERVIEW...
the strength of the relationship between the investigated
variables. Relying on the p-value alone for statistical
inference does not permit an evaluation of the magnitude
and importance of the obtained result [10, 12, 13].
In general terms, there are good enough reasons for
researchers to supplement their reports of the null
hypothesis testing (statistical signicance testing: the
p-value) with information about effect sizes. Given
statistical measures, a large number of effect size
estimates have been developed and used to this day.
As reporting effect size estimates is benecial in more
than one way, below we list the benets that seem most
fundamental [6, 12, 14, 15, 16, 17, 18]:
1.
They reect the strength of the relationship
between variables and allow for the importance
(meaningfulness) of such a relationship to be
evaluated. This holds both for relationships explored
in correlational research and the magnitude of
effects obtained in experiments (i.e. evaluating
the magnitude of a difference). On the other hand,
applying a test of signicance only and stating the
p-value may solely provide information about the
presence or absence of a difference, its impact and
relation, leaving aside its importance.
2. Effect size estimates allow the results from different
sources and authors to be properly compared. The
p-value alone, which depends on the sample size,
does not permit such comparisons. Hence, the
effect size is critical in research syntheses and meta-
analyses that integrate the quantitative ndings from
various studies of related phenomena.
3.
They can be used to calculate the power of
a statistical test (power statistics), which in turn
allows the researcher to determine the sample size
needed for the study.
4.
Effect sizes obtained in pilot studies where the
sample size is small may be an indicator of future
expectations of research results.
Some recommended effect size estimates
In the present section we provide an overview of a number
of effect size estimates for statistical tests that are most
commonly used in sport sciences. Since parametric tests
are frequently used, measures of effect size for parametric
tests are described rst. Then, we describe effect size
estimates for non-parametric tests. Reporting measures of
effect size for the latter is more of a rarity. Aside from that,
in the overview below we omit the measures of effect size
that are most popular and widely reported for parametric
tests. In sport sciences examples of the most popular
estimates of effect size include correlation coefcients
for relationships between variables measured on an
interval or ratio scale such as the Pearson’s correlation
coefcient (r). Nor do we present effect size measures
popular and widely used, among others, in sport sciences,
calculated for relationships between ordinal variables
such as the Spearman’s coefcient of correlation. Some
measures of effect size presented below can be calculated
automatically with the help of statistical software such as
Statistica, the Statistical Package for the Social Sciences
(SPSS), or R. Others can be calculated by hand in a quick
and easy way.
Effect size estimates used with parametric tests
The Student’s t-test for independent samples is
a parametric test that is used to compare the means of
two groups. After the null hypothesis is tested, one can
easily and quickly calculate the value of the point-biserial
correlation coefcient with the help of the Student’s t-test
(provided that the t-value comes from comparing groups
of relatively similar size). This coefcient is similar to the
classical correlation coefcient in its interpretation. Using
this coefcient one can calculate the popular r2 2). The
formula used in computing the point-biserial correlation
coefcient is presented below [1, 6, 19]:
rt
td
f
=+
2
2
rt
td
f
22 2
2
==
+
η
t value of Student’s t-test, df – the number
of degrees of freedom (n
1
1 + n
2
1);
n1, n2 the number of observations in groups
(group 1, group 2)
r point-biserial correlation coefcient
r
2
2
) the index assumes values from 0 to 1 and
multiplied by 100% indicates the percentage
of variance in the dependent variable
explained by the independent variable
Often used here are the effect size measures from the
so-called d family of size effects that include, among
others, two commonly used measures: Cohen’s d and
Hedges’ g. Below we provide a formula for calculating
Cohen’s d [1, 19, 20, 21]:
22
TRENDS IN
SPORT
SCIENCES
March 2014
TOMCZAK, TOMCZAK
d
xx
=
12
σ
d Cohen’s index
x
12, means of the rst and second sample
σ standard deviation of a population
Normally, we do not know the population standard
deviation and we estimate it based on the sample.
Given that, it is possible here to use the estimate of
standard deviation of the total population. In this case,
to estimate the effect size one can compute the g
coefcient that uses the weighted pooled standard
deviation [22]:
g
xx
ns
ns
nn
=
()
+−
()
+−
12
11
2
22
2
12
11
2
n1, n2 the number of observations in groups (group 1,
group 2)
s1, s2 standard deviation in groups (group 1, group 2)
rough arbitrary criteria for Cohen’s d and Hedges’ g
values: d or g of 0.2 is considered small, 0.5 medium,
and 0.8 large [21]
When it comes to the dependent-samples Student’s
t-test, it is possible to compute the correlation
coefcient r. For this purpose, the above-presented
formula for calculating r for independent samples is
adopted. However, the r coefcient “is no longer the
simple point-biserial correlation, but is instead the
correlation between group membership and scores
on the dependent variable with indicator variables
for the paired individuals partialed out” [23, p. 447].
Additionally, once the dependent-samples Student’s
t-test has been used, it is possible to calculate the effect
size estimate g, where [1, 22]:
gD
SS
D
=
D
mean difference score
SS
D
sum of squared deviations (i.e. the sum of squares
of deviations from the mean difference score)
In turn, to compare more than two groups on ratio variables
or interval variables, Analysis of Variance (ANOVA) is
used, be it one-way or multi-factor ANOVA (provided that
the samples meet the criteria). The effect size estimates
used here are the coefcient η
2
or ω
2
. To compute the
former (η2), we may use the ANOVA output from popular
statistical software packages such as Statistica or SPSS.
Below we present the formula [1, 6, 24]:
η2=
SS
SS
ef
t
SSef sum of squares for the effect
SSt total sum of squares
η
2
the index assumes values from 0 to 1 and
multiplied by 100% indicates the percentage of
variance in the dependent variable explained by
the independent variable
One of the disadvantages of η
2
is that the value of each
particular effect is dependent to some extent on the
size and number of other effects in the design [25].
A way out of this problem is to calculate the partial
eta-squared statistic
()
,
η
p
2
where a given factor is seen
as playing a role in explaining the portion of variance
in the dependent variable provided that other effects
(factors) present in the analysis have been excluded [6].
The formula is presented below [1, 6, 24]:
ηp
ef
ef er
SS
SS SS
2=+
SSef sum of squares for the effect
SSer – sum of squared errors
In the same way, one can calculate the effect size for
within-subject designs (repeated measures). However,
both coefcients η
2
and
()
ηp
2
are biased and they estimate
the effect for a given sample only. Therefore, we should
compute the coefcient ω2 that is relatively unbiased. To
calculate it by hand one can use the ANOVA output that
contains values of mean square (MS), sum of squares
(SS), and degrees of freedom (df). For between-subject
designs the following formula applies [24]:
ω2=
+
df MS MS
SS MS
ef ef er
te
r
()
MSef – mean square of the effect
MSer – mean square error
SSt – the total sum of squares
dfef – degrees of freedom for the effect
Vol. 1
(21)
TRENDS IN
SPORT
SCIENCES
23
THE NEED TO REPORT EFFECT SIZE ESTIMATES REVISITED. AN OVERVIEW...
For within-subject designs ω
2
is calculated using the
formula [24]:
ω2=
+
df MS MS
SS MS
ef ef er
ts
j
()
MSef – mean square of the effect
MSer – mean square error
MSsj mean square for subjects
dfef degrees of freedom for the effect
The partial omega-squared
()
ωp
2
is computed in the
same way both for the between-subject designs and
within-subject designs (repeated measures) using the
formula below [24]:
ωp
ef ef er
ef ef ef er
df MS MS
df MS ndfMS
2=
+−
()
()
Both η
2
and ω
2
are interpreted similarly to R
2
. Hence,
these measures multiplied by 100% indicate the
percentage of variance in the dependent variable
explained by the independent variable.
Effect size estimates used with non-parametric tests
Now we turn to non-parametric tests. Various effect
size estimates can be quickly calculated for the Mann-
Whitney U-test: a non-parametric statistical test used
to compare two groups. In addition to the U-value,
the Mann-Whitney test report (output) contains the
standardized Z-score which, after running the Mann-
Whitney U-test on the data, can be used to compute the
value of the correlation coefcient r. The interpretation
of the calculated r-value coincides with the one for
Pearson’s correlation coefcient (r). Also, the r-value
can be easily converted to r2. The formulae for
calcul
ating r and r2 by hand are presented below [6]:
r
Z
n
=
rZ
n
22 2
==η
Z standardized value for the U-value
r correlation coefcient where r assumes the
value ranging from –1.00 to 1.00
r2 (η2) the index assumes values from 0 to 1 and
multiplied by 100% indicates the percentage
of variance in the dependent variable
explained by the independent variable
n the total number of observations on which
Z is based
Following the computation of the Mann-Whitney
U-statistic, one can also calculate the Glass rank-biserial
correlation using average ranks from two sets of data
(_
R1, _
R2) and sample size in each group. Some statistical
packages next to the test score produce the sum of ranks
that can be used to calculate mean ranks. To interpret the
calculated value one can draw on the interpretation of
the classical Pearson’s correlation coefcient (r). Here
the following formula applies [1]:
r
RR
nn
=
+
2
12
12
()
_
R1 mean rank for group 1
_
R2 mean rank for group 2
n1 sample size (group 1)
n2 sample size (group 2)
r correlation coefcient where r assumes the value
ranging from –1.00 to 1.00
For another non-parametric test, the Wilcoxon signed-
rank test for paired samples, again, the Z-score may be
used to calculate correlation coefcients employing the
formula given below (where n is the total number of
observations on which Z is based) [6].
r
Z
n
=
On the other hand, once the Wilcoxon signed-rank test
has been computed, one can also calculate the rank-
biserial correlation coefcient using the formula [1]:
r
TRR
nn
=
+
++
42
1
12
()
R1 sum of ranks with positive signs (sum of ranks
of positive values)
R2 sum of ranks with negative signs (sum of ranks
of negative values)
T the smaller of the two values (R1 or R2)
n the total sample size
r
correlation coefcient (which is the same as r
coefcient in its interpretation)
24
TRENDS IN
SPORT
SCIENCES
March 2014
TOMCZAK, TOMCZAK
For the Kruskal-Wallis H-test, a non-parametric test
adopted to compare more than two groups, the eta-
squared measure (η
2
) can be computed. The formula
for calculating the η
2
estimate using the H-statistic is
presented below [26]:
ηH
Hk
nk
21
=
−+
H the value obtained in the Kruskal-Wallis test (the
Kruskal-Wallis H-test statistic)
η2 eta-squared estimate assumes values from 0 to 1
and multiplied by 100% indicates the percentage
of variance in the dependent variable explained
by the independent variable
k the number of groups
n the total number of observations
In addition, once the Kruskal-Wallis H-test has been
computed, the epsilon-squared estimate of effect size
can be calculated, where [1]:
E
H
nn
R
2
2
11
=
−+
()/( )
H the value obtained in the Kruskal-Wallis test (the
Kruskal-Wallis H-test statistic)
n the total number of observations
E
R
2
coefcient assumes the value from 0 (indicating
no relationship) to 1 (indicating a perfect
relationship)
Also, for the Friedman test, a non-parametric
statistical test employed to compare three or more
paired measurements (repeated measures), an effect
size estimate can be calculated (and is referred to
as W) [1]:
WNk
w
=
χ2
1
()
W the Kendall’s W test value
χ
w
2
the Friedman test statistic value
N sample size
k the number of measurements per subject
The Kendall’s W coefcient assumes the value from 0
(indicating no relationship) to 1 (indicating a perfect
relationship).
Also, in sport sciences it is quite common practice to use
the chi-square test of independence (χ
2
). Having tested
the null hypothesis (H
0
) with a χ
2
test of independence,
one may assess the strength of a relationship between
nominal variables. In this case, Phi (φYoula, computed
for 2 × 2 tables where each variable has only two
levels, e.g. the rst variable: male/female, the second
variable: smoking/non-smoking) can be reported, or
one can report Cramer’s V (for tables which have
more than 2 × 2 rows and columns). The values
obtained for the estimates of effect size are similar to
correlation coefcients in their interpretation. Again,
popular statistical software packages calculate Phi and
Cramer’s V. Below we present the formulae for such
calculations [1, 6]:
φχ
=
2
n
and for Cramer’s V:
Vndfs
=χ2
()
df
s
– degrees of freedom for the smaller from two
numbers (the number of rows and columns,
whichever is smaller)
χ2 the calculated chi-square statistic
n the total number of cases
The Phi coefcient and the Cramer’s V assume the
value from 0 (indicating no relationship) to 1 (indicating
a perfect relationship).
Conclusions
In the present contribution we have re-emphasized the
need to report estimates of effect size in conjunction
with null hypothesis testing, and the benets thereof. We
have presented some of the recommended measures of
effect size for statistical tests that are most commonly
used in sport sciences. Additional emphasis has been
on effect size estimates for non-parametric tests, as
reporting effect size measures for these tests is still
very rare. The present paper may also serve as a point
of departure for further discussion where practical
(e.g. clinical) magnitude (importance) of results in the
light of the conditionings in a chosen area will come
into focus.
Vol. 1
(21)
TRENDS IN
SPORT
SCIENCES
25
THE NEED TO REPORT EFFECT SIZE ESTIMATES REVISITED. AN OVERVIEW...
What this paper adds?
This paper highlights the need for including adequate
estimates of effect size in research reports in the area of
sport sciences. The overview contains various types of
effect size measures that can be calculated following
the computation of parametric and non-parametric
tests. Since reporting effect size estimates when using
non-parametric tests is very rare, this section may
prove particularly useful for researchers. Some of the
effect size measures given can be calculated by hand
quite easily, others can be calculated with the help of
popular statistical software packages.
References
1.
King BM, Minium EW. Statystyka dla psychologów
i pedagogów (Statistical reasoning in psychology and
education). Warszawa: Wydawnictwo Naukowe PWN;
2009.
2.
Cohen J. The earth is round (p < .05). American
Psychologist. 1994; 49(12): 997-1000.
3.
Cohen J. Things I have learned (so far). American
Psychologist. 1990; 45(12): 1304-1312.
4.
Jascaniene N, Nowak R, Kostrzewa-Nowak D, et al.
Selected aspects of statistical analyses in sport with the
use of Statistica software. Central European Journal of
Sport Sciences and Medicine. 2013; 3(3): 3-11.
5.
Aranowska E, Rytel J. Istotność statystyczna co to
naprawdę znaczy? (Statistical signicance – what does
it really mean?). Przegląd Psychologiczny. 1997; 40(3-4):
249-260.
6. Fritz CO, Morris PE, Richler JJ. Effect size estimates:
current use, calculations, and interpretation. Journal of
Experimental Psychology: General. 2012; 141(1): 2-18.
7.
Drinkwater E. Applications of condence limits and
effect sizes in sport research. The Open Sports Sciences
Journal. 2008; 1(1): 3-4.
8. Fröhlich M, Emrich E, Pieter A, et al. Outcome effects
and effects sizes in sport sciences. International Journal
of Sports Science and Engineering. 2009; 3(3): 175-179.
9.
Altman DG, Bland JM. Standard deviations and standard
errors. British Medical Journal. 2005; 331(7521): 903.
10. Sullivan GM, Feinn R. Using effect size – or why the
p value is not enough. Journal of Graduate Medical
Education. 2012; 4(3): 279-282.
11. Bradley MT, Brand A. Alpha values as a function of
sample size, effect size, and power: accuracy over
inference. Psychological Reports. 2013; 112(3): 835 -
844.
12. Brzeziński J. Badania eksperymentalne w psychologii
i pedagogice (Experimental studies in psychology and
pedagogy). Warszawa: Wydawnictwo Naukowe Scholar;
2008.
13. Durlak JA. How to select, calculate, and interpret effect
sizes. Journal of Pediatric Psychology. 2009; 34(9):
917-928.
14.
Shaughnessy JJ, Zechmeister EB, Zechmeister JS.
Research Methods in Psychology. 5th ed. New York,
NY: The McGraw-Hill; 2000.
15. Aarts S, van den Akker M, Winkens B. The importance
of effect sizes. European Journal of General Practice.
2014; 20(1): 61-64.
16. Ellis PD. The essential guide to effect sizes: Statistical
power, meta-analysis, and the interpretation of research
results. Cambridge: Cambridge University Press; 2010.
17.
Lazaraton A. Power, effect size, and second language
research. A researcher comments. TESOL Quarterly.
1991; 25(4): 759-762.
18.
Hatch EM, Lazaraton A, Jolliffe DA. The research
manual: Design and statistics for applied linguistics.
New York: Newbury House Publishers; 1991.
19.
Rosnow RL, Rosenthal R. Effect sizes for experimenting
psychologists. Canadian Journal of Experimental
Psychology. 2003; 57(3): 221-237.
20.
Cohen J. Some statistical issues in psychological
research. In: Wolman BB, ed., Handbook of clinical
psychology, New York: McGraw-Hill; 1965. pp. 95-121.
21.
Cohen J. Statistical power analysis for the behavioral
sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum;
1988.
22.
Hedges LV, Olkin I. Statistical methods for meta-
analysis. San Diego, CA: Academic Press; 1985.
23.
Rosnow RL, Rosenthal R, Rubin DB. Contrasts and
correlations in effect-size estimation. Psychological
Science. 2000; 11(6): 446-453.
24.
Lakens D. Calculating and reporting effect sizes to
facilitate cumulative science: a practical primer for
t-tests and ANOVAs. Frontiers in Psychology. 2013; 4:
863.
25.
Tabachnick BG, Fidell LS. Using multivariate statistics.
Upper Saddle River, NJ: Pearson Allyn & Bacon; 2001.
26. Cohen BH. Explaining psychological statistics. 3rd ed.
New York: John Wiley & Sons; 2008.
... Recent debates promote the shift away from relying solely on null hypothesis significance tests and p-values as the primary source of testing hypotheses (Ferguson, 2009;Sullivan & Feinn, 2012). Agler and De Boeck (2017), Tomczak and Tomczak (2014) and Lachowicz et al. (2018) advocate that the use of effect sizes is mandatory and acts as a complementary support for hypothesis testing. We argue that effect sizes (direct and indirect) are especially useful in EDMs (Lapointe-Shaw et al., 2018). ...
... We argue that effect sizes (direct and indirect) are especially useful in EDMs (Lapointe-Shaw et al., 2018). By overlooking the estimation and proper reporting of effect sizes in EDMs, postulated findings that ought to capture substantive meanings associated with the strength of relationships between constructs are ignored (Tomczak & Tomczak, 2014). This can also hamper comparisons of results from alternative sources of evidence investigating a phenomenon and inadvertently limit meaningful findings that could have otherwise advanced contemporary understandings of a broader phenomenon (Shmueli et al., 2019). ...
... However, in some EDMs, the v effects may seem too small to be worth considering for practical implications and decisions (Wen & Fan, 2015). While mediation may be established via confidence intervals in a given EDM, with a resulting v effect that is negligible, it is important to note that even a smaller v effect size does not necessarily rule out the existence of an already established mediation (Sullivan & Feinn, 2012;Tomczak & Tomczak, 2014). Regardless of the effect size, competing v effects in a given EDM may rule out each other and, thus, diminish the value of v effects in respective estimated paths (Agler & De Boeck, 2017;Lachowicz, Preacher, & Kelley, 2018). ...
Article
Full-text available
The event-driven model (EDM) is an emerging concept in human behavioural research, and understanding how EDMs can promote theory development remains a fundamental quest of predictive science. Traditionally, researchers have heavily depended upon theory confirmation and the inclusion of mediating constructs to clarify uncertainty associated with plausible events (e.g. political, socio-economic, technological, environmental). Though this approach has pushed the field forward, it has also steered mediation research towards largely ignoring the fundamental role of prediction as a key for better understanding future events represented by EDMs. Additionally, emerging research using partial least squares structural equation modelling to execute prediction-oriented analysis continues to overlook problematic endogeneity bias and plausible type IV errors due to omitted paths and neglect of indirect effect size estimation in mediation models that embrace the transmittal or segmentation mediation approaches. We aim to introduce prediction as a fundamental option for estimating EDMs and recommend that researchers employ the segmentation mediation approach when estimating EDMs. We further emphasize a novel direct and indirect ( v) effect size measure, types of prediction and cases when they are useful. Best practices and practical implications are provided to foster a more useful interpretation of findings.
... A comparison between groups for continuous variables was performed using the t-test for independent samples, and the effect size measurement was calculated using Cohen's d value. 31 When normal distribution was not followed, the comparison between groups was performed using the Mann-Whitney test, and the effect size measurement was calculated using the value of r. 31,32 A confirmatory factor analysis was performed to confirm the factor structure proposed by the original version of the OCI-R. Thus, for both groups, chi-square (χ 2 ), confirmatory factor index (CFI), goodness of fit index (GFI), and root mean square error of approximation (RMSE) were determined. ...
... A comparison between groups for continuous variables was performed using the t-test for independent samples, and the effect size measurement was calculated using Cohen's d value. 31 When normal distribution was not followed, the comparison between groups was performed using the Mann-Whitney test, and the effect size measurement was calculated using the value of r. 31,32 A confirmatory factor analysis was performed to confirm the factor structure proposed by the original version of the OCI-R. Thus, for both groups, chi-square (χ 2 ), confirmatory factor index (CFI), goodness of fit index (GFI), and root mean square error of approximation (RMSE) were determined. ...
... 10,16,17 On the other hand, although cutoff points were defined for the parameters evaluated in this process, the literature argues that we should not stick to them because different factors can influence the results obtained in confirmatory factor analysis, from the number of individuals that constitute the sample, to the type of distribution that the variables follow, and the type of content the scale evaluates. 31,32 In this way, considering that the results obtained were similar to those obtained in other studies and that the literature advocates a holistic approach to the scale rather than an approach focused on specific parameters evaluated by factor analysis, we assumed that the Portuguese version of OCI-R follows the model of the six-factors. ...
Article
Full-text available
Introduction: The Obsessive-Compulsive Inventory-Revised has been developed to evaluate the severity of obsessive-compulsive symptoms in both clinical and non-clinical individuals. The aim of this study was to evaluate the psychometric properties of the Portuguese version. Material and methods: This questionnaire was applied to 90 people with obsessive-compulsive disorder and 246 without a known mental illness. In addition to this clinical evaluation instrument, participants completed other clinical assessment scales that helped characterize the two study groups. Results: Given the objective of this study, to evaluate the structure by six factors, a confirmatory factor analysis was performed [patient group: χ2(120, n = 90) = 205.779, p < 0.01; CFI = 0.916; GFI = 0.814; RMSEA = 0.0890. Control group: χ2(120, n = 246) = 224.762, p < 0.01; CFI = 0.938; GFI = 0.904; RMSEA = 0.060]. To assess the internal consistency of the scale, Cronbach's alpha was determined (patient group: α = 0.913; control group: α = 0.888). Convergent validity was tested by determining the Spearman correlation between the scores obtained in the Obsessive-Compulsive Inventory-Revised and Y-BOCS in the patient group (r = 0.651; p < 0.01). Conclusion: Obsessive-Compulsive Inventory-Revised has proved to be a consistent, valid, and reliable instrument with good psychometric properties to determine the severity of obsessive-compulsive symptoms in the Portuguese population.
... This confirms the hypothesis that the activity state influences the viewer's video quality requirements. To assess the strength of the relationship between the context and the resolution we computed the effect size estimate for the Kruskal-Wallis result [44]. More specifically, we computed the eta-squared measure (η 2 ) using the following formula [8]: ...
... where H is the Kruskal-Wallis H-test statistic, k is the number of groups and n the total number of observations. Eta-squared estimate assumes values from 0 to 1 and multiplied by 100 indicates the percentage of variance in the dependent variable explained by the independent variable [44]. For our experiment the computed eta-squared was 0.04 for the first study and 0.06 for the second study; in the related scientific literature [8] eta-squared values less than 0.06 account for a small (weak) effect. ...
Article
Full-text available
While the evolution of mobile computing is experiencing considerable growth, it is at the same time seriously threatened by the limitations of battery technology, which does not keep pace with the evergrowing increase in energy requirements of mobile applications. Yet, with the limits of human perception and the diversity of requirements that individuals may have, a question arises of whether the effort should be made to always deliver the highest quality result to a mobile user? In this work we investigate how a user’s physical activity, the spatial/temporal properties of the video, and the user’s personality traits interact and jointly influence the minimal acceptable playback resolution. We conduct two studies with 45 participants in total and find out that the minimal acceptable resolution indeed varies across different contextual factors. Our predictive models inferring the lowest acceptable playback resolution, together with the reduced power consumption we measure at lower resolutions, open an opportunity for saving a mobile’s energy through context-adaptable approximate computing.
... Item analysis. A descriptive analysis of the items' distributional, correlational characteristics and the response trend was made using non-parametric procedures due to the ordinal level of the responses [79]. The analyses were conducted with the Langtest [80] and MVN [81] R packages. ...
Article
Full-text available
This research aimed to adapt and validate a measuring scale of perceived research competencies among undergraduate students. Perceived research competencies of undergraduate learning can be measured with a new scale adapted from self-determination theory. We assessed the validity of this new measure applied to 307 participating undergraduates from Lima (Peru). The instrument’s survey items in the perceived competencies scale were first translated from English to Spanish and then adapted to focus on participation in research activities. We obtained evidence for (a) content validity (through item analysis), (b) internal structure with Mokken Scaling Analysis and structural equation modeling to examine the item–construct relationship, differential item functioning, and reliability, and (c) association with external variables. The items were found to function one-dimensionally, with strong item–construct relationships and no differential functioning (academic semester and general self-esteem groups). Theoretically consistent associations were found between study satisfaction and anxiety symptoms (controlling for gender, semester, and social support). We also discussed the theoretical implications and practices of this newly adapted measurement instrument.
Article
In Western cultures, humans tend to use a specific kind of speech when talking to their pets, characterised, from an acoustical point of view, by elevated pitch and greater pitch modulation. Pet-directed speech (PDS), which has been mainly studied in dogs, shares some acoustic features with infant-directed speech (IDS), used when talking to young children. The purpose of this study was to test the hypothesis that adult humans also modify characteristics of their voice when talking to a cat. We compared acoustic parameters of speech directed to cats (CDS) and speech directed to adult humans (ADS). In a first experiment, we compared ADS and CDS utterances of male and female participants, addressing cats through video recordings, under controlled laboratory conditions. Both men and women used a higher pitch (mean fundamental frequency, or mean F0) in CDS vs. ADS. The second experiment was conducted under conditions allowing direct cat-human interactions, in a cohort of women. Once again, mean F0 was significantly higher in CDS vs. ADS. Overall, these data confirm our hypothesis that humans change the way they speak when addressing a cat, mainly by increasing the pitch of their voice. Further research is needed to fully investigate specificities of this speech.
Article
Purpose: The aim of this study was to determine changes in physical activity, nutrition, sleep behaviour and body composition in wheelchair users with a chronic disability after 12 weeks of using the WHEELS mHealth application (app). Methods: A 12-week pre-post intervention study was performed, starting with a 1-week control period. Physical activity and sleep behaviour were continuously measured with a Fitbit charge 3. Self-reported nutritional intake, body mass and waist circumference were collected. Pre-post outcomes were compared with a paired-sample t-test or Wilcoxon signed-rank test. Fitbit data were analysed with a mixed model or a panel linear model. Effect sizes were determined and significance was accepted at p < .05. Results: Thirty participants completed the study. No significant changes in physical activity (+1.5 √steps) and sleep quality (-9.7 sleep minutes; -1.2% sleep efficiency) were found. Significant reduction in energy (-1022 kJ, d = 0.71), protein (-8.3 g, d = 0.61) and fat (-13.1 g, d = 0.87) intake, body mass (-2.2 kg, d = 0.61) and waist circumference (-3.3 cm, d = 0.80) were found. Conclusion: Positive changes were found in nutritional behaviour and body composition, but not in physical activity and sleep quality. The WHEELS app seems to partly support healthy lifestyle behaviour.Implications for RehabilitationHealthy lifestyle promotion is crucial, especially for wheelchair users as they tend to show poorer lifestyle behaviour despite an increased risk of obesity and comorbidity.The WHEELS lifestyle app seems to be a valuable tool to support healthy nutrition choices and weight loss and to improve body satisfaction, mental health and vitality.
Chapter
The aim of this chapter is to analyse the competencies for inclusion developed by students of the Degree in Primary Education. After an analysis of the general competencies taught in the degree, the knowledge, skills, and attitudes for inclusion of trainee teachers were analysed. To achieve these objectives, a non-experimental, cross-sectional, and descriptive research design was carried out. A total of 440 students on the degree course in Primary Education at the University of La Laguna (Spain) took part in the study. The data were collected with a questionnaire developed ad hoc based on the inclusive competency profile developed by the EADSNE project (2012). The results showed that there was positive development of the competencies included in the profile. Therefore, the students had developed the knowledge, skills, and attitudes necessary to deal with diversity and become inclusive teachers.
Article
In recent years, the way in which mathematics is taught has changed radically and there has been an increase in the percentage of online classes. One of the biggest challenges that teachers face in this type of online scenario is the need to motivate students so that they do not lose the thread and continue with their usual rhythm and seek methodologies that facilitate the teaching–learning process. The growth of educational technology has become a powerful ally in carrying out this study. This study presents the use of a digital Escape Room in an online scenario for teaching mathematics in the first course of engineering. To measure the results, a pretest–posttest experience has been considered with two different groups of the first course of engineering in an online university. Some comparison tests of the qualifications obtained by both groups shows that, despite they were homogeneous in terms of previous knowledge of calculus in the beginning, the differences obtained between those who have used the Escape Room instead of the traditional methodology are significative. In addition, the effect sizes obtained were intermediate and also considering the results of the survey allows us to conclude that the use of the digital Escape Room is a very good tool for teaching Calculus in Engineering. Finally, it is also important to highlight that the satisfaction of the student with respect to the experience has been very high and they asked for more experiences like this.
Article
Full-text available
La alexitimia se caracteriza por una dificultad para identificar y describir emociones. El objetivo del presente estudio fue comparar el puntaje de alexitimia en pacientes con dolor crónico y personas de la población general. Mediante un estudio prospectivo, no experimental, trasversal, se evaluó a 165 personas, que conformaron cuatro grupos: el primer grupo contó con 32 participantes con diagnóstico de Síndrome de Fibromialgia (SFM); el segundo grupo, conformado por 61 pacientes con dolor crónico diferente al SFM; el tercer grupo estuvo constituido por 19 mujeres con dolor oncológico (cáncer de mama); mientras que el cuarto grupo lo formaron 53 participantes de la población general. Para evaluar el nivel de alexitimia se utilizó la escala de alexitimia de Toronto (TAS-10). Los resultados sugieren que la presencia de dolor crónico (oncológico y no oncológico) ostenta un puntaje de alexitimia significativamente más alto comparado con el grupo libre de dolor crónico (p < 0.001). Adicionalmente, se encontró una asociación negativa entre el nivel educativo de los participantes y su puntaje de alexitimia, presentando un puntaje de alexitimia más alto en aquellos con baja escolaridad. Se sugiere considerar la presencia de alexitimia en pacientes con dolor crónico para su ulterior intervención psicosocial.
Article
Full-text available
2002 (02:20 .) ‫و‬ ‫ياضي‬ ‫الر‬ ‫تكوين‬ ‫على‬ ‫هاما‬ ‫عامال‬ ‫البدني‬ ‫النشاط‬ ‫يعد‬ ‫حقوقه‬ ‫فة‬ ‫ومعر‬ ‫حاجاته‬ ‫بإشباع‬ ‫يبدأ‬ ‫كالتفكير‬ ‫للفرد‬ ‫العليا‬ ‫العقلية‬ ‫الوظائف‬ ‫ينمي‬ ‫فهو‬ ‫اإللكترونية‬ ‫اللعبة‬ ‫خالل‬ ‫يعكسه‬ ‫ما‬ ‫وهو‬ ‫اجباته‬ ‫وو‬ ‫(القوة‬ ‫البدنية‬ ‫ات‬ ‫القدر‬ ‫من‬ ‫العديد‬ ‫وينمي‬ ‫االبتكار‬ ‫و‬ ‫اك‬ ‫اإلدر‬ ‫و‬ ‫األبداع‬ ‫و‬-‫السرعة‬-‫المرونة‬-‫افق‬ ‫التو‬-‫خالل‬ ‫من‬ ‫الحركية‬ ‫ات‬ ‫المهار‬ ‫و‬ ‫التحمل)‬ ‫الكمبيوتر‬ ‫مع‬ ‫للتعامل‬ ‫التفاعلية‬ ‫امج‬ ‫البر‬ ‫و‬ ‫األداء‬ ‫مشاهدة‬ (‫يف‬ ‫شر‬ ‫(نادية‬ ‫الصحيح‬ ‫األداء‬ ‫لتنفيذ‬ 24:80 ،) 2002 ‫عثما‬ ‫عفاف‬ ، ‫ن،‬ 2022 (، 28:21 .)) ‫اإللكتروني‬ ‫التعليم‬ ‫و‬ ‫اإللكترونية‬ ‫امج‬ ‫بالبر‬ ‫يعرف‬ ‫ما‬ ‫العصر‬ ‫هذا‬ ‫في‬ ‫شيوعا‬ ‫امج‬ ‫البر‬ ‫أكثر‬ ‫ومن‬ ‫اآلل‬ ‫الحاسب‬ ‫أو‬ ‫الفيديو‬ ‫ألعاب‬ ‫أحيانا‬ ‫وتسمى‬ ‫ومع‬ ‫الشاشة،‬ ‫على‬ ‫أحداث‬ ‫عرض‬ ‫في‬ ‫تجتمع‬ ‫وكلها‬ ‫ي‬ ‫في‬ ‫فيه‬ ‫التر‬ ‫و‬ ‫اللعب‬ ‫أساليب‬ ‫تطورت‬ ‫العصر‬ ‫هذا‬ ‫في‬ ‫المتسارع‬ ‫و‬ ‫الهائل‬ ‫اإللكتروني‬ ‫و‬ ‫التقني‬ ‫التقدم‬ ‫ا‬ ً ‫ائج‬ ‫ر‬ ‫ا‬ ً ‫سوق‬ ‫لها‬ ‫وجدت‬ ‫التي‬ ‫امج‬ ‫بر‬ ‫على‬ ‫تحتوي‬ ‫التي‬ ‫اإللكترونية‬ ‫ة‬ ‫األجهز‬ ‫فظهرت‬ ‫الحاضر،‬ ‫الوقت‬ ‫ا‬ ‫و‬ ‫لألطفال‬ ‫إقبال‬ ‫من‬ ‫به‬ ‫تتمتع‬ ‫لما‬ ‫ا‬ ً ‫نظر‬ ‫لشباب‬ ، ‫أوقاتهم.‬ ‫من‬ ‫ا‬ ً ‫كبير‬ ‫ا‬ ً ‫حيز‬ ‫تأخذ‬ ‫أصبحت‬ ‫و
Article
Full-text available
Effect sizes are the most important outcome of empirical studies. Most articles on effect sizes highlight their importance to communicate the practical significance of results. For scientists themselves, effect sizes are most useful because they facilitate cumulative science. Effect sizes can be used to determine the sample size for follow-up studies, or examining effects across studies. This article aims to provide a practical primer on how to calculate and report effect sizes for t-tests and ANOVA's such that effect sizes can be used in a-priori power analyses and meta-analyses. Whereas many articles about effect sizes focus on between-subjects designs and address within-subjects designs only briefly, I provide a detailed overview of the similarities and differences between within- and between-subjects designs. I suggest that some research questions in experimental psychology examine inherently intra-individual effects, which makes effect sizes that incorporate the correlation between measures the best summary of the results. Finally, a supplementary spreadsheet is provided to make it as easy as possible for researchers to incorporate effect size calculations into their workflow.
Article
Full-text available
This article describes the origins of the conventional use of null hypothesis significance testing and why this convention has led to difficulties in implementing research results in applied settings. This article continues to explain the the value of expressing research results with confidence limits and effect sizes for sporting application. As researchers investigating human performance, one of our greatest measures of success is for our research outcome to be implemented by sports coaches for their athletes. While coaches are becoming increasingly receptive to the results of sports scientists, coaches are often frustrated by the incon- clusive and numerically cryptic results we report. Conven- tional null hypothesis significance testing dictates that unless the probability of rejecting the null in error (p-value) is less than 5%, we must accept the null hypothesis that the differ- ence between our groups is zero. But to return to a sports coach after six weeks of a training intervention to report "nothing happened" is frustrating and probably not entirely accurate. It may be possible that the intervention did have an effect, but due to sources of error in human performance testing, the results lacked sufficient consistency to pass the conventional 5% rule. However, is the p-value returned by our results greater than 5% because nothing happened, or is the problem in our use of the arbitrary 5% line in the sand to justify the success or failure of our intervention? After all, "… surely, God loves the .06 nearly as much as the .05." (1, p. 1277). Origins of the p-values in Null Hypothesis Significance Testing Initially describing type I and type II error rates was the work of Neyman and Pearson (2). Neyman and Pearson con- sidered that there was sufficient evidence to reject a null hy- pothesis if the probability of its rejection in error was less than 5%. The work by Fisher (3) initially described some standard levels (e.g. 1%, 5%, 10%, etc.) of area under the � 2 , t- and f-distributions, thereby making 5% of these distribu- tions widely accessible to researchers. While Fisher only intended percentages of these distributions to add support to inferences drawn from data, Neyman and Pearson argued that in order for research to be used to make decisions, 5% of these distributions was an acceptable � (cut-off point) (4). Since this time, accepting or rejecting a null hypothesis based on a 5% probability of error has become the norm. The sport science interpretation of Neyman and Pearson's work would be that sports coaches (i.e. research end-users) can only make informed decisions when told if an intervention works or does not work, whereas Fisher would argue that sports coaches should be the ones to decide what probability of error is unacceptably high for their athletes (4).
Article
Full-text available
Abstract. In addition to statistical validation of an intervention in the context of experimental and quasi-experimental designs for hypothesis testing, the practical relevance of an intervention plays a major role. Practical relevance is considered a measure of an experimental effect with respect to various practical issues. Cohen's effect size has become the standard for assessment. However, empirical studies show that effect sizes should not be interpreted statically, but rather dynamically. Furthermore, it seems that prior experience, the target group, the way questions are posed and the context of the study influence the outcome. In the future, in addition to statistical validation, greater consideration should be given to effect sizes to allow a qualitative assessment of a measure's practical relevance. However, the applicable study context and theoretical criteria of the respective research domains must be taken into account. Key Words: statistical validation, practical relevance, effect size, strength training.
Article
After 4 decades of severe criticism, the ritual of null hypothesis significance testing - mechanical dichotomous decisions around a sacred .05 criterion - still persists. This article reviews the problems with this practice, including its near-universal misinterpretation of p as the probability that H0s false, the misinterpretation that its complement is the probability of successful replication, and the mistaken assumption that if one rejects H0 one thereby affirms the theory that led to the test. Exploratory data analysis and the use of graphic methods, a steady improvement in and a movement toward standardization in measurement, an emphasis on estimating effect sizes using confidence intervals, and the informed use of available statistical methods is suggested. For generalization, psychologists must finally rely, as has been done in all the older sciences, on replication.
Article
Tables of alpha values as a function of sample size, effect size, and desired power were presented. The tables indicated expected alphas for small, medium, and large effect sizes given a variety of sample sizes. It was evident that sample sizes for most psychological studies are adequate for large effect sizes defined at .8. The typical alpha level of .05 and desired power of 90% can be achieved with 70 participants in two groups. It was perhaps doubtful if these ideal levels of alpha and power have generally been achieved for medium effect sizes in actual research, since 170 participants would be required. Small effect sizes have rarely been tested with an adequate number of participants or power. Implications were discussed.
Article
Effect size helps readers understand the magnitude of differences found, whereas statistical significance examines whether the findings are likely to be due to chance. Both are essential for readers to understand the full impact of your work. Report both in the Abstract and Results sections.