ArticlePDF Available

Abstract and Figures

Effect size measures are recognized as a necessary complement to statistical hypothesis testing because they provide important information that such tests alone cannot offer. In this paper we: a) briefly review the importance of effect size measures, b) describe some calculation algorithms for the case of the difference between two means, and c) provide a new and easy-to-use computer program to perform these calculations within ViSta “The Visual Statistics System”. A worked example is also provided to illustrate some practical issues concerning the interpretation and limits of effect size computation. The audience for this paper includes novice researchers as well as ViSta’s user interested on applying effect size measures.
Content may be subject to copyright.
TutorialsinQuantitativeMethodsforPsychology
2009,Vol.5(1),p.2534.
 
ComputingEffectSizeMeasureswith
ViSta‐TheVisualStatisticsSystem
RubénDanielLedesma,GuillermoMacbeth
CONICET/UniversidadNacionaldeMardelPlataCONICET/UniversidaddelSalvador,Argentina
NuriaCortadadeKohan
UniversidaddeBuenosAires,Argentina
Effectsizemeasuresarerecognizedasanecessarycomplementtostatisticalhypothesis
testingbecausetheyprovideimportantinformationthatsuchtestsalonecannotoffer.
Inthispaperwe:a)brieflyreviewtheimportanceofeffectsizemeasures,b)describe
somecalculationalgorithmsforthecaseofthedifferencebetweentwomeans,andc)
provideanewandeasytousecomputerprogramtoperformthesecalculationswithin
ViSta“TheVisualStatisticsSystem”.Aworkedexampleisalsoprovidedtoillustrate
somepracticalissuesconcerningtheinterpretationandlimitsofeffectsize
computation.TheaudienceforthispaperincludesnoviceresearchersaswellasViSta’s
userinterestedonapplyingeffectsizemeasures.
Inpsychologicalresearch,EffectSize(ES)measures
constituteanecessarycomplementtostatisticalsignificance
hypothesistesting(Thompson,1994,1998).Inthisworkwe:
(a)reviewtheimportanceofESmeasures;(b)describesome
calculationalgorithmsusedtoestimatethesemeasuresin
caseofadifferencebetweentwomeans;and(c)presentan
easytousecomputersoftwaretoperformthesecalculations
withintheViStastatisticalsystem.Itishopedthispaperwill
helpincreaseawarenessofthesemethodologiesand
facilitateaccesstotheITtoolsnecessaryfortheir
application.
EffectSizeMeasures
Inpsychologicalresearch,ESrepresentsawayto
measureorquantifytheeffectivenessofanintervention,
treatmentorprogram.EScanalsobedescribedasthe
RubénDanielLedesma,RíoNegro3922,MardelPlata
(7600),Argentina,rdledesma@gmail.com,tel:+54223
4752266.ASpanishtutorialforapreliminaryversionofthis
softwarehasbeenpublishedinLedesma,Macbeth&
CortadadeKohan(2008).
degreeoffalsityofthenullhypothesis(Descôteaux,2007).
Thisquantificationisrequiredfordeterminingsamplesizes
andtoachievecorrectstatisticaldecisions(WilsonVan
Voorhis&Morgan,2007).ToillustratetheimportanceofES,
wewillanalyzeanexampletakenfromMoore&McCabe
(1993),whichisavailableasadataarchiveintheViSta
examplesfolder.
Supposewewishtostudytheeffectofanewteaching
activityonthereadingskillsofstudents.Astudyusingtwo
groupsisundertaken.Thenewteachingactivityisapplied
withthesubjectsofonegroup(theexperimentalgroup),
whiletheconventionalteachingactivityisappliedwiththe
subjectsoftheothergroup(thecontrolgroup).Afterwards,
bothgroupsaregivenareadingtest,withthescoresofthe
readingtestconstitutingthedependentvariableYinthe
study.Table1showstheresultsoftheexperiment.
Inthiscase,theresultsofthettestshowsasignificant
differencebetweenthemeansofthegroups,leadingthe
researchertorejectthenullhypothesisthatpredictedequal
means,orthe“0”effectofthetreatment(thenewteaching
activity).But,whatisthemagnitudeoftheobserved
difference?Isthisdifferencesignificantinpracticalterms?
Towhatextentisthenewteachingactivitybetter?These
25
26
typesofquestionscanbeansweredapplyingESmeasures.
Itisworthnotingthatstatisticalsignificancedoesnot
necessarilyinformtheresearcherabouttheimportanceor
magnitudeoftheeffect.Theclassicalhypothesistesting
modelseekstodeterminewhetherornottorejectthe
hypothesisthatmaintainsthattheeffectisnonexistent
(FríasNavarro,Llobell&GarcíaPérez,2000;Gigerenzer,
1993).Therefore,ifthenullhypothesisisrejected,the
researchercanonlyconcludethattheeffectissignificantly
differentfrom“0”,which,forallpracticalmatters,isof
limitedusefulness(Krueger,2001).Furthermore,statistical
significanceisnotadirectindicatorofES,butrathera
functionalrelationbetweenthesamplesize,theESandthep
value(Descôteaux,2007).Forthisreason,aweakESmay
appearasstatisticallysignificantifthesamplesizeis
sufficientlylarge;and,conversely,aneffectiveintervention
maynotappearasstatisticallysignificantifthesamplesize
issmall(WilsonVanVoorhis&Morgan,2007).
Abetterindicatoroftheimpactofthenewteaching
activitycanbeobtainedthroughastandardizedmeasureof
thedifferencebetweenthemeansofthegroups.For
example,thefollowingmeasurecouldbeapplied(Cohen,
1969,1988,1994):
ec
YY
d
σ
=(1)
Inthisequation,e
Yandc
Yrepresentthemeansofthe
dependentvariableYoftheexperimentalandcontrol
groups,respectively,andσistheaveragestandard
deviationforbothgroups,that
is, 22
11.007 14.628 / 2 12.945+=.Inaccordancewiththe
exampleillustratedinTable1,weobtain:
51.476 39.545
12.945
d0.922==
Thisstandardizedmeasureofthedifferencebetweenthe
meansknownasCohen´sdconstitutesapossibleestimation
oftheES,andoffersvariouspracticaladvantages.First,itis
easiertoworkwith,sinceitcanbeinterpretedsimplyasaz
score.Itindicatesthedifferencebetweenthegroupsinunits
ofstandarddeviation.Forexample,ifd=1,thismeansthat
themeanoftheexperimentalgroupis1standarddeviation
awayfromthemeanofthecontrolgroup.Ifweconsiderdas
azscore,wecanalsoapplythetransformationtopercentiles
andobtainanalternativeinterpretation.Continuingwith
thesameexample,wecanstatethatthedistributionofthe
experimentalgroup’sscoresbettersthedistributionofthe
controlgroup’sscoresby82%,becausethatistheareaunder
thenormalcurvethatcorrespondstoazscore=.92.Another
importantadvantageofthisESmeasureisthatitprovidesa
commonmeasuringsticktocomparetherelativeimportance
ofinterventionsandprogramsacrossdifferentresearch
studies,e.g.inmetaanalyticalstudies(Anderson,1999).
Table1.Resultfromthehypotheticalexperimentfrom
Moore&McCabe(1993).
GroupnMeanS.D.
Experim e nt a l21 51.47611.00 7
Control2239.54514.628
t(41)=3.01, p<.01
SomeLimitationsontheUseofEffectSize
DespitetheadvantagesofESmeasures,manyauthors
havenotedthattheiruseislimitedinpractice(Coe,2002;
Descôteaux,2007;Frías‐Navarroetal.,2000;Alhija&Levy,
2008).Thisissoeventhoughsomeinstitutions,likethe
AmericanPsychologicalAssociation,haverecommended
andpromotedtheiruse(Thompson,1998).Similarly,many
publicationspresentlyrequireresearcherstoprovideES
measurestogetherwiththeirstatisticalsignificancetests
(Hunter&Schmidt,2004).ArecentreviewonESreporting
practicesin10educationalresearchjournalsintheyears
2003and2004foundnodifferencebetweenjournalsthat
requireESreportsandjournalsthathavenosuchpolicy
(Alhija&Levy,2008).AlthoughtheESestimateswere
similarlyreportedinboth,thediscrepanciesbetweenp
valuedrawnconclusionsandESdrawnconclusionswere
notoftendiscussed.Sun(2008)conductedacompressive
reviewonESreportingpracticesof1,243studiespublished
in14academicjournalsfrom2005to2007andfoundthat
49.1%ofthearticlesreportedESand56.7%ofthem
interpretedES.Theauthorconcludesthat“itisnecessaryfor
theacademicjournals,leadingscholars,andacademic
associationstocontinuetourgetheimprovementofeffect
sizereportingandinterpretingpractices”.Intherealworld
therearelikelyvariousexplanationsforwhyESmeasures
arenotcommonlyused.Historicalcircumstances
(Descôteaux,2007),thelackofESinthemostpopular
statisticalsoftwarepackagesandtheabsenceofthetopicin
coursesandmanuals(Coe,2002)explain,inpart,the
infrequentuseofthesemethodologies.
EffectSizeMeasures:TheDifferenceBetweenTwo
MeansCase
Toanalyzethemagnitudeoftheeffectinourexample
wecouldsimplycomparethemeanofthedependent
variableYintheexperimentalgrouptoitscounterpartinthe
controlgroupinordertodetermineifthereisadifference
(di)betweenthem(Equation2):
e
di Y Yc
=
(2)
27
Thedifference(di)betweenthemeansofbothgroups
generatedbyequation2isnotstableandhomogenous
becauseitdependsontheunitofmeasureofthedependent
variable.Thisrawdifference(di)ismuchtoounreliableto
provideanyusefulinformation,andsoitbehoovesthe
researchertostandardizeitinsomeway.Asweshallsee
below,therearevariouspossiblewaystoachievethis
objective.
TheMostCommonApproachesforEstimatingES
GlassʹsDelta
Thedifferencedibecomesmoreusefulifitischanged
intoazscorewhenitisstandardized.Onepossible
approachtostandardizethedifferenceisshowninequation
3,wherethedifferencebetweenthemeansisdividedbythe
standarddeviationofthecontrolgroup(Sc):
ec
c
YY
Delta S
=(3)
Thisformula,knownasGlassʹsDelta(Glass,McGaw&
Smith,1981),canbeusedasanestimatorofthepopulation
parameterΔofequation4:
ec
c
μ
μ
σ
Δ= (4)
Inequation4,thevaluese
μ
andc
μ
refertothepopulation
meansofthedependentvariableYintheexperimentaland
controlgroups,respectively.c
σ
referstothepopulation
standarddeviationofthecontrolgroup.Δisthepopulation
parameterthatisbeingestimatedthroughthecalculationof
thesamplestatisticinequation3.
Hedgesʹsg
GlassʹsDeltastandardizesthedifferencebetweenthe
groupsthroughthestandarddeviationofthecontrolgroup
Sc,asindicatedinEquation3.Nonetheless,thegross
differencebetweenthemeansdependsonthevarianceof
bothgroups.Forthisreason,GlassʹsDeltaisonlyslightly
affectedbydifferencesinvariabilitybetweenthe
experimentalandcontrolgroups.Thischaracteristiccan
generatebiasintheESestimationwhenthevariability
withineachgroupisdifferent.ThisiswhyHedgesproposed
changingthestandarddeviationoftheexperimentalgroup
Scforameasurebasedonthevariabilityofbothgroups
(Grissom&Kim,2005).Thispathprovidesapooled
standarddeviationSpbycombiningthedatafromthe
experimentalandcontrolgroupsinasinglemeasurethat
doesnotassumevariancehomogeneity.
ThepooledstandarddeviationintheHedges’Spformula
iscalculatedviaequation5:
22
(1)(1)
2
eec
p
ec
nSnS
Snn
−+
=+−
c
2
e
2
c
(5)
Spaccountsforboththeinternalvariabilityofeachgroup
(S,S)aswellasthesizeofeachgroup(ne,nc)when
estimatingES.ThismeasureislessbiasedthanGlass’sDelta
whennotassumingequalvariances.Theuseofthepooled
standarddeviationSptocalculateESwhencomparingtwo
independentgroupsisknownasHedges’g(Equation6):
ec
p
YY
gS
=(6)
Hedges’gisanestimationofthecorrespondingpopulation
G,indicatedinEquation7:
ec
G
μ
μ
σ
=(7)
BothGlass’sDeltaandHedges’ghaveapositivebias,
whichmeanstheyoverestimatetheES.Toadjustforthis
bias,Hedgesproposedagajust,whichiscalculatedusing
Equation8.
3
141
ajust
gg df
=−
(8)
Thegreaterthedegreesoffreedomdf,thelesserthe
adjustmentnecessarytoestimatealessbiasedES,ascanbe
deducedfromEquation8.
Cohen’sd
Cohen’sd(1988,1994)isoneofthemostwidelyused
measuresinspecializedpublicationstocalculateES,andin
metaanalyticalstudies(Anderson,1999;Hunter&Schmidt,
2004).Tocalculateit,seeEquation1.Cohen’sdcanalsobe
calculatedfromttestresults(Thalheimer&Cook,2002).For
example,knowingthetvalueandthesizeofeachgroup,the
equationwouldbe:
2
ec ec
ec e c
nn nn
dt nn n n
⎛⎞
++
=⎜⎟
⎜⎟
+−
⎝⎠
(9)
ThistypeofconversionisusefultocomputeESfrom
researchpapersthatonlyreportresultsbasedonttest.
ThealternativeusesofCohen’sd,Hedges’gorGlass’s
Deltadependonthepropertiesofthestandarddeviationof
thetwocomparedgroups.Itisassumedthatbothstandard
deviationsareestimatesofthesamepopulationvaluewhen
dandgarecalculated(Coe,2002).Whenthedifference
betweenbothdoesnotdependonlyonsamplingvariation,
thenthestandarddeviationofthecontrolgroupandthe
calculationofGlass’sDeltawouldbeabetterchoice.Inthis
case,thevariabilityofthegroupthatwasnotaffectedby
anyexperimentalmanipulationgivesamoreaccurate
approximationtothepopulationstandarddeviation.
OtherESMeasures
TheCLESStatistic
McGrawandWong(1992)proposeanothermethodto
estimatetheESwhencomparingtwomeansofindependent
28
samples:theCLES(CommonLanguageEffectSize)statistic.
Thisstatisticiseasiertointerpretthantheothers,giventhat
themagnitudeofthedifferenceisexpressedasaprobability.
Moreprecisely,theCLESstatisticestimatestheprobability
thatarandomlyselectedindividualfromtheexperimental
groupwillhaveahigherscorethanarandomlyselected
individualfromthecontrolgroup(ValeraEspín&Sánchez
Meca,1997).Tocalculateit,thezscorefromEquation10is
needed:
22
c
e
ec
YY
Z
SS
=+
(10)
Afterwards,ithastobefoundinthetypicalnormal
distribution,theprobabilityofavaluelessthantheone
obtainedinthepreviousequation.Intheproposedexample,
thiswouldbe:
22
51.47639.545 0.652
11.007 14.628
Z
=
+=
,andp(Z<0.652)=0.743
Thisresultiseasilyinterpreted,i.e.74.3%ofthetime,a
randomlypickedsubjectfromtheexperimentalgroupwill
haveavaluegreaterthanarandomlypickedsubjectfrom
thecontrolgroup.Further,thisconversionoftheEStoa
probabilitycouldalsobeappliedtootherstandardformsof
ESestimation,suchasCohen’sd,soastohaveamore
universalformofinterpretation.
dtorConversion
Anothermeasurethatisdirectandsimpletointerpretis
theconversionofCohen’sdtor.Thelatteristhebiserial
correlationbetweenanindependentbinaryvariableXanda
dependentnumericvariableY(Cohen,1988).Xhastwo
possiblevalues(forexample,1and0),dependingon
whetheritisassociatedwithaparticipantfromthe
experimentalgroup(X=1)orthecontrolgroup(X=0).The
estimationofESthroughtheuseofrhassomeadvantages
overthepreviouslymentionedestimations;mostnotably,it
ismucheasiertointerpret.Oneimportantadvantageofr
overdistheboundedconditionoftheformer.Cohen’sd
behaveslikeazscorebutrmovesbetween‐1and+1.This
propertyfacilitatestheinterpretationofrestimatesofES.
Cohen(1988)proposesEquation11toconvertdtor.
2(1 / )
d
r
dp
=+q
(11)
Thepandqvaluescorrespondtotheproportionof
subjectsbelongingtotheexperimentalandcontrolgroups,
respectively.Inourexample,thestandardizeddifference
betweenmeansisd=0.922.Inputtingthecorresponding
valuesinEquation11,wehave:
2
0.922
0.922 (1 / 0.488 0.512)
0.922 0.922 0.42
2.20
0.850 (1 / 0.249)
r=
==
+
Itcanbeobservedthatthegreaterthedvalue,the
greaterthebiserialcorrelationbetweenXandY.Also,the
greaterthediscrepancybetweenpandqwhichistosay,
betweenthesizesoftheexperimentalandcontrolgroups–
thegreaterthevalueinthedenominatorinEquation11,and
sothelesserthercorrelation.
Whenthesizeofthegroupsisidentical(ne=nc),the
valueoftheterm(1/pq)is4(1/(0.5x0.5)=1/0.25=4).For
thisreason,whentheexperimentalandcontrolgroupsare
thesamesize,Equation11canbesimplifiedas:
24
d
r
d
=+
(12)
Inthegivenexample,thedifferenceinsizebetweenthe
groupsisverysmall,andsoEquation12yieldsthesamer
value:
2
0.922 0.922 0.922 0.42
2.202
4.85
0.922 4
r====
+
Rosenthal&Rubin(1982)suggestanalternativeformto
presentandinterpretthedtorconversion,whichtheycall
thebinomialeffectsizedisplay(BESD).Withthismethod,ifthe
outcomevariableisalsoreducedtoadichotomousvariable,
rcanbeinterpretedsimplyasadifferencebetween
proportions(Randolph&Edmondson,2005).
ANonParametricMethod:Cliff’sDelta
Inalltheabovementionedcases,theESmeasureis
sensitivetoviolationsoftheassumptionofnormality.A
morerobustmeasureforthesecaseshasbeenproposedby
Cliff(1993).Hisapproachisdifferent,giventhatneither
meansnorstandarddeviationsareusedinthecalculation;
instead,whatisconsideredisessentiallytheordinalrather
thantheintervalpropertiesofthedata(Hess&Kromrey,
2004).Specifically,theCliff’sDeltastatisticisexpressedas:
=
12 12
#( ) #( )
ʹ
Cliff s Delta xx xx
>− <
=
12
nn
Wherex1andx2arescoreswithingroup1andgroup2,and
n1andn2arethesizesofthesamplegroups.Thisstatistic
estimatestheprobabilitythatavalueselectedfromoneof
thegroupsisgreaterthanavalueselectedfromtheother
group,minusthereverseprobability.Cliffunderstandsthat
thisisameasureofdominance,aconceptthatreferstothe
degreeofoverlapbetweentwodistributions.Aneffectsize
of1.0or‐1.0indicatestheabsenceofoverlapbetweenthe
twogroups,whereasa0.0indicatesthegroupdistributions
overlapcompletely.
(13)
Thismeasurecanbeusedwhenthedatadistribution
deviategreatlyfromthenormalmodel,orwhenthevariable
beingcomparedcorrespondstoanordinallevelof
measurement.ThenonparametricnatureofCliff’sDelta
reducestheinfluenceoffactorssuchasthegroups’variance
29
Table2.Interpretationsof effectsizes(takenfrom:Coe,2002 ).
EffectSize
Percenta geofcont r ol
groupwhowouldbe
belowaveragepersonin
experimentalgroup
Rankofpersonina
controlgroupof 25who
wouldbeequivalentto
theaveragepers onin
experimentalgroup
Probabilitythatyou
couldguesswhi ch
groupaper so nwasin
fromknowledgeoftheir
ʹscoreʹ
BESDCLES
0.0 50% 13 th  0.50 0.00 0.50
0.1 54% 12 th  0.52 0.05 0.53
0.2 58% 11 th  0.54 0.10 0.56
0.3 62% 10 th  0.56 0.15 0.58
0.4 66% 9th 0.58 0.20 0.61
0.5 69% 8th 0.60 0.24 0.64
0.6 73% 7th 0.62 0.29 0.66
0.7 76% 6th 0.64 0.33 0.69
0.8 79% 6th 0.66 0.37 0.71
0.9 82% 5th 0.67 0.41 0.74
1.0 84% 4th 0.69 0.45 0.76
1.2 88% 3rd 0.73 0.51 0.80
1.4 92% 2nd 0.76 0.57 0.84
1.6 95% 1st
 0.79 0.62 0.87
1.8 96% 1st
 0.82 0.67 0.90
2.0 98% 1st (or1st
outof44) 0.84 0.71 0.92
2.5 99% 1st
(or1stoutof160)  0.89 0.78 0.96
3.0 99.9% 1st
(or1stoutof740)  0.93 0.83 0.98
differencesorthepresenceofoutliers.
InterpretingEffectSize
ThissectionsummarizespossibleESinterpretations
accordingtoCoe(2002).Table2showsCoe’sdata
(reproducedherewiththeauthor’spermission).Column1
listspossibleESvaluesfromd,Deltaorg(theycanbe
interpretedlikezscores).Nextfollowthepercentile
conversion(column2)andtheequivalentchangeinrank
orderforagroupof25(column3).Forexample,foraneffect
sizeof0.9(approximatelythatoftheexampleatthe
beginningofthispaper),thevalueof82%indicatesthatthe
averagepersonintheexperimentalgroupwouldscore
higherthan82%ofthecontrolgroup.Ifthecontrolgroup
consistedof25participants,thiswouldbethesameas
sayingthatthepersonranked5thinthisgroupwouldbe
equivalenttotheaveragepersonintheexperimentalgroup.
ThefourthcolumnofTable2showsanotherwayof
describingtheoverlapbetweenthetwogroups.Itrefersto
theprobabilitythatonecouldguesswhichgroupaperson
camefrombasedsolelyontheirtestscore.Thisprobability
equals0.50ifbothgroupsoverlapcompletely,whichmeans
ESequalszero.Theprobabilityofguessingcorrectly
increasesastheESincreases.Inourexample,witha
differencebetweenthetwogroupsequivalenttoaneffect
sizecloseto0.90,theprobabilitywouldbe0.67.
Aspreviouslyindicated,ifthedependentvariableis
reducedtoavariablewithtwocategories,theBESDmethod
canbeusedtointerpretESasadifferenceintheproportions
ineachcategory.Inourexample,thisvalueis0.41,which
means20%ofthecontrolgroupand61%ofthetreatment
groupreachedsomethresholdofsuccess.Lastly,column6
showstheCLESstatistic,whichisinterpretedaspreviously
mentioned.
Besidesthesestatisticalcriteria,somepracticalrulesfor
interpretingEShavebeensuggested.Forexample,Cohen
(1988)describesanESvalueofapproximately0.2as“small”;
anESvalueof0.5as“medium”and“largeenoughtobe
visibletothenakedeye”;andanESvalueof0.8as“grossly
perceptibleandthereforelarge”.Nevertheless,thevalueof
thisruletoappliedresearchhasbeenquestioned(Glasset
al.,1981),sincethepracticalimportanceoftheeffectsize
30
alsodependsonothervariables,suchastheeffectivenessof
other,alternativetreatmentsandthecostbenefitanalysisof
thetreatment.
Tosummarize,EScannotbeinterpretedthesamewayin
allcases.Asingleeffectsizemeasurecanhavedifferent
practicalmeaningsdependingonthespecificproblembeing
evaluated.Forthisreason,ineachcase,relevanttheoretical
andpracticalaspectsshouldbeconsideredfortheproblem
beingstudied.Inaddition,whentheESestimatesandthep
valueinterpretationsleadtodifferentconclusions,
assumptionsaboutthefrequencydistributionsandstandard
deviationpropertiesshuldbecarefullyrevised(Alhija&
Levy,2008).
CalculatingESMeasuresinVista
DescriptionoftheEscalcModule
EscalcisamodulethatcanbeintegratedintotheViSta
(Young,1996)environment,andthatcanbeusedto
calculateESmeasuresfromrawdataorwithacalculatorby
inputtingthemeans,standarddeviationsandsamplesizes.
Inbothinstances,theViStastatisticalsystemisrequired.
ViStaisafree,expandablestatisticsprogramthatcanbe
usedasaplatformforthedevelopmentofnewmethodsor
toexpandthesystem’spreexistingmethods.ViStawas
createdbyProfessorForrestW.YoungattheL.L.Thurstone
PsychometricLaboratory(UniversityofNorthCarolina,
ChapelHill).Itisanopensourceprojectonwhichseveral
developerscollaborate.
Atamoretechnicallevel,wehaveutilizedXlispStat
(Tierney,1990)todeveloptheprogram’scalculation
functionsandgraphicuserinterface.XlispStatisthe
programminglanguageunderlyingtheViStasystem.
Readersinterestedinageneralreviewofthecapabilitiesand
functionalityofViStamayconsultMolinaIbañez,Ledesma,
ValeroMora&Young(2005).Amoredetailedreviewofthe
programmaybefoundinYoung,ValeroMora&Friendly
(2006).
Figure1.Loa dtheEScalcfilein toViSta usingtheLoad
Lispcommand.
ApplicationScreenshots
Figure1showshowtouploadtheEffectsizefunctions
intotheViStaenvironment(SeeAppendixformoredetails
onhowtodownloadandinstallViStaandESCalc).
Figure2.Pa rtia l screenshotofViStadatafile.
31
Figure3.Pa rtia l screenshotofstatisticalreportgeneratedinViSta
Figure2showsapartialscreenshotofViStawithdata
correspondingtotheabovementionedexample.Thistype
ofdatafilecanbecreatedinViStausingthedataeditoror,
also,byimportingthedataintextformat.
InViSta,anESestimateisperformedautomatically
whenatestforthedifferencebetweenthemeansoftwo
independentsamplesisapplied.Byitsnature,thisanalysis
onlyacceptsdataenteredwithabinaryindependent
variable—thetwogroupsbeingcompared—anda
quantitativedependentvariable,asthedataintheexample
show.Afterrunningthecommand,ViStaprovidesareport
ofresults,asshowninFigure3.Thisfigureshowsthereport
withthebasicstatisticalresultsforthemeancomparisonfor
theexample’sdata.Thefirstpartincludesdescriptive
information(groupsizes,means,standarddeviations,etc.),
whilethesecondpartshowsdifferentESestimates.Lastly,t
testandhomogeneityofvariancetestresultsaredisplayed.
CalculatingESfromMeans,SDsandSampleSizes
Theexamplewehavebeenusingfordemonstrative
purposeshasacompletedataset,butinsomecases,the
researchermaynothavethisinformation.Thiswouldbe
expected,forinstance,inmetaanalyticalresearch.Forsuch
instances,theEScalcmodulemakesitpossibletoperform
theanalysisbyusinganoptionthatonlyrequieressummary
data.ThisisavailablefromtheEScalcitemthatappearson
theViSta’smainmenu(Figure4).Byselecting“EffectSize
fromMeansandSDs”,adialogboxwillopen,promptingthe
usertoenterthedatarequiredtocalculateESmeasures(see
Figure5).AfterprovidingthedataandclickingOK,areport
oftheresultswillappear(seeFigure6).Ascanbeseenin
Figure4,theprogramalsohastheabilitytocalculate
Cohen’sdfromthettestvalue.Forthispurpose,the
Equation9conversionisused.
Figure4.FindingES calcinViSta’smainmenu.
32
Conclusion
Presently,thereisgeneralagreementinhighlightingthe
importanceofESasanecessarycomplementtohypothesis
testingmethodsandfordeterminingsamplesizes
(Descôteaux,2007;WilsonVanVoorhis&Morgan,2007).
Forthisreason,expertsandtheeditorialrequirementsof
specializedjournalsarestronglyencouragingtheuseof
thesetechniques.ESmeasuresallowforamoredirect
appreciationofthemagnitudeofthephenomenabeing
studied,andprovideawaytointerpretresultsmoreclearly.
Inthefieldofpsychologicalresearch,includingESinthe
analysisofdataallowsformoreinformeddecisionmaking
andmoreappropriateevidencebaseddecisions.
Additionally,thesemeasuresareakeyandnecessaryfactor
fortheintegrationofresultsinmetaanalyticalstudies
(Hunter&Schmidt,2004).
Despitethevalueofthesemethods,itisclearthattheuse
ofESisnotwidelypracticedineducationaland
psychologicalresearch(Alhija&Levy,2008;Sun,2008;
Dunleavy,Barr,Glenn&Miller,2006).Thiscanbeattributed
toalackofawarenessaboutthesetechniques,amongother
reasons.Manyfrequentlyusedappliedstatisticsmanualsdo
notincludethemintheirmaincontent,noraretheyoften
includedingraduateandpostgraduateeducational
programs.Similarly,themostpopularstatisticsprogramsdo
notalwayshaveanalysisoptionsforESmeasuresclearly
displayedintheirmenus.
Figure5.ESCalcdi alogbox.
Figure6.ESCal creportofresults.
Inthiscontext,thispaperaimstocontributetotheefforts
ofinstitutionssuchastheAPAtoraiseawarenessand
encouragetheuseofESinpsychologicalresearch.Forthis
purpose,wehaveherewithprovidedareviewforthecaseof
thedifferencebetweentwomeans,andpresentedasoftware
applicationfortheViStastatisticsprogramthatisfreeand
easytouse.Wehopethatthistoolmayhelpcomplementthe
applicationofhypothesistestingmehodsandinthisway
promotetheinclusionofESmeasuresinempiricalstudies.
Additionally,thankstoitssimplicity,webelievethatEScalc
canalsobeusefulasaneducationaltoolforteaching
statistics.PlannedexpansionsofEsCalcinclude:1)further
ViStadevelopmentstowardsESindexesforcategoricaldata,
2)confidenceintervalsforESmeasures,and3)statistical
dataanalysistoolsformetaanalyticalapplications.
Lastly,wewishtowarnaboutcertainproblemsthat
couldaffecttheuseandinterpretationofESmeasuresin
practice(Coe,2002).WiththeexceptionofCliff’sDelta,the
othermeasuresarebasedontheassumptionofnormal
distributionsandequalvariancesinthegroups.
Furthermore,theresultscouldbeaffectedwhenrepeated
measuresareused(Algina&Keselman,2003),thesample
hasrestrictedrange,thedistributionisskewed,oroutliers
33
arepresentinthedataset.Forthesereasons,werecommend
thattheuserexaminethedataforthesetypesofproblems
beforeapplyingparametricESmeasures.ViStaprovides
manyalternativegraphicresourcestodothis,including
dynamichistograms,normalprobabilityplots,boxplots,etc.
References
Algina,J.&Keselman,H.J.(2003).ApproximateConfidence
IntervalsforEffectSizes.EducationalandPsychological
Measurement,63,4,537553.
Alhija,F.N.&LevyA.(2008).EffectSizeReportingPractices
inPublishedArticles.EducationalandPsychological
Measurement(inpress).
Anderson,G.(1999).TheRoleofMetaAnalysisinthe
SignificanceTestControversy.EuropeanPsychologist,4(2),
7582.
Cliff,N.(1993).Dominancestatistics:OrdinalAnalysesto
AnswerOrdinalQuestions.PsychologicalBulletin,114,
494509.
Coe,R.(2002).ItʹstheEffectSize,Stupid.WhatEffectSizeis
andWhyitisImportant.PaperpresentedattheAnnual
ConferenceoftheBritishEducationalResearch
Association,UniversityofExeter,England,1214
September2002.
Cohen(1969).StatisticalPowerAnalysisfortheBehavioral
Sciences.SanDiego,CA:AcademicPress.
Cohen,J.(1988).StatisticalPowerAnalysisfortheBehavioral
Sciences.SecondEdition.Hillsdate,NJ:LEA.
Cohen,J.(1994).TheEarthIsRound(p<.05).American
Psychologist,49,9971003.
Cousineau,D.(2007).ComputingthePowerofatTest.
TutorialsinQuantitativeMethodsforPsychology,3,2,6062.
Descôteaux,J.(2007).StatisticalPower:AnHistorical
Introduction.TutorialsinQuantitativeMethodsfor
Psychology,3,2,2834.
Dunleavy,E.M.,Barr,C.D.,Glenn,D.M.,&Miller,K.M.
(2006).Effectsizereportinginappliedpsychology:How
arewedoing?.TheIndustrial‐OrganizationalPsychologist,
43,4,2937
FríasNavarro,M.D.Llobell,J.P&GarcíaPérez.J.F(2000)
TamañodelEfectodelTratamientoySignificación
Estadística.Psicothema,12,236240.
Gigerenzer,G.(1993).TheSuperego,theEgo,andtheIdin
StatisticalReasoning.EnG.Keren&C.Lewis(Eds.),A
HandbookforDataAnalysisintheBehavioralSciences:
MethodologicalIssues(pp.311339).Hillsdale,NJ:LEA.
Glass,G.V.,McGaw,B.&Smith,M.L.(1981).MetaAnalysis
inSocialResearch.ThousandOaks,CA:Sage.
Grissom,R.J.&Kim,J.J.(2005).EffectSizesforResearch.A
BroadPracticalApproach.Mahwah,NJ:LEA.
Hess,M.R.&Kromrey,J.D.(2004)RobustConfidence
IntervalsforEffectSizes:AComparativeStudyof
Cohen’sdandCliff’sDeltaUnderNonnormalityand
HeterogeneousVariances.Paperpresentedattheannual
meetingoftheAmericanEducationalResearch
Association,SanDiego,April1216,2004
Hunter,J.E.&Schmidt,F.L.(2004).MethodsofMetaAnalysis.
CorrectingErrorandBiasinResearchFindings.Second
Edition.ThousandOaks,CA:Sage.
Krueger,J.(2001).NullHypothesisSignificanceTesting.On
theSurvivalofaFlawedMethod.AmericanPsychologist,
56,1,1626.
Ledesma,R.,Macbeth,G.&CortadadeKohan,N.(2008).
TamañodelEfecto:RevisiónTeóricayAplicacionescon
elSistemaEstadísticoViSta.RevistaLatinoamericanade
Psicología,40,3,425439.
McGraw,K.&Wong,S.(1992).ACommonLanguageEffect
SizeStatistic.PsychologicalBulletin,111,361365.
MolinaIbañez,J.G.,Ledesma,R.,Valero Mora,P. &Yo u n g ,
F.W. (2005).AVideoTourthroughViSta6.4,aVisual
StatisticalSystembasedonLispStat.JournalofStatistical
Software,13,8,110.
Moore,D.S.&McCabe,G.P.(1993).Introductiontothe
PracticeofStatistics.SecondEdition.NewYork:W.H.
Freeman&Company.
Randolph,J.&Edmondson,R.S.(2005).UsingtheBinomial
EffectSizeDisplay(BESD)toPresenttheMagnitudeof
EffectSizestotheEvaluationAudience.Practical
AssessmentResearch&Evaluation,10,14.Availableonline:
http://pareonline.net/getvn.asp?v=10&n=14
Sun,S.(2008)AComprehensiveReviewofEffectSize
ReportingandInterpretingPracticesinAcademic
JournalsinEducationandPsychology.Published
masterʹsthesis,UniversityofCincinnati.Availableon
line:http://www.ohiolink.edu/etd/
Thalheimer,W.,&Cook,S.(2002,August).HowtoCalculate
EffectSizesFromPublishedResearchArticles:ASimplified
Methodology.Availableonline:http://work
learning.com/effect_sizes.htm.
Thompson,B.(1994).TheConceptofStatisticalSignificance
Testing.PracticalAssessment,Research&Evaluation,4,5.
Availableonline:
http://PAREonline.net/getvn.asp?v=4&n=5.
Thompson,B.(1998).StatisticalSignificanceandEffectSize
Reporting:PortraitofaPossibleFuture.Researchinthe
Schools,5,2,3338.
Tierney,L.(1990).LispStatAnObjectOrientedEnvironment
forStatisticalComputingandDynamicGraphics.NY:John
Wiley&Sons.
ValeraEspín,A.&SánchezMeca,J.(1997)Pruebasde
SignificaciónyMagnituddelEfecto:Reflexionesy
Propuestas.AnalesdePsicología,13,8590.
34
WilsonVanVoorhis,C.R.&Morgan,B.L.(2007).
UnderstandingPowerandRulesofThumbfor
DeterminingSampleSizes.TutorialsinQuantitative
MethodsforPsychology,3,2,4350.
Young,F.W.,ValeroMora,P.M.&Friendly,M.(2006).
VisualStatistic:SeeingDataWithDynamicInteractive
Graphics.Hoboken,NJ:JohnWiley&Sons.
Young,F.W.(1996).ViSta:TheVisualStatisticsSystem.UNC
L.L.ThurstonePsychometricLaboratory,Research
Memorandum941.
ManuscriptreceivedOctober2nd,2008
ManuscriptacceptedMarch11th,2009.
Apendix:InstallingViStaandEScalc
Followthestepsbelowtoinstalltheprogram:
Step1.DownloadandinstallViSta.Asmentionedearlier,EScalc
functionswhenintegratedintoViSta,andsoforittowork,ViStamust
firstbedownloadedandinstalled.ThelatestversionofViStais
availableatthefollowingURL:http://www.uv.es/visualstats/Book/.
Fromthiswebsite,onemaydownloadtheprogram’scompletecodeas
acompressedfolder.Simplydecompressthefolderandthenrunthe
applicationfileViSta.exetoopentheprogram.TheReadMeFirst.txt
fileprovidesabriefdescriptionofhowtoinstallViSta.
Step2.DownloadEScalc(EScalc.lsp).Downloadtheprogramfile
EScalc(EScalc.lsp),availableatthefollowingURL:
http://www.mdp.edu.ar/psicologia/vista/.
Step3.LoadEScalcintoViSta.Lastly,theusershouldopenViSta
andexecutethecommand“File/LoadLisp”fromthemainmenu(see
Figure1)toloadthefileEScalc.lsp.ThiswillinstallEScalcasaViSta
mainmenuoptionandaddtheESoptiontotheunivariateanalysis
command.
... Estimates of the effect size were made according to the method of Cohen (difference between means divided by the pooled standard deviation) [12], or if there was a substantial difference between the standard deviation of the change from baseline between antioxidant and placebo therapy, using the method of Glass (difference between the means divided by the standard deviation of the control group) [13,14]. Effect sizes of 0.3 or less are weak, those greater than 0.8 are strong, and those greater than 1.2 are very strong. ...
... The combination therapy of four antioxidants reduced SI by −1.7 m/s (placebo-corrected) with an effect size of 1.4 using the method of Glass [13], indicating a very strong effect in reducing PWV. The placebo-corrected percent fall in SI (−19% with 95% confidence limits of −7% and −31%) was impressive. ...
... The combination therapy of four antioxidants reduced SI by −1.7 m/s (placebocorrected) with an effect size of 1.4 using the method of Glass [13], indicating a very strong effect in reducing PWV. The placebo-corrected percent fall in SI (−19% with 95% confidence limits of −7% and −31%) was impressive. ...
Article
Full-text available
Antioxidants reduce arterial stiffness, but the effects previously reported are weak. A systematic review of the antioxidants vitamin E, vitamin C, vitamin A, and beta-carotenes (the most commonly studied antioxidants) on pulse wave velocity (PWV) found an effect size of only −0.20 (approximately −16 m/s or −2.5%). Studies in rats of the potent pro-oxidant substance acetaldehyde have shown that combinations of sulfur-containing antioxidants, including thiamine and l-cysteine, with ascorbic acid potently protect against oxidative-stress-mediated mortality. The effects of these combinations of oxidants on PWV have not been studied. The present study evaluated the effects of 2 weeks of therapy with a combination of sulfur-containing antioxidants (cysteine, thiamine, and pyridoxine) in combination with ascorbic acid on stiffness index (SI), a measure of arterial stiffness that is strongly correlated with PWV, using a Pulse Trace recorder in a diverse group of 78 volunteers. SI fell by −1.7 m/s relative to placebo (95% confidence intervals −0.6 to −2.7 m/s), a reduction of −19% (95% confidence intervals −9% to −31%). The Glass effect size was 1.4, indicating a very strong treatment effect which was substantially greater than the effect size found in previous studies of antioxidants. PWV reduction was correlated significantly with increasing age. Further studies of similar antioxidant combinations are required to determine whether they are of value in the treatment or prevention of cardiovascular disease.
... Given the limitations of p-values, effect sizes and confidence intervals were reported in addition to p-values (Ledesma et al., 2009;Smith, 2018). Effect size is a standardized, scale-free measure independent of sample size that indicates the magnitude of a quantity of interest and is useful for determining the practical significance of an observed difference (i.e., the meaningfulness of the difference) (Cohen, 1988;Ledesma et al., 2009). ...
... Given the limitations of p-values, effect sizes and confidence intervals were reported in addition to p-values (Ledesma et al., 2009;Smith, 2018). Effect size is a standardized, scale-free measure independent of sample size that indicates the magnitude of a quantity of interest and is useful for determining the practical significance of an observed difference (i.e., the meaningfulness of the difference) (Cohen, 1988;Ledesma et al., 2009). Cohen's d was used as the effect size measure for paired samples t-tests and Hedges' g, which expresses effect size in standard deviation units and is preferable to ...
Article
Objectives This study tests if femoral and humeral cross‐sectional geometry (CSG) and cross‐sectional properties (CSPs) in an ontogenetic series of wild‐caught chimpanzees ( Pan troglodytes ssp.) reflect locomotor behavior during development. The goal is to clarify the relationship between limb bone structure and locomotor behavior during ontogeny in Pan. Materials and Methods The latex cast method was used to reconstruct cross sections at the midshaft femur and mid‐distal humerus. Second moments of area (SMAs) ( I x , I y , I max , I min ), which are proportional to bending rigidity about a specified axis, and the polar SMA ( J ), which is proportional to average bending rigidity, were calculated at section locations. Cross‐sectional shape (CSS) was assessed from I x / I y and I max / I min ratios. Juvenile and adult subsamples were compared. Results Juveniles and adults have significantly greater femoral J compared to humeral J . Mean interlimb proportions of J are not significantly different between the groups. There is an overall decreasing trend in diaphyseal circularity between the juvenile phase of development and adulthood, although significant differences are only found in the humerus. Discussion Juvenile chimpanzee locomotion includes forelimb‐ and hindlimb‐biased behaviors. Juveniles and adults preferentially load their hindlimbs relative to their forelimbs. This may indicate similar locomotor behavior, although other explanations including a diversity of hindlimb‐biased locomotor behaviors in juveniles cannot be ruled out. Different ontogenetic trends in forelimb and hindlimb CSS are consistent with limb bone CSG reflecting functional adaptation, albeit the complex nature of bone functional adaptation requires cautious interpretations of skeletal functional morphology from biomechanical analyses.
... Specifcally, we classifed participants into two groups (YES/NO) based on their selection for the DC: "Would you like the chatbot to show empathy" ? We ran Wilcoxon Mann-Whitney test [14] for comparing two group means following by the efect size analysis using the Clif's delta [29,34], which is widely used to report the efect size of Mann-Whitney U test [29,34]. Clif's delta runs from -1 to 1. ...
... Specifcally, we classifed participants into two groups (YES/NO) based on their selection for the DC: "Would you like the chatbot to show empathy" ? We ran Wilcoxon Mann-Whitney test [14] for comparing two group means following by the efect size analysis using the Clif's delta [29,34], which is widely used to report the efect size of Mann-Whitney U test [29,34]. Clif's delta runs from -1 to 1. ...
... Thus, 2x2 factorial designs were employed to examine both individual and interaction effects of various factors on the response variable (Gutiérrez and De la Vara, 2008). Sample sizes for each treatment were determined based on Kirk's table (1995), using a power of 0.7 and a moderate effect size, which indicated that 25 articulated buses per treatment were required (Cohen, 1988;Ledesma, Macbeth and de Kohan, 2009;Morales, 2012). ...
... The raw mean difference, however, is not generally stable and homogeneous because it depends on the unit of measurement of the effect variable. Therefore, measures have been developed to quantify the effect size in a standardized manner (Ledesma et al. 2009). For treatment effects based on means, the most commonly used ones include Glass's Delta, Hedges's g, and Cohen's d. ...
Article
The gold standard in medical research to estimate the causal effect of a treatment is the Randomized Controlled Trial (RCT), but in many cases these are not feasible due to ethical, financial or practical issues. Observational studies are an alternative, but can easily lead to doubtful results, because of unbalanced selection bias and confounding. Moreover, RCTs often only apply to a specific subgroup and cannot readily be extrapolated. In response, we present Rod of Asclepius (RoA), a novel visual analytics method that integrates modern techniques designed for identification of causal effects and effect size estimation with subgroup analysis. The result is an interactive display designed to combine exploratory analysis with a robust set of techniques, including causal do-calculus, propensity score weighting, and effect estimation. It enables analysts to conduct observational studies in an exploratory, yet robust way. This is demonstrated by means of a use case involving patients undergoing surgery, for which we collaborated closely with clinical researchers.
... Differences between antioxidant and placebo treatments for change from baseline were analysed using Student's t test. Estimates of the effect size were made according to the method of Cohen (difference between means divided by the pooled standard deviation) (9), or if there was a substantial difference between the standard deviation of the change from baseline between antioxidant and placebo therapy, by the method of Glass (difference between the means divided by the standard deviation of the control group) (10). Effect sizes of 0.3 or less are weak, those greater than 0.8 are strong and those greater than 1.2 are very strong. ...
Preprint
Full-text available
Antioxidants reduce arterial stiffness but the effects previously reported are weak. The present study evaluated the effects of 2 weeks of therapy with a combination of antioxidants (l-cysteine, thiamine, pyridoxine and ascorbic acid) compared to placebo on stiffness index (SI) a measure of arterial stiffness that is strongly correlated with central pulse wave velocity (PWV), using a Pulse Trace recorder in a diverse group of 78 volunteers. SI fell by 1.7 m/sec relative to placebo (95% confidence intervals 0.6 to 2.7 m/sec), a reduction of 19% (95% confidence intervals 9% to 31%). The Glass effect size was 1.4, indicating a very strong treatment effect which was substantially greater than found in previous studies of the effects of antioxidant therapy on arterial stiffness. The magnitude of reduction in SI was positively correlated with increasing age (r = 0.362 P = 0.02). The change in RI, a measure of compliance of small to medium size arteries, did not differ between the antioxidant and placebo groups The combination of antioxidants studied reduced PWV to a much greater extent than reported for other antioxidants. This combination may be of value in the treatment of cardiovascular disease.
... 22 As extensively reported in the literature, and citing Yang and Dalton's SAS Global Forum 2012 paper, SMD "can be treated as equivalent to a Zscore of a standard normal distribution". [23][24][25][26][27][28][29][30][31] In fact, z-score is measured as: ...
Article
Full-text available
Topic. To provide standardized confidence limits of the transient PERG (tPERG) P50 and N95, and steady state PERG (ssPERG) amplitudes in normal controls as compared to Ocular Hypertension (OHT), Glaucoma Suspect (GS), or Early Manifest Glaucoma (EMG) eyes. Clinical relevance The identification of standardized confidence limits in the context of PERG might overcome the high intrinsic variability of the measure, and it might lead to a more intuitive understanding of the results as well as to an easier comparison of data from multiple tests, sites, and operators.
... Followers are willing to trust charismatic leaders since they are in need and believe the leader is highly skilled (Bedellet. al., 2006;Aaltio, 2000;Ledesma, Macbeth, & de Kohan, 2009). Further, supporters are now working together to accomplish the agreed-upon objectives. ...
Article
Full-text available
The goal of this study was to identify the model that best match with the work ethics of public elementary school teachers as estimated by charismatic leadership, supervisory relationship and reciprocity beliefs of school heads in Region XI, Philippines. Conducted from June 2021 to November 2022 using a correlation approach and path analysis, which study employed a quantitative, non-experimental research design. A stratified sampling approach was used to determine the 432 teachers of public elementary schools. Statistics methods employed included mean, Pearson r, and path analysis. Moreover, adapted survey questionnaires were used. The result reveals that the levels of charismatic leadership, supervisory relationship and reciprocity beliefs of schools and work ethics of teachers were very high. Further, when each independent variable correlates with work ethics of teachers, results showed that charismatic leadership was significantly correlated with work ethics. There was also a significant relationship between supervisory relationship and work ethics as well as between reciprocity beliefs and work ethics. Model 3 came out as the best fit model that predicts work ethics. The model showed that charismatic leadership and reciprocity beliefs predicts work ethics among public school teachers. Article visualizations: </p
Article
The aim of this study was to analyze sex differences in Psychological Well-being among college students. Additionally, it was proposed to evaluate the difference between college students and general population. Psychological Well-being is the result of an evaluative assessment by the subject with respect to its own life. This study worked with a sample of 654 students (457 women and 197 men) of the National University of Mar del Plata. It was applied the Spanish adaptation (Díaz el at., 2006) of Ryff's Psychological Well-being Scale. For the analyses of sex differences t student test and Mann-Whitney's U test were carried out. The results showed that women scored significantly higher than men in the scales of Purpose in Life, Positive Relations with others and Personal Growth; and men scored significantly higher in Autonomy with regard to women. However the effect sizes of the differences were small. Regarding educational level, a difference between college students and general population was observed.
Article
Effect sizes are important because they are an accessible way to indicate the practical importance of observed associations or differences. Standardized mean difference (SMD) effect sizes, such as Cohen's d, are widely used in education and the social sciences-in part because they are relatively easy to calculate. However, SMD effect sizes assume normally distributed data, whereas most data in these fields are ordinal and/or non-normal. In these situations, SMD effect sizes can be biased, and a non-parametric measure such as Cliff's delta (δ) is more appropriate. This paper provides a practical guide on how to calculate Cliff's δ. First, we present a conceptual overview and a worked example. Then we present two methods of calculating Cliff's δ: (1) a web-based Shiny application developed to accompany this paper (https://cliffdelta.shinyapps.io/calculator; suitable for all users), and (2) an R tutorial (suitable for R users). This is intended to provide researchers and practitioners with an appropriate and accessible effect size measure for non-normal data.
Chapter
Where do new ideas come from? What is social intelligence? Why do social scientists perform mindless statistical rituals? This vital book is about rethinking rationality as adaptive thinking: to understand how minds cope with their environments, both ecological and social. The author proposes and illustrates a bold new research program that investigates the psychology of rationality, introducing the concepts of ecological, bounded, and social rationality. His path-breaking collection takes research on thinking, social intelligence, creativity, and decision-making out of an ethereal world where the laws of logic and probability reign, and places it into our real world of human behavior and interaction. This book is accessibly written for general readers with an interest in psychology, cognitive science, economics, sociology, philosophy, artificial intelligence, and animal behavior. It also teaches a practical audience, such as physicians, AIDS counselors, and experts in criminal law, how to understand and communicate uncertainties and risks.
Article
After 4 decades of severe criticism, the ritual of null hypothesis significance testing - mechanical dichotomous decisions around a sacred .05 criterion - still persists. This article reviews the problems with this practice, including its near-universal misinterpretation of p as the probability that H0s false, the misinterpretation that its complement is the probability of successful replication, and the mistaken assumption that if one rejects H0 one thereby affirms the theory that led to the test. Exploratory data analysis and the use of graphic methods, a steady improvement in and a movement toward standardization in measurement, an emphasis on estimating effect sizes using confidence intervals, and the informed use of available statistical methods is suggested. For generalization, psychologists must finally rely, as has been done in all the older sciences, on replication.
Book
The goal of this book is to inform a broad readership about a variety of measures and estimators of effect sizes for research, their proper applications and interpretations, and their limitations. Its focus is on analyzing post-research results. The book provides an evenhanded account of controversial issues in the field, such as the role of significance testing. Consistent with the trend toward greater use of robust statistical methods, the book pays much attention to the statistical assumptions of the methods and to robust measures of effect size.
Article
Effect size (ES) reporting practices in a sample of 10 educational research journals are examined in this study. Five of these journals explicitly require reporting ES and the other 5 have no such policy. Data were obtained from 99 articles published in the years 2003 and 2004, in which 183 statistical analyses were conducted. Findings indicate no major differences between the two types of journals in terms of ES reporting practices. Different conclusions could be reached based on interpreting ES versus p values. The discrepancy between conclusions based on statistical versus practical significance is frequently not reported, not interpreted, and mostly not discussed or resolved.