MethodPDF Available

Estimating linear trends in single factor between subject designs

Authors:

Abstract and Figures

This is an introduction to linear trend analysis from an estimation perspective. The contents of this introduction is based on Maxwell, Delaney, and Kelley (2017) and Rosenthal, Rosnow, and Rubin (2000). I have taken the (invented) data from Haans (2018). For linear trend analysis, the relevant effect size is the slope coefficient of the linear trend, so, the purpose of the analysis is to estimate the value of the slope and the 95% confidence interval of the estimate. We will use contrast analysis to obtain the relevant data.
Content may be subject to copyright.
Estimating linear trends in single factor between
subject designs
Gerben Mulder
August 16, 2019
This is an introduction to linear trend analysis from an estimation perspective.
The contents of this introduction is based on Maxwell, Delaney, and Kelley (2017)
and Rosenthal, Rosnow, and Rubin (2000). I have taken the (invented) data
from Haans (2018). The estimation perspective to statistical analysis is aimed
at obtaining point and interval estimates of effect sizes. Here, I will use the
frequentist perspective of obtaining a point estimate and a 95% Confidence
Interval of the relevant effect size. For linear trend analysis, the relevant effect
size is the slope coefficient of the linear trend, so, the purpose of the analysis is
to estimate the value of the slope and the 95% confidence interval of the estimate.
We will use contrast analysis to obtain the relevant data.
The references cited above are clear about how to construct contrast coef-
ficients (lambda coefficients) for linear trends (and non-linear trends for that
matter) that can be used to perform a significance test for the null-hypothesis
that the slope equals zero. Maxwell, Delaney, and Kelley (2017) describe how
to obtain a confidence interval for the slope and make clear that to obtain
interpretable results from the software we use, we should consider how the linear
trend contrast values are scaled. That is, standard software (like SPSS) gives
us a point estimate and a confidence interval for the contrast estimate, but
depending on how the coefficients are scaled, these estimates are not necessarily
interpretable in terms of the slope of the linear trend, as I will make clear
momentarily.
So our goal of the data-analysis is to obtain a point and interval estimate of
the slope of the linear trend and the purpose of this contribution is to show how
to obtain output that is interpretable as such.
A linear trend
Let us have a look at an example of a linear trend to make clear what exactly
we are talking about here. To keeps things simple, we suppose the following
context. We have an experimental design with a single factor and a single
dependent variable. The factor we are considering is quantitive and its values are
equally spaced. This may (or may not) differ from the usual experiment,where
the independent variable is a qualitative, nominal variable. An example from
Haans (2018) is the variable location, which is the row in the lecture room where
students attending the lecture are seated. There are four rows and the distance
between the rows is equal. Row 1 is the row nearest to the lecturer, and row 4 is
1
the row with the largest distance between the student and the lecturer. We will
assign values 1 through 4 to the different rows.
We hypothesize that the distance between the student and the lecturer, where
distance is operationalized as the row where the student is seated, and mean
exam scores of the students in each row show a negative linear trend. The
purpose of the data-analysis is to estimate how large the (negative) slope of this
linear trend is. Let us first suppose that there is a perfect negative linear trend,
in the sense that each unit increase in the location variable is associated with a
unit decrease in the mean exam score. Let us suppose that means are 4, 3, 2
and 1, respectively. The negative linear trend is depicted in the following figure.
1.0 1.5 2.0 2.5 3.0 3.5 4.0
X
Y
1234
Figure 1: Negative linear trend with slope β1=1
The equation for this perfect linear relation between location and mean exam
score is
¯
Y
= 5 + (
1)
X
, that is, the slope of the negative trend equals
1. So,
suppose the pattern in our sample means follows this perfect negative trend, we
want our slope estimate to equal 1.
Now, following Maxwell, Delaney, and Kelley (2017), with equal sample sizes,
2
Location
Row 1 Row 2 Row 3 Row 4
5 7 5 1
6 9 4 3
7 8 6 4
8 5 7 2
9 6 8 0
¯
Y1= 7 ¯
Y2= 7 ¯
Y3= 6 ¯
Y4= 2
Table 1: The data provided by Haans (2018)
the estimated slope of the linear trend is equal to
ˆ
β1=ˆ
ψlinear
Pk
j=1 λ2
j
,(1)
where the lambda weight
λj
=
Xj¯
X
. For the intercept of the linear trend
equation we have
ˆ
β0=¯
Yˆ
β1¯
X. (2)
Since the mean of the X values equals 2.5, we have lambda weights
Λ
=
[
1
.
5
,
0
.
5
,
0
.
5
,
1
.
5]. The value of the linear contrast equals
1
.
5
4 +
0
.
5
3 + 0
.
5
2+1
.
5
1 =
5, the sum of the squared lambda weights equals 5, so
the slope estimate equals ˆ
β1=5
5=1, as it should.
The importance of scaling becomes clear if we use the standard recommended
lambda weights for estimating the negative linear trend. These standard weights
are
Λ
= [
3
,
1
,
1
,
3] Using those weights leads to a contrast estimate of
10,
and, since the sum of the squared weights equals 20, to a slope estimate of
0
.
50,
which is half the value we are looking for. For significance tests of the linear
trend, this difference in results doesn’t matter, but for the interpretation of the
slope it clearly does. Since getting the “correct” value for the slope estimate
requires an additional calculation (albeit a very elementary one), I recommend
sticking to setting the lambda weight to λj=Xj¯
X.
Estimating the slope
Let us apply this to the imaginary data provided by Haans (2018). The data
are reproduced in Table 1.
The groups means of all of the four rows are
¯
Y
= [7
,
7
,
6
,
2]. The lambda
weights are
Λ
= [
1
.
5
,
0
.
5
,
0
.
5
,
1
.
5]. The value of the contrast estimate equals
ˆ
ψlinear
=
8, the sum of the squared lambda weights equals
Pk
j=1 λ2
j
= 5, so
the estimated slope equals
1
.
6. The equation for the linear trend is therefore
ˆµj
= 9
.
5
1
.
6
Xj
. Figure 2 displays the obtained means and the estimated means
based on the linear trend.
Note that the slope estimate can also be obtained from
ralerting
, the product
moment correlation between the group means and the contrast weights. Let
3
us define the effect
ˆαj
= (
¯
Yj¯
Y
), and the lambda weights as before, then
ˆ
β1=ralerting sPk
j=1
ˆα2
j
Pk
j=1 λ2
j
.
2468
Location
Mean exam score
1234
Figure 2: Obtained group means and estimated group means based on the linear
trend
Obtaining the slope estimate with SPSS
If we estimate the linear trend contrast with SPSS, we will get a point estimate
of the contrast value and a 95% confidence interval estimate. For instance, if we
use the lambda weights
Λ
= [
1
.
5
,
0
.
5
,
0
.
5
,
1
.
5] and the following syntax, we
get the output presented in Figure 3.
UNIANOVA score BY row
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/CRITERIA=ALPHA(0.05)
/LMATRIX = "Negative linear trend?"
4
row -1.5 -0.5 0.5 1.5 intercept 0
/DESIGN=row.
Figure 3: SPSS output
Figure 3 makes it clear that the 95% CI is of the linear trend contrast estimate,
and not of the slope. But it is easy to obtain a confidence interval for the slope
estimate by using (1) on the limits of the CI of the contrast estimate. Since the
sum of the squared lambda weights equals 5.0, the confidence interval for the
slope estimate is 95% CI [
11
.
352
/
5
,
4
.
648
/
5] = [
2
.
27
,
0
.
93]. Alternatively,
divide the lambda weights by the sum of the squared lambda weights and use
the results in the specification of the L-matrix in SPSS:
UNIANOVA score BY row
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/CRITERIA=ALPHA(0.05)
/LMATRIX = "Negative linear trend?"
row -1.5/5 -0.5/5 0.5/5 1.5/5 intercept 0
/DESIGN=row.
Using the syntax above leads to the results presented in Figure 4.
Obtaining the estimation results in R
The following R-code accomplishes the same goals.
# load the dataset
load('~\\betweenData.RData')
5
Figure 4: SPSS output for adjusted lambda weights
# load the functions of the emmeans package
library(emmeans)
# set options for the emmeans package to get
# only confidence intervals set infer=c(TRUE, TRUE)for both
# CI and p-value
emm_options(contrast =list(infer=c(TRUE,FALSE)))
# specify the contrast (note divide
# by sum of squared contrast weights)
myContrast =c(-1.5,-0.5,0.5,1.5)/5
# fit the model (this assumes the data are
# available in the workspace)
theMod =lm(examscore ~location)
# get expected marginal means
theMeans =emmeans(theMod, "location")
contrast(theMeans, list("Slope"=myContrast))
## contrast estimate SE df lower.CL upper.CL
## Slope -1.6 0.3162278 16 -2.270373 -0.9296271
##
6
## Confidence level used: 0.95
Interpreting the result
The estimate of the slope of the linear trend equals
ˆ
β1
=
1
.
60, 95% CI
[
2
.
27
,
0
.
93]. This means that with each increase in row number (from a given
row to a location one row further away from the lecturer) the estimated exam
score will on average decrease by
1
.
6points, but any value between
2
.
27
and
0
.
93 is considered to be a relatively plausible candidate value, with 95%
confidence. (Of course, we should probably not extrapolate beyond the rows
that were actually observed, otherwise students seated behind the lecturer will
be expected to have a higher population mean than students seated in front of
the lecturer).
In order to aid interpretation one may convert these numbers to a standardized
version (resulting in the standardized confidence interval of the slope estimate)
and use rules-of-thumb for interpretation. The square root of the within condition
variance may be a suitable standardizer. The value of this standardizer is
SW
= 1
.
58 (I obtained the value of
MSwithin
= 2
.
5from the SPSS ANOVA table).
The standardized estimates are therefore
1
.
0, 95% CI [
1
.
43
,
0
.
59], suggesting
that the negative effect of moving one row further from the lecturer is somewhere
between medium and very large, with the point estimate corresponding to a
large negative effect.
References
Haans, Antal (2018). Contrast Analysis: A Tutorial. Practi-
cal Assessment, Research, & Education, 23(9). Available online:
http://pareonline.net/getvn.asp?v=23&n=9
Maxwell, S.E., Delaney, H. D., & Kelley, K. (2017). Designing Experiments
and Analyzing Data. A Model Comparison Perspective. (Third Edition). New
York/ London: Routledge.
Rosenthal, R., Rosnow, R.L., & Rubin, D.B. (1993). Contrasts and Effect Sizes
in Behavioral Research. A Correlational Approach. Cambridge, UK: Cambridge
University Press.
7
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Contrast analysis is a relatively simple but effective statistical method for testing theoretical predictions about differences between group means against the empirical data. Despite its advantages, contrast analysis is hardly used to date, perhaps because it is not implemented in a convenient manner in many statistical software packages. This tutorial demonstrates how to conduct contrast analysis through the specification of the so-called L (the contrast or test matrix) and M matrix (the transformation matrix) as implemented in many statistical software packages, including SPSS and Stata. Through a series of carefully chosen examples, the main principles and applications of contrast analysis are explained for data obtained with between- and within-subject designs, and for designs that involve a combination of between- and within-subject factors (i.e., mixed designs). SPSS and Stata syntaxes as well as simple manual calculations are provided for both significance testing and contrast-relevant effect sizes (e.g., η² alerting). Taken together, the reader is provided with a comprehensive and powerful toolbox to test all kinds of theoretical predictions about cell means against the empirical data.