ArticlePDF Available

Abstract

Suppose that n hypotheses H1, H2,..., Hn with associated test statistics T1, T2 ..., Tn are to be tested by a procedure with experimentwise significance level (the probability of rejecting one or more true hypotheses) smaller than or equal to some specified value α. A commonly used procedure satisfying this condition is the Bonferroni (B) procedure, which consists of rejecting Hi, for any i, iff the associated test statistic Ti is significant at the level α' = α/n. Holm (1979) introduced a modified Bonferroni procedure with greater power than the B procedure. Under Holm's sequentially rejective Bonferroni (SRB) procedure, if any hypothesis is rejected at the level α' = α/n, the denominator of α' for the next test is n - 1, and the criterion continues to be modified in a stagewise manner, with the denominator of α' reduced by 1 each time a hypothesis is rejected, so that tests can be conducted at successively higher significance levels. Holm proved that the experimentwise significance level of the SRB procedure is ≤α, as is that of the original B procedure. Often, the hypotheses being tested are logically interrelated so that not all combinations of true and false hypotheses are possible. As a simple example of such a situation suppose, given samples from three distributions, we want to test the three hypotheses of pairwise equality: $\mu_i = \mu'_i (i
Modified Sequentially Rejective Multiple Test Procedures
Author(s): Juliet Popper Shaffer
Source:
Journal of the American Statistical Association,
Vol. 81, No. 395 (Sep., 1986), pp. 826-
831
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2289016
Accessed: 06/07/2010 05:55
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=astata.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.
http://www.jstor.org
Modified
Sequentially
Rejective
Multiple
Test
Procedures
JULIET
POPPER SHAFFER*
Suppose that
n
hypotheses
Hl,
H2,.
.
.
,
Hn
with associated
test
statistics
Tl,
T2,. .
.
,
T,
are
to be
tested
by a procedure
with experimentwise
significance
level
(the probability
of
rejecting
one or
more
true
hypotheses)
smaller
than
or
equal
to some
specified
value
a. A
commonly
used
pro-
cedure satisfying
this
condition
is the
Bonferroni
(B)
pro-
cedure,
which
consists
of rejecting
H, for
any i,
iff
the
associated
test statistic
Ti
is significant
at
the
level
a'
=
aln.
Holm
(1979)
introduced
a modified
Bonferroni
pro-
cedure
with greater
power
than
the
B
procedure.
Under
Holm's
sequentially
rejective
Bonferroni
(SRB)
proce-
dure,
if
any hypothesis
is rejected
at
the level
a' =
a/n,
the denominator
of a'
for
the
next
test
is
n
-
1,
and
the
criterion continues
to be modified
in a stagewise
manner,
with the denominator
of
a'
reduced
by
1
each time
a hy-
pothesis
is rejected,
so that tests
can be
conducted at suc-
cessively
higher
significance
levels.
Holm
proved
that
the
experimentwise
significance
level
of
the SRB
procedure
is
<a,
as is
that
of the
original
B procedure.
Often,
the
hypotheses
being
tested
are
logically
interrelated
so that
not
all combinations
of true
and false
hypotheses
are
pos-
sible.
As
a
simple
example
of
such
a
situation
suppose,
given
samples
from
three
distributions,
we want
to test
the
three hypotheses
of
pairwise
equality:
pi
=
,u'
(i
<
i'
1,
2, 3),
where
,ai
is the mean
of
distribution
i. It is easily
seen
from
the
relations among
the
hypotheses
that if
any
one of them is false,
at least
one other
must
be false.
Thus
there
cannot
be
one false and
two
true
hypotheses
among
these
three.
If
we are
testing
all
hypotheses
of
pairwise
equality
with more than
three
distributions,
there
are
many
such
constraints.
As another
example,
consider
the
hy-
potheses
of
independence
of rows and
columns
of all
2
x
2
subtables
of
a K
x
L
contingency
table.
It is shown
that
if
one such hypothesis
is false,
then at
least
(K
-
1)(L
-
1)
must be false. When
there are
logical
implications
among
the
hypotheses
and
alternatives,
as
in the
preceding
ex-
amples,
Holm's SRB
procedure
can
be
improved
to
obtain
a further
increase
in
power.
This
article considers
methods
for
achieving
such
improvement.
One
way
of
modifying
the SRB
method
is
as
follows: Given
that
j
-
1
hypotheses
have been
rejected,
the
denominator
of
a',
instead of
being
set at n
-
j
+
1 for the next
test as in the SRB
procedure,
can be set
at
tj,
where
tj
equals
the maximum
number of
hypotheses
that could be
true, given
that at
least
j
-
1
hypotheses
are false.
Obviously,
t,
is never
greater
than
n
-
j
+
1,
and
for
some
values
of
j
it
may
be strictly
smaller,
as for
j
=
2
in the
first
example.
Then this
modified
se-
quentially
rejective
Bonferroni
(MSRB)
procedure
will'never
be less
powerful
(and
typically
will be more powerful)
than
the
SRB
procedure
while
(as
is
proved
in
the
article)
main-
*
Juliet
Popper
Shaffer
is
Senior
Lecturer,
Department
of Statistics,
University
of
California, Berkeley,
CA 94720.
taining
an
experimentwise
significance
level
-<a. The
MSRB
procedure
is readily
applicable
to
a
wide variety
of
standard
and
nonstandard
problems.
A
number
of
examples
are
given,
and extensions
and
generalizations
are
discussed.
It
is pointed
out that
the methods may
be
adapted
in
some
circumstances
to the use
of
non-Bonferroni
multiple
test
procedures.
KEY WORDS:
Multiple
comparisons;
Simultaneous
in-
ference;
Bonferroni
tests;
Stagewise
multiple
tests;
Power;
Experimentwise
error
rate.
1.
INTRODUCTION
Suppose
that n
hypotheses
Hl,
H2,
.
. .,
Hn
with asso-
ciated
test statistics
T1,
T2,
. . .
,
Tn
are to be tested by a
procedure
with an
experimentwise
significance
level smaller
than
or
equal
to some specified
value a, where
the exper-
imentwise
significance
level
is
defined
as the
supremum
(over
all
joint
distributions
F of
the
Ti
that are
possible
under
the assumed
model)
of
the
probability
of
rejecting
one or more true hypotheses.
A
commonly
used procedure
satisfying
this condition
is the Bonferroni
(B)
procedure,
based
on the
simple
Bonferroni inequality.
The B
proce-
dure consists
of
rejecting
Hi,
for
any
i,
if and
only
if the
significance
probability
of
Ti-that
is,
PrH,(Ti
?
ti)-is
<
aln,
where
ti
is the observed
value
of
Ti
and the
Ti
are
defined
so that
large
values
lead to
rejection.
Holm
(1977,
1979)
introduced
a
class
of
sequentially
rejective
multiple
test methods
that includes
a modified
Bonferroni
procedure
with
greater
power
than
the
B
pro-
cedure.
Holm's
sequentially
rejective
Bonferroni
(SRB)
procedure
modifies the
criterion
in a
stagewise
manner,
as
follows: Let
Yi
=
PrH,(Ti
-
ti),
let
{Y(i)}
be the
order sta-
tistics of the
Y1,
Y(1)
c
s'
c
Y(n),
and let
H(i)
be
the
hy-
pothesis
with
test
statistic
Y(i),
i
=
1,
. . .
,
n. Then
H(1)
is
rejected
iff
Y(1)
c a/n;
given
that
H(1)
is
rejected,
H(2)
is
rejected
iff
Y(2)
?
aI(n
-
1);
.
.
;
given
that
H(,-1)
is
rejected,
H(,)
is
rejected
iff
Y(j)
a
a/(n
-
j
+
1);
and so
forth.
Acceptance
of
H(k)
implies
acceptance
of
H(1)
for all
1 >
k,
1
c
j,
k,
1
c
n. Holm proved
that the
experimentwise
significance
level
of
the
SRB
procedure
is
a,
the
same as
that
of
the
original
B
procedure.
It will be assumed
that
no
hypothesis
in
the set
is equiv-
alent
to the intersection
of
any
of
the
others-that
is,
the
hypotheses
are minimal
(Gabriel
1969).
A decision
on
any
intersection
hypothesis
of interest
is made
by rejecting
it
iff at least one
of the
hypotheses
H1,
H2,.
. .
,
Hn
included
in the intersection is rejected;
clearly,
these decisions
can
be added
to
the decisions
with
respect
to
Hl,
H2,..
.
Hn
?
1986
American Statistical
Association
Journal of the
American
Statistical
Association
September
1986,
Vol.
81,
No.
395,
Theory
and Methods
826
Shaffer:
Modified
Sequentially
Rejective
Procedures
827
without changing the
experimentwise significance level of
the total procedure.
A procedure
3
will be
called uniformly more powerful
than another
procedure
6* for testing a specific set S of
hypotheses if the
probability of rejecting each false hy-
pothesis in S under
3
is
greater than or equal to the prob-
ability
of
rejecting it under
3*, for all joint distributions
of
the
T,
that are possible
under the assumed model, with
strict inequality for at least
one false hypothesis in S under
some
distribution. The SRB
procedure is obviously uni-
formly more powerful than
the B procedure for H1,
H2,
.
. .
,
H,
and
their
intersections;
in fact it has the
stronger
property of always
rejecting hypotheses that are rejected
under
B
and sometimes
rejecting additional ones.
Let I
=
{il,
i2, .
.
,
iJ}
be the set of indexes of the
hypotheses that are true in
any particular application. In
Holm's terminology, the
situation is one of "free combi-
nations" if the set
{H,:
i E
I} can be any subset of the n
hypotheses.
If
these conditions
are
not
satisfied, Holm's
procedure remains
valid,
but it
is possible to improve it to
obtain
a
further increase
in
power.
The
purpose of this
article is to show
how the
improvement
can be
achieved
and
to
illustrate its extent in a
number
of
different appli-
cations.
2. A
MODIFIED SEQUENTIALLY REJECTIVE
BONFERRONI PROCEDURE
The SRB
procedure
described
in
Section
1
can
be mod-
ified
in
the following way:
At stage
j,
instead
of
rejecting
H(1)
if
Y(1) -c al(n
-
j
+
1), reject
H(,)
if
Y(1) c alt1,
where
t,
equals
the
maximum number
of
possibly
true
hy-
potheses, given
that
at least
j
-
1
hypotheses
are
false.
When
there
are
relationships
of
logical implication among
the
hypotheses,
usually
the
number m
of
true
hypotheses
cannot
take
on
certain
values between
0
and
n,
since the
falsity
of
j
-
1
hypotheses
implies
the
falsity
of
some
ad-
ditional
hypotheses
for
some values
of
j,
as
will
be
illus-
trated
in
Section
3. For those
values of
j,
t,
will
be
strictly
less
than n
-
j
+
1,
and since
t,
is
obviously
never
greater
than
n
-
j
+
1,
the modified SRB
(MSRB) procedure
will be at
least
as
powerful
as the SRB
procedure,
and in
most
applications
with restricted
combinations
it
will
be
uniformly
more
powerful.
Given
some
specific
application,
let
A
=
{a,:
i
=
1,
.
r}
be the set
of
possible
numbers of true
hypotheses,
0
<
a1
<
a2
<
<
ar
c n,
and
let
J
be
the associated
set of
possible
values
of
t,.
Then
either
J
=
A
or,
more
typically,
J is
the
set
of
all
nonzero values
of
A, since
t1
=
max{,j?n
i+}
a,
for all stages j.
That the
MSRB
procedure has experimentwise signifi-
cance
level
-<a
follows directly
from Holm's
(1979) proof
for the
SRB procedure.
The
basic idea
behind Holm's
proof
is that
if
m hypotheses are true,
an error
must
occur at or
before stage n
-
m
+
1. Therefore, Pr(no errors)
-
Pr(Y,
>
a/m
for
all i E I)
=
1
-
Pr(Y,
<
a/m
for
some
i
E I)
?-1
-
1E,I
a/m
=
1
-
a.
In
Section
3
some
applications
in which the MSRB procedure may be considerably more
powerful than the SRB procedure are presented, in Section
4
possible further modifications to achieve still greater power
are given, and in Section 5 specific illustrations of the use
of
the MSRB are provided.
3. APPLICATIONS
3.1
Comparisons Among
k
Distributions
Consider a class
of
distributions
G
E
.9) and a function
f defined over
9
and
taking
on
at least
k distinct
values.
Let
G1,
G2,
. . ,
Gk
be k
unknown distributions
in
X,
and
consider
the
k(k
-
1)12 hypotheses
f(G,)
=
f(G'),
i
<
i'.
(3.1)
The
family may
or
may
not restrict the
distributions
to
some specified form, such as normal;
the function
may be
real-valued,
such
as
the mean or
variance,
or,
at the
other
extreme,
f(G)
may equal
G. Given
any
set
of
distributions,
they
will be said to be
homogeneous
or
different
according
to
whether
or not
their values of
f
are
equal.
The
possible
numbers
of
true
hypotheses
can be
determined from the
properties
of
equivalence relationships,
as illustrated
in
Table
1 for
k
=
4, in
which
case
the
number
of
hypotheses
n
=
6.
By considering
all
possible configurations
of
true
and
false
hypotheses,
as
in
Table
1,
we
see,
for
example,
that all six
hypotheses may
be
true,
but that
if
any hy-
pothesis
is
false,
at
least
three
must
be
false,
since
if
any
two
distributions differ, at least one
of
these
must differ
from the
remaining
ones. As
shown,
it
is
also
possible
to
have
2, 1,
or
0
true
hypotheses,
so
A
=
{O, 1, 2, 3,
6}
in
this
case.
The
possible
numbers
of
true
hypotheses,
and thus the
values of
t1,
for 3
?
k
-
10 are
given
in
Table
2.
Values
for k
>
10
can be obtained
from the
recursion
formula
k
S(k)
=
U
{(2)
+
x:
X S(k - j)}
(3.2)
]=1
where
S(k)
is
the set
of
possible
numbers
of true
hy-
Table 1. Determining Possible Numbers of
True Hypotheses for the Application in
Section 3.1, I/lustrated for k
=
4
Number of
Number of Maximum number
of
Partitions of 4
populations Representation
true hypotheses
false hypotheses true
hypotheses
1.
[(1, 2, 3, 4)]
(4)
(2)
=
6
0 6
(partition
1)
2.
[(1)(234)],
[(2)(134)], etc. (3)(1)
(2)
3
1-3
3
(partition
2)
3.
[(12)(34)],
[(13)(24)], etc. (2)(2)
(2)
+
(2)
=
2
4 2
(partition
3)
4.
[(12)(3)(4)], etc.
(2)(1)(1)
()
=
1
5 1
(partition
4)
5.
[(1)(2)(3)(4)]
(1)(1)(1)(1)
0
6 0
(partition
5)
General
(kl)(k2)
(kt)
7k,=2
(")
or
0
if all
k, -
1
828
Journal
of
the
American
Statistical
Association,
September
1986
Table
2.
Possible
Numbers
of
True
Hypotheses
for
the
Application
in
Section
3.1,
With
k
Distributions
(3
s
k
s
10)
Number of
Number of
distributions
hypotheses
Possible
numbers
(k)
(n)
of
true
hypotheses
3
3
0,
1,3
4
6
0-3,
6
5
10
0-4,
6,
10
6
15
0-4,
6,
7,
10, 15
7
21
0-7,9,
10,
11,
15,21
8
28
0-13,
15,
16,
21,
28
9
36
0-13,
15,
16,
18,
21,
22,
28,
36
10
45
0-18,
20,
21,
22,
24,
28,
29,
36,
45
NOTE:
To
use the
MSRB
procedure for
the
application
in
Section
3.4,
determine the
set A
=
{ai}
of
possible
numbers of true
hypotheses
corresponding
to the
relevant
values of K
and
L.
Then,
at
stage
j,
for
i
=
1,
,
(2K) .
(L)
test the
hypothesis
H(/)
at
significance
level
alt1,
where
ti
=
maX{a,n-j1+1}
ai.
potheses
with
k
distributions,
k
2
2,
and
S(O)
=
S(1)
=
{O}.
By
testing
intersections of
these
pairwise
hypotheses
as
described in
Section
1,
tests
of
all
of
the
2k -
k
-
1
hypotheses of
subset
homogeneity
of
the
Gi
can
be ob-
tained.
Formula
(3.2) can
be
proved
by
induction.
It
obviously
holds
for
k
=
2.
Assuming
it
holds
for
k
-
1
distributions,
when a
new
distribution
is
added to
those k
-
1,
it will
be
one
of
a
set
of
j
homogeneous
distributions for
some
j
E
{1,
2,
. .
.
,
k}
and
the
other k
-
j
distributions will
be
different
from
those.
Therefore, the set of
possible num-
bers of
true
hypotheses
(3.1),
given
j,
is
{(2)
+
x:
x
E
S(k
-
j)},
and
S(k)
=
the union
of
these
sets
over
j
E
{1,
2,
*
,
k}.
Of
course,
many
other
methods
have
been
proposed
for
this
situation
[see,
e.g., Einot
and
Gabriel
(1975),
and
note
the
modifications
described in
Sec. 4
here and
other
pos-
sibilities
indicated in
Sec.
5].
Some
detailed
comparisons
with
other
approaches can
be
found in
Shaffer
(1984),
where
it
is
shown
that
the
method
described
here
is
competitive
with
other
methods in
general
use.
3.2
Comparisons
Within
Several
Sets
of
Distributions
Let
fi
be a
function,
defined
over a
class of
distributions
Gi
E
gi,
which
takes on
at
least K'
distinct
values.
Suppose
there
are
p
sets of
unknown
distributions
Gi1,
Gi2,
.
.
Gik
(i
=
1,.
. .
,
p),
where
EIfg1ki
=
K',
and
consider the
P=
l
ki(ki
-
1)/2
within-set
hypotheses
fi(Gij)
=
fi(Gij),
i
<ji
(3.3)
From
(3.2)
we obtain
the
recursion
formula
W(kl,
k2,
,kp)
=
Xl
+
X2
+
***
+
xp:
Xi
E
S(ki),
i
=
1,
2,
. . .
,p,
(3.4)
where
W(kl,
k2,
. .
.,
kp)
is
the set of
possible
numbers
of
true
within-set
hypotheses
(3.3)
with
ki
distributions
in
set i
(i
=
1,
2,
.
.
,
p)
and
S(ki)
is
defined
as in
(3.2).
By
testing
intersections
as
in the
application
in
Section
3.1,
all
within-set
hypotheses
of
subset
homogeneity
of
the
G1
may
be
included.
A
specific
application
would
be
the
tests
usually
recommended
when
there
is
interaction
between
two
factors in
a
factorial
design:
The
levels
of
one
of
the
factors
are
compared
separately
within
each
level
of
the
other
factor.
If
i
represents
one
of
the
p
levels
of a
factor
A,
j
represents
one
of
the
ki
levels
of a
factor
B,
with
ki
=
k
for
all
i,
and
the
fi(Gij)
are
the
means
of
normal
distributions
Gij
with
common
variance,
then
the
hy-
potheses
(3.3)
are
the
standard
normal-theory
analysis-of-
variance
hypotheses
that
the
effects
of B
within
each
level
of
A
equal
zero
(see
also
Sec.
5,
Illustration
2).
3.3
Comparisons
Between
Several
Sets
of
Distributions
Given
the
same
situation
as in
the
application
in
Section
3.2,
consider
the
1j.i<j,Pkjkj
pairwise
equality
hy-
potheses
fi(Gij)
=
fj'(G'j1j),
i
<
i'.
(3.5)
[This
comparison
would
generally
make
sense
only
when
fi(Gij)
=
f(Gij)
for
all
i.]
The
possible
numbers
of
true
hypotheses can
be
obtained from
the
recursion
formula
B(k1l,
kp....
,
kp)
=
U
f
cici
+
x:
(C1,...,Cp)eC
Jl'ji<i''p)
x
EB(k-
cl,
k2
-
C2,
. .
.
,
kp
-
cp)j
(3.6)
where
B(O,
0,
. .
.
,
0)
=
{O},
B(kl,
k2,
.
.
.,
kp)
=
the
set of
possible
numbers of
true
between-set
hypotheses
(3.5)
with
ki
distributions
in
set
i
(i
=
1,
2,
.
.
.
,
p)
and
C
=
{
(cl,
C2,
. .
.
,
cp):
O
c<
ci
-<
ki
for
i
=
1,
2,
.
. .
,
p
and
If=
ci
>
O}.
The
proof
of
(3.6)
is
somewhat
similar
to
that of
(3.2)
and
is
omitted.
By
adding
consideration
of
intersections,
one
obtains
tests of
the
2K'
-
K'
-
1
-
1P
(2k,
-
ki
-
1)
hypotheses of
equality
of
all
subsets
containing
distributions from
more than
one set.
An
important
application
is
to
studies
comparing treat-
ments
with
control
groups.
As
pointed
out
by Cochran
(1983),
in
many
observational
studies an
ideal
control
group
is
not
available,
in
which
case it
is
desirable to
compare
each
treatment
group
with
more than
one control
group,
where each
control
may
be
vulnerable to
different
sources
of
bias.
3.4
Tests of
Independence
of
All
2
x
2
Subtables
of
a
K
x
L
Contingency
Table
or
Tests
of
Additivity
in
All 2
x
2
Subparts of
a
K
x
L
Factorial
Design
The
sets A of
possible
numbers of
true
hypotheses are
the
same in
(a) tests of
independence
of all 2
x
2
subtables
of
a K
x
L
contingency
table
and
(b)
tests of
additivity
in
all 2
x
2
subparts of a K
x
L
factorial
design.
If L
=
2,
they
reduce
to
those
in
the
application
in
Section
3.1:
In
(a),
the
hypotheses
are
then
equivalent
to
the
hy-
potheses
7ril
/ri2
=
7ri'1/'7i'2,
for
i,
i'
=
1,
2,
.
.
.,
K,
where
7%j
is
the
probability
of
an
observation
falling
in
row
i
and
column
j;
in
(b),
they
are
equivalent
to
the
hypotheses
,ui1
-
/1i2
=
pil-
/1i'2
for i,
i'
= 1,
2,
. .
.
, K,
where
uij
is
the
mean
of the
distribution
at
level
i
of
factor
A
and
j
of
Shaffer:
Modified
Sequentially
Rejective
Procedures
829
Table 3. Possible
Numbers
of True Hypotheses
for the Application
in
Section 3.4,
for Selected Values
of K x L
Number
of
Possible numbers
of
K
and L
hypotheses
true hypotheses
L = 2
Obtain from
Table
2
All K (2)
by setting
K
=
k.
L
=
3
K
=
3
9 0-3, 5,
9
K
=
4
18
0-10, 12, 18
K= 5 30
0-16, 18,22,30
L =
4
K
=
4
36
0-21,24,27,36
NOTE:
To
use the
MSRB procedure
for the application
in
Section 3.4,
determine the
set
A
=
{a,}
of
possible
numbers of true
hypotheses
corresponding
to the relevant values
of K
and L.
Then,
at
stage j,
for j
=
1,
. .
.,
(K
2),
test
the
hypothesis
H(1)
at
significance
level
alt1,
where
ti
=
maxia,!n -1
+
1
a,.
factor
B.
Results
for some representative
values of
K
x
L are
given
in
Table 3.
Adding intersections
permits
tests
of
the hypotheses
of independence
in all subtables
of a
contingency
table
or of
the hypotheses
of additivity
under
all subsets
of
factor level combinations
in
a factorial design.
If
L
>
2, there
is
no
obvious algorithm
for
computing
the
possible
numbers
of
true hypotheses.
It
can be seen,
however,
from the sets
of possible numbers
in
Sections
3.1-3.3 and
from Table 3 that the main
advantage
of the
MSRB procedure
over the SRB procedure
appears
at the
second
stage,
where the relative
difference
in
criterion
sig-
nificance
probabilities
is greatest.
An
explicit expression
for
t2
in
Section
3.4, proved
in the
Appendix,
is
t2=
[K(K
-
1)/2][L(L
-
1)/2]
-
(K
-
1)(L
-
1).
(3.7)
A
compromise
procedure,
possibly applicable
also
in other
situations,
would be to set
tj
=
t2
for
all
2
c j c
n
-
t2
+
1,
and to
use
the SRB
values
for all
stages j
>
n
-
t2
+
1.
This
approach
could also
be combined
effectively
with
the modified
procedure
described
in
Section
4.1.
4.
MODIFICATIONS
AND
GENERALIZATIONS
OF
THE
MSRB
PROCEDURE
4.1 A
Modified
MSRB
Procedure
Following
Initial Rejection
of a More
Comprehensive
Hypothesis
Often
the
n
hypotheses
are not tested
separately
unless
a
more
comprehensive
hypothesis
has
initially
been re-
jected
at
significance
level
a,
where
such
rejection
implies
that
at
least
some
number
r of the
n
hypotheses (but
not
which
ones)
are
false,
r
=
1, 2,
. .
.,
n
-
1.
It follows
directly
from the
proof
in
Section
2
that
a further
improve-
ment
in
the
MSRB is
then
possible;
the critical
values
a/
tj
for
testing
H(1),
H(2),
. .
.,
H(r)
can
be
replaced
by
a/
t(n-r)
without
increasing
the overall significance
level above
a.
A
typical
opportunity
to
apply
this
modified
procedure
would
arise
in
the use
of
a
sequentially
rejective procedure
in the application
in Section 3.1 following
rejection
of the
hypothesis f(G1) = f(G2)
- -
f(Gk)
by a composite
test based on a statistic
other than Y(l) (e.g., rejection
of
equality of
means with an F test in analysis
of variance).
In the first stage of the MSRB
following rejection
of this
composite hypothesis, aln
would be replaced by a/t2.
For
an application of this idea
in
a somewhat
different
context,
see Shaffer (1979).
4.2
A
Modified MSRB
Procedure Taking
Into
Account the Particular
Hypotheses Rejected
The power of the MSRB
procedure can be increased,
at the cost of greater complexity,
by substituting
for
a/t,
at stage
j
the value
a/lt,*,
where
t,*
is the maximum number
of
hypotheses that could be
true, given that the specific
hypotheses H(l), H(2),
. .
.,
H(, l)
are false.
(The depen-
dence of
tj*
on the first j
-
1
rejected hypotheses is sup-
pressed
for
convenience
in the notation.) To prove that
this
procedure
has an
experimentwise
significance
level
?
a, let
t,*L
be the minimum
t,*
(for 1
S
j
-
n
-
m
+
1)
over all subsets of size
j
-
1
of false hypotheses. Note that
t*
L-m
for all
j,
where m is the number
of
true
hypotheses.
Then Pr(one
or
more errors)
=
Pr(Y,
a
a/tj,L
for
some j
S
n -
m
+ 1 and some i
E
I) ?
Pr(Y,
c
a//m
for some
iE
I)
c aa.
As
an illustration,
consider the
application
in
Section 3.2
with p
=
2
and
k,
=
k2
=
4.
By referring
to Table
1,
we
see that
if
the two hypotheses
f1(G11)
=
f1(G14)
and
f1(G11)
=
f1(G13)
are
false,
the number
of
possibly
true
hypotheses
is
9;
if
the two
hypotheses
fl(Gll)
=
fl(G14)
and
f2(G21)
=
f2(G24)
are
false,
the number
of
possibly
true
hypotheses
is
6
(see
also Sec.
5,
Illustration
2).
5.
ILLUSTRATIONS
When the number
of
hypotheses
is
large,
the
analysis
of
relationships among
them
may
be
complicated.
In
many
situations that
arise
in
practice,
however,
the
number
of
hypotheses
is small and their
logical
interrelations
are
transparent.
In
such
cases,
the
MSRB
procedure
and its
extensions
can be
easily applied.
Illustration
1
is an ex-
ample
of this
kind.
In
Illustration
2,
the MSRB
is
compared
with a more
familiar
approach
to the
problem
described
in
Section
3.2.
Illustration
1. Information was
available
on the
pro-
portions
of
(i) passes, (ii)
failures,
and
(iii) incompletes
or
withdrawals,
in a number of mathematics
classes,
each
of which had
been
taught by
one
of
two
different methods.
All
classes
were
of
approximately
the same size
and were
taught by
different instructors. The
experimenter
was
in-
terested
in
comparing
the
proportions
(i), (ii),
and
(iii)
for
the two methods.
Let
Pi1k
be
the
proportion
of students
in
the
kth class
taught by
method
i who
fall
in
category j,
for i
=
1, 2;
j
=
1, 2, 3;
k
=
1, 2,
. . .
,
n,.
Assuming
the
vectors
(Pilk,
Pi2k, Pi3k)
are
independent
observations from trivariate
dis-
tributions
Fi
with mean vectors
(1-ii,
fl1i2 f13),
the
three
hypotheses
to be tested are
{H,:
1
-
2
=
,
j =
1, 2,
3}.
Since the sum of the three observations for each class
equals 1, it follows that if any of the three hypotheses is
false, at most one can be true.
Thus the following MSRB
methods may be considered.
830
Journal
of the American
Statistical Association, September
1986
1. Choose
an appropriate
test
statistic
for each
hypoth-
esis.
Order
the hypotheses
as
in Section
1, and
reject
H(l)
(the
hypothesis
corresponding
to the smallest
significance
probability)
if
Y(1)
<
a/3. If
H(1)
is rejected,
reject
H(i)
if
Y(i)<a,fori
=
2,3.
2. Carry out
a level-a
test
of the hypothesis Ho:
ulj
-
P2j
= 0 for
all
j.
Under
appropriate
conditions
on
the
proportions,
a repeated
measures
analysis
of
variance
or
a multivariate
analysis
of variance
would
be a
reasonable
approximate
test in this
situation
(see
Shaffer
1981).
In
view
of the result
in Section
4.
1, if Ho is
rejected,
test each
of
Hl,
H2, and
H3 at
level a.
Illustration
2. Assume
a
2
x
3 balanced
factorial
design
to
be
analyzed
by a
fixed-effects
analysis
of variance.
As
pointed
out
in Section
3.2,
if the test
for interaction
is
significant,
it is often
recommended
that
the effects
of
any
factor
of interest be examined
separately
within
each level
of
the
other
factor. Suppose
the
interaction
is
significant,
and
assume that
we are
interested
in all
pairwise
contrasts
among the
three levels
of factor
B for
each
of the two
levels
of factor
A. Letting
ui,
be the
mean of
the cell
for
level
i of
factor
A and
j of
factor B,
the six
hypotheses
to be
tested
are
{Hik):
1ij
-
1ik
=
0;
i
=
1,
2; j
<
k
=
1,
2, 3}.
Assume
that we want the
experimentwise
significance
level
to be
a. A
typical
way
of
accomplishing
this aim
is
to use
a
multiple
range
test
for each
value of
i,
with
sig-
nificance level
a/2
for each.
More
specifically,
given
the
value
of
i,
the three
means are ordered,
and
the difference
between
the largest
and smallest
is considered significant
(i.e.,
the corresponding
hypothesis
is rejected)
if
the
dif-
ference,
divided
by
its
estimated
standard
deviation based
on
the within-groups
mean square,
is greater
than
the
a/
2
critical
value
of the
studentized
range
distribution
for
three means.
If
the difference
is
significant,
the tests
of
the
remaining
two differences
are based
on
the
studentized
range
of two
means,
with the levels depending
on
the
par-
ticular
multiple
range
procedure
adopted.
The
optimal
lev-
els,
consistent
with a
maximum
Type
I
error
probability
of
a/2,
are a/2
for each
of the
remaining
two
differences
(see,
e.g.,
Lehmann
and Shaffer
1979).
To use the
MSRB,
note that
the
significance
of
the in-
teraction
implies
that the
six
hypotheses
are
not
all
true.
It
is
then
easily
seen
intuitively by
the
kind of
argument
in
Section
3.1,
and
formally
from the results
of Section
3.2,
that
at
most four
of them
are true.
Thus,
ordering
the
hypotheses
as
in
Section
1,
and
using
the modification
of
the MSRB
discussed
in
Section
4.1,
hypothesis
H(M)
would
be
rejected
if the difference
between the
corresponding
means were
larger
than
the a/4
critical value
of
the
stu-
dentized
range
of two means. Given
a
rejection,
H(2)
would
also be tested
at a/4.
Making
use
of
the
modification
in
Section
4.2,
H(3)
would be tested
at a/4
or
a/2,
depending
on whether
H(l) and
H(2)
referred
to the same
or
different
values
of
i,
respectively. At
each
subsequent
stage,
the
appropriate level
for
the test would be
easily
determined.
If the degrees
of freedom
for error are
large, the
multiple
range
and MSRB
approaches
can be compared
by
exam-
ining critical
values
of ranges
of
standard
normal
random
variables.
The first
test,
for
example,
would
be
based
ap-
proximately
on the
a/2
critical
value
of the
range
of
three
means
for the
multiple
range
procedure
and
the
a/4 critical
value
of
the
range
of two
means
for the
MSRB
procedure.
For a
=
.05,
the
respective
values
are
3.68 and 3.53.
In
other
words,
the
probability
of
finding
at
least
one signif-
icant
pairwise
difference
is greater
with
the
modified
MSRB
procedure
than
with
the range
procedure.
Some
further
comparisons
are possible
by
direct
consideration
of
critical
values
required
by the
two procedures.
For instance,
the
probability
of finding
at
least
one significant
difference
within
each
level
of
i
is greater
with the
modified
MSRB
than with
the
multiple
range procedure,
as
is
the proba-
bility
of
rejecting
all
of the hypotheses.
Further consid-
eration
of the
procedures
suggests,
as a
rough
approxi-
mation, that
the
multiple
range procedure
is more powerful
when
the
false
hypotheses
are
all
within
a
single
level
of
factor A,
whereas
the MSRB procedure
has the advantage
when
true
mean
differences
occur
within
both
levels.
6.
DISCUSSION
Note
that the
improvements
in
multiple
test
procedures
discussed
in
this
article
are based
on
logical
analysis
of
the
relationships
among
the
hypotheses
and
are
independent
of
the
particular
test statistics
used,
except
for
knowledge
of their respective
marginal
distributions.
As
in
the
usual
use
of
the
Bonferroni
inequality,
the
methods
are,
there-
fore,
highly
flexible
and
easily
used in
nonstandard
situa-
tions.
Other
approaches
to
multiple
testing
use more
pow-
erful methods
based
on
the
joint
distribution
of
the
test
statistics,
ranging
from
the
use
of
improved
Bonferroni
inequalities
that
are based
on some properties
of the
joint
distribution
of
subsets
of the test
statistics (e.g.,
Worsley
1982),
to
the
full use
of the joint
distribution,
as,
for ex-
ample,
when the test
statistics
are
independent
or
in the
comparison
of means
of normal
distributions
with
equal
variance.
In
many
circumstances
it
may
be
feasible
to
com-
bine
logical
and
distributional
considerations
to
obtain
multiple
testing
methods
better
than those
obtainable
using
either
type
alone;
these
would be
modifications
of the
more
general
class
of
sequentially
rejective
methods
considered
by
Holm
(1977).
APPENDIX: PROOF
OF (3.7)
The
proof
will be carried out
in the
contingency
table
frame-
work. To
apply
it to factorial
designs,
substitute
means
for ex-
pected
frequencies
and
substitute equivalence
if different
only
by
translation for
equivalence
if
different
only by
a
scale
factor.
Consider
a K
x
L
contingency
table,
with entries
equal
to
expected
frequencies
under the
true
model,
as
a
row
of
L
column
vectors
cl,
C2,
. . .
, CL
of
length
K. Two
vectors
will be said to
be
equivalent
if
they
differ
only by
a
scale
factor.
Then
given
any
L'
columns,
2
<
L'
'
L,
all 2
x
2 subtables of
the
K
x
L'
contingency
table consisting
of the
K rows
and those
L'
columns
satisfy
the hypotheses
(of independence)
iff all
column
vectors
i
included
in the
L' columns
are
equivalent.
It will be
shown
that a table
satisfying
the
maximum
number
of
true hypotheses
of independence,
given that
not
all are
true,
Shaffer:
Modified
Sequentially
Rejective
Procedures
831
is one in
which L
-
1
vectors are
equivalent and the Lth vector
would be
equivalent to
these others
if
a
single element were
changed.
The
number of true
hypotheses
in
such
a
table is
readily
seen to be
(3.7).
Given
a
specific
table that
does not
have
all
column
vectors
equivalent, let
r,,
=
the
number of
independent 2
x
2
subtables
in the
K
x
2
subtable
consisting of the K
rows and
columns
j
and
j',
0
r,,,
?
K(K
-
1)/2. The number
of
independent 2
x
2
tables
is
r, .
(A. 1)
We want to
choose vectors to maximize
(A. 1) subject
to the
restriction that not all
hypotheses are true.
Let
rM
=
maxls,<,J'L
r,,1
among
those
that
are
<K(K
-
1)/2,
and
let
c,*
and
c,**
be any
two column vectors for which
r,*,**
=
rM.
Replace each column
vector
c,
by a
copy of
c,*
(if
r,,*
-
r,,**)
or
c,**
(otherwise). Note that with
each
of these
replace-
ments, (A.1)
does not
decrease: since
r,,
becomes either
K(K
-
1)/2 (if
c,
and
c,'
become
copies of the
same vector
c,*
or
c,**)
or
rM
(if
c,
and
c,
become
copies of
the two
different
vectors),
no
r,,
can
decrease. After this
replacement, the vectors are
in
two
groups of
L1 and
L
-
L1
vectors,
respectively, where the
vectors within
each group
are equivalent;
the number of
true
hypotheses is
[L1(L1
-
1)12][K(K
-
1)/2]
+
[(L
-
L1)(L
-L
-
1)12][K(K
-
1)/2]
+
L1(L
-
Ll)rM.
(A.2)
Since the maximum number of true
hypotheses
must
occur
in a
table
of
this
form,
it remains
only
to maximize
(A.2)
with
respect
to
L1
and
rM.
Maximization
of (A.2)
With
Respect
to
L1.
Since the sum
of
the coefficients of
K(K
-
1)/2
and rM
in
(A.2)
is
fixed,
and
rM
<
K(K
-
1)/2,
the maximum
is
found
by
maximizing
the coef-
ficient
of
K(K
-
1)/2.
Since
this coefficient is
a
quadratic
in
L1
with
a minimum at
L,
=
L/2,
its
integer-valued
maximum occurs
for L1
=
1
(or
L
-
1),
in
which
case
(A.2)
becomes
[(L
-
1)(L
-
2)12][K(K
-
1)/2]
+
(L
-
1)rM.
(A.3)
Maximization
of (A.3)
With
Respect
to
rM.
We want to max-
imize
rM,
the
number
of
true
hypotheses
in
a
K
x
2
contingency
table, given
rM <
K(K
-
1)/2.
As
noted
in
Section
3,
the set of
possible
numbers
of
true
hypotheses
in
tests
of
independence
in
K
x
2
contingency
tables is
equivalent
to the set of
possible
numbers of true
hypotheses
in
tests
of
pairwise
equality
among
k
populations
(see the
application in Sec.
3.1).
Considering that
application,
if at least one
of the
hypotheses (3.5) is
false, there
are at
least
two distinct
values of f (G); we
may assume
that there
are exactly
two, since the
number of
true hypotheses
can never
be decreased
if two different
values are
replaced by a
single value.
If the
two distinct
values
are designated as
v,
and
v2,
and d
=
the number of
distributions
i such
that
f
(G,)
=
v1 (O
<
d
<
k)
then
the number of
true
hypotheses is
d(d
-
1)/2
+
(k
-
d)(k
-
d
-
1)/2.
(A.4)
The
expression (A.4) is
a
quadratic in d with
a
minimum at
d
=
k/2 and its
integer-valued
maximum
at
d
=
1
(or
k
-
1).
Sub-
stituting this value
for
d
in
(A.4) gives (k
-
1)(k
-
2)/2
as the
maximum
number of
true
hypotheses
in
(3.1)
with at
least
one
false
hypothesis.
Therefore, the maximum value of
rM
smaller
than
K(K
-
1)/2
is
(K
-
1)(K
-
2)/2,
achieved
when the vector
that is not
equivalent
to the
L
-
1
others differs from such
equivalence
in
a
single
element.
Finally,
(A.3)
with
rM
replaced
by (K
-
1)(K
-
2)/2 equals
(3.7).
[Received
August
1984.
Revised
January
1986.]
REFERENCES
Cochran,
W. G.
(1983), Planning and Analysis of Observational Studies,
New York: John Wiley.
Einot, Israel, and Gabriel, K. R. (1975),
"A
Study of the Powers of
Several Methods of Multiple Comparisons," Journal of the American
Statistical Association, 70, 574-583.
Gabriel, K. R. (1969), "Simultaneous Test Procedures-Some Theory
of
Multiple Comparisons," Annals of Mathematical Statistics, 40, 224-
250.
Holm, Sture (1977), "Sequentially Rejective Multiple
Test
Procedures,"
Statistical
Research
Report, University
of
Umea (Sweden),
Institute
of Mathematics and Statistics.
(1979),
"A
Simple Sequentially Rejective Multiple
Test Proce-
dure," Scandinavian Journal of Statistics, 6, 65-70.
Lehmann, E. L., and Shaffer, Juliet Popper (1979), "Optimum Signifi-
cance
Levels for Multistage Comparison Procedures," The Annals of
Statistics, 7, 27-45.
Shaffer,
Juliet
Popper (1979), "Comparison
of Means: An
F
Test Fol-
lowed by a Modified Multiple Range Procedure," Journal of Educa-
tional Statistics, 4, 14-23.
(1981), "The Analysis of
Variance Mixed Model With Allocated
Observations: Application to Repeated Measurement Designs," Jour-
nal
of
the
American Statistical Association, 76,
607-611.
(1984),
"Issues
Arising
in
Multiple Comparisons Among Popu-
lations," in Proceedings of the Seventh Conference on Probability The-
ory, ed. M. Iosifescu, Bucharest,
Romania: Editura Academiei
Re-
publicii Socialiste
Romania.
Worsley,
K. J.
(1982),
"An
Improved
Bonferroni
Inequality
and
Appli-
cations," Biometrika, 69, 297-302.
... After making this choice, method construction reduces to a discrete optimization problem. Moreover, the Closure Principle helps to handle complex situations such as restricted combinations, i.e., logical implications between hypotheses (Shaffer, 1986;Goeman et al., 2021). Finally, methods constructed using closed testing often allow for some user flexibility, permitting researchers to modify the results of the multiple testing procedure post-hoc without compromising error control (Goeman and Solari, 2011). ...
... Logical relationships between hypotheses (restricted combinations; Shaffer, 1986) can be seen as an extreme case of the same phenomenon, where H S can even become ∅ for some S. For example, suppose we have m = 3 and H 1 : θ 1 = θ 2 , H 2 : θ 1 = θ 3 and H 3 : θ 2 = θ 3 . ...
Preprint
We present a novel necessary and sufficient principle for False Discovery Rate (FDR) control. This e-Partitioning Principle says that a procedure controls FDR if and only if it is a special case of a general e-Partitioning procedure. By writing existing methods as special cases of this procedure, we can achieve uniform improvements of these methods, and we show this in particular for the eBH, BY and Su methods. We also show that methods developed using the e-Partitioning Principle have several valuable properties. They generally control FDR not just for one rejected set, but simultaneously over many, allowing post hoc flexibility for the researcher in the final choice of the rejected hypotheses. Under some conditions, they also allow for post hoc adjustment of the error rate, choosing the FDR level α\alpha post hoc, or switching to familywise error control after seeing the data. In addition, e-Partitioning allows FDR control methods to exploit logical relationships between hypotheses to gain power.
... "Age" (children, middle-aged and older adults) was used as a between factor; and illuminance, defined as "lx" (100, 300 and 750 lx), and "CCT" (3000 and 6000 K) was used as within factors. Post-hoc analysis was performed using multiple comparisons of two-tailed paired or unpaired t-tests with modified sequentially rejective Bonferroni (MSRB) correction (Shaffer 1986). Crystalline lens transmittance was compared among the age groups using one-way ANOVA with multiple comparisons of unpaired t-tests corrected by MSRB after calculating the area under the curve (AUC) of spectral lens transmittance in the wavelength range of 350-790 nm. ...
Article
The effects of different illuminances and correlated color temperatures (CCTs) of LED lighting on contrast sensitivity (log CS) and subjective visual perception (brightness and comfort) were compared in healthy children (n = 10, 9.9 ± 1.6 years old), middle-aged (n = 10, 40.7 ± 3.3 years) and older adults (n = 10, 68.3 ± 3.2 years). The six lighting conditions used were a combination of three illuminances (100, 300, and 750 lx) and two CCTs (3000 and 6000 K). Furthermore, we measured spectral crystalline lens transmittance and pupil size, and investigated relationships between visual-related measurements and ophthalmologic characteristics. Log CS significantly decreased in older adults and increased with increasing illuminance, regardless of age group or CCT. Multiple regression analysis revealed that age-related changes in log CS are not due to pupil size (β = 1.1 × 10 −3 , p = .42) or age (β = 6.0 × 10 −4 , p = .24) but are influenced by a decrease in lens transmittance (β = 1.72, p < .0001). Subjective brightness and comfort increased with increasing illuminance, but comfort in children was not affected by illuminance and was a higher tendency at low CCT. These results show that the effects of the lighting environment, i.e. illuminance and CCT, on visual functions vary with age and ophthalmologic characteristics, especially crystalline lens transmittance, emphasizing the importance of designing a lighting environment considering these factors. ARTICLE HISTORY
... The effect sizes are indicated in terms of partial eta squared (ηp2). Post hoc comparisons were made using Shaffer's modified sequentially rejective multiple-test procedure, which extends the Bonferroni t-tests in a stepwise fashion [29]. The level of significance was set at p < .05 ...
Article
Full-text available
Second-language (L2) knowledge of English is an essential communication tool in the contemporary era of globalization in many research and business fields. Among the important tasks in English education is the measurement of student proficiency. While various tests purport to measure proficiency in learning English, repeated preparation and implementation of tests remain major costs in providing English education. We propose that measurement with an electroencephalogram (EEG) might be used for measuring proficiency in L2 learning of English without the repeated preparation and implementation of English tests. In this study, we proposed the use of an EEG as an index of the frontal theta band activity involved in concentration with the use of English materials without the need for the repeated preparation and implementation of English tests. Our experiments recorded the EEG signals of 52 participants, divided into two groups of those with beginner and advanced knowledge of English. All participants were asked to attend three sessions: before, during, and after the English lesson and the frontal theta band activities were analyzed as index of concentration for these lessons. The results showed that the frontal theta band activity for advanced students during the English lesson was larger than that for beginner students. Frontal brain activity reflects a common computation recognizing the need for cognitive control, such as concentration on L2 learning. Without dependence on existing tests, this study proposes a method to measure L2 proficiency more quickly than the methods used in previous neuroscience studies.
... For significant effects in the invertebrate density and biomass models, pairwise comparisons were assessed using the summary function by adjusting the base factor level. For significant effects in the salmon density models, pairwise comparisons were performed using Shaffer's stepwise extension of the Tukey test to control for family-wise error (Shaffer, 1986) using the glht function in the multcomp package, version 1.4-16 (Hothorn et al., 2008). A threshold of α = 0.05 was used to assess significance of model factors. ...
Article
Objective River habitat restoration projects are implemented throughout western North America to mitigate the long-term effects of anthropogenic disruptions on native salmonid (Salmonidae) populations. However, relatively little is known about the target species’ physiological or behavioral responses to habitat enhancement. We constructed four seasonally inundated habitat restoration projects in the Merced (8.5 ha) and Stanislaus (2.2 ha) rivers in California’s Central Valley. We hypothesized that increasing seasonal shallow-water habitat area would increase the quantity and quality of rearing habitat, improving juvenile salmonid rearing, growth, and variation in migration timing and body size. Methods To test our hypotheses, we compared invertebrate prey abundance and juvenile density, residence time, and growth of Chinook Salmon Oncorhynchus tshawytscha in restored seasonal habitats and the unrestored main channel. Results Invertebrate prey productivity in restored seasonally inundated habitat was variable but generally comparable to that in the main channel. In both rivers, restored seasonal habitats supported increased total salmon growth through increased residence time in wet years and faster growth rates in a below-normal precipitation year and a greater diversity in out-migration timing across all study years. Conclusions This study demonstrates that expanded seasonally inundated habitats support invertebrate prey production and are used by juvenile salmon. All sites supported natal rearing and growth, but the strength of restoration response varied by site, flow conditions, and metric. Data for rearing behavior and growth parameters of juvenile Chinook Salmon can be incorporated into existing models to better predict the rearing capacity and growth and survival benefits of seasonally inundated habitat in these and other Mediterranean-climate rivers.
... We conducted 4 (movement) × 3 (trial) mixed-factor ANOVAs on all the dependent variables. Additionally, Shaffer's modified sequentially rejective Bonferroni procedure [13] was used for multiple comparisons. Fig. 6 shows the results of the reliance rate, trust rating of Schaefer trust scale, and emotional trust rating. ...
Conference Paper
Full-text available
This study investigates the role of robot bowing in trust repair, focusing on how human-like movement impacts trust in human-robot interaction. We manipulated the movement quality across four conditions: human-like, constant-speed, abrupt, and no-movement. Specifically for the human-like movement, we analyzed the Japanese hospitality (called "omotenashi") gesture, which expresses attentiveness and respect to others and was translated into robot bowing movement. The experimental results indicated that human-like movement did not significantly affect trust repair, while constant-speed and abrupt movements showed improvements in subjective trust. The study highlighted the need for movement designs with considerations of the robot's appearance to facilitate effective trust repair.
... Our plan involved conducting a one-way ANOVA based on the mask condition for eeriness scores. In cases of significant main effects, Shaffer's method (Shaffer, 1986) was employed for multiple comparisons. Nonsignificant main effects prompted equivalence testing using two one-sided tests for equivalence (TOSTs) in R (Lakens et al., 2018), with Cohen's d (equivalence bounds) set at ±0.5. ...
Article
Wearing a mask often disrupts social interactions because it covers parts of the face. Hence, masks with a printed smiling mouth (smiling masks) were designed to overcome this problem. In this study, we examine how wearing a smiling mask evokes affective impressions. The results show that people wearing a smiling mask are evaluated more eerily than those with a typical cloth mask or without any masks (Experiments 1). Moreover, people wearing a transparent mask (i.e., a mask whose area around the mouth is transparent) are evaluated less eerily than those with a smiling mask (Experiments 2). Our findings suggest that the realism inconsistency between facial features in the upper area and the printed mouth causes devaluation effects for people with a smiling mask. Our findings can be used as a reference for future mask designs that can promote healthy social interactions in a mask-wearing society, considering the potential return of infectious diseases and pandemics in the future.
Article
One of the main challenges in drug development for rare diseases is selecting the appropriate primary endpoints for pivotal studies. Although many endpoints can effectively reflect clinical benefit, their sensitivity often varies, making it difficult to determine the required sample size for study design and to interpret final results, which may be underpowered for some or all endpoints. This complexity is further compounded when there is a desire to support regulatory claims for multiple clinical endpoints and dose regimens due to the issues of multiplicity and sample size constraints. Joint Primary Endpoints (JPEs) offer a compelling strategy to address these challenges; however, their analysis in conjunction with component endpoints presents additional complexities, particularly in managing multiplicity concerns for regulatory claims. To address these issues, this paper introduces a robust two‐stage gatekeeping framework designed to test two hierarchically ordered families of hypotheses. A novel truncated closed testing procedure is employed in the first stage, enhancing flexibility and adaptability in the evaluation of primary endpoints. This approach strategically propagates a controlled fraction of the error rate to the second stage for assessing secondary endpoints, ensuring rigorous control of the global family‐wise Type I error rate across both stages. Through extensive numerical simulations and real‐world clinical trial applications, we demonstrate the efficiency, adaptability, and practical utility of this approach in advancing drug development for rare diseases while meeting stringent regulatory requirements.
Article
Full-text available
The framework for multistage comparison procedures in the present paper is roughly that introduced by Duncan and treated more fully by Tukey. In the present paper we consider the problem of finding the optimum allocation of nominal significance levels for successive stages. The optimum procedure we obtain when the number s of treatments is odd, and the compromise procedure we propose for even s, essentially agree with a procedure suggested by Tukey. The agreement is exact when s is even and close when s is odd. The results of the present paper apply among others to the problem of distinguishing normal distributions with known variances, multinomial distributions, Poisson distributions, and distributions in certain nonparametric settings. However, they do not apply exactly to the comparison of normal distributions with a common unknown variance. When the variances are completely unknown, the method applies in principle but faces the difficulty that no exact test is then available for testing the homogeneity of a set of means.
Article
This article deals with layouts involving one random factor and at least one fixed factor crossed with the random factor, where the sum of the observations on each level of the random factor must equal a specified constant. The constant total for each random level can then be thought of as allocated among the levels of other factors. An allocation mixed model is proposed as a modification of the Scheffé mixed model, which applies when there are no constant-sum constraints. It is shown that for testing several hypotheses and estimating several parameters of interest, parallel statistical methods can be applied under the Scheffé and the allocation mixed models.
Article
If used only when a preliminary F test yields significance, the usual multiple range procedures can be modified to increase the probability of detecting differences without changing the control of Type I error. The modification consists of a reduction in the critical value when comparing the largest and smallest means. Equivalence of modified and unmodified procedures in error control is demonstrated. The modified procedure is also compared with the alternative of using the unmodified range test without a preliminary F test, and it is shown that each has advantages over the other under some circumstances.
Article
We present an improved Bonferroni inequality which gives an upper bound for the probability of the union of an arbitrary sequence of events. The bound is constructed in terms of the joint probability of pairs of events, which are represented by edges on a graph. Examples of applications to periodicity, location shift detection, Kolmogorov-Smirnov tests and outlier detection are given.
Article
Treats studies, primarily in human populations, that show casual effects of certain agents, procedures, treatment or programs. Deals with the difficulties that comparative observational studies have because of bias in their design and analysis. Systematically considers the many sources of bias and discusses how care in matching or adjustment of results can reduce the effects of bias in these investigations.
Article
Powers of multiple comparisons procedures are studied for fixed maximal experimentwise levels. Analytical considerations show Tukey-Scheffe methods to have least power, Duncan's to be intermediate, Ryan's most powerful. (Newman-Keuls tests could preserve experimentwise levels only if modified radically and impractically.) Extensive Monte-Carlo trials show these power differences to be small, especially for range statistics. We therefore generally recommend the Tukey technique for its elegant simplicity and existent confidence bounds-its power is little below that of any other method. Simulation was for 3, 4 and 5 treatments: the conclusions might need modification for more treatments.
Article
This paper presents a simple and widely ap- plicable multiple test procedure of the sequentially rejective type, i.e. hypotheses are rejected one at a tine until no further rejections can be done. It is shown that the test has a prescribed level of significance protection against error of the first kind for any combination of true hypotheses. The power properties of the test and a number of possible applications are also discussed.
Article
When a hypothesis is tested by a significance test and is not rejected, it is generally agreed that all hypotheses implied by that hypothesis (its "components") must also be considered as non-rejected. However, when the hypothesis is rejected the question remains as to which components may also be rejected. Various writers have given attention to this question and have proposed a variety of multiple comparisons methods based either on tests of each one of the components or on simultaneous confidence bounds on parametric functions related to the various hypotheses. An approach to such methods, apparently originally due to Tukey [27], is to test each component hypothesis by comparing its statistic with the α\alpha level critical value of the statistic for the overall hypothesis. This is called a Simultaneous Test Procedure (STP for short) as all hypotheses may be tested simultaneously and without reference to one another. An STP involves no stepwise testing of the kind employed by some other methods of multiple comparisons for means, in which subsets are tested for equality only if they are contained in sets which have already been found significant. (See [3], [4], [10], [18]). A general formalization of STP's is attempted in this paper. Section 2 introduces the requisite concepts of families of hypotheses and the implication relations between them, as well as the monotonicity relations between the related statistics. Section 3 defines STP's and shows conditions for coherence and consonance of their decisions, these properties being that hypothesis implication relations are preserved in the decisions of the STP. Section 4 discusses comparison of various STP's for the same hypotheses and shows the advantages of the union-intersection type of statistics and of reducing the family of hypotheses tested as much as possible. Section 5 translates all these results to simultaneous confidence statements after introducing the definitions necessary to allow such translation. The analogy between simultaneous test and confidence methods is of special importance as it brings a wide spectrum of methods within this framework, most of which was originally formulated in confidence region terms. This covers the original work of Tukey [27] and Scheffe [25] and continues with that of Roy and his associates [21] and most recently Krishnaiah [12], [13]. A general discussion of this confidence approach has been given by Aitchison [1] since the first draft of the present paper. In view of the close analogies pointed out in Section 5, it is not surprising that Aitchison arrives at the requirement of a constant critical value for all his tests, exactly as in an STP. In fact his approach and the present one are complementary. The formal theory is illustrated with ANOVA examples to clarify the concepts. No essentially new techniques are presented in this paper though this approach has been used elsewhere by the author to derive a number of practically useful procedures [6], [7], [8].
  • Shaffer Juliet Popper