## Publications

In a previous experiment, in which subjects rated likability of stimulus persons of both sexes to whom different personality-trait adjectives were ascribed, it was found that ratings were more polarized when stimulus persons were of the opposite sex than when they were of the same sex as the rater; i.e., ascribing positive adjectives resulted in hi...

Ss were asked, for each of 200 adjectives, to rate the likability of a “woman your age” (Group W), a “man your age” (Group M), a “person your age” (Group P), or a “child” (Group C). There were 32 men and 32 women in each group. The ratings of Groups W and M were more polarized for Ss rating the opposite sex than for Ss rating their own sex. In othe...

Previous research has indicated that college students make more polarized likability ratings of college-age stimulus persons of the opposite sex than of those of the same sex, that is, opposite-sex stimulus persons described positively yield higher ratings, and those described negatively yield lower ratings, than same-sex stimulus persons. The two...

In comparing multiple treatments, 2 error rates that have been studied extensively are the familywise and false discovery rates. Different methods are used to control each of these rates. Yet, it is rare to find studies that compare the same methods on both of these rates, and also on the per-family error rate, the expected number of false rejectio...

In two experiments, college students were asked to rate estimated degree of career success for a series of stimulus persons described by personality-trait adjectives. Experiment 1 also included sex of stimulus persons and Experiment 2 included sex and age of stimulus persons in the descriptions. In both experiments, the descriptions were varied sys...

Much of statistical theory and methodology deals with testing a single hypothesis or specifying a single confidence set. But many, probably most studies deal with a number of questions, requiring more than one test or confidence set. In fact, within the last 20 years, advances in scientific and computer technology has made it possible to collect an...

An exact procedure is developed for testing directional hypotheses concerning pairwise differences between category probabilities when the data can be considered a sample from a multinomial distribution. The procedure is illustrated in the case of three and four categories, and is compared with the exact test of the hypothesis of equal category pro...

Multiple test procedures are usually compared on various aspects of error control and power. Power is measured as some function of the number of false hypotheses correctly identified as false. However, given equal numbers of rejected false hypotheses, the pattern of rejections, i.e. the particular set of false hypotheses identified, may be crucial...

The Newman-Keuls (NK) procedure for testing all pairwise comparisons among a set of treatment means, introduced by Newman (1939) and in a slightly different form by Keuls (1952) was proposed as a reasonable way to alleviate the inflation of error rates when a large number of means are compared. It was proposed before the concepts of different types...

There are many different notions of optimality even in testing a single hypothesis. In the multiple testing area, the number of possibilities is very much greater. The paper first will describe multiplicity issues that arise in tests involving a single parameter, and will describe a new optimality result in that context. Although the example given...

Consider the multiple testing problem of testing k null hypotheses, where the unknown family of distributions is assumed to satisfy a certain monotonicity assumption. Attention is restricted to procedures that control the familywise error rate in the strong sense and which satisfy a monotonicity condition. Under these assumptions, we prove certain...

Often in applied research, confidence intervals (CIs) are constructed or reported only for parameters selected after viewing the data. We show that such selected intervals fail to provide the assumed coverage probability. By generalizing the false discovery rate (FDR) approach from multiple testing to selected multiple CIs, we suggest the false cov...

A combination of hypothesis testing and confidence interval construction is often used in social and behavioral science studies. Sometimes confidence intervals are computed or reported only if a null hypothesis is rejected, perhaps to see whether the range of values is of practical importance. Sometimes they are constructed or reported only if a nu...

Multiple hypothesis testing occurs in a vast variety of fields and for a vast variety of purposes. Optimality results are relatively sparse in this area compared to results for tests of individual hypotheses. This paper restricts consideration to cases in which a finite number of parameters are involved, in which conclusions are desired for each pa...

DNA microarrays are part of a new and promising class of biotechnologies that allow the monitoring of expression levels in cells for thousands of genes simultaneously. An important and common question in DNA microarray experiments is the identification of differentially expressed genes, that is, genes whose expression levels are associated with a r...

L. V. Jones and J. W. Tukey (2000) pointed out that the usual 2-sided, equal-tails null hypothesis test at level alpha can be reinterpreted as simultaneous tests of 2 directional inequality hypotheses, each at level alpha/2, and that the maximum probability of a Type I error is alpha/2 if the truth of the null hypothesis is considered impossible. T...

Duncan's Bayesian decision-theoretic multiple comparison procedure requires a decision on the relative magnitudes of losses due to Type I and Type II errors. In this paper, the relative losses are chosen so that the procedure results in weak control of familywise error at the 0.05 level, i.e. the probability that all hypotheses are accepted is 0.95...

Currently available methods for determining exact probabilities of the signed-rank statistic without computer assistance are shown to be inadequate. Currently available tables are noted to be either excessively large or of limited coverage. A combination of methods is shown to give exact or very accurate approximations to the ρ values for most case...

Limitations in currently available methods for producing significance probabilities for the sign test are discussed. Two simple modifications to the continuity corrected normal approximation are derived and presented in a simple form. These modifications are shown to markedly reduce the relative error in approximating exact probabilities. The relat...

In basing results on ratios of properties of two groups, it would be undesirable for conclusions to be affected by the arbitrary choice of which group’s properties should be the numerator and which the denominator. From this point of view, using means of ratios is undesirable; medians are somewhat better, and mean logarithms seem the best choice. T...

In the standard linear regression model with independent, homoscedastic errors, the Gauss—Markov theorem asserts that = (X'X)(X'y) is the best linear unbiased estimator of β and, furthermore, that is the best linear unbiased estimator of c'β for all p × 1 vectors c. In the corresponding random regressor model, X is a random sample of size n from a...

Four pairwise multiple comparison procedures for achieving approximate familywise Type I error control were investigated when multisample sphericity was violated. The test statistic in all cases was the ratio of the corresponding sample mean difference divided by an estimate of its variance. Bonferroni, Studentized range, and Studentized maximum mo...

In a factorial design with two or more factors, there is nonzero interaction when the differences among the levels of one factor vary with levels of other factors. The interaction is disordinal or qualitative with respect to a specific factor, sayA, if the difference between at least two levels ofA is positive for some and negative for some levels...

R. Rosenthal and D. B. Rubin (see record
1990-00197-001) proposed a measure of effect size for multiple-choice data that adjusts for differences in percentage correct due to different numbers of response alternatives. The adjustment is based on a logit model. Under an alternative simple guessing model, the appropriate adjustment can differ conside...

R. Rosenthal and D. B. Rubin (see record 1990-00197-001 ) proposed a measure of effect size for multiple-choice data that adjusts for differences in percentage correct due to different numbers of response alternatives. The adjustment is based on a logit model. Under an alternative simple guessing model, the appropriate adjustment can differ conside...

By the inverted distribution of a random variable X we mean the distribution of its reciprocal 1/X. Every distribution G that assigns probability 0 to the value 0 is the inverted distribution of some random variable, namely of X = 1/Y, where Y is distributed according to G. Thus it is impossible to assert any special properties of inverted distribu...

Suppose that n hypotheses H1, H2,..., Hn with associated test statistics T1, T2 ..., Tn are to be tested by a procedure with experimentwise significance level (the probability of rejecting one or more true hypotheses) smaller than or equal to some specified value α. A commonly used procedure satisfying this condition is the Bonferroni (B) procedure...

This article deals with layouts involving one random factor and at least one fixed factor crossed with the random factor, where the sum of the observations on each level of the random factor must equal a specified constant. The constant total for each random level can then be thought of as allocated among the levels of other factors. An allocation...

This article deals with multiple comparison methods for a set of populations which can be described in terms of testing hypotheses of homogeneity of subsets. Criteria for evaluating procedures are defined in terms of the set of outcomes of all tests, termed the pattern of decisions. In particular complexity is introduced as a new criterion based on...

This paper is concerned with jointly testing the hypotheses $\theta_i = \theta_{i0}, i = 1, \cdots, s$, using tests based on independent statistics $T_i$ with distributions $P(T_i \leq t) = F_i(t, \theta_i)$ nonincreasing in $\theta_i$. Holm proposed a sequentially rejective test procedure, applicable to this problem, for which, for fixed $\alpha (...

If used only when a preliminary F test yields significance, the usual multiple range procedures can be modified to increase the probability of detecting differences without changing the control of Type I error. The modification consists of a reduction in the critical value when comparing the largest and smallest means. Equivalence of modified and u...

The framework for multistage comparison procedures in the present paper is roughly that introduced by Duncan and treated more fully by Tukey. In the present paper we consider the problem of finding the optimum allocation of nominal significance levels for successive stages. The optimum procedure we obtain when the number $s$ of treatments is odd, a...

A result of Tukey (1953), Spjøtvoll (1971), and Einot and Gabriel (1975) concerning multiple-range tests is stated sometimes with and sometimes without an implied condition of monotonicity of the defining critical values. It is shown that this condition is not only sufficient but also necessary for the validity of this and a related result.

Dunnett's procedure for multiple comparisons yields a set of simultaneous confidence intervals for pairwise comparison of the means of each of k - 1 conditions with the mean of a kth condition. As in the case of the Tukey procedure for all pairwise comparisons, the Dunnett procedure can be extended to yield simultaneous confidence intervals for all...

Appropriate reorganization of variables in some analysis of variance designs may make the obtained results more easily interpretable and may also expand the range of experimental designs that can easily be analyzed by standard procedures. A rule is given for determining equivalences of effects in terms of original and reorganized variables, and an...

Intersubject agreement on names (uncertainty) for pictures indexes codability of visual reality in a language community. The time it takes to access permanent memory and retrieve name-words for visual objects was measured by picture naming reaction time (RT). RT is influenced by four fundamental variables: the uncertainty-codability of the display,...

A measure of the magnitude of the effect in a one-factor multivariate analysis of variance design is considered. Cooley and Lohnes have proposed the use of the quantity (1 — |W|/|T|) as a multivariate extension of the correlation ratio, where |W| is the determinant of the within-groups cross-products matrix and | T| is the determinant of the total...

The commonly used multiple-comparisons methods for testing pairwise contrasts among means in analysis of variance can be divided into those which apply a fixed standard to all contrasts regardless of the outcomes of tests on other contrasts (ScheffC, Tukey A) and those which use a stepwise procedure (multiple-range rests) wherein the standard is su...

When the hypothesis H0: θ = θ0 is tested against the alternative Ha: θ ≠ θ0, θ a real-valued parameter, it would often be more appropriate to make simultaneous tests of the two hypotheses 1H0: θ ≤ θ0 (with alternative 1Ha: θ > θ0) and 2H0: θ ≥ θ0 (with alternative 2Ha: θ < θ0). The criterion of unbiasedness in choosing an optimal test of H0 is inap...

Exact methods for the partitioning of Pearson's chi-square are criticized on the grounds that they lead to tests of a set of hypotheses in which, in general, each test is appropriate only on the assumption that other hypotheses in the set are true. The specific case of a 4 × 2 contingency table is examined in detail, and an alternative partitioning...

Describes the log-linear model as a framework for analyzing effects in multidimensional contingency tables, i.e., tables of frequencies formed by 2 or more variables of classification. Variables are considered to have nominal categories. A general purpose analysis is proposed for such tables, similar to analysis of variance. 2 test procedures are c...

The log-linear model for contingency tables expresses the logarithm of a cell frequency as an additive function of main effects,
interactions, etc., in a way formally identical with an analysis of variance model. Exact statistical tests are developed
to test hypotheses that specific effects or sets of effects are zero, yielding procedures for explo...

Presents a reformulation of the rationale for the t test of the difference between 2 means, similar to but distinct from that proposed by H. Kaiser. Basically, the traditional 2-sided formulation, with the null hypothesis MU1 = MU2 vs. the alternative hypothesis MU1 > MU2, is replaced by a simultaneous test of 2 1-sided hypotheses: MU1 ± MU2 vs. MU...

SYNOPSIS. Autoradiographic studies were done which tested the effect of a potent DNA inhibitor, mitomycin C (MC) on the utilization of tritium from exogenous thymidine-methyl-H3 (TMH3) in Entamoeba histolytica grown with Bacteroides sp. in CLG medium. Concentrations of MC (0.0002%) which inhibited growth of amebae by ca. 50%, caused an overall depr...

Incorporation of tritium from thymidine-methyl-H3 (TMH3) in the Laredo strain of Entamoeba histolytica grown at room temperature in the CLG medium, as in the K9 strain grown at 37 C, occurs in both nucleus and cytoplasm. Most extensive cytoplasmic activity is detected during periods of most pronounced growth. Nuclear activity is generally evenly di...

Ss (50 college students) were asked to try to classify correctly into 1 of 2 groups each of a series of cards containing geometric figures, with the "correct" classification (conceptualized as reinforcement of the corresponding response) provided following each trial. Actually, the 2 reinforcements occurred randomly with equal probabilities. Applyi...

The existence of correlated response classes in verbal conditioning experiments means that the response class actually being conditioned cannot, in general, be uniquely determined. Methods of analyzing the data from such experiments in order to reduce the indeterminacy are proposed.

Fisher reported obtaining a positive relationship in males between overestimation of self height and both power aspirations and commitment to the idea of male superiority. Possible explanations for this result are considered. A methodological problem involved in this type of research is discussed, and some issues are mentioned on which further rese...

An R-type factor analysis of Hall and Lindzey's ratings of seventeen personality theories on eighteen variables was conducted. Both oblique and orthogonal rotation yielded similar structures. Five factors emerged, and they seemed potentially meaningful as a way to order and understand contemporary personality theories. A detailed comparison was mad...

A model developed by Burke and Estes generated predictions for a 2-stimulus (T^B1), T^B2)), 2-response (A^B1), A^B2)) successive discrimination learning problem, which consisted of predicting which of 2 reinforcing events, E^B1) or E^B2), would occur on each of the 324 trials. The probability of E^B1) was 1.00 on T^B1) trials and .50 on T^B2) trial...

Ss were given 2 successive probabilistic discrimination problems. Their performance on a 3rd problem was predicted on the assumption that it would be affected in a specified way by mediating responses, resulting from the training on the 2 initial problems. The precise quantitative predictions were only partially confirmed although there was a signi...

Does the nonparametric version of the N-K procedure control the FDR?
We have partial results at present.