Journal of Educational Statistics

Print ISSN: 0362-9791
This book is about a computer program, TETRAD. The program is designed to assist in the search for causal explanations of statistical data. There are often many millions of possible alternative causal explanations for nonexperimental data; human researchers can only consider and evaluate a handful of these alternatives and hope that the best explanation they find is the best explanation there is. That hope is often in vain. The TETRAD program uses artificial intelligence techniques to help the investigator to perform a systematic search for alternative causal models using whatever relevant knowledge may be available. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Categorical data consist of observations counted and subclassified into one among a set of disjoint, exhaustive groups. The groups may be defined on the basis of a single or a combination of qualitative characteristics and/or quantitative measurements. When such categorizations are to be modeled as probabilistic “responses” or “outcomes”, statistical analysis must be based upon discrete distributions and requires extensive generalization of the classical notions of regression and analysis of variance commonly used for continuous observations. This overview article reviews statistical inference for categorical data, from the basic issues in analyses of 2 × 2 contingency tables through generalized linear mixed models, with emphasis on approaches that have increasingly unified our concepts of categorical and continuous data modeling.
Commends G. W. Joe and J. L. Mendoza (see record 1990-13767-001) on their suggestions for applications for the internal correlation coefficient (ICC). Even without knowledge of its distribution, the ICC can serve as a useful index number, especially as an upper bound on correlation, although additional theoretical and simulation results are needed. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Addresses the problem of grasping the distribution of characteristic roots of a correlation matrix encountered by G. W. Joe and J. L. Mendoza (see record 1990-13767-001) in their work on the internal correlation. Correlating the 2 linear composites that produce the largest internal correlation with the original variables is recommended for future exploration with real-life data. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Comment agrees with Freedman's criticism of the path analysis model for studying education and social stratification. The model makes assumptions about how data are generated that are generally false. (RB)
Incl. app., bibliographical references, index, answers pp; 593-619
Milkman (1978) accuses Arthur Jensen of misapplying heritability data in speculating on the causes of racial differences in intelligence test scores. He then offers a method for illuminating Jensen's alleged "error". It is contended in this article that Milkman (1978) has misconstrued Jensen's (1973) argument and that, as a consequence, his method is without point.
This article considers surveys where one can observe, after the sample is selected, that each member of the sample belongs to one or more aggregations. The population of aggregations is of interest, and we consider the probability that a given aggregation contains at least one sample member. An expression for the probability is derived as a function of population parameters, many of which can be known only if additional, costly data collection is undertaken. A variety of model-based estimators of those parameters is discussed, and their relative advantages and disadvantages are noted. The estimators are applied to a particular case, the National Educational Longitudinal Study of 1988 (NELS:88), in which the aggregations are tenth-grade schools. Evaluations of the estimators are presented. In NELS:88, a sample of eighth-grade schools and eighth-grade students in those schools was surveyed and 2 years later, when most of the students were in tenth grade, the students were resurveyed. If the tenth-grade school sample (i. e., the set of tenth-grade schools enrolling one or more sampled students) is a probabiliy sample, then we can make inferences to the population of tenth-grade schools. A requirement for a sample to be a probability sample is that the selection probabilities be nonzero for all units and be known for the selected units. This article discusses how those probabilities can be estimated from complete and incomplete data. Various estimators based on incomplete data are derived and empirically evaluated with data from a special test sample of schools and data from the Houston Independent School District.
Researchers concerned with the degree of similarity in pairs of scores sharing a common mean and variance can express similarity in terms of the absolute differences between the scores. This article discusses the distribution of absolute differences between pairs of normally distributed scores having a common mean and variance and illustrates procedures for calculating moments and areas within this distribution.
The observed probability p is the social scientist's primary tool for evaluating the outcomes of statistical hypothesis tests. Functions of ps are used in tests of "combined significance," meta-analytic summaries based on sample probability values. This study examines the nonnull asymptotic distributions of several functions of one-tailed sample probability values (from t tests). Normal approximations were based on the asymptotic distributions of z(p), the standard normal deviate associated with the one-sided p value; of ln(p), the natural logarithm of the probability value; and of several modifications of ln(p). Two additional approximations, based on variance-stabilizing transformations of ln(p) and z(p), were derived. Approximate cumulative distribution functions (cdfs) were compared to the computed exact cdf of the p associated with the one-sample t test. Approximations to the distribution of z(p) appeared quite accurate even for very small samples, while other approximations were inaccurate unless sample sizes or effect sizes were very large. Approximations based on variance-stabilizing transformations were not much more accurate than those based on ln(p) and z(p). Generalizations of the results are discussed, and implications for use of the approximations conclude the article.
Browne's definitive but complex formulas for the cross-validational accuracy of an OSL-estimated regression equation in the random-effects sampling model are here reworked to achieve greater perspicuity and extended to include the fixed-effects sampling model.
Large-scale field evaluations of education programs typically present complex and competing design requirements that can rarely be satisfied by ideal, textbook solutions. This paper uses a recently completed national evaluation of the federally-funded Emergency School Aid Act (ESAA) Program to illustrate in concrete fashion some of the problems often encountered in major program evaluations, and traces the evolution of efforts in that three-year longitudinal study--both in the original design conceptualization and in the actual implementation and data analysis phases--to resolve competing demands and to provide as much methodological rigor as possible under field conditions. Issues discussed here include the selection of experimental versus quasi-experimental designs; the development of sampling procedures to provide maximum precision in treatment-control comparisons; the selection of achievement tests and difficulties in developing and administering other, non-cognitive outcome measures; and the importance of ascertaining whether the underlying assumptions of a true experimental design have been met before conclusions about program impact are drawn on the basis of treatment-control comparisons.
A new estimator of the actual error rate for the two-group discriminant problem is presented. The new estimator is based on the concept of a shrunken generalized distance. This new estimator is compared favorably to two modified distance estimators and an estimator developed by Okamoto.
A time-saving and space-saving algorithm is presented for computing the sums of squares and estimated cell means under the additive model in a two-way analysis of variance or covariance with unequal numbers of observations in the cells. The algorithm uses matrices of order no larger than min {r, c}, where r = number of rows and c = number of columns. A Fortran program is available; the key computational device is a special subroutine, LS2WAY, whose FORTRAN code appears in Rubin, Stroud & Thayer (1978). The procedure is illustrated using high school and college numerical grade averages for 85 feeder high schools over a period of 6 years.
Simulated data based on five test score distributions indicate that a slight modification of the asymptotic normal theory for the estimation of the p and kappa indices in mastery testing will provide results which are in close agreement with those based on small samples from the beta-binomial distribution. The modification is achieved through the multiplication of the asymptotic standard errors of estimate by the constant 1+1/m3/4 where m is the sample size.
This paper considers the situation (as in college admissions) where one is given two attributes, X and Y, which one uses to predict a third attribute, Z, by some function Ẑ of X and Y. However, one only retains values of X, Y, and Z for which Ẑ is large. A thorough discussion, under fairly general conditions on the distributions, is given of how the correlation coefficients of X, Y, and Z are affected by this restriction of the range of values. In the case of the normal distribution, where linear prediction is optimal, the role of suppressor variables is discussed.
A method is proposed for modeling developmental data, in general, and, in particular, the developmental relationship between the age of a child and the presence or absence of aggressive (pushing) behavior using a logistic quantal response model. The proposed model was applied to cross-sectional data on 180 first-, second-, and third-grade children from a study by Hapkiewicz (1973). The results tended to support the hypothesized sex differences and the logistic quantal response model was found to fit the data adequately. An interpretation of the parameter estimates provide interesting conclusions concerning the developmental patterns of aggression for both sexes.
Two approaches are described for the simultaneous determination of passing scores for subtests when the passing score for the total test is known. The minimax approach seeks subtest passing scores in such a way that there is maximum agreement between pass-fail classifications based on each subtest and pass-fail decisions based on the entire test. The Rasch procedure seeks subtest passing scores that occupy approximately the same place on the ability scale with the total test passing score. In the context of a basic skills assessment program, the two approaches yield essentially the same set of passing scores.
The authors discuss standardization procedures in discriminant analysis and compare three leading statistical software packages in terms of their calculations of unstandardized and standardized discriminant coefficients. They describe estimation procedures in each and point out inconsistencies that could lead to the misinterpretation of data. They present arguments that favor using within-group-rather than total-variance estimates to calculate standardized discriminant coefficients.
Consider k normal distributions having means μ 1,...,μ k and variances σ 1²,...,σ k². Let μ [1] ≤... ≤ μ [k] be the means written in ascending order. Dudewicz and Dalal proposed a two-stage procedure for selecting the population having the largest mean μ [k] where the variances are assumed to be unknown and "unequal." This paper considers an approximate but conservative solution for situations where unequal sample sizes are used in the first stage. The paper also considers how to estimate the actual probability of selecting the "best" treatment; that is, the one having mean μ [k], after a heteroscedastic ANOVA has been performed.
This article discusses mastery classification involving the use of latent class and quasi-independence models. Extensions of mastery classification techniques developed by Macready and Dayton are presented. These extensions provide decision rules for assigning individuals to latent classes in complex models involving more than two latent categories. Procedures for identifying the minimally acceptable proportion of misclassified individuals in complex latent class models are also detailed.
Models are constructed to compare trainee and expert assessment strategies. These models detect trainee differences in assessment in terms of location and scale, and detect the presence of bias towards particular groups of a population. Although the assessment may be evaluated by means of hypothesis tests, emphasis is placed upon the use of a criterion (Mallows, 1973). The use of these procedures in assessment training programs is demonstrated in an application.
The use of analysis of covariance (ANCOVA) is examined for attribute-by-treatment interaction (ATI) research, where participants are randomly assigned to treatments but not to levels of the attribute factor. A survey of the ATI literature revealed that ANCOVA was typically not used and was sometimes misunderstood. The current paper demonstrates that, contrary to what some have thought, the use of a covariate does not confound the interpretation of the interaction between attribute and treatment, because the expected value of the interaction effect is independent of the particular covariate chosen. Independence occurs regardless of (1) whether the covariate is measured with perfect or less than perfect reliability, (2) whether the attribute variable is discrete or continuous, and (3) whether sample sizes are equal or proportional. The primary implication of these results is that ANCOVA provides a needed means for researchers to achieve powerful designs for assessing ATI's.
Contents: Introduction to Statistics. Probability. Random Variables, Distributions, and Estimation. Binomial and Normal Distributions. Hypothesis Testing. Student's T, Chi-Square, and F Distribution. Comparing Two Means. One-Way ANOVA. Multiple Comparisons. Two-Way ANOVA. Repeated Measures and Randomized Blocks Designs. Selection Techniques. Correlation and Regression. Categorical Data. Nonparametric Procedures. Appendices: Tables. Elementary Matrix Algebra.
A validity study was conducted to examine the degree to which GMAT scores and undergraduate grade-point average (UGPA) could predict first-year average (FYA) and final grade-point average in doctoral programs in business. A variety of empirical Bayes regression models, some of which took into account possible differences in regressions across schools and cohorts, were investigated for this purpose. Indexes of model fit showed that the most parsimonious model, which did not allow for school or cohort effects, was just as useful for prediction as the more complex models. The three preadmissions measures were found to be associated with graduate school grades, though to a lesser degree than in MBA programs. The prediction achieved using UGPA alone as a predictor tended to be more accurate than that obtained using GMAT verbal (GMATV) and GMAT quantitative (GMATQ) scores together. Including all three predictors was more effective than using only UGPA. The most likely explanation for the lower levels of prediction than in MBA programs is that doctoral programs tend to be more selective. Within-school means on GMATV, GMATQ, UGPA, and FYA were higher than those found in MBA validity studies; within-school standard deviations on FYA tended to be smaller. Among these very select, academically competent doctoral students, highly accurate prediction of grades may not be possible.
Testing the hypothesis H that k > 1 binomial parameters are equal and jointly estimating these parameters are related problems. A Bayesian argument can simultaneously answer these inference questions: to test the hypothesis H, the posterior probability λ = λ(H|x) of H given the experimental data x can be used; to estimate each binomial parameter, their Bayesian estimates under H and the alternative hypothesis H̄ are combined with weights λ and 1 - λ, respectively.
The comparison of two regression lines is often meaningful or of interest over a finite interval I of the independent variable. When the prior distribution of the parameters is a natural conjugate, the posterior distribution of the distances between two regression lines at the end points of I is bivariate t. The posterior probability that one regression line lies above the other uniformly over I is numerically evaluated using this distribution.
Top-cited authors
Larry V Hedges
  • Northwestern University
John C. Loehlin
  • University of Texas at Austin
Annick Leroy
Peter Rousseeuw
Gerald Arnold
  • American Board of Internal Medicine