Journal of modern applied statistical methods: JMASM

Online ISSN: 1538-9472
Constrained Bayes methodology represents an alternative to the posterior mean (empirical Bayes) method commonly used to produce random effect predictions under mixed linear models. The general constrained Bayes methodology of Ghosh (1992) is compared to a direct implementation of constraints, and it is suggested that the former approach could feasibly be incorporated into commercial mixed model software. Simulation studies and a real-data example illustrate the main points and support the conclusions.
A general simulation procedure is described to validate model fitting algorithms for complex likelihood functions that are utilized in periodic cancer screening trials. Although screening programs have existed for a few decades, there are still many unsolved problems, such as how age or hormone affects the screening sensitivity, the sojourn time in the preclinical state, and the transition probability from disease-free state to the preclinical state. Simulations are needed to check reliability or validity of the likelihood function combined with the associated effect functions. One bottleneck in the simulation procedure is the very time consuming calculations of the maximum likelihood estimates (MLE) from generated data. A practical procedure is presented, along with results for when both sensitivity and transition probability into the preclinical state are age-dependent. The procedure is also suitable for other applications.
Power pattern of HLM, AR(1), UN and TRM by G matrix in treatment, time and interaction tests when n = 75, t = 3 and λ = .5 of 5000 MC samples at α = .10.
The power properties of traditional repeated measures and hierarchical linear models have not been clearly determined in the balanced design for longitudinal studies in the current literature. A Monte Carlo power analysis of traditional repeated measures and hierarchical multivariate linear models are presented under three variance-covariance structures. Results suggest that traditional repeated measures have higher power than hierarchical linear models for main effects, but lower power for interaction effects. Significant power differences are also exhibited when power is compared across different covariance structures. Results also supplement more comprehensive empirical indexes for estimating model precision via bootstrap estimates and the approximate power for both main effects and interaction tests under standard model assumptions.
Missing data are a common problem in educational research. A promising technique, that can be implemented in SAS PROC MIXED and is therefore widely available, is to use maximum likelihood to estimate model parameters and base hypothesis tests on these estimates. However, it is not clear which test statistic in PROC MIXED performs better with missing data. The performance of the Hotelling-Lawley-McKeon and Kenward-Roger omnibus test statistics on the means for a single factor within-subject ANOVA are compared. The results indicate that the Kenward-Roger statistic performed better in terms of keeping the Type I error close to the nominal alpha level.
In regression analysis of count data, independent variables are often modeled by their linear effects under the assumption of log-linearity. In reality, the validity of such an assumption is rarely tested, and its use is at times unjustifiable. A lack-of-fit test is proposed for the adequacy of a postulated functional form of an independent variable within the framework of semiparametric Poisson regression models based on penalized splines. It offers added flexibility in accommodating the potentially non-loglinear effect of the independent variable. A likelihood ratio test is constructed for the adequacy of the postulated parametric form, for example log-linearity, of the independent variable effect. Simulations indicate that the proposed model performs well, and misspecified parametric model has much reduced power. An example is given.
Testing for interactions in multivariate experiments is an important function. Studies indicate that much data from social studies research is not normally distributed, thus violating that assumption of the ANOVA procedure. The aligned rank transformation test (ART), aligning using the means of columns and rows, has been found, in limited situations, to be robust to Type I error rates and to have greater power than the ANOVA. This study explored a variety of alignments, including the median, Winsorized trimmed means (10%) and (20%), the Huber 1.28 M-estimator, and the Harrell-Davis estimator of the median. Results are reported for Type I errors and power.
Whether one should use null hypothesis testing, confidence intervals, and/or effect sizes is a source of continuing controversy in educational research. An alternative to testing for statistical significance, known as equivalence testing, is little used in educational research. Equivalence testing is useful in situations where the researcher wishes to show that two means are not statistically different. A common equivalence test for comparing the means of two independent samples is reviewed. A simulation study assessed the relationships among effect size, sample size, statistical significance, and statistical equivalence. An example of typical educational research data is reanalyzed using equivalence methodology. A tentative conclusion is drawn about the magnitude of effect size needed to be important. (Contains 8 tables, 7 figures, and 42 references.) (Author/SLD)
Thesis (Ph. D.)--University of Northern Colorado, 2006. Includes bibliographical references (leaves 111-116)
Researchers are frequently chided for choosing the .05 alpha level as the determiner of statistical significance (or non-significance). A partial justification is provided.
Trivials are effect sizes associated with statistically non-significant results. Trivials are like Tribbles in the Star Trek television show. They are cute and loveable. They proliferate without limit. They probably growl at Bayesians. But they are troublesome. This brief report discusses the trouble with trivials.
b. 50 observations Figure 2b. 100 observations
Monte Carlo Parameters of the GARCH Effects
b. 1000 observations  
a. Power?Size plots of the Granger-causality test for 50 observations VAR(1)  
Using Monte Carlo methods, the properties of Granger causality test in stable VAR models are studied under the presence of different magnitudes of GARCH effects in the error terms. Analysis reveals that substantial GARCH effects influence the size properties of the Granger causality test, especially in small samples. The power functions of the test are usually slightly lower when GARCH effects are imposed among the residuals compared with the case of white noise residuals.
A Visual Basic program that generates all permutations of {1, 2, ..., n} is presented. The procedure for running the program as an Excel macro is described. An application is presented which involves selecting permutations which meet a specific constraint.
Rangen 2.0 is Fortran 90 module of subroutines used to generate uniform and nonuniform pseudo-random deviates. It includes uni1, an uniform pseudo-random number generator, and non-uniform generators based on uni1. The subroutines in Rangen 2.0 were written using Essential Lahey Fortran 90, a proper subset of Fortran 90. It includes both source code for the subroutines and a short description of each subroutine, its purpose, and the arguments including data type and usage.
The purpose of this study was to compare the statistical power of a variety of exact tests in the 2 × C ordered categorical contingency table using StatXact software. The Wilcoxon Rank Sum, Expected Normal Scores, Savage Scores (or its Log Rank equivalent), and Permutation tests were studied. Results indicated that the procedures were nearly the same in terms of comparative statistical power.
Pesarin and Bonnini respond to Anderson's (2013) Conceptual Distinction between the Critical p Value and the Type I Error Rate in Permutation Testing.
A brief discussion on the history and purpose of Fortran for scientific and engineering computing is given. This leads to the role Fortran, in its various environments, will likely play well into the 21st century.
Testing a point (sharp) null hypothesis is arguably the most widely used statistical inferential procedure in many fields of scientific research, nevertheless, the most controversial, and misapprehended. Since 1935 when Buchanan-Wollaston raised the first criticism against hypothesis testing, this foundational field of statistics has drawn increasingly active and stronger opposition, including draconian suggestions that statistical significance testing should be abandoned or even banned. Statisticians should stop ignoring these accumulated and significant anomalies within the current point-null hypotheses paradigm and rebuild healthy foundations of statistical science. The foundation for a paradigm shift in testing statistical hypotheses is suggested, which is testing interval null hypotheses based on implications of the Zero probability paradox. It states that in a real-world research point-null hypothesis of a normal mean has zero probability. This implies that formulated point-null hypothesis of a mean in the context of the simple normal model is almost surely false. Thus, Zero probability paradox points to the root cause of so-called large n problem in significance testing. It discloses that there is no point in searching for a cure under the current point-null paradigm.
A SAS program (SAS 9.1.3 release, SAS Institute, Cary, N.C.) is presented to implement the Hettmansperger and McKean (1983) linear model aligned rank test (nonparametric ANCOVA) for the single covariate and one-way ANCOVA case. As part of this program, SAS code is also provided to derive the residuals from the regression of Y on X (which is step 1 in the Hettmansperger and McKean procedure) using either ordinary least squares regression (proc reg in SAS) or robust regression with MM estimation (proc robustreg in SAS).
From the days when statistical calculations were done on mechanical calculators to today, technology has transformed the discipline of statistics. More than just giving statisticians the power to crunch numbers, it has fundamentally changed the way we teach, do research, and consult. In this article, I give some examples of this from my 35 years as an academic statistician.
The aim of this study is to compare different robust regression methods in three main models of multiple linear regression and weighting multiple linear regression. An algorithm for weighting multiple linear regression by standard deviation and variance for combining different robust method is given in SAS along with an application.
Output for Example 2. 
Variance homogeneity (HOV) is a critical assumption for ANOVA whose violation may lead to perturbations in Type I error rates. Minimal consensus exists on selecting an appropriate test. This SAS macro implements 14 different HOV approaches in one-way ANOVA. Examples are given and practical issues discussed.
This syntax program is intended to provide an application, not readily available, for users in SPSS who are interested in the Pearson product-moment correlation coefficient (r) and r biased adjustment indices such as the Fisher Approximate Unbiased estimator and the Olkin and Pratt adjustment.
Output for the first data set 
Output for the second data set. 
The main purpose of this study is to review calculation algorithms for some of the most common non-parametric and omnibus tests for normality, and to provide them as a compiled MATLAB function. All tests are coded to provide p-values for those normality tests, and the proposed function gives the results as an output table. Only some common tests are computed by packages, and researchers are obliged to use these ready-made tests, or code them in any language. Some tests are not covered by any of the packages, and most packages do not give any hints on which test to use under which assumptions.
Interaction Matrix
Alias Structure
Defining Relations
Patterns and extended patterns for the design: I= a 1 bcDE = a 2 cDF= a 3 bEF
Interaction graphs have been developed for two-level and three-level fractional factorial designs under different design criteria. A catalogue is presented of all possible non-isomorphic interaction graphs for 4 r2n-p (r=1; n=2,..., 10; p=1,...,8 and r=2; n=1..., 7; p=l,...,7) fractional factorial designs, and non-isomorphic interaction graphs for asymmetric fractional factorial designs under the concept of combined array.
Reliability data are generated in the form of success/failure. An attempt was made to model such type of data using binomial distribution in the Bayesian paradigm. For fitting the Bayesian model both analytic and simulation techniques are used. Laplace approximation was implemented for approximating posterior densities of the model parameters. Parallel simulation tools were implemented with an extensive use of R and JAGS. R and JAGS code are developed and provided. Real data sets are used for the purpose of illustration.
Re-sampling based statistical tests are known to be computationally heavy, but reliable when small sample sizes are available. Despite their nice theoretical properties not much effort has been put to make them efficient. Computationally efficient method for calculating permutation-based p-values for the Pearson correlation coefficient and two independent samples t-test are proposed. The method is general and can be applied to other similar two sample mean or two mean vectors cases.
Fortran 77 and 90 modules (REALPOPS.lib) exist for invoking the 8 distributions estimated by Micceri (1989). These respective modules were created by Sawilowsky et al. (1990) and Sawilowsky and Fahoome (2003). The MicceriRD (Micceri’s Real Distributions) Python package was created because Python is increasingly used for data analysis and, in some cases, Monte Carlo simulations.
This primer is intended to provide the basic information for sampling without replacement from finite populations.
Rubin (1976, and elsewhere) claimed that there are three kinds of “missingness”: missing completely at random; missing at random; and missing not at random. He gave examples of each. The article that now follows takes an opposing view by arguing that almost all missing data are missing not at random.
Weakly informative priors
Caterpillar plot for Lomax model
Lung cancer survival data
Summary of the simulated results using rstan function with Mean stands for
Summary of the simulated results using rstan function with Mean stands for posterior mean, se_mean, sd for posterior standard deviation, LB, Median, UB are 2.5%, 50%, 97.5% quantiles, n_eff for number effective sample size, and Rhat, respectively
An attempt is made to fit three distributions, the Lomax, exponential Lomax, and Weibull Lomax to implement Bayesian methods to analyze Myeloma patients using Stan. This model is applied to a real survival censored data so that all the concepts and computations will be around the same data. A code was developed and improved to implement censored mechanism throughout using rstan. Furthermore, parallel simulation tools are also implemented with an extensive use of rstan.
Detailed is a 20-year arduous journey to develop a statistically viable two-phase (AB) single-case two independent-samples randomization test procedure. The test is designed to compare the effectiveness of two different interventions that are randomly assigned to cases. In contrast to the unsatisfactory simulation results produced by an earlier proposed randomization test, the present test consistently exhibited acceptable Type I error control under various design and effect-type configurations, while at the same time possessing adequate power to detect moderately sized intervention-difference effects. Selected issues, applications, and a multiple-baseline extension of the two-sample test are discussed.
On occasion, the response to treatment in an AB/BA crossover trial is measured on a binary variable - success or failure. It is assumed that response to treatment is measured on an outcome variable with (+) representing a treatment success and a (-) representing a treatment failure. Traditionally, three tests for comparing treatment effect have been used (McNemar's, Mainland-Gart, and Prescott's). An issue arises concerning treatment comparisons when there may be a residual effect (carryover effect) of a previous treatment affecting the current treatment. A general consensus as to which procedure is preferable is debatable. However, if both group and carry-over effects are absent, Prescott's test is the best one to use. Under a model with residual effects, Prescott's test is biased. Therefore, a conservative approach includes testing for residual effects. When there is no period effect, McNemar's test is optimal, while McNemar's test is biased.
Alphabet letter recognition item responses from 1,299 rising kindergarten children from low-income families were used to determine the dimensionality of letter recognition ability. The rising kindergarteners were enrolled in preschool classrooms implementing a research-based early literary curriculum. Item responses from the TERA-3 subtests were also analyzed. Results indicated alphabet letter recognition was unitary. The ability of boys and younger children was less than girls and older children. Child-level letter recognition was highly associated with TERA-3 measures of letter knowledge and conventions of print. Classroom-level mean letter recognition ability accounted for most of variance in classroom mean TERA-3 scores.
We propose a least absolute deviation estimation method that produced a least absolute deviation estimator of parameter of the linear regression model. The method is as accurate as existing method. Keywords: Linear regression model, least absolute deviation (LAD), equation of a line, R statistical programming and algorithm
When choosing smoothing parameters in exponential smoothing, the choice can be made by either minimizing the sum of squared one-step-ahead forecast errors or minimizing the sum of the absolute one-step-ahead forecast errors. In this article, the resulting forecast accuracy is used to compare these two options.
An application of Markov Chain Analysis of student flow at Kuwait University is presented based on a random sample of 1,100 students from the academic years 1996-1997 to 2004-2005. Results were obtained for each college and in total which allows for a comparative study. The students' mean lifetimes in different levels of study in the colleges as well as the percentage of dropping out of the system are estimated.
Assumed that the distribution of the lifetime of any unit follows a lognormal distribution with parameters μ and σ. Also, assume that the relationship between ju and the stress level V is given by the power rule model. Several types of bootstrap intervals of the parameters were studied and their performance was studied using simulations and compared in term of attainment of the nominal confidence level, symmetry of lower and upper error rates and the expected width. Conclusions and recommendations are given.
A single acceptance sampling plan for the three-parameter Lindley distribution under a truncated life test is developed. For various consumer’s confidence levels, acceptance numbers, and values of the ratio of the experimental time to the specified average lifetime, the minimum sample size important to assert a certain average lifetime are calculated. The operating characteristic (OC) function values as well as the associated producer’s risks are also provided. A numerical example is presented to illustrate the suggested acceptance sampling plans.
A new method is proposed based on construction of perceptual maps using techniques of correspondence analysis and interval algebra that allow specifying the measurement error expected in panel choices in the evaluation form described in unstructured 9-point hedonic scale.
Simulating studies with right-censored outcomes as functions of time-varying covariates is discussed. Guidelines on the use of an algorithm developed by Zhou and implemented by Hendry are provided. Through simulation studies, the sensitivity of the method to user inputs is considered.
Statistical power for ANOVA and Welch with varied sample size and number of outliers when standardized group mean equals to 0.0, 0.3 and 0.6 
Statistical power for ANOVA and Welch with sample size = 20, number of outliers = 0, 1, 2, 3, 4, 5, and two effect sizes (standardized group mean equals to 0.0, 0.3, 0.6 and 0.0,-0.3,-0.6) 
Power of parametric significance tests and different outlier accommodation
Type I error rates of nonparametric tests and Winsorizing method under varied sample sizes and outlier conditions
The outliers' influence on power rates in ANOVA and Welch tests at various conditions was examined and compared with the effectiveness of nonparametric methods and Winsorizing in minimizing the impact of outliers. Results showed that, considering both power and Type I error, a nonparametric test is the safest choice to control the inflation of Type I error with a decent sample size and yield relatively high power.
Methods proposed to solve the missing data problem in estimation procedures should consider the type of missing data, the missing data mechanism, the sampling design and the availability of auxiliary variables correlated with the process of interest. This article explores the use of geostatistical models with multiple imputation to deal with missing data in environmental surveys. The method is applied to the analysis of data generated from a probability survey to estimate Coho salmon abundance in streams located in western Oregon watersheds.
Participants in epidemiologic studies may not represent statistically independent observations. We consider modifications to conventional analyses of 2×2 tables, including Fisher's exact test and confidence intervals, to account for correlated observations in this setting. An example is provided, assessing the robustness of conclusions from a published analysis.
A cross tabulation of the measure of agreement when component loading was .80 between rater 1 and rater 2.
This article pertains to the accuracy of the of the scree plot in determining the correct number of components to retain under different conditions of sample size, component loading and variable-to-component ratio. The study employs use of Monte Carlo simulations in which the population parameters were manipulated, and data were generated, and then the scree plot applied to the generated scores.
Top-cited authors
Shlomo Sawilowsky
  • Wayne State University
Bruno D Zumbo
  • University of British Columbia - Vancouver
Carl Lee
  • Central Michigan University
Felix Famoye
  • Central Michigan University
Rand R Wilcox
  • University of Southern California