## About

480

Publications

79,387

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

13,521

Citations

Citations since 2016

## Publications

Publications (480)

Sedentary behavior (SB) is associated with cardiometabolic disease and mortality, but its association with dementia is currently unclear. This study investigates whether SB is associated with incident dementia regardless of engagement in physical activity (PA). A total of 146,651 participants from the UK Biobank who were 60 years or older and did n...

Purpose:
To provide a method for determining the vector that, when added to the preoperative astigmatism, results in no prediction error (PE) and to specify statistical methods for evaluating astigmatism and determining the 95% confidence convex polygon.
Setting:
Baylor College of Medicine, Houston, Texas, and University of Southern California,...

When comparing two independent groups, there is now a substantial literature on characterizing the difference between the groups using a measure of effect size that is based on a measure of location in conjunction with some measure of dispersion. Included is a measure of effect size that allows heteroscedasticity. This paper suggests a robust exten...

This note deals with the goal of computing a confidence interval for P=P(X<Y), where X and Y are independent random variables. Extant results indicate that the methods derived by Cliff, as well as Brunner and Munzel, perform relatively well and that generally there is little separating these two techniques. But like all methods, they break down whe...

This note suggests two simple approaches to determining whether it is reasonable to make a decision about which random variable has the smallest or largest measure of location. Both are related to Tukey’s three-decision rule, they are easily adapted to a wide range of situations, and they have certain advantages over extant techniques. The focus is...

Consider a one-way or two-way ANOVA design. Typically, groups are compared based on some measure of location. The paper suggests alternative methods where measures of location are replaced by a robust measure of effect size that is based in part on a robust measure of dispersion. The measure of effect size used here does not assume that the groups...

Being able to remove or weigh down the influence of outlier data is desirable for any statistical model. While magnetic and electroencephalographic (MEEG) data are often averaged across trials per condition, it is becoming common practice to use information from all trials to build statistical linear models. Individual trials can, however, have con...

When comparing two independent groups, a possible appeal of the quantile shift measure of effect size is that its magnitude takes into account situations where one or both distributions are skewed. Extant results indicate that a percentile bootstrap method performs reasonably well given the goal of making inferences about this measure of effect siz...

Introduction:
Physical activity (PA) is recognized as one of the key lifestyle behaviors that reduces risk of developing dementia late in life. However, PA also leads to increased respiration, and in areas with high levels of air pollution, PA may increase exposure to pollutants linked with higher risk of developing dementia. Here, we investigate...

Chapter 2 introduces the basic mathematical tools for judging the robustness of parameters. It is demonstrated that the population mean is not robust. Robust measures of location and scatter are introduced.

Chapter 7 covers one-way, two-way and three-way designs dealing independent groups. Included are robust, heteroscedastic measures of effect size. Multiple comparisons are covered as are nested designs. R functions are described that can handle the usual global hypotheses associated with main effects and interactions, all pairwise comparisons as wel...

Chapter 4 deals with inferential methods in the one-sample case. It begins by describing when and why Student's t test can be highly unsatisfactory in terms of power, controlling the probability of a type I error, and achieving accurate probability coverage. Methods for making inferences about trimmed means, M-estimators and related measures of loc...

Chapter 10 summarizes a wide range of robust regression estimators. Their relative merits are discussed. Generally, these estimators deal effectively with regression outliers and leverage points. Some can offer a substantial advantage, in terms of efficiency, when there is heteroscedasticity. Included are robust versions of logistic regression and...

Chapter 9 describes robust measures of association. Several types are available and their relative merits are discussed. Inferential methods based on these measures of association are described as well. Although Pearson's correlation is not robust, a method for testing the hypothesis that Pearson's correlation is equal to zero is included that allo...

Chapter 5 describes robust methods for comparing two distributions. Compared to traditional methods for comparing means, modern robust methods offer substantial gains in power under fairly general conditions. Included are methods for comparing both independent and dependent groups. The chapter begins with methods for comparing multiple quantiles. T...

Chapter 11 describes a variety of inferential methods based on the regression estimators described in Chapter 10. The chapter begins with inferential methods that allow heteroscedasticity when using a linear model. Next, methods for comparing the parameters of independent groups are described followed by two of the better methods for testing the hy...

Chapter 6 is generally concerned with issues related to multivariate data. The chapter begins with basic techniques for measuring the overall dispersion of a data cloud and how to measure the extent a particular point is nested within the cloud. These methods have practical value when testing hypotheses. Next, robust multivariate measures of locati...

Chapter 3 describes how to estimate the measures of location and scatter that were introduced in Chapter 2. Some related estimators are described and their relative merits are summarized. Included is a summary of some methods for estimating distributions, including kernel density estimators, which have practical value for reasons illustrated in sub...

Chapter 8 covers one-way, two-way and three-way designs where one or more factors involve dependent groups. R functions for dealing with global tests as well as multiple comparisons are included. Also covered are multivariate rank-based methods. Measures of effect size are described as well.

Chapter 12 describes various robust approaches to comparing groups in a manner that takes into account covariates. In contrast to classic ANCOVA methods, these techniques allow both types of heteroscedasticity in addition to dealing with outliers. The classic assumption of parallel regression lines is not required. Methods based on smoothers are co...

A recent article in this journal proposed a new method for computing a confidence interval for the population mean. Extensive simulations indicated that if the association between the population mean and the geometric mean is known, accurate confidence intervals can be computed for a wide range of situations even for a small sample size. This brief...

Studying how elite athletes satisfy multiple mechanical objectives when initiating well-practiced, goal-directed tasks provides insights into the control and dynamics of whole-body movements. This study investigated the coordination of multiple body segments and the reaction force (RF) generated during foot contact when regulating forward angular i...

PREFACE ix PREFACE 5Th Ed. There are many new and improved methods in this 5th edition. The R package written for this book provides a crude indication of how much has been added. When the 4th edition was published, it contained a little over 1200 R functions. With this 5th edition, there are now over 1700 R functions. All of the chapters have been...

Let ρ j be Pearson's correlation between Y and X j (j = 1, 2). A problem that has received considerable attention is testing H 0 : ρ 1 = ρ 2 . A well-known concern, however, is that Pearson's correlation is not robust (e.g., Wilcox, 2005), and the usual estimate of ρ j , r j has a finite sample breakdown point of only 1/n. The goal in this paper is...

Consider a two‐way ANOVA design. Generally, interactions are characterized by the difference between two measures of effect size. Typically the measure of effect size is based on the difference between measures of location, with the difference between means being the most common choice. This paper deals with extending extant results to two robust,...

Being able to remove or weigh down the influence of outlier data is desirable for any statistical models. While Magnetic and ElectroEncephaloGraphic (MEEG) data used to average trials per condition, it is now becoming common practice to use information from all trials to build linear models. Individual trials can, however, have considerable weight...

Perceiving speech in noise (SIN) is important for health and well-being and decreases with age. Musicians show improved speech-in-noise abilities and reduced age-related auditory decline, yet it is unclear whether short term music engagement has similar effects. In this randomized control trial we used a pre-post design to investigate whether a 12-...

The percentile bootstrap is the Swiss Army knife of statistics: It is a nonparametric method based on data-driven simulations. It can be applied to many statistical problems, as a substitute to standard parametric approaches, or in situations for which parametric methods do not exist. In this Tutorial, we cover R code to implement the percentile bo...

The paper deals with two issues. The first is testing the hypothesis that two dependent variables have a common variance. Currently, a heteroscedastic analog of the Morgan–Pitman test appears to perform relatively well. The paper demonstrates that when the marginal distributions differ in skewness, this method can be highly unsatisfactory. An expla...

Purpose:
To provide a reference for study design comparing intraocular lens (IOL) power calculation formulas, to show that the standard deviation of the prediction error is the single most accurate measure of outcomes, and to provide the most recent statistical methods to determine p-values for Type 1 errors.
Setting:
Baylor College of Medicine,...

In a regression context, consider p independent variables and a single dependent variable. The paper addresses two goals. The first is to determine the extent it is reasonable to make a decision about whether the largest estimate of the Winsorized correlations corresponds to the independent variable that has the largest population Winsorized correl...

Let p1,…, pJ denote the probability of a success for J independent random variables having a binomial distribution and let p(1) ≤ … ≤ p(J) denote these probabilities written in ascending order. The goal is to make a decision about which group has the largest probability of a success, p(J). Let p̂1,…, p̂J denote estimates of p1,…,pJ, respectively. T...

For a binary random variable Y, let p(x) = P(Y = 1 | X = x) for some covariate X. The goal of computing a confidence interval for p(x) is considered. In the logistic regression model, even a slight departure difficult to detect via a goodness-of-fit test can yield inaccurate results. The accuracy of a confidence interval can deteriorate as the samp...

There is an extensive literature dealing with inferences about the probability of success. A minor goal in this note is to point out when certain recommended methods can be unsatisfactory when the sample size is small. The main goal is to report results on the two-sample case. Extant results suggest using one of four methods. The results indicate w...

To summarise skewed (asymmetric) distributions, such as reaction times, typically the mean or the median are used as measures of central tendency. Using the mean might seem surprising, given that it provides a poor measure of central tendency for skewed distributions, whereas the median provides a better indication of the location of the bulk of th...

Recently, a multiple comparisons procedure was derived with the goal of determining whether it is reasonable to make a decision about which of J independent groups has the largest robust measure of location. This was done by testing hypotheses aimed at comparing the group with the largest estimate to the remaining J − 1 groups. It was demonstrated...

When dealing with the association between some random variable and two covariates, extensive experience with smoothers indicates that often a linear model poorly reflects the nature of the association. A simple approach via quantile grids that reflects the nature of the association is given. The two main goals are to illustrate this approach can ma...

A fundamental way of characterizing how two independent compares compare is in terms of the probability that a randomly sampled observation from the first group is less than a randomly sampled observation from the second group. The paper suggests a bivariate analog and investigates methods for computing confidence intervals. An interaction for a tw...

Secondhand smoke exposure is a major public health risk that is especially harmful to the developing brain, but it is unclear if early exposure affects brain structure during middle age and older adulthood. Here we analyzed brain MRI data from the UK Biobank in a population-based sample of individuals (ages 44-80) who were exposed (n = 2510) or une...

The percentile bootstrap is the Swiss Army Knife of statistics: it is a non-parametric method based on data-driven simulations. It can be applied to many statistical problems, as a substitute to standard parametric approaches, or in situations where parametric methods do not exist. In this tutorial, we cover R code to implement the percentile boots...

This paper introduces the R package WRS2 that implements various robust statistical methods. It elaborates on the basics of robust statistics by introducing robust location, dispersion, and correlation measures. The location and dispersion measures are then used in robust variants of independent and dependent samples t tests and ANOVA, including be...

The bootstrap is a versatile technique that relies on data-driven simulations to make statistical inferences. When combined with robust estimators, the bootstrap can afford much more powerful and flexible inferences than is possible with standard approaches such as t-tests on means. In this R tutorial, we use detailed illustrations of bootstrap sim...

When dealing with the association between some random variable and two covariates, extensive experience with smoothers indicates that often a linear model poorly reflects the nature of the association. A simple approach via quantile grids that reflects the nature of the association is given. The two main goals are to illustrate this approach can ma...

The paper describes a nonparametric analog of Cohen's d, Q. It is established that a confidence interval for Q can be computed via a method for computing a confidence interval for the median of D = X1 − X2, which in turn is related to making inferences about P(X1 < X2).

There is a substantial collection of robust analysis of covariance (ANCOVA) methods that effectively deals with non-normality, unequal population slope parameters, outliers, and heteroscedasticity. Some are based on the usual linear model and others are based on smoothers (nonparametric regression estimators). However, extant results are limited to...

When dealing with a logistic regression model, there is a simple method for estimating the strength of the association between the jth covariate and the dependent variable when all covariates are entered into the model. There is the issue of determining whether the jth independent variable has a stronger or weaker association than the kth independe...

To summarise skewed (asymmetric) distributions, such as reaction times, typically the mean or the median are used as measures of central tendency. Using the mean might seem surprising, given that it provides a poor measure of central tendency for skewed distributions, whereas the median provides a better indication of the location of the bulk of th...

A well‐known concern regarding the usual linear regression model is multicollinearity. As the strength of the association among the independent variables increases, the squared standard error of regression estimators tends to increase, which can seriously impact power. This paper examines heteroscedastic methods for dealing with this issue when tes...

Secondhand smoke exposure is a major public health risk that is especially harmful to the developing brain, but it is unclear if early life smoke exposure affects brain structure during middle age and older adulthood. Here we analyzed brain MRI data from the UK Biobank in a population-based sample of individuals (ages 44-80) who were exposed (n=2,5...

Let β1,…,βp be the slope parameters in a linear regression model and consider the goal of testing H0:βj=0 ( j=1,…,p). A well-known concern is that multicolinearity can inflate the standard error of the least squares estimate of βj, which in turn can result in relatively low power. The paper examines heteroscedastic methods for dealing with this iss...

This study investigates the effect of initial leg angle on horizontal jump performance. Eleven highly skilled male and female long jumpers (national and Olympic level) performed a series of horizontal jumps for distance. Within-jumper differences in initial leg angle, normalized horizontal and net vertical impulses, contact time, and average reacti...

To summarise skewed (asymmetric) distributions, such as reaction times, typically the mean or the median are used as measures of central tendency. Using the mean might seem surprising, given that it provides a poor measure of central tendency for skewed distributions, whereas the median provides a better indication of the location of the bulk of th...

The paper reviews advances and insights relevant to comparing groups when the sample sizes are small. There are conditions under which conventional, routinely used techniques are satisfactory. But major insights regarding outliers, skewed distributions, and unequal variances (heteroscedasticity) make it clear that under general conditions they prov...

A skipped correlation has the advantage of dealing with outliers in a manner that takes into account the overall structure of the data cloud. For p-variate data, p≥2, there is an extant method for testing the hypothesis of a zero correlation for each pair of variables that is designed to control the probability of one or more Type I errors. And the...

A skipped correlation has the advantage of dealing with outliers in a manner that takes into account the overall structure of the data cloud. For p-variate data, p ≥ 2, there is an extant method for testing the hypothesis of a zero correlation for each pair of variables that is designed to control the probability of one or more Type I errors. And t...

Consider three random variables Y, X1 and X2, where the typical value of Y, given X1 and X2, is given by some unknown function m(X1, X2). A goal is to determine which of the two independent variables is most important when both variables are included in the model. Let τ¹ denote the strength of the association associated with Y and X1, when X2 is in...

There is a vast array of new and improved methods for comparing groups and studying associations that offer the potential for substantially increasing power, providing improved control over the probability of a Type I error, and yielding a deeper and more nuanced understanding of data. These new techniques effectively deal with four insights into w...

Chapter 12 describes various robust approaches to ANCOVA (analysis of covariance). These techniques allow both types of heteroscedasticity in addition to dealing with outliers. The classic assumption of parallel regression lines is not required. Methods that allow curvature are described as well. Several methods for plotting the data are described...

Chapter 8 covers one-way, two-way and three-way designs where one or more factors involve dependent groups. R functions for dealing with global tests as well as multiple comparisons are included. Also covered are multivariate rank-based methods.

Chapter 9 describes robust measures of association. Several types are available and their relative merits are discussed. Inferential methods based on these measures of association are described as well. Although Pearson's correlation is not robust, a method for testing the hypothesis that Pearson's correlation is equal to zero is included that allo...

Chapter 6 is generally concerned with issues related to multivariate data. The chapter begins with basic techniques for measuring the overall dispersion of a data cloud and how to measure the extent a particular point is nested within the cloud. These methods have practical value when testing hypotheses. Next, robust multivariate measures of locati...

Chapter 11 describes a variety of inferential methods based on the regression estimators described in Chapter 10. The chapter begins with inferential methods that allow heteroscedasticity when using a linear model. Next, methods for comparing the parameters of independent groups are described followed by two of the better methods for testing the hy...

Chapter 2 introduces the basic mathematical tools for judging the robustness of parameters. It is demonstrated that the population mean is not robust. Robust measures of location and scatter are introduced.

Chapter 7 covers one-way, two-way and three-way designs dealing independent groups. Multiple comparisons are covered as are nested designs. R functions are described that can handle the usual global hypotheses associated with main effects and interactions, all pairwise comparisons as well as all linear contrasts relevant to main effects and interac...

Chapter 3 describes how to estimate the measures of location and scatter that were introduced in Chapter 2. Some related estimators are described and their relative merits are summarized. Included is a summary of some methods for estimating distributions, including kernel density estimators, which have practical value for reasons illustrated in sub...

Chapter 5 describes robust methods for comparing two distributions. Compared to traditional methods for comparing means, modern robust methods offer substantial gains in power under fairly general conditions. Included are methods for comparing both independent and dependent groups. The chapter begins with methods for comparing multiple quantiles. T...

Chapter 10 summarizes a wide range of robust regression estimators. Their relative merits are discussed. Generally, these estimators deal effectively with regression outliers and leverage points. Some can offer a substantial advantage, in terms of efficiency, when there is heteroscedasticity. Included are robust versions of logistic regression and...

Many nonparametric regression estimators (smoothers) have been proposed that provide a more flexible method for estimating the true regression line compared to using some of the more obvious parametric models. A basic goal when using any smoother is computing a confidence band for the true regression line. Let M(Y|X) be some conditional measure of...

There is a vast array of new and improved methods for comparing groups and studying associations that offer the potential for substantially increasing power, providing improved control over the probability of a Type I error, and yielding a deeper and more nuanced understanding of neuroscience data. These new techniques effectively deal with four in...

If many changes are necessary to improve the quality of neuroscience research, one relatively simple step could have great pay-offs: to promote the adoption of detailed graphical methods, combined with robust inferential statistics. Here we illustrate how such methods can lead to a much more detailed understanding of group differences than bar grap...

Paper describing the utility and application of various robust methods in R

If many changes are necessary to improve the quality of neuroscience research, one relatively simple step could have great pay-offs: to promote the adoption of detailed graphical methods, combined with robust inferential statistics. Here we illustrate how such methods can lead to a much more detailed understanding of group differences than bar grap...

## Projects

Projects (3)

Heteroscedastic confidence band for smoothers that have some specified simultaneous probability coverage. Also working on the 2nd ed. of my CRC book.

Let M(Y|X) be some measure of location Y given X. Smoothers provide a flexible way of estimating M(Y|X) that can improve upon the more obvious parametric models. Confidence intervals for M(Y|X) can be computed but they do not control the simultaneous probability coverage and the better-known methods assume homoscedasticity. Working on methods that correct this.