Testing logistic regression coefficients with clustered data and few positive outcomes
ABSTRACT Applications frequently involve logistic regression analysis with clustered data where there are few positive outcomes in some of the independent variable categories. For example, an application is given here that analyzes the association of asthma with various demographic variables and risk factors using data from the third National Health and Nutrition Examination Survey, a weighted multi stage cluster sample. Although there are 742 asthma cases in all (out of 18,395 individuals), for one of the categories of one of the independent variables there are only 25 asthma cases (out of 695 individuals). Generalized Wald and score hypothesis tests, which use appropriate cluster-level variance estimators, and a bootstrap hypothesis test have been proposed for testing logistic regression coefficients with cluster samples. When there are few positive outcomes, simulations presented in this paper show that these tests can sometimes have either inflated or very conservative levels. A simulation-based method is proposed for testing logistic regression coefficients with cluster samples when there are few positive outcomes. This testing methodology is shown to compare favorably with the generalized Wald and score tests and the bootstrap hypothesis test in terms of maintaining nominal levels. The proposed method is also useful when testing goodness-of-fit of logistic regression models using deciles-of-risk tables.
[Show abstract] [Hide abstract]
ABSTRACT: The delete-a-group jackknife is sometimes used when estimating the variances of statistics based on a large sample. We investigate heavily poststratified estimators for a population mean and a simple regression coefficient, where both full-sample and domain estimates are of interest. The delete-a-group (DAG) jackknife employing 30, 60, and 100 replicates is found to be highly unstable, even for large sample sizes. The empirical degrees of freedom of these DAG jackknives are usually much less than their nominal degrees of freedom. This analysis calls into question whether coverage intervals derived from replication-based variance estimators can be trusted for highly calibrated estimates.Communication in Statistics- Simulation and Computation 06/2014; 43(10). DOI:10.1080/03610918.2012.762392 · 0.29 Impact Factor