Relaxing the Rule of Ten Events per Variable in Logistic and Cox Regression

Department of Epidemiology and Biostatistics, University of California-San Francisco, 185 Berry Street, San Francisco, CA 94107, USA.
American Journal of Epidemiology (Impact Factor: 5.23). 04/2007; 165(6):710-8. DOI: 10.1093/aje/kwk052
Source: PubMed

ABSTRACT The rule of thumb that logistic and Cox models should be used with a minimum of 10 outcome events per predictor variable (EPV),
based on two simulation studies, may be too conservative. The authors conducted a large simulation study of other influences
on confidence interval coverage, type I error, relative bias, and other model performance measures. They found a range of
circumstances in which coverage and bias were within acceptable levels despite less than 10 EPV, as well as other factors
that were as influential as or more influential than EPV. They conclude that this rule can be relaxed, in particular for sensitivity
analyses undertaken to demonstrate adequate control of confounding.

1 Follower
53 Reads
    • "Various rules of thumb have been proposed for how many subjects one needs to include in a regression analysis, varying from 5 to 25 subjects for each predictor (Green, 1991). Data has showed that when a statistically significant association is found in a logistic regression with a number of events per variable near the lower limit of five, " only a minor degree of extra caution is warranted " (Vittinghoff & McCulloch, 2007, p. 717). A maximum of 10 predictors could thus be analyzed in the present study. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Cognitive behavioral therapy produces significant and long-lasting improvement for individuals with insomnia, but treatment resources are scarce. A "stepped care" approach has therefore been proposed, but knowledge is limited on how to best allocate patients to different treatment steps. In this study, 66 primary-care patients with insomnia attended a low-end treatment step: manual-guided cognitive behavioral therapy (CBT) for insomnia delivered by ordinary primary-care personnel. Based on clinically significant treatment effects, subjects were grouped into treatment responders or nonresponders. Baseline data were analyzed to identify predictors for treatment success. Long total sleep time at baseline assessment was the only statistically significant predictor for becoming a responder, and sleep time may thus be important to consider before enrolling patients in low-end treatments.
    Behavioral Sleep Medicine 08/2015; DOI:10.1080/15402002.2015.1007995 · 2.34 Impact Factor
  • Source
    • "Past studies comparing three groups of people with similar populations suggested a medium effect size of 0.05 (Clark-Carter, 2010). It has been recommended that a minimum ratio of ten participants to one predictor variable is used for multiple regression analyses (Tabachnick & Fidell, 2001) and for logistic regression consideration is given to potential group size, with the ratio of 10:1 applied to the anticipated smallest group (Vittinghoff & McCulloch, 2007). Applying the null hypothesis of an equal distribution between groups this would suggest with the four predictor variables a sample size of 40 in each group. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Existing evidence suggests that people with intellectual disabilities are vulnerable to low self-esteem leading to additional psychosocial issues such as social exclusion and stress. Previous research into the involvement of Special Olympics (SO) of people with intellectual disabilities has indicted positive psychosocial outcomes. Involvement in sport is known generally to have psychological and social benefits. This study aimed to compare the psychosocial impact of involvement in sport through the SO to no or limited sports involvement, for a sample of people with intellectual disabilities. A cross sectional design was employed comparing three groups, SO, Mencap Sports, and Mencap No Sports on the variables: Self-esteem, quality of life, stress levels and social networks. One hundred and one participants were recruited either through the SO or Mencap. Data were collected through the completion of validated questionnaires by one to one interviews with the participants. Analysis revealed that self-esteem, quality of life, and stress were all significantly associated with SO involvement. Logistic regression analysis was used to explore whether scores on these variables were able to predict group membership. Self-esteem was found to be a significant predictor of group membership, those in the SO having the highest self-esteem. The findings provide further evidence of a positive association between sport involvement and increased psychological wellbeing, especially for those involved in the SO. The implications of these findings for practice and future research into the relationship between sport and psychological wellbeing within the learning disabled population are considered. Copyright © 2015 Elsevier Ltd. All rights reserved.
    Research in Developmental Disabilities 08/2015; 45-46. DOI:10.1016/j.ridd.2015.07.009 · 3.40 Impact Factor
  • Source
    • "Others even suggest a stricter 1:20 ratio, though these recommendations may differ according to the model type being used [43] [44]. For example, it has been suggested that the 1:10 rule may actually be relaxed in the case of logistic regression [64], which is noteworthy given the prevalence of this model type in SFRT research. Nevertheless, it is of interest to refer to columns D and E of Table 1 "
    [Show abstract] [Hide abstract]
    ABSTRACT: The field of fall risk testing using wearable sensors is bustling with activity. In this Letter, the authors review publications which incorporated features extracted from sensor signals into statistical models intended to estimate fall risk or predict falls in older people. A review of these studies raises concerns that this body of literature is presenting over-optimistic results in light of small sample sizes, questionable modelling decisions and problematic validation methodologies (e.g. inherent problems with the overly-popular cross-validation technique, lack of external validation). There seem to be substantial issues in the feature selection process, whereby researchers select features before modelling begins based on their relation to the target, and either perform no validation or test the models on the same data used for their training. This, together with potential issues related to the large number of features and their correlations, inevitably leads to models with inflated accuracy that are unlikely to maintain their reported performance during everyday use in relevant populations. Indeed, the availability of rich sensor data and many analytical options provides intellectual and creative freedom for researchers, but should be treated with caution, and such pitfalls must be avoided if we desire to create generalisable prognostic tools of any clinical value.
    08/2015; 2(4):79 – 88. DOI:10.1049/htl.2015.0019
Show more


53 Reads