Article

Relaxing the Rule of Ten Events per Variable in Logistic and Cox Regression

Department of Epidemiology and Biostatistics, University of California-San Francisco, 185 Berry Street, San Francisco, CA 94107, USA.
American Journal of Epidemiology (Impact Factor: 4.98). 04/2007; 165(6):710-8. DOI: 10.1093/aje/kwk052
Source: PubMed

ABSTRACT The rule of thumb that logistic and Cox models should be used with a minimum of 10 outcome events per predictor variable (EPV),
based on two simulation studies, may be too conservative. The authors conducted a large simulation study of other influences
on confidence interval coverage, type I error, relative bias, and other model performance measures. They found a range of
circumstances in which coverage and bias were within acceptable levels despite less than 10 EPV, as well as other factors
that were as influential as or more influential than EPV. They conclude that this rule can be relaxed, in particular for sensitivity
analyses undertaken to demonstrate adequate control of confounding.

1 Follower
 · 
169 Views
  • Source
    • "Others even suggest a stricter 1:20 ratio, though these recommendations may differ according to the model type being used [43] [44]. For example, it has been suggested that the 1:10 rule may actually be relaxed in the case of logistic regression [64], which is noteworthy given the prevalence of this model type in SFRT research. Nevertheless, it is of interest to refer to columns D and E of Table 1 "
    [Show abstract] [Hide abstract]
    ABSTRACT: The field of fall risk testing using wearable sensors is bustling with activity. In this Letter, the authors review publications which incorporated features extracted from sensor signals into statistical models intended to estimate fall risk or predict falls in older people. A review of these studies raises concerns that this body of literature is presenting over-optimistic results in light of small sample sizes, questionable modelling decisions and problematic validation methodologies (e.g. inherent problems with the overly-popular cross-validation technique, lack of external validation). There seem to be substantial issues in the feature selection process, whereby researchers select features before modelling begins based on their relation to the target, and either perform no validation or test the models on the same data used for their training. This, together with potential issues related to the large number of features and their correlations, inevitably leads to models with inflated accuracy that are unlikely to maintain their reported performance during everyday use in relevant populations. Indeed, the availability of rich sensor data and many analytical options provides intellectual and creative freedom for researchers, but should be treated with caution, and such pitfalls must be avoided if we desire to create generalisable prognostic tools of any clinical value.
  • Source
    • "Our brood survival data contained the lowest number of events. Consequently, we modified selection of brood survival models by only fitting models with three or fewer variables to maintain acceptable model performance (Vittinghoff and McCulloch 2006) at both levels of model selection. Within the nest survival and female survival variable subsets and final set, we did not exceed four variables per model, without modification; because many uninformative predictor variables were already removed through the screening process (85% CI included 0). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Conserving a declining species that is facing many threats, including overlap of its habitats with energy extraction activities, depends upon identifying and prioritizing the value of the habitats that remain. In addition, habitat quality is often compromised when source habitats are lost or fragmented due to anthropogenic development. Our objective was to build an ecological model to classify and map habitat quality in terms of source or sink dynamics for Greater Sage-Grouse (Centrocercus urophasianus) in the Atlantic Rim Project Area (ARPA), a developing coalbed natural gas field in south-central Wyoming, USA. We used occurrence and survival modeling to evaluate relationships between environmental and anthropogenic variables at multiple spatial scales and for all female summer life stages, including nesting, brood-rearing, and non-brooding females. For each life stage, we created resource selection functions (RSFs). We weighted the RSFs and combined them to form a female summer occurrence map. We modeled survival also as a function of spatial variables for nest, brood, and adult female summer survival. Our survival models were mapped as survival probability functions individually and then combined with fixed vital rates in a fitness metric model that, when mapped, predicted habitat productivity (productivity map). Our results demonstrate a suite of environmental and anthropogenic variables at multiple scales that were predictive of occurrence and survival. We created a source–sink map by overlaying our female summer occurrence map and productivity map to predict habitats contributing to population surpluses (source habitats) or deficits (sink habitat) and low-occurrence habitats on the landscape. The source–sink map predicted that of the Sage-Grouse habitat within the ARPA, 30% was primary source, 29% was secondary source, 4% was primary sink, 6% was secondary sink, and 31% was low occurrence. Our results provide evidence that energy development and avoidance of energy infrastructure were probably reducing the amount of source habitat within the ARPA landscape. Our source–sink map provides managers with a means of prioritizing habitats for conservation planning based on source and sink dynamics. The spatial identification of high value (i.e., primary source) as well as suboptimal (i.e., primary sink) habitats allows for informed energy development to minimize effects on local wildlife populations.
    Ecological Applications 06/2015; 25(4):968-990. DOI:10.1890/13-1152.1 · 4.13 Impact Factor
  • Source
    • "Our brood survival data contained the lowest number of events. Consequently, we modified selection of brood survival models by only fitting models with three or fewer variables to maintain acceptable model performance (Vittinghoff and McCulloch 2006) at both levels of model selection. Within the nest survival and female survival variable subsets and final set, we did not exceed four variables per model, without modification; because many uninformative predictor variables were already removed through the screening process (85% CI included 0). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Conserving a declining species that is facing many threats, including overlap of its habitats with energy extraction activities, depends upon identifying and prioritizing the value of the habitats that remain. In addition, habitat quality is often compromised when source habitats are lost or fragmented due to anthropogenic development. Our objective was to build an ecological model to classify and map habitat quality in terms of source or sink dynamics for Greater Sage-Grouse (Centrocercus urophasianus) in the Atlantic Rim Project Area (ARPA), a developing coalbed natural gas field in south-central Wyoming, USA. We used occurrence and survival modeling to evaluate relationships between environmental and anthropogenic variables at multiple spatial scales and for all female summer life stages, including nesting, brood-rearing, and non-brooding females. For each life stage, we created resource selection functions (RSFs). We weighted the RSFs and combined them to form a female summer occurrence map. We modeled survival also as a function of spatial variables for nest, brood, and adult female summer survival. Our survival models were mapped as survival probability functions individually and then combined with fixed vital rates in a fitness metric model that, when mapped, predicted habitat productivity (productivity map). Our results demonstrate a suite of environmental and anthropogenic variables at multiple scales that were predictive of occurrence and survival. We created a source–sink map by overlaying our female summer occurrence map and productivity map to predict habitats contributing to population surpluses (source habitats) or deficits (sink habitat) and low-occurrence habitats on the landscape. The source–sink map predicted that of the Sage-Grouse habitat within the ARPA, 30% was primary source, 29% was secondary source, 4% was primary sink, 6% was secondary sink, and 31% was low occurrence. Our results provide evidence that energy development and avoidance of energy infrastructure were probably reducing the amount of source habitat within the ARPA landscape. Our source–sink map provides managers with a means of prioritizing habitats for conservation planning based on source and sink dynamics. The spatial identification of high value (i.e., primary source) as well as suboptimal (i.e., primary sink) habitats allows for informed energy development to minimize effects on local wildlife populations.
    Ecological Applications 05/2015; 25(4):968-990. DOI:10.1890/13-1152.1.sm · 4.13 Impact Factor
Show more