Goodness-of-fit test for logistic regression models when data are collected using a complex sample design

Division of Epidemiology and Biostatistics and Department of Statistics, School of Public Health, The Ohio State University, 320 West Tenth Ave., M200 Starling-Loving Hall, Columbus, OH 43210, USA
Computational Statistics & Data Analysis (Impact Factor: 1.4). 05/2007; 51(9):4450-4464. DOI: 10.1016/j.csda.2006.07.006
Source: RePEc


Logistic regression models are frequently used in epidemiological studies for estimating associations that demographic, behavioral, and risk factor variables have on a dichotomous outcome, such as disease being present versus absent. After the coefficients in a logistic regression model have been estimated, goodness-of-fit of the resulting model should be examined, particularly if the purpose of the model is to estimate probabilities of event occurrences. While various goodness-of-fit tests have been proposed, the properties of these tests have been studied under the assumption that observations selected were independent and identically distributed. Increasingly, epidemiologists are using large-scale sample survey data when fitting logistic regression models, such as the National Health Interview Survey or the National Health and Nutrition Examination Survey. Unfortunately, for such situations no goodness-of-fit testing procedures have been developed or implemented in available software. To address this problem, goodness-of-fit tests for logistic regression models when data are collected using complex sampling designs are proposed. Properties of the proposed tests were examined using extensive simulation studies and results were compared to traditional goodness-of-fit tests. A Stata ado function svylogitgof for estimating the F-adjusted mean residual test after svylogit fit is available at the author's website

Download full-text


Available from: Kellie Archer, May 19, 2015
  • Source
    • "Due to the complex sampling designs, existing software was not yet developed or implemented for these tests based on the logistic regression. According to Archera et al.[26], available software usually takes the form of simulation studies in which results of analysis were compared with ordinary goodness-of-fit statistics. For instance, the Hosmer and Lemeshow goodness-of-fit test statistic, the Pearson residual, and the deviance residual test are not yet incorporated in the Survey Logistic procedure . "
    [Show abstract] [Hide abstract]
    ABSTRACT: This study aims to determine predictors of food insecurity in a typical setting where resilience of population is weakened as a result of protracted crises. South Sudan is used as a case study. The rationale of the study is anchored on the perception that food insecurity risk is a function of weak resilience, which in turn is a function of the absence of a combination of certain characteristics and livelihood endowments of a household or a population. Analysis explores the use of SAS ® SURVEYLOGISTIC procedure, as it has been established to be useful in analysis of data from sample surveys. The procedure is known for its valid statistical inference. Employing a survey logistic model with generalised logit link function determined all fitted fixed effects to be statistically significant. Analysis showed that characteristics of households and agriculture (including livestock and fishing) were typically associated with acceptable level of food consumption and implying that the absence of these factors demonstrated weakened resilience and thus increased risk of food insecurity. Analysis also examined the odds of each level of fixed effect compared to the reference level in relation to the food consumption score (the response or outcome variable). Findings were interesting, but largely confirmed what was expected (see Table 5). For instance, it was found out that households headed by younger adults aged 17 years or less fared three times worse than those aged 60 years and above. It was also shown that smaller households fared better than larger ones. The odds of a household with three or less members were twice as worse as those with seven or more members. We conclude that the method exerted reasonable statistical efficiency for fulfilling the study end, thus providing sufficient evidence for food security analysts and development policy makers in the course of developing appropriate interventions for early preparedness and crises response.
    Preview · Article · Dec 2016 · Agriculture and Food Security
  • Source
    • "For binary outcomes, we assessed the assumption of correct fit using Hosmer- Lemeshow tests (Hosmer & Lemeshow, 2000). We ran the tests as if the data were unstructured, since the tests are computationally unavailable for survey data (Archer, Lemeshow, & Hosmer, 2007). Overall, non-significant results were obtained; indicating no evidence to suggest that the correct fit assumption was violated. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Communication technologies are often proposed to level the playing field for individuals with disabilities, but the benefits may be magnified for deaf individuals in particular due to the communication barriers experienced by these individuals. In this paper, we set out to test the assumption that increased engagement with communication technology, specifically computer-mediated communication, during adolescence would contribute to actual attainment gains in adult life for deaf individuals in three domains: life, education, and employment. A secondary analysis using the National Longitudinal Transition Study 2 (NLTS2) was conducted, allowing for a longitudinal examination of deaf individuals' experiences in the transition from adolescence to adulthood. Findings revealed that deaf individuals who engaged with computer-mediated communication at higher frequencies during adolescence did not reveal discernible gains in adult life attainments in any domain. We propose that the benefits of communication technology only go so far, and that achieving greater equitable outcomes for deaf individuals requires larger systemic change.
    Full-text · Article · Dec 2015
  • Source
    • "It is defined as an evaluation of how well model predicted outcomes agree with the observed data [2]. GOF test, on the other hand, widely referred to as lack-of-fit, because it is measuring how far the model is from the data; more than how much the model is good [3]. Omitted predictors, a misspecified form of the predictor, or an inappropriate link function can all result in poor fitting. "

    Preview · Article · Oct 2015
Show more