Editing data: what difference do consistency checks make?

Chronic Disease Epidemiology Section, Bureau of Epidemiology, Division of Disease Control, Florida Department of Health, Tallahassee 32399-1734, USA.
American Journal of Epidemiology (Impact Factor: 4.98). 06/2000; 151(9):921-6. DOI: 10.1093/oxfordjournals.aje.a010296
Source: PubMed

ABSTRACT In 1998, the Florida Department of Health undertook a self-administered school-based survey of tobacco use, attitudes, and behaviors among nearly 23,000 public school students in grades 6-12. The survey design did not use skip patterns; therefore, students had multiple opportunities to contradict themselves. By using examples from the high school portion (grades 9-12) of the survey, the authors examined five possible approaches to handling data inconsistencies and the effect that each has on point estimates. Use of these approaches resulted in point estimates of current cigarette use ranging from 25.6% to 29.7%. The number of missing respondents varied from 33 (less than 1%) to 1,374 (13%), depending on which approach was used. After stratification by gender and race, the prevalence estimates changed marginally for girls but strikingly for boys. Non-Hispanic White students were substantially more likely than non-Hispanic Black students to report current cigarette use, but the magnitude of this difference varied significantly according to the analytical approach used. The approach used to check data consistency may influence point estimates and comparability with other studies. Therefore, this issue should be addressed when findings are reported.

  • [Show abstract] [Hide abstract]
    ABSTRACT: To understand how methodological factors influence prevalence estimates of health-risk behaviors obtained from surveys, we examined the effect of varying question wording and honesty appeals while holding other aspects of the surveys constant. A convenience sample of students (n = 4140) in grades 9 through 12 was randomly assigned to complete one of six versions of a paper-and-pencil questionnaire in classrooms. Each questionnaire version represented a different combination of honesty appeal (standard vs. strong) and questionnaire type. The questionnaire types varied in wording and in the number of questions assessing particular types of behaviors. The questionnaires were based on those used in three national surveys--the Youth Risk Behavior Survey, Monitoring the Future, and the National Household Survey on Drug Abuse. Logistic regression analyses examined how responses to each survey question assessing behavior were associated with questionnaire type, honesty appeal, and the interaction of those two variables. Among 32 behaviors with different question wording across questionnaire types, 12 showed a significant effect of questionnaire type. Among 45 behaviors with identical question wording across questionnaire types, five showed a significant main effect of questionnaire type. Among all 77 behaviors, one showed a significant main effect for honesty appeal and two showed a significant interaction between honesty appeal and questionnaire type. When population, setting, questionnaire context, mode of administration, and data-editing protocols are held constant, differences in question wording can create statistically significant differences in some prevalence estimates. Varying honesty appeals does not have an effect on prevalence estimates.
    Journal of Adolescent Health 09/2004; 35(2):91-100. DOI:10.1016/j.jadohealth.2003.08.013 · 2.75 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: To identify characteristics associated with youth bidi use. The New Jersey Youth Tobacco Survey is a self-administered school-based survey that uses a 2-stage cluster sample design to obtain a representative statewide sample; 9589 students (grades 7-12) participated. Logistic regression was used to generate an adjusted odds ratio (OR) for current bidi use for each variable, controlling for gender, race, and school grade. Higher odds for current bidi use were noted for black and Hispanic students, users of other tobacco products, and students that perceived bidis as safer than cigarettes. These results suggest specific groups that should be targeted for intervention.
    American journal of health behavior 03/2004; 28(2):173-9. DOI:10.5993/AJHB.28.2.8 · 1.31 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Accuracy of self-reported data may be improved by data editing, a mechanism to produce accurate information by excluding inconsistent data based on a set number of predetermined decision rules. We compared data editing methods in the Global Youth Tobacco Survey (GYTS) with other editing approaches and evaluated the effects of these on smoking prevalence estimates. We evaluated 5 approaches for handling inconsistent responses to questions regarding cigarette use: GYTS, do-nothing, gatekeeper, global, and preponderance. Compared with GYTS data edits, the do-nothing and gatekeeper approaches produced similar estimates, whereas the global approach resulted in lower estimates and the preponderance approach, higher estimates. Implications for researchers using GYTS include recognition of the survey's data editing methods and documentation in their study methods to ensure cross-study comparability.
    Preventing chronic disease 03/2013; 10:E38. DOI:10.5888/pcd10.120202 · 1.96 Impact Factor

Full-text (2 Sources)

Available from
May 17, 2014