Modeling Count Outcomes from HIV Risk Reduction Interventions: A Comparison of Competing Statistical Models for Count Responses

Department of Biostatistics and Computational Biology, Box 630, University of Rochester, 265 Crittenden Boulevard, Rochester, NY 14642, USA.
AIDS research and treatment 03/2012; 2012(2):593569. DOI: 10.1155/2012/593569
Source: PubMed


Modeling count data from sexual behavioral outcomes involves many challenges, especially when the data exhibit a preponderance of zeros and overdispersion. In particular, the popular Poisson log-linear model is not appropriate for modeling such outcomes. Although alternatives exist for addressing both issues, they are not widely and effectively used in sex health research, especially in HIV prevention intervention and related studies. In this paper, we discuss how to analyze count outcomes distributed with excess of zeros and overdispersion and introduce appropriate model-fit indices for comparing the performance of competing models, using data from a real study on HIV prevention intervention. The in-depth look at these common issues arising from studies involving behavioral outcomes will promote sound statistical analyses and facilitate research in this and other related areas.

Download full-text


Available from: Yinglin Xia,
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe a general framework for modeling and stochastic simulation of epidemics in realistic dynamic social networks, which incorporates heterogeneity in the types of individuals, types of interconnecting risk-bearing relationships, and types of pathogens transmitted across them. Dynamism is supported through arrival and departure processes, continuous restructuring of risk relationships, and changes to pathogen infectiousness, as mandated by natural history; dynamism is regulated through constraints on the local agency of individual nodes and their risk behaviors, while simulation trajectories are validated using system-wide metrics. To illustrate its utility, we present a case study that applies the proposed framework towards a simulation of HIV in artificial networks of intravenous drug users (IDUs) modeled using data collected in the Social Factors for HIV Risk survey.
    Simulation 04/2014; 90(4):460-484. DOI:10.1177/0037549714526947 · 0.82 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In psychosocial and behavioral studies count outcomes recording the frequencies of the occurrence of some health or behavior outcomes (such as the number of unprotected sexual behaviors during a period of time) often contain a preponderance of zeroes because of the presence of 'structural zeroes' that occur when some subjects are not at risk for the behavior of interest. Unlike random zeroes (responses that can be greater than zero, but are zero due to sampling variability), structural zeroes are usually very different, both statistically and clinically. False interpretations of results and study findings may result if differences in the two types of zeroes are ignored. However, in practice, the status of the structural zeroes is often not observed and this latent nature complicates the data analysis. In this article, we focus on one model, the zero-inflated Poisson (ZIP) regression model that is commonly used to address zero-inflated data. We first give a brief overview of the issues of structural zeroes and the ZIP model. We then given an illustration of ZIP with data from a study on HIV-risk sexual behaviors among adolescent girls. Sample codes in SAS and Stata are also included to help perform and explain ZIP analyses.
    Shanghai Archives of Psychiatry 08/2014; 26(4):236-42. DOI:10.3969/j.issn.1002-0829.2014.04.008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Typical data in a microbiome study consist of the operational taxonomic unit (OTU) counts that have the characteristic of excess zeros, which are often ignored by investigators. In this paper, we compare the performance of different competing methods to model data with zero inflated features through extensive simulations and application to a microbiome study. These methods include standard parametric and non-parametric models, hurdle models, and zero inflated models. We examine varying degrees of zero inflation, with or without dispersion in the count component, as well as different magnitude and direction of the covariate effect on structural zeros and the count components. We focus on the assessment of type I error, power to detect the overall covariate effect, measures of model fit, and bias and effectiveness of parameter estimations. We also evaluate the abilities of model selection strategies using Akaike information criterion (AIC) or Vuong test to identify the correct model. The simulation studies show that hurdle and zero inflated models have well controlled type I errors, higher power, better goodness of fit measures, and are more accurate and efficient in the parameter estimation. Besides that, the hurdle models have similar goodness of fit and parameter estimation for the count component as their corresponding zero inflated models. However, the estimation and interpretation of the parameters for the zero components differs, and hurdle models are more stable when structural zeros are absent. We then discuss the model selection strategy for zero inflated data and implement it in a gut microbiome study of > 400 independent subjects.
    PLoS ONE 07/2015; 10(7). DOI:10.1371/journal.pone.0129606 · 3.23 Impact Factor