A novel framework for validating and applying standardized small area measurement strategies
ABSTRACT Local measurements of health behaviors, diseases, and use of health services are critical inputs into local, state, and national decision-making. Small area measurement methods can deliver more precise and accurate local-level information than direct estimates from surveys or administrative records, where sample sizes are often too small to yield acceptable standard errors. However, small area measurement requires careful validation using approaches other than conventional statistical methods such as in-sample or cross-validation methods because they do not solve the problem of validating estimates in data-sparse domains.
A new general framework for small area estimation and validation is developed and applied to estimate Type 2 diabetes prevalence in US counties using data from the Behavioral Risk Factor Surveillance System (BRFSS). The framework combines the three conventional approaches to small area measurement: (1) pooling data across time by combining multiple survey years; (2) exploiting spatial correlation by including a spatial component; and (3) utilizing structured relationships between the outcome variable and domain-specific covariates to define four increasingly complex model types - coined the Naive, Geospatial, Covariate, and Full models. The validation framework uses direct estimates of prevalence in large domains as the gold standard and compares model estimates against it using (i) all available observations for the large domains and (ii) systematically reduced sample sizes obtained through random sampling with replacement. At each sampling level, the model is rerun repeatedly, and the validity of the model estimates from the four model types is then determined by calculating the (average) concordance correlation coefficient (CCC) and (average) root mean squared error (RMSE) against the gold standard. The CCC is closely related to the intraclass correlation coefficient and can be used when the units are organized in groups and when it is of interest to measure the agreement between units in the same group (e.g., counties). The RMSE is often used to measure the differences between values predicted by a model or an estimator and the actually observed values. It is a useful measure to capture the precision of the model or estimator.
All model types have substantially higher CCC and lower RMSE than the direct, single-year BRFSS estimates. In addition, the inclusion of relevant domain-specific covariates generally improves predictive validity, especially at small sample sizes, and their leverage can be equivalent to a five- to tenfold increase in sample size.
Small area estimation of important health outcomes and risk factors can be improved using a systematic modeling and validation framework, which consistently outperformed single-year direct survey estimates and demonstrated the potential leverage of including relevant domain-specific covariates compared to pure measurement models. The proposed validation strategy can be applied to other disease outcomes and risk factors in the US as well as to resource-scarce situations, including low-income countries. These estimates are needed by public health officials to identify at-risk groups, to design targeted prevention and intervention programs, and to monitor and evaluate results over time.
Full-textDOI: · Available from: Ali Mokdad, May 29, 2015
SourceAvailable from: PubMed Central[Show abstract] [Hide abstract]
ABSTRACT: The Indian Janani Suraksha Yojana (JSY) program is a demand-side program in which the state pays women a cash incentive to deliver in an institution, with the aim of reducing maternal mortality. The JSY has had 54 million beneficiaries since inception 7 years ago. Although a number of studies have demonstrated the effect of JSY on coverage, few have examined the direct impact of the program on maternal mortality. To study the impact of JSY on maternal mortality in Madhya Pradesh (MP), one of India's largest provinces. By synthesizing data from various sources, district-level maternal mortality ratios (MMR) from 2005 to 2010 were estimated using a Bayesian spatio-temporal model. Based on these, a mixed effects multilevel regression model was applied to assess the impact of JSY. Specifically, the association between JSY intensity, as reflected by 1) proportion of JSY-supported institutional deliveries, 2) total annual JSY expenditure, and 3) MMR, was examined. The proportion of all institutional deliveries increased from 23.9% in 2005 to 55.9% in 2010 province-wide. The proportion of JSY-supported institutional deliveries rose from 14% (2005) to 80% (2010). MMR declines in the districts varied from 2 to 35% over this period. Despite the marked increase in JSY-supported delivery, our multilevel models did not detect a significant association between JSY-supported delivery proportions and changes in MMR in the districts. The results from the analysis examining the association between MMR and JSY expenditure are similar. Our analysis was unable to detect an association between maternal mortality reduction and the JSY in MP. The high proportion of institutional delivery under the program does not seem to have converted to lower mortality outcomes. The lack of significant impact could be related to supply-side constraints. Demand-side programs like JSY will have a limited effect if the supply side is unable to deliver care of adequate quality.Global Health Action 12/2014; 7:24939. DOI:10.3402/gha.v7.24939 · 1.65 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: Small area estimation is a statistical technique used to produce reliable estimates for smaller geographic areas than those for which the original surveys were designed. Such small area estimates (SAEs) often lack rigorous external validation. In this study, we validated our multilevel regression and poststratification SAEs from 2011 Behavioral Risk Factor Surveillance System data using direct estimates from 2011 Missouri County-Level Study and American Community Survey data at both the state and county levels. Coefficients for correlation between model-based SAEs and Missouri County-Level Study direct estimates for 115 counties in Missouri were all significantly positive (0.28 for obesity and no health-care coverage, 0.40 for current smoking, 0.51 for diabetes, and 0.69 for chronic obstructive pulmonary disease). Coefficients for correlation between model-based SAEs and American Community Survey direct estimates of no health-care coverage were 0.85 at the county level (811 counties) and 0.95 at the state level. Unweighted and weighted model-based SAEs were compared with direct estimates; unweighted models performed better. External validation results suggest that multilevel regression and poststratification model-based SAEs using single-year Behavioral Risk Factor Surveillance System data are valid and could be used to characterize geographic variations in health indictors at local levels (such as counties) when high-quality local survey data are not available. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.American journal of epidemiology 05/2015; DOI:10.1093/aje/kwv002 · 4.98 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: Despite staggering investments made in unraveling the human genome, current estimates suggest that as much as 90% of the variance in cancer and chronic diseases can be attributed to factors outside an individual's genetic endowment, particularly to environmental exposures experienced across his or her life course. New analytical approaches are clearly required as investigators turn to complicated systems theory and ecological, place-based and life-history perspectives in order to understand more clearly the relationships between social determinants, environmental exposures and health disparities. While traditional data analysis techniques remain foundational to health disparities research, they are easily overwhelmed by the ever-increasing size and heterogeneity of available data needed to illuminate latent gene x environment interactions. This has prompted the adaptation and application of scalable combinatorial methods, many from genome science research, to the study of population health. Most of these powerful tools are algorithmically sophisticated, highly automated and mathematically abstract. Their utility motivates the main theme of this paper, which is to describe real applications of innovative transdisciplinary models and analyses in an effort to help move the research community closer toward identifying the causal mechanisms and associated environmental contexts underlying health disparities. The public health exposome is used as a contemporary focus for addressing the complex nature of this subject.International Journal of Environmental Research and Public Health 10/2014; 11(10):10419–10443. DOI:10.3390/ijerph111010419 · 1.99 Impact Factor