A novel framework for validating and applying standardized small area measurement strategies

Institute for Health Metrics and Evaluation, University of Washington, 2301 5th Ave, Suite 600, Seattle, WA 98121, USA. .
Population Health Metrics (Impact Factor: 2.11). 09/2010; 8(1):26. DOI: 10.1186/1478-7954-8-26
Source: PubMed

ABSTRACT Local measurements of health behaviors, diseases, and use of health services are critical inputs into local, state, and national decision-making. Small area measurement methods can deliver more precise and accurate local-level information than direct estimates from surveys or administrative records, where sample sizes are often too small to yield acceptable standard errors. However, small area measurement requires careful validation using approaches other than conventional statistical methods such as in-sample or cross-validation methods because they do not solve the problem of validating estimates in data-sparse domains.
A new general framework for small area estimation and validation is developed and applied to estimate Type 2 diabetes prevalence in US counties using data from the Behavioral Risk Factor Surveillance System (BRFSS). The framework combines the three conventional approaches to small area measurement: (1) pooling data across time by combining multiple survey years; (2) exploiting spatial correlation by including a spatial component; and (3) utilizing structured relationships between the outcome variable and domain-specific covariates to define four increasingly complex model types - coined the Naive, Geospatial, Covariate, and Full models. The validation framework uses direct estimates of prevalence in large domains as the gold standard and compares model estimates against it using (i) all available observations for the large domains and (ii) systematically reduced sample sizes obtained through random sampling with replacement. At each sampling level, the model is rerun repeatedly, and the validity of the model estimates from the four model types is then determined by calculating the (average) concordance correlation coefficient (CCC) and (average) root mean squared error (RMSE) against the gold standard. The CCC is closely related to the intraclass correlation coefficient and can be used when the units are organized in groups and when it is of interest to measure the agreement between units in the same group (e.g., counties). The RMSE is often used to measure the differences between values predicted by a model or an estimator and the actually observed values. It is a useful measure to capture the precision of the model or estimator.
All model types have substantially higher CCC and lower RMSE than the direct, single-year BRFSS estimates. In addition, the inclusion of relevant domain-specific covariates generally improves predictive validity, especially at small sample sizes, and their leverage can be equivalent to a five- to tenfold increase in sample size.
Small area estimation of important health outcomes and risk factors can be improved using a systematic modeling and validation framework, which consistently outperformed single-year direct survey estimates and demonstrated the potential leverage of including relevant domain-specific covariates compared to pure measurement models. The proposed validation strategy can be applied to other disease outcomes and risk factors in the US as well as to resource-scarce situations, including low-income countries. These estimates are needed by public health officials to identify at-risk groups, to design targeted prevention and intervention programs, and to monitor and evaluate results over time.

Download full-text


Available from: Ali Mokdad, Sep 26, 2015
1 Follower
17 Reads
  • Source
    • "We applied previously described small area models to estimate the prevalence of cigarette smoking for US counties [21-23]. In brief, we constructed a family of logistic hierarchical mixed effects regression models for each outcome, stratified by sex. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Cigarette smoking is a leading risk factor for morbidity and premature mortality in the United States, yet information about smoking prevalence and trends is not routinely available below the state level, impeding local-level action. We used data on 4.7 million adults age 18 and older from the Behavioral Risk Factor Surveillance System (BRFSS) from 1996 to 2012. We derived cigarette smoking status from self-reported data in the BRFSS and applied validated small area estimation methods to generate estimates of current total cigarette smoking prevalence and current daily cigarette smoking prevalence for 3,127 counties and county equivalents annually from 1996 to 2012. We applied a novel method to correct for bias resulting from the exclusion of the wireless-only population in the BRFSS prior to 2011. Total cigarette smoking prevalence varies dramatically between counties, even within states, ranging from 9.9% to 41.5% for males and from 5.8% to 40.8% for females in 2012. Counties in the South, particularly in Kentucky, Tennessee, and West Virginia, as well as those with large Native American populations, have the highest rates of total cigarette smoking, while counties in Utah and other Western states have the lowest. Overall, total cigarette smoking prevalence declined between 1996 and 2012 with a median decline across counties of 0.9% per year for males and 0.6% per year for females, and rates of decline for males and females in some counties exceeded 3% per year. Statistically significant declines were concentrated in a relatively small number of counties, however, and more counties saw statistically significant declines in male cigarette smoking prevalence (39.8% of counties) than in female cigarette smoking prevalence (16.2%). Rates of decline varied by income level: counties in the top quintile in terms of income experienced noticeably faster declines than those in the bottom quintile. County-level estimates of cigarette smoking prevalence provide a unique opportunity to assess where prevalence remains high and where progress has been slow. These estimates provide the data needed to better develop and implement strategies at a local and at a state level to further reduce the burden imposed by cigarette smoking.
    Population Health Metrics 03/2014; 12(1):5. DOI:10.1186/1478-7954-12-5 · 2.11 Impact Factor
  • Source
    • "Estimating health outcomes for small areas is challenging as researchers are faced with large stochastic fluctuations due to small numbers of events or small numbers sampled. Commonly used methods to deal with these issues include pooling multiple years of data, borrowing strength across geospatial units, or using structured relationships with covariates [17]. Kulkarni et al. proposed a method for county life table estimation that integrates these three approaches [11], which we use here. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The United States spends more than any other country on health care. The poor relative performance of the US compared to other high-income countries has attracted attention and raised questions about the performance of the US health system. An important dimension to poor national performance is the large disparities in life expectancy. We applied a mixed effects Poisson statistical model and Gaussian Process Regression to estimate age-specific mortality rates for US counties from 1985 to 2010. We generated uncertainty distributions for life expectancy at each age using standard simulation methods. Female life expectancy in the United States increased from 78.0 years in 1985 to 80.9 years in 2010, while male life expectancy increased from 71.0 years in 1985 to 76.3 years in 2010. The gap between female and male life expectancy in the United States was 7.0 years in 1985, narrowing to 4.6 years in 2010. For males at the county level, the highest life expectancy steadily increased from 75.5 in 1985 to 81.7 in 2010, while the lowest life expectancy remained under 65. For females at the county level, the highest life expectancy increased from 81.1 to 85.0, and the lowest life expectancy remained around 73. For male life expectancy at the county level, there have been three phases in the evolution of inequality: a period of rising inequality from 1985 to 1993, a period of stable inequality from 1993 to 2002, and rising inequality from 2002 to 2010. For females, in contrast, inequality has steadily increased during the 25-year period. Compared to only 154 counties where male life expectancy remained stagnant or declined, 1,405 out of 3,143 counties (45%) have seen no significant change or a significant decline in female life expectancy from 1985 to 2010. In all time periods, the lowest county-level life expectancies are seen in the South, the Mississippi basin, West Virginia, Kentucky, and selected counties with large Native American populations. The reduction in the number of counties where female life expectancy at birth is declining in the most recent period is welcome news. However, the widening disparities between counties and the slow rate of increase compared to other countries should be viewed as a call for action. An increased focus on factors affecting health outcomes, morbidity, and mortality such as socioeconomic factors, difficulty of access to and poor quality of health care, and behavioral, environmental, and metabolic risk factors is urgently required.
    Population Health Metrics 07/2013; 11(1):8. DOI:10.1186/1478-7954-11-8 · 2.11 Impact Factor
    • "Model selection and validation was performed using the approach outlined by Srebotnjak et al wherein county-level predictions are validated against a pooled gold standard [21]. We used this approach to perform variable selection as well as to determine the likely performance of the small area models in sparsely populated counties. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Hypertension is an important and modifiable risk factor for cardiovascular disease and mortality. Over the last decade, national-levels of controlled hypertension have increased, but little information on hypertension prevalence and trends in hypertension treatment and control exists at the county-level. We estimate trends in prevalence, awareness, treatment, and control of hypertension in US counties using data from the National Health and Nutrition Examination Survey (NHANES) in five two-year waves from 1999-2008 including 26,349 adults aged 30 years and older and from the Behavioral Risk Factor Surveillance System (BRFSS) from 1997-2009 including 1,283,722 adults aged 30 years and older. Hypertension was defined as systolic blood pressure (BP) of at least 140 mm Hg, self-reported use of antihypertensive treatment, or both. Hypertension control was defined as systolic BP less than 140 mm Hg. The median prevalence of total hypertension in 2009 was estimated at 37.6% (range: 26.5 to 54.4%) in men and 40.1% (range: 28.5 to 57.9%) in women. Within-state differences in the county prevalence of uncontrolled hypertension were as high as 7.8 percentage points in 2009. Awareness, treatment, and control was highest in the southeastern US, and increased between 2001 and 2009 on average. The median county-level control in men was 57.7% (range: 43.4 to 65.9%) and in women was 57.1% (range: 43.0 to 65.46%) in 2009, with highest rates in white men and black women. While control of hypertension is on the rise, prevalence of total hypertension continues to increase in the US. Concurrent increases in treatment and control of hypertension are promising, but efforts to decrease the prevalence of hypertension are needed.
    PLoS ONE 04/2013; 8(4):e60308. DOI:10.1371/journal.pone.0060308 · 3.23 Impact Factor
Show more