Electronic copy available at: http://ssrn.com/abstract=1538429Electronic copy available at: http://ssrn.com/abstract=1538429
P A P E R
for Population Health
Is There No Place Like Home?
MARGARET M. WEDEN
CHLOE E. BIRD
JOSE J. ESCARCE
This paper series made possible by the NIA funded RAND Center for the Study
of Aging (P30AG012815) and the NICHD funded RAND Population Research
This product is part of the RAND
Labor and Population working
paper series. RAND working papers
are intended to share researchers’
latest findings and to solicit informal
peer review. They have been approved
for circulation by RAND Labor and
Population but have not been
formally edited or peer reviewed.
Unless otherwise indicated, working
papers can be quoted and cited
without permission of the author,
provided the source is clearly referred
to as a working paper. RAND’s
publications do not necessarily
reflect the opinions of its research
clients and sponsors.
is a registered trademark.
Electronic copy available at: http://ssrn.com/abstract=1538429 Electronic copy available at: http://ssrn.com/abstract=1538429
Neighborhood Archetypes for Population Health Research: Is There No Place Like
Margaret M. Weden, PhD1; Chloe E. Bird, PhD1; José J Escarce, MD, PhD1,2; Nicole
Lurie, MD, MSPH3
1RAND Corporation; 2University of California, Los Angeles; 3U.S. Department of Health
and Human Services. We acknowledge the contributions of Ricardo Basurto-Davila and
Adria Dobkin for helping prepare the data; Paul Steinberg for editing; and Richard
Carpiano, Stephanie Robert, and Erin Ruel for helpful suggestions on earlier versions of
the manuscript. This work was supported by a grant from the National Institute on
Environmental Health Sciences (P50 ES012383).
Electronic copy available at: http://ssrn.com/abstract=1538429 Electronic copy available at: http://ssrn.com/abstract=1538429
The principal objective of this study is to characterize the places in which people live by
factors associated with physical and mental wellbeing. We demonstrate a new approach
that employs neighborhood measures such as social environment, built environment,
commuting and migration, and demographics and household composition to classify
neighborhoods into archetypes. The number of neighborhood archetypes, their defining
attributes, and their change/stability between 1990 and 2000 is analyzed using latent class
analysis applied to a rich array of data sources. In both years, six archetypes of U.S.
neighborhoods are differentiated which occur at prevalence from 13% to 20%: Mobile
single-household, urbanites; Low SES, rural; Poor, urban, minority; Low SES, urban,
minority commuters; High SES, foreign born, new home owners; and Middle-class
suburban/exurban families. Findings show that neighborhoods have remained notably
constant between 1990 and 2000, with 76.4% of the neighborhoods categorized as the
same archetype ten years later. The approach to defining neighborhood archetypes
translates the theoretical aspects of research on neighborhoods and health into a
measurement typology that can be employed in applied research questions such as public
health surveillance and planning and which can be replicated and extended for use in
other historical, geographical, and substantive applications.
Research on neighborhoods and health is motivated by the idea that we live in places that
represent more than physical locations. They are also the manifestation of the social,
cultural, political and geographic cleavages that shape a constellation of risks and
resources. Research on neighborhood effects has reconnected public health with its
earlier population foundations—showing that the social ecology and built environments
are important “upstream” determinants of chronic and infectious disease. This work
documents how social and built environments structure opportunities and barriers to more
proximal social and material determinants of health (Sampson et al. 2002; Cummins et al.
Neighborhoods and health research draws heavily on theory and methodologies from
Chicago School factorial social ecology which relied on Census data and measures and
which developed factor analysis (e.g. Janson 1980; Schwirian 1983; for critique see
Sampson et al. 2002). This approach conceptualized four primary axes of neighborhood
structure—class, race/ethnicity, density, and life-course stage. The theory and methods
also informed the most commonly employed measures of neighborhoods for
neighborhoods and health research (Sampson et al. 2002).
In this paper, we reconsider the models and measures of neighborhoods that emerged
from the Chicago School factorial social ecology and explore whether there have been
changes in the social ecology of neighborhoods since the four primary cleavages were
identified. We address questions raised in literature reviews on the characteristics of US
Neighborhood Archetypes: 3
neighborhoods, the relevance of the built environment, and the dynamics of
neighborhoods over time (Diez Roux 2001; Sampson et al. 2002; Robert et al.
forthcoming). We develop a new, complementary theoretical and methodological
approach to study neighborhoods that employs latent class analysis to characterize
neighborhood archetypes and assess stability or change. In so doing, we produce a
reliable measure of U.S. neighborhood archetypes that can be employed in future
research on neighborhoods and health.
Neighborhood Social Ecology and Constructs
Researchers have identified how economic, social, demographic, geographic, structural,
and institutional conditions of a neighborhood coalesce to influence physical and mental
wellbeing. While some studies highlight specific neighborhood characteristics—e.g.
neighborhood poverty (Haan et al. 1987), racial and ethnic concentration (Collins &
Williams 1999), or urbanization (Galea & Vlahov 2005)—most indicate that multiple
factors affect neighborhood characteristics. Multidimensional constructs have been used
to examine how socioeconomic affluence or disadvantage, and social disorganization
impact individual’s health and social wellbeing (e.g., Sampson et al. 2002; Cummins et
al. 2007). Examples of indicators include neighborhood socioeconomic status (NSES),
(concentrated) neighborhood affluence and/or neighborhood disadvantage, and
neighborhood social cohesion, collective efficacy and social disorganization (Sampson et
Neighborhood Archetypes: 4
Neighborhood indicators employed in population health research typically measure social
context using poverty rates, percentage of residents receiving public assistance,
percentage of female-headed families, unemployment ratio, and percentage of African
American residents (for review see Browning & Cagney 2002). Studies have also
extended the NSES construct to describe how neighborhood disadvantage and
neighborhood affluence exert independent effects on individual health highlighting the
relevance of high income, education and occupational status in so doing (e.g., Weden et
Similarly, work on structural aspects of neighborhoods and the relevance of social
disorganization (e.g. as indicated by boarded-buildings, vacancy rates, and residential
turnover; Wilson & Kelling 1982) extends a long history of research (beginning with
seminal works by Durkheim and Simmel) that relates rural-urban differentials and
industrialization to social disorganization, and therein, physical and mental health (see
review Vlahov & Galea 2002). Neighborhood residential turnover also has been linked to
poor child development, problem behavior, and health-related risks (for review see
Jelleyman & Spencer 2008).
Additionally, urban planning, urban studies and social ecology provide a theoretical
foundation linking social and physical dimensions of the neighborhood (e.g. see reviews
by Vlahov 2002; Corburn 2004). Research on the neighborhood life cycle links shifts in
the demographic composition of communities to changing land-use patterns (i.e. from
residential to commercial; see Downs 1981). Sociological research links residential
Neighborhood Archetypes: 5
turnover and deterioration of physical infrastructure to social disorganization (e.g.
Sampson and Groves 1989).
Recent studies refocused attention to built environment factors that support active life
styles and reduce the risk of chronic disease, such as land use, commuting patterns and
walkability (see reviews by Frumkin 2003; Srinivasan et al. 2003; Galea et al. 2005). Yet
few studies have reconsidered the linkages between neighborhood social ecology and
built environment in light of current population health dynamics in chronic disease (see
review Diez Roux 2001; Galea et al. 2005). One notable exception is research on social
capital and the built environment as it relates to physical activity and obesity (e.g. Leyden
2003; Poortinga 2006; Wood & Giles-Corti 2008; Cohen et al. 2008).
Theoretical Motivation for Neighborhood Archetypes
The first wave of studies on neighborhoods and health focused on showing that
‘neighborhoods matter’ and have independent effects beyond individual socioeconomic
characteristics. These studies argued that neighborhoods influence health and behavior
through mechanisms such as collective socialization, peer-group influence, and
institutional capacity. The second wave of studies on neighborhoods and health evaluated
these mechanisms with latent measures of neighborhood characteristics (such as level of
segregation, collective social and economic capacity, or social disorganization) (Sampson
et al. 2002). In this work, factor analysis or structural equation models are used to create
scales for these characteristics and identify a continuum of sociodemographic
Neighborhood Archetypes: 6
disadvantage or affluence on which neighborhoods were located. We call this approach a
‘variable perspective’ to neighborhood research.
Although the variable perspective is useful for answering questions about the
independent effect of specific neighborhood characteristics controlling for individual
characteristics, it is not as well suited to studying how various aspects of neighborhoods
combine to effect health and whether and how the effects differ over the life course.
Rather than being defined by a single dimension, neighborhoods are the synthesis of
different combinations of social, economic, demographic, structural and geographic
conditions, which affect individuals’ lives and health. For example, the impact of
neighborhood poverty depends on the community’s level of urbanization, age
composition, and degree of segregation (Jargowsky 1997; Boardman et al. 2005).
Similarly neighborhood socioeconomic disadvantage is associated with and can be
exacerbated by environmental risk factors including pollution and environmental hazards
(Cutter & Scott 2000; Ponce et al. 2005).
To date, most work has employed this ‘variable perspective’. For example, Boardman
and colleagues (2005) model the interacting effects of poverty and racial segregation,
addressing two of the four axes of neighborhood in the foundational work on
neighborhood social ecology (Janson 1980; Schwirian 1983). The problem of the variable
approach become evident when additional axes are considered (e.g., a simple model with
single dichotomous indicators for each of the four axes requires 4 main effects and 12
interactions), and even more so when additional dimensions of the neighborhood (e.g. the
Neighborhood Archetypes: 7
built environment) are considered. Adequate statistical power for interactions between all
of these neighborhood variables quickly becomes unattainable. The analytical problem of
the ‘variable perspective’ to neighborhood research is analogous to the ‘variable
problem’ in life course research described previously by Singer and colleagues (1998).
Rather than producing neighborhood measures that decompose residential environments
into their parts, we propose identifying archetypal neighborhoods that describe this
interacting synthesis of components.
Addressing Outstanding Questions about Neighborhood Classification with LCA
There are two areas of research on neighborhoods and health our approach is well
designed to extend. The first pertains to the interactions between different conceptual
dimensions of the neighborhood. Interactions between conceptual dimensions can be
studied in social science by characterizing archetypes. The empirical method of latent
class analysis (LCA) was designed for characterizing archetypes (e.g., Hagenaars &
Halman 1989) and has been used extensively in social, behavioral, and health research
(Bollen 2002). The second area of research pertains to temporal dynamics including
neighborhood change (e.g. gentrification, racial succession) and neighborhood life cycles
(e.g., Schwirian 1983; Sampson 2002; Robert et al. forthcoming).
To date, LCA has not been applied to neighborhood characterization for population
health research, yet the approach offers distinct analytic advantages to alternative
methods previously employed (e.g., factor analytic methods, including structural equation
Neighborhood Archetypes: 8
modeling (SEM), and cluster analysis techniques). The advantages of LCA are reviewed
elsewhere (Rapkin et al. 1993; Chow 1998), and pertain directly to two new areas of
research on neighborhoods and health. First, LCA can measure how constellations of
characteristics capture distinct neighborhood archetypes. These constellations of
characteristics are described by ‘interactions’ between neighborhood dimensions. Thus
LCA allows a researcher to identify the most statistically robust set of interactions
between dimensions as a constellation of characteristics that describe the places of
interest. Secondly, like factor analytic methods (e.g., factor analysis, SEM), LCA allows
one to assess the stability or change of neighborhood archetypes independent of the
measurement of the neighborhood archetypes at a given point in time (and can capture
change over time). In contrast with these methods, though, LCA can not only be used to
statistically test whether the distribution of neighborhood archetypes in a population
changes over time, but it can be used to characterize neighborhoods discretely into
archetypes at different points in time. Once individual neighborhoods are discretely
characterized, questions about the life cycles of individual neighborhoods can be
explored. In summary, at its minimum, LCA is a data reduction mechanism similar to
cluster analysis. In its full application, LCA becomes a powerful tool for the
characterization of neighborhood archetypes and analysis of neighborhood change.
Data and Sample
Data on U.S. neighborhoods come from a neighborhood characteristics database
compiled and disseminated by RAND Corporation
Neighborhood Archetypes: 9
(http://www.rand.org/health/centers/pophealth/data.html). The neighborhood
characteristics database contains a rich array of contextual data from the 1990 and 2000
Decennial Census, the Census Topologically Integrated Geographic Encoding and
Referencing (TIGER/Line) files, the Environmental Protection Agency Air Quality
System, and the American Chamber of Commerce Research that has been compiled,
harmonized, and documented by RAND. U.S. neighborhoods are defined at the
geographical level of the census tract, with harmonization for changes in tract definitions
between 1990 and 2000. Models are estimated using 20% random samples of the
complete set of U.S. census tracts in each year, so that 12,252 tracts are observed in 1990
and 13,261 tracts are observed in 2000.
Neighborhood Characteristics by Domain
Indicators of the neighborhood characteristics are selected that: (1) are theoretically
related to population health; (2) entail previously validated variables of the social and
built environment; and (3) were measured identically in 1990 and 2000. Specific
variables fall into four domains: built environment, migration and commuting,
socioeconomic composition, and demographics and household composition.
Table 1 details how categorical indicators were constructed for each neighborhood
measure described below. Refinement of the indicators was conducted in conjunction
with refinement of the measurement models described below and in Appendix 2. This
included sensitivity analyses to evaluate the improvement of models achieved by
Neighborhood Archetypes: 10
recategorizing continuous variables as categorical or dichotomous indicators, combining
indicators, and dropping redundant indicators (see Appendix 2).
Urbanization is measured using density (population per square kilometer) and urbanicity
(or % rural) categorized as exclusively rural, exclusively urban (100% versus 0% rural
respectively), or mixed (suburban, exurban or urbanizing). Land-use patterns are
measured via mean block size, the number of intersections or “nodes,” and two measures
of walkability (the gamma and alpha index of street connectivity; Taaffe and Gauthier
1973). The quality and upkeep of neighborhood infrastructure is measured via the mean
value of the housing stock, percent owner-occupied dwellings, the mean housing
construction date, and percent vacant dwellings. Air quality (expressed by the
concentration of particulate matter smaller than 10 micrometers, or PM10) is included as
in indicator of environmental pollution, using a threshold for health compromising levels
of 50 ug/m3 PM10 (from yearly averages) based on previous findings (Daniels et al.
Migration and Commuting
Indicators of internal and external migratory patterns capture the relevance of instability
in the home environment (Wyly 1999) and are measured through residency and housing
turnover. Commuting patterns are a new dimension of the neighborhood in population
health research that has been related to opportunities for physical activity and exposure to
psychosocial stress (for review see Hamer & Chida 2008). The indicators of commuting
Neighborhood Archetypes: 11
are divided according the length of commute and are complemented by the mode of
transportation to work.
Socioeconomic composition captures an axis of the social ecology model that has been
more recently validated in measures of NSES, neighborhood affluence, and neighborhood
disadvantage, as described earlier. The indicators entail educational attainment, labor
force characteristics, and economic characteristics.
Demographic and Household Composition
The demographic and household composition domain captures the last two domains of
the social ecology model—racial/ethnic composition and life-cycle stage. The
demographic composition is described by race and ethnicity, native language and age.
Household composition refers to household structure and is captured through the
proportion of singles, large families, and female-headed households.
Latent Class Model
LCA models are used to identify, characterize, and measure the latent, unobserved
categorical variable for the neighborhood archetypes. Neighborhood archetypes are
modeled separately for 1990 and 2000, and then a multigroup LCA model is fit to the
data from both years combined. The multigroup LCA model can be used to assess
whether the distribution of neighborhood archetypes and their characterization changes
over time. On the basis of findings from the LCA models for 1990 and 2000, we hold
Neighborhood Archetypes: 12
characterization of the neighborhood archetypes constant in the final multigroup LCA
model, and we test whether the distribution of neighborhoods across neighborhood
archetypes changes between 1990 and 2000.
The LCA models are fit using observed data on the indicators of the built environment,
migration and commuting, socioeconomic composition, and demographic and household
composition described earlier (see also Table 1). The data are used to characterize the
unobserved latent variable for the neighborhood archetypes. Refinement of the structural
component of the LCA models (e.g. the number of neighborhood archetypes) and the
measurement components of the LCA models (e.g. the characteristics of the
neighborhood archetypes) are considered iteratively until the best fitting LCA model is
identified (Hagenaars & McCutcheon 2002). Goodness of fit statistics (e.g. the Lo-
Mendell-Rubin likelihood ratio test, the Bayesian Information Criteria, and entropy
measures) and statistical tests of significance for model parameters are used in the
refinement of the structural and measurement models (Hagenaars 1990 Ramaswamy et al.
1993; Lo et al. 2001).
Mplus software version 4.2 (Mplus Version 4.2. 2006) is used to fit the LCA models,
accounting for missing variables on some observations. It is also employed to predict
latent class membership using the findings from the final LCA multigroup model. These
findings allow us to produce a dataset in which every census tract in the U.S. has been
probabilistically assigned to the best fitting neighborhood archetype.
Neighborhood Archetypes: 13
Detail on the LCA modeling approach is described in the appendices, specifically, the
statistical details of the LCA models (Appendix 1), refinement of the LCA models and
sensitivity analyses (Appendix 2).
How many neighborhood archetypes are there in 1990 and 2000?
Six neighborhood archetypes best summarize the combinations of patterns of
neighborhood characteristics data, and this number of archetypes does not change from
1990 to 2000. The findings on the number of neighborhood archetypes for the LCA
models fit for each year in Table 2 show that a six-class model produces the best
goodness of fit statistics for the respective models for each year. The Lo-Mendell-Rubin
test demonstrates that the improvement in the fit of each of these LCA models is
statistically significant when contrasting the five- and six-class models, but not when
contrasting the fit of the six- and seven-class models.
It is important to note that these goodness-of-fit statistics on the number of neighborhood
archetypes (i.e. the ‘structural component’ of the model) are detailed for the final models
for each year identified after iteratively improving the structural and measurement
components of the LCA models as described in the methods section. Furthermore,
sensitivity analyses (in Appendix 2) validate that six neighborhood archetypes produce
the best fitting model in each year.
Neighborhood Archetypes: 14