ArticlePDF Available

The Impact of Poor Health on Education: New Evidence Using Genetic Markers


Abstract and Figures

In this paper we review the available summary measures for the magnitude of socio-economic inequalities in health. Measures which have been used differ in a number of important respects, including (1) the measurement of "relative" or "absolute" differences; (2) the measurement of an "effect" of lower socio-economic status, or of the "total impact" of socio-economic inequalities in-health upon the health status of the population; (3) simple versus sophisticated measurement techniques. Based on this analysis of summary measures which have previously been applied, eight different classes of summary measures can be distinguished. Because measures of "total impact" can be further subdivided on the basis of their underlying assumptions, we finally arrive at 12 types of summary measure. Each of these has its merits, and choice of a particular type of summary measure will depend partly on technical considerations, partly on one's perspective on socio-economic inequalities in health. In practice, it will often be useful to compare the results of several summary measures. These principles are illustrated with two examples: one on trends in the magnitude of inequalities in mortality by occupational class in Finland, and one on trends in the magnitude of inequalities in self-reported morbidity by level of education in the Netherlands.
Content may be subject to copyright.
Weili Ding
Queen’s U niversity
Stev en F. Lehrer
Queen’s University and NBE R
J. Niels Rosenquist
Univ ersity of P ennsylvania and Massac husetts General Hospital
Janet Audrain-McGo vern
Univ ersit y of P ennsylvania
Jan uary 2007
This paper examines the inuence of health conditions on academic performance during
adolescence. To account for the endogeneity of health outcomes and their interactions with
risky behaviors we exploit natural variation within a set of genetic markers across individuals.
We present strong evidence that these genetic markers serv e as valid instruments with good
statistical properties for ADHD, depression and obesity. They help to reveal a new dynamism
from poor health to lower academic ac hievement with substantial heterogeneity in their impacts
across genders. Our investigation further exposes the considerable challenges in identifying
health impacts due to the prevalence of comorbid health conditions and endogenous health
behaviors with clear implications for the health economics literature.
We are grateful to seminar participants at the 2005 NBER Summer Institute, 2006 Society of Labor Economics
Annual Meeting, 2006 American Society of Health Economists Meeting, 2006 Canadian Economics Association meet-
ing, 3rd Economics and Human Biology Conference, 2006 TARGET Conference at University of British Columbia,
University of Toronto, University of Illinois - Chicago, Fudan University, Shanghai University of Finance and Eco-
nomics and BU/Harvard/MIT Health Economics seminar for helpful comments and suggestions. We would also like
to thank Paul Wileyto for answ ering our numerous questions about the data employed in the study. Lehrer wishes
to thank SSHRC for research support. Rosenquist wishes to thank AHRQ for support. We are responsible for all
1 Introduction
The discov ery of the hum an genome, a sequence of appro ximately three billion chemical “letters”
that make up h um a n DNA, the recipe of human life, is considered a milestone in the history of
science and medicin e that migh t have the poten tial to inuence social science research. Consider
the follo w ing question that has been in vestigated in the psychology, education, economics, sociology
and public health literatures: Does health status aect educa tional outcomes? W hile nu m erou s
studies report that studen ts who are obese or depressed perform poorly relative to their classma tes,
factors other than health could be responsible for this repeatedly observed, but poten tially spurious
association. To credibly claim that obesit y and depression ha ve a deleterious eect on studen t
performance in schools one m ust rst ov ercome the inherent endogeneity when considering health
and education. Further, accurate measures of health are dicult to obtain and overco m in g biases
arising from measurem ent error represen ts a second hurd le for applied researc her s.
This study o vercomes these c halleng es b y considerin g an instrumental variables approach, where
the instruments are selected based on a gro w ing body of evidence in several neuroscien tic elds
that have identied genetic markers whic h possess signicant associations with specic diseases
and health behaviors. Wh ile there has long been scientic evidence suggesting that the association
bet ween genetic factors and health is substantial,
only recen tly has it been possible to collect mea-
sures of genetic markers. Since genetic markers are formed at conception, they are predetermined
to an y outcomes including those that occur during pregnan cy and at birth. Genetic markers truly
tthedenition of “nature”. Using this “nature lter”, the health variables being instrum ented
will be isolated from most n u rtur e inuences or c h oice-based inputs suc h as schools parents c hoose
for their kids, neigh borhood families select to reside in, peers kids c h oose to associate with, amon g
other factors that threaten the iden tication of education production function parameters. When
the v ariations in health variables that include clinical measures of depression, ADHD and obesity
are due only to the dierences in genetic coding, these variation s are m u ch less likely to be corre-
lated with the en viro nm ents surround ing an individual, allowing us to recover consistent estimates
of the impacts of a v ector of health measures on academic performance.
W hile our identication
strategy relies on scientic ndings, the results suggest that further study of social en vironm ents
might have to be invok ed to understand the root of heterogeneous impacts of health on academic
perform ance, whic h seems to place the question squarely bac k under the realm of social sciences.
Specically, our empirical identica tion strategy is based on a large body of evidence in sev e ral
elds that explain the role of specic genes in the operation of a region of the brain along the
medial forebrain bundle which is responsible for reward and pleasure.
This region is distinct from
those that are know n to process, dev elop and retain know ledge. Evidence that dierent regions of
the brain are activated (or correlate) with dieren t economic decisions has been found using fMRI
tec h nolog y in a studies of intertemporal cho ices (e. g. M cC lur e, Laibson, Loew e nstein and Cohen
[2004]). The gro w ing evidence in the biomedical literature that presents a signicant association
bet ween certain genes in this reward system with particular health behav iors and health status suc h
as smoking, alcohol usage, obesit y, ADH D , depression and schizo phr enia cannot be denied.
It is worth stating explicitly that the goal of this analysis is not to report a causal link bet ween
genes and health broadly dened. While w e exploit the strong neural correlations between a set of
genetic mark ers and certain health outcomes and behaviors, w e do not wish to delve into the often
comp licated and sometimes con troversial debate on ho w genes aect beha vior. For example, the
popular press is occasionally lled with stories on the discovery of a gene that specically codes for
obesity or depression that are often quickly refuted b y medical author ities.
This study extends the burgeoning literature in economics that seeks to explain the strong
correlation bet ween education and health in three directions.
First, we presen t empirical evidence
on a causal link running from health to academic performa nce. Due to biases associated with
omitted variab les, few studies hav e either emp irically estimated the causal imp act of health on
education outcomes
nor focused on mental health conditions despite evidence that their incidence
is substantially larg er than physical disorders in ado lescence.
Exceptions include Currie and Stabile
[2005] w hich presents evidence from sibling xed eects regressions that the negativ e impacts on test
scores and educational attainment from a specic men ta l disorder, hyperactivit y are quantitativ ely
larger than those from physical health limitation. Behrm a n and Lavy [1998] as well as Glewwe
and Jacoby [1995] use mark et instrumen ts such as prices for health. They respectively nd that
the impact of c hild health on cognitive achie vement varies as a function of the assumptions mad e
concerning paren tal c h oices and that m uch of the impact of child health on school enrolment proxies
for unobserved variables. Using an experimen tal approac h, Kremer and Miquel [2004] ov ercom e the
omitted variab le bias problem by rando m ly assigning health treat m ents to primary sc h ools in Ken ya.
Their analysis displa ys a mixed picture as impro v ed health from the treatmen t signicantly reduced
sc hool absenteeism but did not yield any gains in academ ic performance.
Second we tak e a close look at empirical measures of health. The dynam ic relationships between
health disorders a nd health beha viors rev ealed through our ana lysis clearly presen t a major em pirical
challenge. T h is c h allen ge has not been claried earlier since the majorit y of the literature linking
health to education focuses on a single measure or pro xy of an individual’s health such as birth
weigh t due to data limitations .
Since an individuals’ health consists of many physical and mental
health measures including standing heart rate, blood pressure, mental clarity, etc. that constitute
a ric h v ector which not only would be dicu lt to convert to a single index, but wo uld such a single
index exist it is unlikely to be w ell pro x ied by measure s such as BMI or birth weight.
Third , w e make a clear separation of health outcomes from health behaviors. This distinction is
not apparent in earlier empirical studies whic h estimate equations deriv ed from models that either
exclusiv ely treat ad olescents as a "ch ild" who se par ents make all her health and edu ca tion choices or
indistinguisha ble from "ad ults" that make all the decisions by themselves. In con trast, w e introduce
a model that treats adolescen ts as "adolescents" since they only make a subset of all the decisions.
For example, we postulate that a teenager w ou ld mak e decisions suc h as whether or not to smok e
or have sex, while their paren ts make important h um an capital in vestment decisions suc h as which
neighborhood to reside in, whic h sc hool their c hild should be sent to, the t ype of health insurance
to purchase and number of visits to health care providers. This hybrid in decision-making is not
only more realistic but helps disentangle the impact of health status (a state variable) from health
behavior (a control variable) that are treated as equivalent in the earlier literature. Sin ce health
behaviors only explain a limited amount of the variation in health status, they are poor proxies
for health status and increase biases due to endogeneity as they ma y also pro xy for non-health
preferences such as peer group composition. Furth er health behaviors could result from as well
as cause certain particular health state, which has important policy implications. For examp le,
adolescents may decide to smok e since the nicotine in cigarettes may help self-medicate against
cra v ing for food or some mental illnesses. A ccou nting for the pathway bet ween health status and
health behavio r is necessary for proper in terpretation of our coecien t estimates and could rev e al
their dynamism that has been understudied in earlier work.
Our empirical analysis reach es four major conclusions:
1) Genetic mark ers sho w a great deal of promise as a means to identify the impact of health
on education. The individual markers and their two b y two polygenic interactions that w e consider
are highly correlated with eac h health beha v ior and status in the study. Moreover consisten t with
Mend el’s h ypothesis that the hereditary factors for dierent genes are independen t, statistical tests
demon strate that these markers are not related to each other and only aect academic performance
through health outcom es.
Further, genetic mark ers oers severa l advantag es as instrumental vari-
ables since concerns regarding reverse causalit y and spurious correlations are greatly minim ized.
W hile this strategy permits statistical identication as we discuss in Section 5.2.3 our instrumental
variables estimates should be in ter preted as reduced form parameters.
2) The impa ct of poor health outcome s on academ ic achievement is substan tial. Depression and
obesity both lead to a decrease of 0.45 GPA poin ts on average, which is roughly a one standard
deviation reduction. However, there is substantial heterogeneity in the impa ct of health on academic
performance across genders. The academic performance of female students is strongly and negatively
aected by poor ph ysical an d mental health conditions. The estimated m a gn itudes are substan tially
smaller for male studen ts and not a single poor health mea sure has a statistically signican t impact.
3) To accurately estimate the impact of health status, it is importan t to account for endogen ou s
health enhancing or health deteriorating beha viors. We nd that treating the stock of lifetime
smoking as exogenous leads to substantially dierent impacts of adverse health status on education.
Cigarette smoking is endogenous and w e nd that accoun ting for this choice reduces the negativ e
impact of depression inattention and ADH D by o v er 50% for the full sample and females. In
addition, ignoring the endogeneity of smoking makes the negativ e impact of depression on males
statistically signicant.
4) The presence of high comorbidity of health disorders is striking, thus the importance of
accounting for it. Comorb idity is dened as ha ving two or more diagnosable conditions at the
same time. Fo r example, researc h has suggested that bet ween 50 to 65 percen t of children with
AD H D have one or more comorb id conditions such as depression (Pliszka et al. [1999]). Unless
the exogenous genetic or enviro nmental factors can be clearly disen tang led bet ween these disorders,
estimating the causal impact of one disorder in the absence of related health states ma y not provide
accurate results. Since many individuals suer from m ore than one disorder, ignoring related
illnesses ma y lead to some mislea din g conclusions. In our analysis, we nd striking dieren ces in
the estim ated impacts of depression and obesit y when on e examines a sing le health state in isolation,
The rest of the paper is organized as follow s. In Section II, we pro vid e an overview of the
scien tic literature linking genes to health beha viors and health outcomes. An overview of the data
w e emplo y in this study is provided in Section III. The framew ork that guides our understanding
of how education and health in teract in adolescence is described in Section IV. We discuss the
identication strategy and estima ting equations in this section. Our results ar e presen ted and
discussed in Section V. A concluding Section summarizes our ndings and discusses directions for
future research.
2Scientic Primer on Genetic M ark ers
As it w a s not possible un til recen tly to collect data on genetic markers, empirical researc h ers in the
social sciences tra ditiona lly c hose to either ignore o r assumed the u nob served heterogeneit y conferred
byvariationingeneticinheritanceisxed over time for the same individual or across siblings or
t w ins. Yet recent advances in elds of molecular and behavioral genetics, most notably through the
decoding of the h u m a n genome (Venter et al. [2002]) permits researchers to elucidate how dierences
in the genetic code correlate with dieren ces in specic behaviors or outcomes across individuals.
W hile researchers were able to identify the genetic code for a n u mber of inherited traits and diseases
suc h as eye color, cystic brosis, and Huntingdon’s disease, most products of inheritance ha ve been
found to be polygenic, caused b y the interaction of n umerous genetic markers. The health outcomes
and behaviors we consider in this paper are though t to be polygenic with researchers associating
approximately 160 genes with obesit y (P erusse et al. [2005]).
For these disorders researchers ha ve
focused their attention on g ene s involved in the reward pathw ay of the brain. This pathw ay is closely
linked to primal driv es suc h as feeding and sex, and has been sho wn to ha v e a powerful eect on
decision making among higher mammals including humans. For example, in a w ell-kno wn study
(Olds [1956]), rats that were given the cho ic e of food ve rsu s stimulation of their rew ard system by
electrodes ended up starving to death rather than lessening the stimulation of their pleasure cen ter.
Since the rew a rd system of the brain has been found to be closely linke d to num er ous h um an
activities such as addiction muc h research has focused on ho w variation in dierent components
of the pathw ay migh t make an individual more or less predisposed to addiction. In general, this
system operates when activities such as feeding or sex are undertaken. A region of the brain known
as the v entral tegmental area (VTA ) is activated and neurons (brain cells) in the V TA release
signaling molecules kno wn as neurotransmitters (in this case dopamine
) to another area of the
brain known as the n ucleus accumbens (NA). These signals pass through the synapses (small gaps
separating neur ons) until they eventually reac h th e frontal cortex, where m ost “d ecisions” are m ade.
Increases in the synapse of either neurotransm itters or receptor neurons for them allo w for a m uch
stronger signal to be sent.
Since the response of these neurons to nicotine and other substances
has been sho wn to vary between individuals, it has been hypothesized that genetic dierences could
explain wh y dierent individuals report dierent levels of “highs” when smokin g cigarettes, whic h
is the underlyin g idea of having a genetic predisposition. In addition, since the VTA -N A path way
is important in regulating pleasure and, therefore, emotio n, a num ber of beha vio ral traits including
depression, food binging and AD H D ha ve been link ed to this path wa y.
The genes selected in this study operate either in the liv er or in one of the t wo critical neuro-
transmission path ways in the reward pathw ay, the dopam ine and serotonin systems. These markers
include the, i) Dopamine Receptor D2 locus (DRD 2), ii) SLC6A3 locus (DAT), iii) Trypto ph an
h y dr oxylase locus (TPH) and iv) CYP 2B 6 locus (CYP). Each person inherits from each parent a
single copy k nown as an allele for each marker. Alleles can dier by th e particular building bloc k s, or
base pairs, that make up all DNA or the n u mber of repeats, or base pairs in a row that repeat them-
selves. An individual who inherits 2 of the same (dierent) allele is considered to be homozygo us
(heterozygous) for that mark er. Diere nt allelic combinations are often called polymorph ism s.
The DRD2 gene is believ e d to code for the density of D2 dopamine receptors on neurons in the
brain, including those in the VTA and NA. The D2 receptor is one of at least ve ph y sio logically
distinct dopamine receptors (D1-D5) found on the synaptic mem branes of neurons in the brain.
The DRD2 -A1 allele has been associated with a reduced density of dopamine receptors.
Sev eral
researchers postulate that the reduced density of dopamine receptors explains the higher associations
individua ls with DRD 2-A 1 alleles (A1/A 1 or A1 /A 2 ) have with compu lsive and addictive behav io rs
including smoking, depression and obesit y, relative to individuals with two DRD2 -A 2 alleles.
synaptic levels of dopamine in the brain.
Variability in the length of the DAT gene is believ e d to
positively inuence lev els of the reuptake protein in the brain.
Individuals with shorter variants of
the S LC6A3 gen e ha ve dim in ish ed dopam in e reuptake and greater availability of syn ap tic dopamin e.
It has been suggested that b y having more synaptic dopamine these individuals receiv e smaller
benets from substances that stimulate dopam ine transmission.
The tryptophan hyd roxylase gene (TPH) is a m ember of the serotonergic neurotran smission
system and plays a crucial role in the regulation of mood and impulsivity. This particular gene
is invo lved in the biosynth esis of serotonin, another neurotransmitter that operates in conjunction
with the brain’s rewa rd system. Seroton in activity has been link ed to a n umber of beha vioral and
physical conditions including depression, appetite, and addictive beha vior .
The CYP genes as a group code for enzymes presen t in various body organs, prima rily the
liver which break dow n a number of drugs and to xins including nicotine. P olymorphisms of these
genes ha ve been linked to across population dierences in smoking, alcoholism, and response to
anti-depression medication s.
Fina lly, dierent allelic combinations when in teracted, can potentially have pow erfu l eects. For
example, the level of endogenous synaptic dopamine depends not only on the amoun t of dopamine
released but also on the number of receptors that do pam in e can bin d to (proxied by the DRD 2 gen e)
aswellastheamountofreuptakeprotein(proxiedbythelengthoftheSLC6A3allele). Similarly,
one could imagin e that the rate of metabolism determined b y the CYP2B 6 gene interac ts with both
the TPH and DRD2 genes.
This p aper uses da ta primarily from the Georgeto w n Adolescent Tobacco Research (GATOR) study.
GATOR is a unique longitudinal data set of adolescen ts that combines information from a series of
5 surve y questionnaires giv en o ver four yea rs of high school (1999-2003) along with measures of the
four genetic mark er s described in the preceding section.
The study began in 1999 when researchers selected ve high sch ools from the same county in
Northern Virginia.
W ithin eac h sc h ool, administrators provided the names and mailing addresses
of the complete 9th grade class roster of students. Project information pac kets, consen t forms, a
brief demographic/response form and an explanatory cover letter from the school principal w ere
then mailed to 2120 students’ homes to recruit study participan ts.
To increase participation rates,
up to three waves of mailing s w ere sent and telephone calls w ere placed to encourag e paren ts to
respond. Of the 72% of the parents/ guar dia ns (1533 of 2120) who responded to the mailings, three
quarters (1151) pro vided written consent for their adolescen t to participate in the study. 99% of
the 1151 adolescents who had parental consen t to participate provided assen t themselves. These
mailings also ask the responding parent on their smoking history, age, gender, education, and
biological relationship to the survey participant.
Biological samples w ere collected using buccal swabs from whic h DNA w as extracted via standard
phenol-c h loroform tec hn iques. DN A was extracted from buccal cells to avoid a selectiv e exclusion
of subjects with blood and injection phobia. Since the method to genotype varies across mark ers
dierent assays were conducted.
In all assa ys, 20% of the samples were repeated for qualit y
The survey questionnaires provide basic information on student demographic characteristics (i.e.
race, gender, etc.), academic performance as me asured by GPA (w av e s 3-5 o nly), reports on physical
activit y, detailed information on smoking patterns and smoking history within the household and
across a complete set of family members.
Surveys w ere administered by a GATOR sta member to students who pro vided assen t during a
classroom common to all studen ts.
Participants were initially surveyed in the spring of 9th grade
and resurv eyed in both the fall and spring of the 10th grade and in the spring of both the 11th and
12th grades. The rates of participation at the four follow-ups from baseline we re about 95%, 96%,
93% and 89% respectiv ely. Participan t s received $5 gift certicates to media stores to acknowledge
their time and participa tion in this study.
The GATO R data contains n umerous questions on health and health beha vior. Each survey
contained standa rd epidemiologica l questions related to self-reported experimen tation with, and
current use of, cigarettes. Eac h participant who reported having smoked a cigarette pro v ided addi-
tional informatio n on both recent and lifetime cigarette use. From this inform ation , we constructed
t wo variables that represen ted whether an adolescent was currently smoking cigarettes and yea rs
of being a cigarette smok er. A current smoker was den ed as hav ing smoked a cigarette within
the past month and over one hun dr ed cigarettes o ver the lifetime. Using this information on being
a current smoker with self-reported smoking histories we constructed a conservative measure of
number of years of smoking.
With the exception of the survey in the fth w ave, participants comp leted The Center for Epi-
demiologic Studies-Depressio n Scale (CES -D), a 20-item self-report measure of depressive symp-
toms. Items on the CES-D are rated along a 4-point Likert scale to indicate ho w frequen tly in the
past w eek eac h sympto m occurred (0 = nev er or rarely; 3 = very often). The sum of these items
is calculated to pro vide a total score where higher scores indicate a greater degree of depressive
sympto m s. To determ ine whether an individual may be depressed, we follo wed n dings from ear-
lier research with adolescen t samples (Roberts, Lewinsohn, and Seeley [1991]) who suggest using
gender and age appropriate dichotom ous cuto scores (> 24 for female adolescen ts, > 22 for male
adolescents) to ascertain the presence of clinically signicant levels of depressiv e symptom s.
The Current Symptom s Scale-Self Report Form (CSSF ), a well-stan d ardized , 18-item self-report
measure were used to assess symptoms of A ttention-decit/ hyperactivit y disorder (ADH D ) from
DSM-IV (Barkley and Murphy [1998]) in the second wave survey.
Th is form allows particip ant s
to rate their recen t behavior regarding how often they experience symptoms of inattention (9 items)
and h yperactivit y-im p ulsivity (9 items) on a 4-poin t Likert scale (0 = never or rarely; 3 = very of-
ten). Typ ica l diagn ostic criteria (endorsement of at least moderate severity on at least six symp tom s
from either the inatten tio n or h yperactivit y-im p ulsivity subscale) w as used to determine the likely
presence or absence of clinically signicant AD H D symptoms. In the nal wave of the GATOR
survey participants pro vided self reports of their height and w eight. These measures w ere used to
construct body mass index and we applied standard denitions for being obese (BMI>30).
In total we ha ve information on academic performan ce as measured by GPA, genetics, health
outcomes and health beha v iors for 893 study participants. A pproxima tely 90% of these students
(807 studen ts) completed the survey in all three years. The top panel of Table 1 presen ts summary
statistics of the time invariant cha rac teristics of the 893 participants in our study. The sample is
predom ina tely Caucasian and the largest minority population are Asians. The percen tage of African
Am ericans and Hispanics in the student body of the sc h ools in our sample vary between 2.07% to
12.20% and 5.54% to 19.3% respectively. The AD and HD subscale averages fell within standard
ranges for adolescen t samples. Over 40% of the studen ts report that at least one of their paren ts
was either currently smoking or w as an active smok er during their c hildhood. Finally, the majority
of responding paren ts are biological mothers and possesses a college degree.
The bottom panel of Table 1 presen ts information on time varying con tr ols and outcomes.
Neither GPA nor percentage of students who ha ve a household member that smokes ha ve any
substantial change in summary statistic ov er the three years. In contrast, the n umber of individuals
who curren tly smok e and have tried smoking rises rapidly during the same period. The percentage
of daily smokers in this sample is similar to national a verage s calculated using the NELS 88 (Miller
[2005]). The percentag e of depressed adolescent in our samp le is slightly higher than the 1999
estimate of the fraction of the adolescent population being clinically depressed (12.5%) from the
U.S. Department of Health and Human Services. Summ ary statistics on one year lagged smoking
and depression are included since w e use these predetermined measures in our empirical analysis
since one could postulate that the answe rs from the psychologica l questionnaires used to diagnose
these conditions could be inuenced by current academic performance or another factor which
simultaneou sly aects responses and curren t academ ic performance. Finally, we supplemen ted the
GATOR surv ey data with information from other sources to impro v e measures of the students’
neighborhood and school.
4.1 The Dynam ics From Health to Education
In this section, we presen t a three-stage model that guides our empirical analysis. The rst t wo
stages of our model incorporate elements from three competing theories in three distinct disciplines
that explain the heterogeneity in health behaviors across individuals. Economics contributes the
standar d model of health in vestment (starting with Grossman , 1972). This model postulates that
individuals m ake inte r-temporal decisions trading o imm ediate satisfactions for future benets.
Dierent time discount factors and value of life could result in dierent health cho ices. P sycholo-
gists claim that the heterogeneous health beha viors arise from dierent environ m ent or situational
factors that individuals encoun ter. Natural scientists h ypothesize that genetic variations in single
or mu ltiple genes are associated with health dierences across the population.
Stage 1, at the beginning of period T (T
), adolescents c h oose whether or not to (continue to)
engage in a risky behavior such as smoking, drinking alcohol or using narcotic drugs given their
demogra phics, discount rates, the value of life, genetic markers and home and sc hool en vironm ents
as well as their curren t health status (H
iT 1
). Adolescent i at time T
c hooses action or beha vior k if
the immed iate satisfaction it prov ides exceeds the aggreg ation of the current cost and the perceived
future cost to her. The immediate satisfaction that adolescent i deriv es from action k could be
aected by her current health status
and her genetic predispositions. The immediate cost of
taking action k includes both pecuniary components suc h as price of cigarette and non-pecuniary
components such as ho w dicult it is to take action k. For instance a teenager may face obstacles
in acquiring cigarettes or narcotic drugs that can be measured as time spen t. The obstacle faced
are determined by neigh borhood, sc h ool and family en vironment inputs. For example, increased
parental monitoring might mak e cigarette smoking more costly; a drug infested neigh borhood migh t
make drug usage less dicult. The perceiv ed future costs usually depend on the discount rates and
the value of life, whic h may vary with current health status (health y people are more patien t in
general) and genetic predispositions. Since the data contain s no informa tion on this matter, wlog
we assume a non-binding monetary budget constraint for ease of exposition. As a result adolescent
i’s c hoice of k is a function of the market price for k that’s available to i (p
) and the health status
at time T
iT 1
), given i’s endo wed predisposition to taking action k –thatis,thesetofgenes
) associated with k and the environm ent variables that are included in the matrix X
= k(X
iT 1,
) (1)
captures an independen t random shoc k. This stage of the model can be easily generalized
to treat k as a ve cto r of behaviors that are either health enhan cin g (i.e. proper diet and regular
exercise) or health deteriorating (i.e. smoking and drinking ).
, altruistic paren ts select a level of health input l
for adolescent i, given
the teenag er’s observed health behavior s
(not necessarily equal to k
) at the beginning of
this period and rev ea led health status H
iT 1
, that pro v ides the highest indirect utility for their
household V
iT 1
), for each l a vailable to i
s family (2)
where X
are person-specic and en vironm ental c ha racteristics of the child i; C
health input l at time T which include the cost of insurance payment and the wage-rate forgone
when taking care of child i’s sic kness etc.; and G
is a vector of genetic markers that provide
endow ed predispositions to the curren t state of health status. Given the history of health beha v iors
c ho sen by adolescent i and the health inputs chosen by is parents, health production functions
translate these elements in to a vecto r of health outputs as follow s
= g(X
) (3)
where X
are the full history of individual and en vironmen tal
c haracteristics, health beha viors, health inputs and independent random shocks to health production
respectiv ely. Ch ild i’s initial health stoc k at the start of life is represented b y H
We assume here a display of single-mindedness in parental preference on c hild health. That is,
, ) U(H
, ) if H
. (4)
We also assume a discrete set of health input lev els (i.e. health insurance packages) all we ll within
the budget constrain t. By this, we lea ve out extreme cases where parents have to choose between
puttin g enough food on the table and pay in g the kid’s medical bills. Since our data has no health
input information, this assumption places no constrain ts on the estimation equations. Under these
t wo assumptions, paren t s will always c hoose l
that leads to the highest possible level of health for
child i.
Stage3,attheendofperiodT , T
, parents choose a set of education inputs (i.e. school qualit y,
employing tutors, etc.) based on the health status of their child . Parents select among these inputs
the optimal school j
for child i which provides the highest indirect utility for their household V
iT 1
), for each j available to child i (5)
where X
are observable person-specic and family c hara cteristics of the child i; C
is the cost of
attending sc h ool j, which include the cost if living in a good school district; Q
is school-specic
c ha racteristics; A
iT 1
indexes child i’s measured ac hievement at the stage of decision making; and I
is child i’s innate abilities. The a vailability of sch ools to a c h ild is described b y the sch ool adm ission
rules in the local areas where paren t s can commute to work daily.
Conditional on the selection of school j in the third stage, the standard education production
model states that child i in school j at time T gains h um an capital as measured by a score on an
ac hievement test or report card. The general conceptual model depicts this lev el of ac h ievemen t
to be a function of the full history of family, com munity, school inputs and o w n inn ate abilities.
Current ac hievement can be expressed as
= f(X
) (6)
where X
is a vec tor of community variables, individual and family ch ar acteristics in yea r t, Q
a v ector of school c haracteristics, I
is a v ector of unobserved heterogeneit y including suc h factors
as student innate abilities, parental tastes, determination, among others and (
) are the full
history of independen t random shocks assumed to ha ve zero mean and no serial correlation.
4.1.1 Health as an Education Input
There are three popular explanations put forth in the health economics literature for the observ ed
positive relationship between health and education. The rst model considers education an invest-
men t in the future as paying large dividends the longer one lives, thus incentivizing individuals to
sta y health y and liv e longer (Becke r [1993]). The second model postulates that education is a critical
component in a health production function, thus, educated individuals are better equipped to sta y
healthy (Grossman [1972]). The third exp lanatio n suggests that the r elationsh ip exists because both
health status and education are directly related to an unobserved variable such as time discountin g
(Fuchs [1982]) or one’s family background (Rosenzweig and Schultz [1983]). Ho wever, there’s no
formal economic model postulating how health enters into the education production process as an
input. As a result, w e h ypothesize belo w the possible c hannels under which health status (H
potentially aect education.
First, it ma y aect the phy sical energy level of a child whic h determines the time (including
classroom attendance and after school educational activities) that can be used for learning. For
examp le, obesity has been found to be the largest determin ant of absenteeism (Sc hwim m er et al.
[2003]). Second, it aects the child’s mental status that ma y ha ve a direct impact on academ ic
perform ance. For example, obesity is associated with obstructive sleep apnea which impacts energy
levels and neurocognitive impairm ent (See et al. [2006]) and being obese may also cause lo w self
esteem which leads to classroom disengagement that may reduce academ ic performan ce. Oth er
health status suc h as being diagn osed with AD H D or clinical depressio n ma y directly aect a
child’s attention span, whic h adversely aects her academ ic outcomes. Third, a ch ild’s health
status may aect the w ay her teac hers, paren ts and peers treat her; this in part shapes the learning
en viro nm ent that she encoun ters. For example, obese c hildren are often less popular among their
peers and teachers . Depressed c h ildren are associated with personal distress, and if the state lasts a
long time or occur repeatedly, they can lead to a circumscribed life with fewer friends and sources of
support (Klein et al. [1997]). The rst t wo c hannels directly aect own health input (both ph ysical
and m ental) in the ed u cation pr ocess while the latter scenario inuences a child’s education outc om e
through other inputs such as peer quality and teacher attention that is the result of a certain health
Ideally we w o uld like to disen tan gle the eect of obesit y on education (the structural parameter)
from that whic h is due to the impact of the environ m ent resulting from being obese. If paren ts,
sc hools or peers are responding to negativ e health outcomes b y increasing in vestment in to other
inputs this ma y oset the deleterious eects of poor health on achiev em ent. Conv ersely the response
of these individuals could mo ve in a direction that reinforces the deleterious impact of health such
as discrimination. For example, parents may decide not to invest or invest less in a c hild’s education
due to observed health status of their c hild. Since our data lac ks informatio n on fam ily and sch ool
inputs as w ell as peers, we will obtain a com bined (reduced form param eter) impact of health on
4.2 The Estim atin g Equ atio n s
Linearizing the ac hievement relationship (equation 6) yields
T 1
where δ
= α
for som e coecient α
. The components of equation (7) ma y include higher
order and interaction terms. We re-express the achievement function as
= β
+ β
+ β
+ β
where the vector X contain s individual and family characteristics (gender, race, residentia l smoking
status, responding parent ch aracteristics), the v e ctor H is a vector of variables that captures current
predeter m ined health measures.
Similarly both the health production function in equation (3)
and the decision to engage in health beha vior equation (1) can be expressed as follow s:
= γ
+ γ
+ γ
+ γ
= δ
+ δ
+ δ
+ δ
Instrum ental variable methods are used to estimate the above system of equations ((8) - (10)) to
generate consistent estimates of the causal impact of health on education (β
relies on the a ssum ption that the v ectors of genetic markers that impact h ealth beha v iors an d h ealth
outcomes (G
and G
) are unrelated to unobserv ed componen ts of equation (8). While there is
absolutely no evidence for the former assump tion that the markers considered in this study have
any impa ct on the education production process, it remain s possible.
5.1 Basic P a tte rn s in the D ata
5.1.1 Losing the genetic lottery?
We begin by demonstrating that Mendel’s law of independen t assortment is supported by the
GATOR data and that there is substantial unique variation from eac h of the markers and their
in teractions. Summary information on the genetic markers in our data is pro vided in Table 2. The
DAT genotypes are classied with indicator variables for the nu mber of 10-repeat alleles (zero, one,
or two). We include indicator variables for the a vailable AA, AC and C C genot ypes o f the TP H gene.
Similarly, the DRD2 gene is classied as A1/A1, A1/A2 or A2/A2. Finally, w e include indicator
variables for the available CC, CT and TT genot ypes of the CYP gene. The rst column of Table
2 provides the raw number of individuals who possess each particular mark er. Excluding the TPH
gene, the majority of individuals in our data are homozygous of A2/A2 (for the DRD2 gene), CC
(for the CYP gene) and have two ten repeat alleles of the D AT gene. For each of these genes the
heterozygous com bination is the next most populated and the remaining homozygous combinations
of the CYP (TT) and DRD2 (A1A1) genes are rarest. For the TPH gene there is nearly an equal
n u mber of people who possess either the heterozygous A C or homozygous CC com bina tion, with
AA being the rarest.
The en tries in the remaining columns of Table 2 indicate the nu mber of people in eac h row that
also possess one of the rare polymo rphism s of the other genes along with the conditional probabilit y
of possessing this com b ination . Each cell in the table is populated with at least two individuals
and there does not exist any systematic relationship between the dierent genetic polymorphisms.
Thus, havin g a rare polymorphism for one gene does not make it more or less likely that y ou would
have a rare allele combinations in another gene. These results are consistent with Mend el’s la w of
independen t assortment and are encouragin g as they do not lend support to correlations between
markers of dierent genes.
5.1.2 Candidate Genes for Adolescent Health
To justify our four sets of genetic markers and their two by t wo polygenic interactio ns to explain
health beha vior and status we begin by examining whether there are indeed simple dierences in
health measures bet ween individuals with dierent genetic mark ers. Table 3 presen ts summa ry
information on health measur es for each genetic marker. Each cell contains the conditional mean,
standar d deviation and odds ratio of alternative health outcomes for individuals that possess a
particular marker.
For eac h genetic marker, there exists a substantial dierence in the occurrence rate of at least one
of the health outcomes and behavior s.
Individuals with the AA polymorphism of the TPH gene
have substan tia lly higher propensities (relativ e to the AC and CC markers) for smoking and obesity
respectiv ely. For the CYP gene, those with the rare TT polymorph ism are signicantly more lik ely
to be diagnosed with inattention (AD) and h yperactivit y (HD) relative to those with the common
CC marker. For the DRD 2 gene, individuals with the commo n A2A2 allele are signicantly less
lik ely to be diagnosed as depressed or obese relative to DRD 2 markers that contain an A1 allele.
with ADHD and less lik ely of a depression diagnosis. Individuals that have no 10- repeats (DAT0)
are associated with sligh tly higher smoking rates. T hese results clearly demonstrate that the four
sets of genetic markers ha ve statistically signican t associations with our health measures.
5.1.3 HealthandEducationOutcomesinAdolescence
The well known positive association bet ween good health and educational outcomes is also observed
in the data. As indicated in Appendix Table 2, individuals diagnosed with ADHD, depression and
obesity respectively hav e on average GPA scores that are 0.26, 0.18 and 0.43 lo wer than their
coun terparts. These dieren ces are statistically signicant (one sided t-tests). The raw GPA gap
of individuals with ADHD or obesit y relative to those not diagnosed increases from grades 10 to
12 b y appro x im ately 20%. Wh ile the gap bet ween depressed and non-depressed c h ildren does not
vary through grades, cigarette smok ers close their GPA gap with non-smokers from 0.58 in grade
10 to 0.49 in grade 11 and 0.37 in grade 12. This is somewh at misleading as many adolescents
start smoking o ver time. These new smokers ha ve substantially higher GPA scores than long-term
smokers. Betw een grade 10 and grade 12 long-term smok e rs consisten tly ha ve GPA scores that are
appro ximately one half poin t lower relative to non-smok ers.
Not only do smokers have lo wer GPA scores but they also ha ve a higher propensity of being
diagnosed with negative health status. Individuals with each health disorder are signicantly more
lik ely to be smokers at the 1% signicance level.
The largest gaps occur for individuals diagnosed
with either inattention or AD H D whose smoking rate is over 250% higher than the remaining
populatio n (33% of individuals with A D H D smok e versus 13% of the rema ining indiv id uals an d 39%
of individuals with AD smoke versu s 12% of the remaining population). The propensity to smoke is
t w ice as high among adolescents with h y peractivit y (HD ) relative to those not diagnosed with this
disorder. Lastly, adolescents diagnosed as obese or depressed are associated with approximately
50% greater smok ing propensities ver su s the remaining sample.
Comorbid conditions poten tially pose a major statistical cha llen g e for identication . Table 4
presents some summary inform ation on the presence of comorbordities in our full sample.
1 of Table 4 displays the number of individuals (and marginal distribution) in eac h wave who smok e
or have been diagnosed with either AD, HD , ADHD , obesity or depression. Across each ro w we
present the nu mber of individuals (and conditional frequency) who also engage in smoking or suer
other poor health outcome s. No t only are adolescents who are diagnosed with ADH D more likely
to smoke but they also have a higher rate of being diagnosed as either clinically depressed or obese
than their cohorts (one sided t-tests). This result is not unique to ADHD as we nd that individuals
diagnosed with an y of these health disorders are signicantly more likely to engage in smoking than
those not diagnosed in grade 12.
Since health disorders and risky health behaviors are more common among individuals diagnosed
with one particular disorder than among the remaining populatio n we will in vestigate whether
estimates of the impacts of a disorder vary if we do not con trol for comorbidities. The majorit y of
the literature on the im pacts of hea lth generally include only single outcom e measure such as obesit y,
smokin g or birthw eight in their analysis. Estima tes of the impact of health disorders may vary if
there are both strong correlations between included and om itted health outcom es and if the omitted
health outcomes have a signicant impact on the dependen t variable. Our genetic instruments are
unlikely to be unique to specic disorders as they are associated with the same region of the brain.
Thus even with the genetic instruments, excluding signicant comorb id conditions may result in
estimates of the impacts of included disorder proxying for the eects of the omitted outcomes.
5.2 Estimates of the Empirica l Model
Ordina ry least squares estimates of equations (8) that ignore the endogeneity of health outcom es
and smoking beha vior are presented in the top panel of Table 5.
In our analysis w e consider two
dierent health v ectors. The rst health v ector includes depression, obesity and AD HD. The results
are reported in columns 1 - 3. The second health v ector (results reported in columns 4 -6) includes
depression and o besit y but deco m poses the diagnosis of ADHD into being clinically inatten tive (A D )
or clinically hyperactive / impu lsive (HD). Results for the full sample are presen ted in columns 1
and 4, for the sample of females in columns 2 and 5 and the male sample in columns 3 and 6.
As shown in column 1 of Table 5, the impact of each health disorder in the rst v ector is
negatively and sig nicantly associated with academic performance for the full sample. The negative
impact of obesit y is larger than the magnitude of the other health outcomes. On a verage obese
individualshaveaGPA0.34pointslower,aneect that is larger than that from an y race or family
variable. Columns 2 and 3 present the results for the subsample of females and males respectiv ely
andeachhealthoutcomeisnegativelyandsignic antly related to achiev e m ent. The nega tive imp a ct
of obesity is approx im ately eigh t times the magnitu de of being depressed for females. In con tra st to
the results for the girls, the magnitude of the coecients does not vary across the health outcomes
for boy s. Finally, both the negative impact of the household smoking environment variable and
positive impact of whether the biological parent is present is nearly twice as large for boys than for
Decomposing the impact of ADHD in to its components, columns 4 to 6 of Table 5 indicate that
AD w a s responsible for the negative coecient of AD H D in column 1. For the full sample, HD is
positively associated with academic performan ce but the coecien t is not signican t at the 10%
level. The impact of obesity relative to depression remains large for females but for boys in column
6 there is a strong negativ e association bet ween AD and GPA. Interestingly among Asians, females
performed signicantly better than the ir Cau casian counterpart s.
5.2.1 Endogenous Health Outcomes and Health Beha viors: First-stage Estimates
A potential challen ge exists in selecting an appropriate subset of the mark er s in our data to serve
as instruments. The scientic literature provides some (arguably w eak) guidance as the evidence
tends to be inconsistent across studies.
We present and report results from a parsimonious set
of instrum e nts selected by forw ar d step w ise estimation and we used twelv e dierent sets using
alternative selection criteria to verify the robustness of our ndings.
We do not vary our instrument
set across gender so that an y observ ed dierence in terms of health eects is not the result of the
selection of dieren t instrumen t sets that are gender variant.
Statistically, for the markers to serve as instrum ents they must possess two statistical properties.
First, they m u st have a substantial correlation with the poten tially endogenous health variables.
Second, they must be unrelated to unobserved determinan ts of the achiev ement equation. Ta-
ble 6 presents results from two specication tests that examine the statistical perform an ce of the
instruments for each health equation and sample.
In the top panel of Table 6 we present estimates of the F-statistics of the joint signicance of the
instruments in the rst stage regressions. For each health outcome and health behavior with eac h
samp le , the instrument set is join tly statistically signicant at a lev el above curren t cutos for w eak
Since our 2SLS estimates (presen ted in the next sub-section) are o ver-iden tied, we
use a J-test to formally test the overiden tifying restrictions. T his test is the principle method to
test whether a subset of instrum ents satisfy the orthogon ality conditions. Th e associated p-values
for these tests are presen ted in the bottom panel of Table 6. The smallest of the ve p-values is a
reassuring 0.21, provides little evidence against the overidentifying restrictions. In addition many
of the p-values are large and exceed 0.5.
5.2.2 Endogenous Health Outcomes and Health Beha viors: Second-stage Estimates
Two stage least squares (2SLS) results for the ach ievement equation (8) for the two health vectors
is presen ted in Table 7. Colum n one presents results for the full sample and depression and obesity
are signican tly related to academic performance. The impact of depression is approxim ately three
times larger than the OL S estimate presented in Table 5. When ADH D is broken into componen ts
(AD and HD) the impact of depression decreases by roughly a third but remains statistically
signicant as sho w n in column 4. Hyperactivity and impulsiveness is positiv ely related to academic
performance and is signicant at the 20% level. In contrast, the portion attributab le to AD is
negatively related to GPA and statistically signicant at the 20% level. Th ese impacts wo u ld
appear drastic in light of the prevalence of the over prescription of beha vior al drugs among sch ool-
age children (Eberstadt [2004]).
The results fo r the su bsample of females in colum ns 2 an d 5 are most striking. With health vector
one, only obesit y is signicantly related to academic performance. With health vector t wo, both
depression and obesity lead to signicant d ecreases in GPA. The impact of d ep ression is substantially
larger than that obtained using OLS. In con tra st, for the subsam ple of males in columns 3 and 6,
health outcomes are no longer statistically signicant once we correct for their endogeneity. For
each sample and health v ector w e ch eck ed whether health status should be treated as endogenous
b y testing the n ull hypothesis that the OLS and 2SLS estimates are equal using a Hausman-Wu
We can reject the Null of exogeneity of health outcomes for eac h health v ector with eac h
samp le at the 5% lev el.
Ther e are seve ral additional dierences between the estimates for males and females.
girls are associated with higher GPA scores among females. Hispa nic boys have signicantly lo wer
GPA among the males. T he magnitude in the 2sls estimates increases relative to OLS for the
boys but diminishes by appro x im ately 40% for girls. We should emphasize that our variable indi-
cating whether a smok er resides in the household is a proxy for family env ironm ent that w e lack
direct informa tion on. Concerns regarding whether a smoker residing in the home ma y represen t
inheritability of genes from biological parents were examined . First, the ra w association bet ween
biological parents ha vin g been regular smokers and the presence of a smok e r in the household is
35%, within the households that smoke approximately 65% of the smok ers are other family mem-
bers. Second, w e replicated the analysis in Table 7 excluding this proxy for home en vironmen t, the
magnitude as w ell as the statistical signicance of the health disorders were unc hanged for all three
samp les and two health vecto rs.
To demonstrate the robustness of our results, Appendix Tables 4 presen ts results for the male
and female subsample that correspond to their preferred instruments sets using step w ise estimation
on those subsamples. While the rst stage properties in Appendix Table 4 are improv ed , a eyeball
test conrms that there are no important statistical dierences between these estimates and those
using the instruments set constructed for the full sample with health vector 1 in Table 7. Similarly,
combining the separate instrum ent sets for males and females a nd estimating the system of equations
for the full sample yields no observable dierences. For females with health vector 2, the negative
impact of AD increases substan tially and become s statistically signicant. Sim ilarly, the impact of
depression and obesity increases by 25% with this alternative instrument set. O verall, the results
con tinue to demo nstrate that females suer large decreases in their GPA when they ha ve been
diagnosed with depression or are obese; whereas no signicant relationships exist for the males.
5.2.3 Discussion
The para meter estimates we obtain should be view ed as r educed form coecients that might include
dynastic eects.
Information on parental and teac her investment as well as peer group composition
is not available to disentangle the impact of the health condition as explained by genes from that
of the response from the environmen t to the health conditions as explained by genes. While this
appears unsatisfying, this limitation is also implicitly shared by other empirical strategies used
to estimate the impact of health on education which generally either treat genetics as part of a
big blackbox that can be eliminated under strong assumptions or propose the use of alternative
instrumental variables such as an individual’s pheno type. The availabilit y of genes as instrumental
variables for the rst time ma kes it crystal clear the level of dicult y in obtaining structural
parameter estimates and the importance of detailed accurate information on health and education
Further, structural parameters of this kind even if they could be obtained, ma y quic kly
becom e in valid ev ery time a new (medical) treatment is developed that c han ges the occurrence rate
or severity of these disorders’ negative impacts.
The use of exact measures of genes permits us to enter what traditionally has been a blac k box in
empirica l economics. Studies that exploit variation within siblings or within t w ins not only assume
that the set of genetic factors do not vary between pairs but implicitly the impacts of these factors
and unobserved (to the analyst) family in vestments are constan t betwe en family mem bers. Most
unsatisfying is that one can not test the validity of these two assump tion s and if they are refuted
biases could increase from dierencing.
Increasing scientic evidence sho ws that monozygotic
h u m an tw ins are discordan t in man y ph ysica l traits and diseases which is not only ascribed to
en vironmen tal factors but also epigenetic modications.
Epigen etics refers to DNA and chromatin
modications that play a critical role in regulation of various genomic functions. Essen tia lly a
substantial degree of epigenetic variation can be generated during the mitotic divisions of a cell
in the absence of any specic environmental factors. Th is variation which results primarily from
stoc h astic events is either assumed to be the same in the sibling and t w in dierencing strategies
or has zero impacts on outcomes. In the social sciences, research er s often consider sibling xed
eects model as they poten tially control for (unobserved) paren tal c h aracteristics and could allow
the researcher to exp loit a genetic lottery between fam ily members. Yet, this emp irical strateg y does
not eectiv ely deal with endogeneity bias that results from either paren ts adjusting their fertility
patterns in response to the (genetic) quality of their earlier children or which results from dierential
time varying in vestments across siblings. T h ese two factors have strong empirical support in the
social science literature and within evo lution a ry biology models of human fertilit y.
Further, this
empirica l strategy is inconsistent with many underlying econom ic models (Rosenzweig and Wolpin,
[2000]) and it implicitly imposes an assumption of strict exogen eity on the explanatory variab les
in the model which rules out the possibility that predeterm ined characteristics of the siblings ma y
either purposely or inadv ertently inuence eac h other. This assumption directly contradicts the
a vailable evidence that indicates sibling behavior is a stronger risk factor for health behavior than
paren tal behavior (Rajan et al, [2003], Vink et al. [2003] and Avenev oli and Merikagnas [2003]).
As noted, the use of genes as an instrum ent presen ts a cha llenge in regards to in tergen eration al
transm ission. It is w ell kno w n that ospring of parents with psychological problems are more likely
to develop these disorders. For example, it has been estimated that 40% of child ren with depressed
parents experience psych iatric disorders b y the age 20 (Beardslee et al. [1998]). Dat a from the
Minnesota Twin Family Study nds a w eak positive association between maternal depression and
ospring depression but does not nd an y evidence of an association between paternal depression
with either maternal or ospring depression. The mechanism by whic h parental disorders inuence
ospring psyc hop ath ology has not been clearly established. While we lack direct informa tion on
parental diagnoses, w e use knowledge of comorbidities to construct pro xies. That is, w e use parental
smoking to proxy for paren tal health. We estimated varian ts of our principal empirical model where
we separately as well as joint ly included variables on whether the responding parent is currently
smoking or has ever smoked as well as whether the subject reported that at least one o f his b iological
parents smoked in their lifetim e as additional control variables. Our r esults were both quantitativ ely
and qualitativ e ly robust to the inclusion of these paren ta l health measures. This result is not a
surprise since our genetic markers possess severa l properties that increase our condence in their
conceptual validit y as instruments. First, the genes we consider are pleiotropic and second they
cannot credibly account for the majority of the variation in the diagnoses of these health disorders.
Thus, even if a paren t possessed the same markers for any of these four genes as their child, this
would neither guarantee that they suer from the same disorders nor that these particular genes
would aect the parent and child in similar fashion.
Our coecient estimates may also capture a dynastic eect of the impact of health disorders.
Without more detailed data on parental diagnoses as w ell as parental genes we can not separate
out the portion of the impact that is uniquely brought on by the c hild ’s condition. As a result,
this eect ma y include the impact of family en viron m ents pro v ided by depressed paren ts who se
depression can be explained b y exactly the same set of genes and genetic in teraction terms that w e
selected to explain the c h ild’s depression in our study. This dynastic eect is of policy relevance
since individuals are in general not random ly assigned to families and policymakers are generally
in terested in the total im pact of these d isorders. Similarly if the assortative mating process is stable,
then the dynastic eect is importan t to recover since kids with certain disorders will increasingly
come from fam ilies that also have this disorder. It is also w orth noting that there is limited evidence
that individuals seek out partners with similar genetic mak eup. Animal studies on mate c hoice have
shown that both signals of genetic quality and genetic div er sity play important roles whose relative
weight varies according to the respective ranges of these c h aracter istics in the study population.
The pursuit of genetic diversity serves to w eaken intergenerational correlations, especially on adv er se
health attributes.
A concern ma y exist regarding the conceptual validit y of the instruments since dynastic eects
may suggest that the genetic markers we consider inuence academic outcomes through channels
outside of c h ild health status. Conceptually since the estimatin g equations used in Table 7 include
predeterm in ed outcomes of the responding parent suc h as education as explanatory variables, should
the identical markers man ifest in the same mann er within a family w e are directly accoun tin g for
all these predetermined impacts of genes on parental outcome s that subsequently aect the ch ild’s
educatio n. Emp irically the quantitative and qualitative patterns of our 2sls results are robust to
the exclusion of information on the paren ts and other family mem bers education, smoking and age
which further increases our condence in the validity of the instrumen ts.
To summ ar ize, the genetic mark ers we employ in our study are predetermined to any in teraction
that the adolescen ts ha ve with the en vironm ent, even those interactions suc h as pre-natal care that
occurinuteroandaect measures suc h as birth weight and APGA R scores. They possess strong
correlations with certain health disorders and health outcomes. At present there is no detectable
evidence that they are correlated with genetic factors that associate with inputs to either innate
ability or the development of in telligenc e. We are not ruling out the possibilit y that these genes af-
fect the acquisition of intellige nce but rather w e are assuming that these genes neither directly ent er
the education production process nor correlated with genes directly involved in production of these
education outcomes. The assumptions underlying these markers for iden ti cation are supported by
both statistical tests and the scientic literature. Not only can these assump tions be tested but we
argue that this strategy imposes substan tially weak er assump tions on the relationship between na-
ture, n u rture and adolescent outcom es th an other emp irical str ategies used in the literature. Despite
these advances substan tia lly richer data w ou ld be needed to recover the structural parameter.
5.3 Accounting for Endogenous Cigarette Smoking Matters
W ith genetic mark ers as instruments we can investigate the exten t to which smoking is a c h oice
variable. Pa st researc h in econom ics has suggested that smoking could pro xy for an individu als’ dis-
coun t rate and have implicitly assumed that smoking does not reect a choice.
Treating cigarette
smokin g as an exogenous input to health outcomes presen ts striking chang es to our results. Table
8 presen ts 2SLS estimates of equations (8) and (9) that assume this c hoice is exogenous. Notice
that the magnitude of all health outcomes in Table 8 increases markedly from those presented in
Table 7, where smoking w as treated endogenous. Most surprising is that b y treating smoking as an
exogenous behavior, the estimates on the impact of depression, HD and obesit y become statistically
signicant for males. The results suggest that being obese leads boys to score 0.8 poin ts higher
on their GPA. The sign and magnitude of this estimated impa ct seems implau sible. For the full
samp le and subsample of girls, the estimated impact of depression nearly doubles in magnitude.
In addition , AD H D becomes statistically signicant for the full sample. F ina lly, the estim ate s on
AD and HD for girls become implausibly large but con tinue to oset one another. The implausible
magn itu de of these coecients are a result of both limited independent variation to separately
identify impacts and the use of smoking as an invalid exclusion restriction.
We conducted a Hausm an test of each health status equation for eac h vector in Table 8 by
comparin g it to the corresponding equation in Table 7. We can reject the Null of exogeneity for
y ears of cigarette smoking, suggesting that smoking is indeed a c ho ice variable. Our investigation
in to the endogeneity of smoking show s that despite the use of genes as instruments for the health
outcomes, the dierent w ays of accoun tin g for the smoking decision leads to very dierent results.
This could result from the fact that genes associated with smoking tendency are also associated
with health disorders and that smoking ma y ha ve a direct impact on our health disorders.
To further investiga te whether smoking patterns do indeed hav e dieren t relationships with
diagnosed health disorders between the genders w e presen t OLS and 2SLS estimates of the impacts
of smokin g on each health outcomes for eac h sample and health v ector in Appendix Table 5.
Wh ereas smoking is positively associated with eac h health outcome when treated as exogenous (in
the bottom panel), the 2sls estimates present dierent patterns. Sm ok ing is positively related to
depression and negativ ely related to obesit y once we account for endogeneity as reported in column
1. Further, bo ys who smoke are signicantly less lik ely to be diagnosed with h yperactivity but more
lik ely to be diagnosed with depression and inatten tion. In con tra st, females who smoke are less
lik ely to be diagnosed with depression. These gender dieren ces add a further layer of complexit y
and support the possibilit y that smoking patterns account for some of the gap in the impacts of
health disorders on education bet ween the genders.
5.4 Accou nting for Co m o r bid Hea lth Outc om e s Matte rs
We now consider what, if an y, eect it would have on our estimates if we follo wed the usual practice
of ignoring comorbid conditio ns and only include one health outcome in the ach ievement equation
at a time. Two stage least squares estimates are presen ted in Table 9, where each en try refers to
the poin t estimate of that health behavior from a system of equation that included the achievement
equation, the particular health outcome and health behavior.
Exam inin g results from sepa rate regressions using the full sample, w e would conclude that
inattention is positiv ely and HD negatively related to GPA, whic h is the opposite of the pattern
reported in Table 7. The results for the subsample of boys completely cha ng e wh en com o rbid
conditions are omitted. Obesit y, AD and HD are all positiv ely related to academic performance
and the magnitude of the impact for obesity is extremely large. Sim ilarly, for the full samp le and
subsamp le of girls the impact of depression is approximately 40% larger as it may be capturing a
portion of the negative impa ct of obesity or ADHD . Taken to gethe r, the results of Table 8 and Table
9 illustrate the need to accou nt for a greater set of related health ou tco m es and endogenous beha v iors
in an y analysis. E ven with exogenous instrum ents such as genes to correct for the endogeneity of
health status, the omission of comorbid conditions and behaviors ma y presen t a mislead ing picture
of the causa l relation bet ween particular health states and academic perform a nce amon g other
Due to the high comorbidit y in health conditions as demonstrated in our study and the lack of
exogenous variations that can explain one particular condition only, the coecient for one particular
health condition such as obesity may reect the composite eect of several health conditions, thus
the reliability of that coecientisdependentontherichcontrolswehaveonmostofthecomorbid
health conditions. Without the ric h information on health, most of the exogenous variations cannot
identify the impa ct of one conditio n only.
Under standin g the consequences of growing up in poor health for adolescent development is an
importan t research question. This question is particu larly in teresting to policymakers since part
of the explicit rationale for programs suc h as Medicaid is to impro ve the development of ch ildre n.
How e ver, it is c ha llenging to address due to endogen eity that arises from omitted variables and
measur em ent error problems pertaining to health.
In this paper, we use information on genetic mark ers to o vercome these challenges and identify
the causal eect of health on education via an instrumen tal variables strategy. The explicit use of
genetic markers in empirical social science researc h is becoming possible due to an ev er increasing
understanding of how genetic inheritance relates to individual health outcomes as w ell as knowledge
from the hum a n genome project. This kno w ledge increase the conceptual validity of the instrument
since i) the markers are inherited at conception prior to any in tera ction with the environment elimi-
nating concerns related to reverse causalit y, ii) a large literature reports robust correlation between
the mark ers and health variables we consider in this study suggesting that the correlations are not
spurious, iii) studies of genetic inheritance indicate that the assignment of the markers we consider
are independent of hereditary factors associated with the development of intelligenc e, and iv) while
these genes are pleiotropic they only inuence academic outcomes through adolescen t health status
c ha nn els as w e directly account for predetermined paren tal education outcomes. Empirically, sta-
tistical tests conrm the strong correlations in the rst stage relationship as well as do not support
the o veridentifying restrictions. Hausm an tests further speaks to the strength of our instruments
as the IV estimates are statistically dieren t from the OLS estimates whic h also indicates that we
should treat health as an endogenous input to education. Further, the quantitative and qualitative
patterns of our empirical results are robust to the inclusion of information on the parents and other
family members in the estimating equations.
Using these genes as a nov el source of iden tication we n d that the impact of poor health
on academic ac h ievement is large. Depression and obesit y both lead to a 0.45 poin t decrease on
GPA, which is roughly a one standard deviation reduction in performan ce. There exists substantial
heterogeneity in the impacts of health status on academic performance as female adolescen ts are
strongly adversely aected b y negative phy sical and mental health conditions, whereas males are
not signic an tly imp acted.
Several results from our empirical investigation have importan t implications for the health eco-
nom ics literature. First, w e nd in explaining health status researchers m ust account for comorbid
health disorders. Since many individuals suer from more than one disorder, ignoring related ill-
nesses may lead to some misleading conclusions regarding the impacts of one particular disorder
when it is examined in isolation. This issue is particularly cha llen gin g since one can not easily
o vercome biases when measuring a particular health state with error using an instrumental vari-
ables strategy, unless there exists an instrum ent that can clearly disen ta ngle the variation bet ween
related comor bid disorders such as AD H D and depression. Second, w e make a clear separation of
health outcomes (a state variable) from health behavior (a con t rol variable) in our theoretical and
empirica l analysis. Em pirica lly we nd that treating health behaviors as exogenous or ignoring
como rbid conditions would lead to either dierent signed estimates or substan tially larger impacts
of health on education. Since health behaviors only explain a limited amount of the variation in
health status and could result from as well as cause certain particular health states, accounting
for this path way is importan t as it could rev eal a dynamic relationship that could be exam ined in
future w ork.
The results also suggest that future research is needed to improve our understanding on wh y
females and not males are so adversely aected by poor health outcomes. For example, responses
to a variety of psychological questionna ires can be used to shed light on possible dierences be-
t ween females and males in their self-perception. Future research could also incorporate additional
dynam ics such as ho w parents, teachers and peers respond to an individual’s c han ging health state
to explore more deeply some of the sources for this heterogeneit y.
Finally, measures of genetic markers could also be used in other lines of research in the social
sciences. One could use them as a source of identication to assess the impact of health as a
form of hum an capital on many outcomes such as labor market activit y, marriage and educational
attainment. Researc h ers could also in vestigate whether n urture inputs or family c h aracteristics can
oset the impact of genetic predispositions. In conclusion, recent y ears have witnessed an explosion
of ndings on the causes and correlates of health outcomes and behaviors in neurobiology, which
could oer a promising source of predetermined exogenous variations to help identify the impact of
health on a set of outcomes of great interest to economists.
The importance of genetic factors to behavioral char acteristics and health outcomes has been
noted throughout history. The passage of ph ysical and disease traits from paren ts to osp ring
was rst explicitly studied and modeled by Grego r Mendel in the 19th centu ry. Since this work
more sophisticated studies of laboratory anima ls as w ell as compariso ns bet ween monozyg otic and
dizygotic h um an tw ins dem onstra te that beha v ioral c ha racteristics and econom ic as w ell as health
outcomes w ere in part link ed to genetic inheritance. Most recently, Cutler and Glaeser [2005]
compares the correlation of health beha viors bet ween monozygotic and dizygotic twins and conclude
that approximately 72% of the variation in obesity and 30% of the variation in cigarette smoking
are due to genetic factors.
These impac ts should be viewed as reduced form parameters and our analysis will clarify the
diculties in estim a ting the structural health parameter. In Section 5.2.3 we discuss issues surround-
ing identication that include intergenerational transmission, poten tial dynastic eects, assortative
matching and ideal data requirements. We also discuss how using genes as instrumen ts to identify
the impacts of health oers sev eral benets over alternative empirical approaches, most importan tly
we can directly test the identifyin g assumption s.
This evidence summ arized in Section 2 suggests that possessing the genetic markers considered
in our study indeed increases the sensitivity of individuals being diagnosed with certain health
disorders. Second, there is no detectable evidence that the markers w e consider are correlated with
other genetic factors that associate with either innate ability or the dev e lop ment of intelligence.
Note,wearenotrulingoutthepossibilitythatthesegenesaect outcome measures of in telligenec e
but rather we are assumin g that these genes neither directly enter or correlate with the genes
directly involv ed in the education production process; they only aect ac hievement through health.
This correlation has been explained in three w ays that are not necessarily mutually exclusive.
The rst hypothesis is that education increases health through productive or allocativ e eciency
(Grossm a n [1972], Kenk el [1991]). The second h y pothesis is the converse that poor health results
in little education (P er ri [1984], Currie and Hyson [1999]). Finally, others ha ve suggested that this
correlation could be caused by a third unobserved variable (e.g. discoun t rate) that aects both
education and health (Fuchs [1982]).
Grossm an and Kaestner [1997] note that the majorit y of the empirical literature reports corre-
lationsandfocusesontheeect of education on health . Strauss and Thom as [1998] present a surv ey
of the literature on the relationship between health and income. More recently, Bleakley (2006)
presents evidence that cohorts who w ere exposed to a large scale public health in terven tio n against
hookworm in c hildhood were associated with larger gains in income and higher rates of return to
schooling later in life.
Chapter 3 of M ental Health: A Report of the Surgeon General clearly states that "appro xim ately
one in ve children and adolescents experiences the signs and symptom s of a D SM -IV disorder during
For ex ample, see Behrman, Rosenzweig, and Taubman [1994], Currie and Hyson [1999], Behrman
and Rosenzw eig [2004] or Almond, Cha y and Lee [2005].
Note, more recen t evidence suggests that not all hereditary factors assort independently but
that those which are located close together on the same chromosom e tend to be inherited as a
unit, not as independent entities. This property is termed linkage in the genetics literature. This
does not threaten our analysis as the markers w e consider are i) not in close proximit y to those
markers believ ed to associate with intelligen ce , and ii) not located close to eac h other. Further,
the National Center for Biotechnology Information in their online science primer on the genom e,
interpret the evidence on linkage as being random across individuals but note that some regions on
the c hr omosone are more likely to have links than others.
Our literature surv ey indicated that studies have examined whether associations exist between
approximately 300 dierent genes and ADH D (e. g. Co ming s et al. [2000 ] examined 42 in their
study alone).
Dopa m ine has been called the “pleasure” c h em ical of the brain because people who are electri-
cally stimulated in the lim bic dopaminerg ic centers of the brain report in t ense feelings of w ell-being
and sometimes orgasm.
Certain food and drugs such as nicotine or caeine can ha ve an especially pow erfu l eect on the
reward cen ter of the brain as they mimic or poten tiate the eects of neurotransmitters that occur
there naturally. Th is process is often described as a mo lecular “hijacking” of the reward path-
way. For example, nicotine has been sho w n to increase levels of synaptic dopamin e b y stim u lating
dopamine release in the VTA (Di Chiara and Imperato [1988]) and inhibiting dopamin e reuptake
in the reward pathway (Carr et al. [1992]).
This nding w as rst reported in Blum et al. [1991].
See Audrain-M cG overn [2004] and Epstein et al. [2002] and the references within for evidence
on these associations.
Bannon, Granneman, and Kapatos [1995] presents an o verview of the SLC6A3 gene. The
SLC 6A 3 gene has been implicated in Parkinson ’s disease (Seeman and Niznik [1990]), attention
decit disorder (Cook et al. [1995]) and Tourette’s syndrome (Connors et al. [1996]).
The length is associated with the nu mber of variable tandem repeats on eac h mark er. Each
repeat increases the amount of reuptake protein. The majority of individuals ha ve SLC6A 3 alleles
with lengths of 9 or 10 base pairs, where the length is positively associated with levels of DAT
protein. Note the SLC6A3 loci may also take the form of 7- repeat, 8-repeat, 11-repeat or 12-
repeat; eac h of which is extremely rare in both the population and our sample.
See Lucki [1998] for evidence of these associations.
See Lerman et al. [2001, 2003] for a discussion.
At present, these are the only genes that have been collected for the full sample. For subsets of
approximately tw o h undred subjects information on the COM-T, CYP2A6 and OPRM1 genes are
also available. As we discuss later in this section we use specic assays for each gene product as these
methods are substan tially more accura te (lo wer m isclassication rates) than newer technologies
which can provide information on large sections of the genome or gene expression.
A total of 21 high sc hools exist in this county. Using data from the NCES CCD we did not
nd an y signicant dierences in studen t demograp hics or standard sc h ool input measures bet ween
sc hools included and excluded from the samp le. Note, w e cannot identify this coun ty by name,
but is large and auent as it con tained o ver 950,000 residen ts with a median household income of
$70,000 in 1995.
Students who the p rincipals indicated special class placement, such as a severe learning disabilit y
or dicult y speaking and understanding the English language w ere excluded from the study. In
total 273 students or 11% of the total population w ere excluded.
For example in conducting SLC6A3 genotyping the follo wing assay w as conducted. DNA (25
ng)wasmixedwithprimers(20pmol),GeneAmpPCRbuer (10 mM tris-HC l pH 8.3, 50 mM KCl,
1.5 mM MgCl2 , and 0.0001% gelatin; Perkin Elmer, Norwalk, CT), Amp lit aq DNA polymerase
(2.5 μ; P erkin Elmer, Norw alk, CT), and 2’-deoxynucleotides-3’-triphosphates (144μM;Pharmacia,
Piscataway, NJ) in 50-μl total volume. The reaction conditions included an initial melting step
C; 4 min) follow ed by 35 cycles of melting (94
C; 1 min), annea ling (65
C; 1 min), and extending
C; 1 min). The VNTR repeat was then determined with a 4% agarose gel electrophoresis (3:1
n usieve:agarose). The authors would be happ y to provide full details on the assays for the other
markers by request. Note eac h assay wa s validated by conrming a polymorphic inheritance pattern
in sev en h um a n family lines encompassin g three generations.
Quality con tro l procedures included positiv e and negative controls with each assay and inde-
pendent repeat genotyp ing for 20% of the results. The rate of discordance w as less than 5%, and
ambiguous results were not reported. In total, genetic information wa s obtained for 1032 subjects.
Students without parental consent completed classroom assignments during the administration
of these surveys. Classroom teachers and sc hool administrative personnel did not participate in the
surv ey portion of the research, nor w ere they permitted to view participants’ responses. Studen ts
were identied on the completed survey b y an identication number and during each wave a member
of the research team read aloud a set of instructions, emphasizin g cond entiality to promote honest
responding, and encouraged questions if survey items were not clear. To minimize missing data,
make-up days were sc heduled for those adolescen ts who w ere absent during the regular survey
administration. Further, surveys w ere mailed to the homes of students who had either switched
sc hools or dropped out of school.
Barkley and M urphy [1998] describe the scoring algorithm n. Being diagnosed with AD H D
means that an individual has been diagnosed with either AD or HD. It also does not make a
distinction bet ween individuals with one or both disorders. It is important to state explicitly that
we are not focusing on diagnosed cases but rather on responses to questions whic h are used to
construct a diagnosis known only to researchers.
Our results are robust to alternative cutos for obesit y, ADH D and depression.
Data at the school level was obta ined from the CCD and neig hborhood in form ation wa s obtained
from US census records at the zip code level.
Research has also suggested that individuals with ADHD employ nicotine to enhance cognitive
function (e.g. Coger et al. [1996], Levin et al. [1996] and P om erleau et al. [1995]).
Boardman and Murnane [1979] presen t a clear discussion of the model underlying education
production functions.
This model is commonly used in the economics of education literature and alternativ ely one
could include lagged measures of ach ievement in the specication. These modelling decisions place
implicit assump tion s on the eects of all previous observed and unobserved inuences in the cur-
rent period. The empirical validity of these alternative assump tion s has only recently been tested
(Ding and Lehrer [2005], Todd and Wolpin [2005]). Note that since parents ma y choose to mak e
in vestments in their c h ildren based on their health status, our estimates should be viewed as an
upper bound of the health impact on academic performan ce if the in vestment is positively related
to good health. Conver sely, if the investment is negatively related to good health, our estimates
pro vide a low er bound.
Plomin et al. [2006] and de Quervain and Pap as sotriopoulos [2006] present recent surveys
on whic h genes are believed to be associated w ith in tellig en ce and memory abilit y respectiv ely.
Researchers have found no links between sev eral of the genes in this study and either intelligence
(i.e. M oises et al. [2001]) or cognitive abilit y (e.g. Petrill et al. [1997]).
Statistically, to determ ine wheth er there were links between markers of dieren t genes w e
conducted regressions and tests for homogeneity of odds ratios to see whether possessing a given
marker increased the odds of possessing a specicmarkerforadierent gene. We did not nd any
evidence indicating a systematic relationship bet ween markers of any two of these genes.
In addition, we conducted simple linear regressions b y gene of health outcomes on discrete
indicators for possessing each allele com b ination . The regression results are available by request.
Several relationships are statistically signicant and we denote statistically dierent odds ratios
with an asterik in Table 3.
Results from one sided t-tests.
Note the high prevalance of comorbid conditions is not unique to this sample. This is a well
known empirical regularit y in the medic al literature particular ly among mental health conditions.
For example, Biederman et al. [1995] report that 70% of adults with AD HD are treated for depres-
sion at some poin t in their life.
Recall, from the scien tic literature that these disorders are believ ed to be polygenic and that
there is no unique depression or obesity or ADH D gene. Pharmaceutical companies are now in the
process of examin ing the use of nicotine patc hes to deal with ADH D . Ritalin, which is currently
prescribed to c hildren with ADHD wa s originally dev e loped as an an ti-d ep ressa nt.
The full set of estimates from the system of equations are a vailable b y request.
These studies tend to use very sm all unrepresentativ e clinical sam p les and suer from low
statistical po wer. Since it is not possible (and probably unethical) to engage in random m utations
of an individual genetic code w e argue it is best to treat genetic predispositions as a form of neural
correlates with health beha viors and health status.
To examine the robustness of our results w e ha ve no w considered t welve dierent instrumen t
sets for the equations. One set involv ed the use of the complete set of the markers in our study,
another set w as constructed based on our reading of the neuroscientic literature up to May 2005
and the remaining ten sets were constructed from stepw ise estimation using alternativ e selection
criteria. Our empirical results (a vailable upon request) are robust to the instrum ent set for the full
samp le and sub-sample of males. The statistical signican ce of the estimates of the negativ e impact