ArticlePDF Available

The Impact of Poor Health on Education: New Evidence Using Genetic Markers


Abstract and Figures

This paper examines the influence of health conditions on academic performance during adolescence. To account for the endogeneity of health outcomes and their interactions with risky behaviors we exploit natural variation within a set of genetic markers across individuals. We present strong evidence that these genetic markers serve as valid instruments with good statistical properties for ADHD, depression and obesity. They help to reveal a new dynamism from poor health to lower academic achievement with substantial heterogeneity in their impacts across genders. Our investigation further exposes the considerable challenges in identifying health impacts due to the prevalence of comorbid health conditions and endogenous health behaviors.
Content may be subject to copyright.
Weili Ding
Steven F. Lehrer
J. Niels Rosenquist
Janet Audrain-McGovern
Working Paper 12304
1050 Massachusetts Avenue
Cambridge, MA 02138
June 2006
We are grateful to seminar participants at the 2005 NBER Summer Institute, University of Toronto and
BU/Harvard/MIT Health Economics Seminar for helpful comments and suggestions. We would also like to
thank Paul Wylieto for answering our numerous questions about the data employed in the study. Lehrer
wishes to thank SSHRC for research support. Rosenquist wishes to thank AHRQ for support. We are
responsible for all errors. The views expressed herein are those of the author(s) and do not necessarily reflect
the views of the National Bureau of Economic Research.
©2006 by Weili Ding, Steven F. Lehrer, J. Niels Rosenquist and Janet Audrain-McGovern. All rights
reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission
provided that full credit, including © notice, is given to the source.
The Impact of Poor Health on Education: New Evidence Using Genetic Markers
Weili Ding, Steven F. Lehrer, J. Niels Rosenquist and Janet Audrain-McGovern
NBER Working Paper No. 12304
June 2006
JEL No. I2, I1
This paper examines the influence of health conditions on academic performance during
adolescence. To account for the endogeneity of health outcomes and their interactions with risky
behaviors we exploit natural variation within a set of genetic markers across individuals. We present
strong evidence that these genetic markers serve as valid instruments with good statistical properties
for ADHD, depression and obesity. They help to reveal a new dynamism from poor health to lower
academic achievement with substantial heterogeneity in their impacts across genders. Our
investigation further exposes the considerable challenges in identifying health impacts due to the
prevalence of comorbid health conditions and endogenous health behaviors.
Weili Ding
School of Policy Studies and
Department of Economics
Queen's University
Kingston, Ontario K7L 3N6
Steven F. Lehrer
Queen's University
School of Policy Studies
Kingston, Ontario K7L 3N6
and NBER
J. Niels Rosenquist
Health Care Systems Department
Wharton School, University of Pennsylvania
Philadelphia, PA 19104
Janet Audrain-McGovern
The Transdisciplinary Tobacco Use Research
University of Pennsylvania
3535 Market Street, Suite 4100
Philadelphia, PA 19104
1 Introduction
The discovery of the human genome, a sequence of approximately three billion chemical “letters”
that make up human DNA, the recipe of human life, is considered to be a milestone in the history
of science and medicine that might have the potential to inuence social science research. Consider
the following question that has been investigated in the psychology, education, economics, sociology
and public health literatures: Does health status aect educational outcomes? While numerous
studies report that students who are obese or depressed perform poorly relative to their classmates,
factors other than health could be responsible for this repeatedly observed, but potentially spurious
association. To credibly claim that obesity and depression have a deleterious eect on student
performance in schools one must rst overcome the inherent endogeneity when considering health
and education. Further, accurate measures of health are dicult to obtain and overcoming biases
arising from measurement error represents a second hurdle for applied researchers.
This study overcomes these challenges by considering an instrumental variables approach, where
the instruments are selected based on a growing body of evidence in several neuroscienticelds
that have identied genetic markers which possess signicant associations with specic diseases
and health behaviors. While there has long been scientic evidence suggesting that the association
between genetic factors and health is substantial,1only recently has it been possible to collect mea-
1The importance of genetic factors to behavioral characteristics and health outcomes has been
noted throughout history and the passage of physical and disease traits from parents to ospring
was rst explicitly studied and modeled by Gregor Mendel in the 19th century. Since this work
more sophisticated studies of laboratory animals as well as comparisons between monozygotic and
dizygotic twins demonstrate that behavioral characteristics and economic as well as health outcomes
sures of genetic markers. Since genetic markers are formed at conception, they are predetermined
to any outcomes including those that occur during pregnancy and at birth. Genetic markers truly
tthedenition of “nature”. Using this “nature lter”, the health variables being instrumented
will be isolated from most nurture inuences or choice-based inputs such as schools parents choose
for their kids, neighborhood families select to reside in, peers kids choose to associate with, among
other factors that threaten the identication of education production function parameters.2When
the variations in health variables that include clinical measures of depression, ADHD and obesity
are due only to the dierences in genetic coding, these variations are much less likely to be correlated
with the environments surrounding an individual, allowing us to recover consistent estimates of the
impacts of a vector of health measures on academic performance.3While our identication strategy
relies on scienticndings, the results suggest that social environments might have to be invoked in
order to understand the root of heterogeneous impacts of health on academic performance, which
were in part linked to genetic inheritance. Most recently, Cutler and Glaeser (2005) compare
the correlation of health behaviors between monozygotic and dizygotic twins and conclude that
approximately 72% of the variation in obesity and 30% of the variation in cigarette smoking are
due to genetic factors.
2This does not exclude the potential of bias from assortative matching, which we discuss in
Section 5.2.2. However, the bias is likely limited as evidence from numerous studies in evolutionary
biology indicates that mate choice is not based simply on genetic quality. Rather increasing evidence
(surveyed in Mays and Hill, 2004) suggests that mate selection is driven predominantly by genetic
diversity which is desired since it increases reproductive success.
3These impacts should be viewed as reduced form parameters and our analysis will clarify the
diculties in estimating the structural health parameter. In Section 5.2.2 we discuss issues surround-
ing identication that include intergenerational transmission, potential dynastic eects, assortative
matching and ideal data requirements. We also discuss how using genes as instruments to identify
the impacts of health oers several benets over alternative empirical approaches, most importantly
we can directly test the identifying assumptions.
seems to place the question squarely back under the realm of social sciences.
Specically, our empirical identication strategy is based on a large body of evidence in several
elds that explain the role of specic genes in the operation of a region of the brain along the medial
forebrain bundle which is responsible for reward and pleasure.4This region is distinct from those
that are known to process and retain knowledge. Evidence that dierent regions of the brain are
activated (or correlate) with dierent economic decisions has been found using fMRI technology in
a study of intertemporal choices (McClure, Laibson, Loewenstein and Cohen, 2004). The growing
evidence in the biomedical literature that presents a signicant association between certain genes
in this reward system with particular health behaviors and health status such as smoking, alcohol
usage, obesity, ADHD, depression and schizophrenia can not be denied.
It is worth stating explicitly that the goal of this analysis is not to report a causal link between
genes and health broadly dened. While we exploit the strong neural correlations between a set of
genetic markers and certain health outcomes and behaviors, we do not wish to delve into the often
complicated and sometimes controversial debate on how genes aect behavior. For example, the
popular press is occasionally lled with stories on the discovery of a gene that specically codes for
obesity or depression that are often quickly refuted by medical authorities.
4This evidence summarized in Section 2 suggests that possessing the genetic markers considered
in our study indeed increases the sensitivity of individuals being diagnosed with certain health
disorders. Second, there is no detectable evidence that the markers we consider are correlated with
other genetic factors that associate with either innate ability or the development of intelligence.
Note,wearenotrulingoutthepossibilitythatthesegenesaect outcome measures of intelligenece
but rather we are assuming that these genes neither directly enter or correlate with the genes
directly involved in the education production process.
This study extends the burgeoning literature in economics that seeks to explain the strong
correlation between education and health in three directions.5First, we present empirical evidence
on a causal link running from health to academic performance. Due to biases associated with
omitted variables, few studies have either empirically estimated the causal impact of health on
education outcomes6nor focused on mental health conditions despite evidence that their incidence
is substantially larger than physical disorders in adolescence.7Exceptions include Currie and Stabile
(2005) which presents evidence from sibling xed eects regressions that the negative impacts on test
scores and educational attainment from a specic mental disorder, hyperactivity are quantitatively
larger than those from physical health limitation. Behrman and Lavy (1998) as well as Glewwe
and Jacoby (1995) use market instruments such as prices for health. They respectively nd that
the impact of child health on cognitive achievement varies as a function of the assumptions made
concerning parental choices and that much of the impact of child health on school enrolment proxies
for unobserved variables. Using an experimental approach, Kremer and Miquel (2004) overcome
5This correlation has been explained in three ways that are not necessarily mutually exclusive.
The rst hypothesis is that education increases health through productive or allocative eciency
(Grossman, 1972, Kenkel, 1991). The second hypothesis is the converse that poor health results
in little education (Perri 1984, Currie and Hyson 1999). Finally, others have suggested that this
correlation could be caused by a third unobserved variable (e.g. discount rate) that aects both
education and health (Fuchs 1982).
6Grossman and Kaestner (1997) note that the majority of the empirical literature reports cor-
relations and focuses on the eect of education on health. Strauss and Thomas (1998) present a
survey of the literature on the relationship between health and income.
7Currie (2005) provides details and points out that the 1999 U.S. Surgeon General’s Report on
Mental Health stated "approximately one in ve children and adolescents experiences the signs and
symptoms of a DSM-IV disorder during the course of a year".
the omitted variable bias problem by randomly assigning health treatments to primary schools in
Kenya. Their analysis displays a mixed picture as improved health from the treatment signicantly
reduced school absenteeism but did not yield any gains in academic performance.
Second we take a close look at empirical measures of health. The dynamic relationships between
health disorders and health behaviors revealed through our analysis clearly present a major empirical
challenge. This challenge has not been claried earlier since the majority of the literature linking
health to education focuses on a single measure or proxy of an individual’s health such as birth
weight due to data limitations.8Since an individuals’ health consists of many physical and mental
health measures including standing heart rate, blood pressure, mental clarity, etc. that constitute
a rich vector which not only would be dicult to convert to a single index, but would such a single
index exist it is unlikely to be well proxied by measures such as BMI or birthweight.
Third, we make a clear separation of health outcomes from health behaviors. This distinction
is not apparent in earlier empirical studies which estimate equations derived from models that
either exclusively treat adolescents as a "child" whose parents make all her health and education
choices or indistinguishable from "adults" that make all the decisions by themselves. In contrast, we
introduce a model that treats adolescents as "adolescents" since they only make a subset of all the
decisions. For example, we postulate that a teenager would make decisions such as whether or not
to smoke or have sex, while their parents make important human capital investment decisions such
as which neighborhood to reside in, which school their child should be sent to, the type of health
8For example, see Behrman, Rosenzweig, and Taubman (1994), Currie and Hyson (1999),
Behrman and Rosenzweig (2004) or Almond, Chay and Lee (2005).
insurance to purchase and number of visits to health care providers. This hybrid in decision-making
health behavior (a control variable) that are treated as equivalent in the earlier literature. Since
health behaviors only explain a very limited amount of the variation in health status, they are
poor proxies for health status (which increases biases due to endogeneity) since they may reect
non-health preferences such as the type of peers. Further some health behaviors result from rather
than cause certain particular health state, which has important policy implications. For example,
adolescents may decide to smoke since the nicotine in cigarettes may help self-medicate against
craving for food or some mental illnesses. Accounting for the pathway between health status and
health behavior is necessary for proper interpretation of our coecient estimates and could reveal
their dynamism that has been understudied in earlier work.
Our empirical analysis reaches four major conclusions:
1) Genetic markers show a great deal of promise as a set of instrumental variables. The markers
and their two by two polygenic interactions that we consider are strongly associated with each health
behavior and status in the study. Moreover, statistical tests demonstrate that these instruments
only aect academic performance through the health outcomes.
2) The impact of poor health outcomes on academic achievement is substantial. Depression
and inattention both lead to a decrease of 0.5 GPA points on average, which is roughly a one
standard deviation reduction. However, there is substantial heterogeneity in the impact of health
on academic performance across gender. The academic performance of female students is strongly
and negatively aected by poor physical and mental health outcomes. The estimated magnitudes
are substantially smaller for male students and not a single poor health condition has a statistically
signicant impact.
3) To accurately estimate the impact of health status, it is important to account for endogenous
health enhancing or health deteriorating behaviors. We nd that treating the stock of lifetime
smoking as exogenous leads to substantially dierent impacts of adverse health status on education.
Cigarette smoking is endogenous and we nd that accounting for this choice reduces the negative
impact of depression inattention and ADHD by over 50% for the full sample and females. In
addition, ignoring the endogeneity of smoking leads to obesity being positively and signicantly
related to achievement for males.
4) The presence of high comorbidity of health disorders is striking, thus the importance of
accounting for it. Comorbidity is dened as having two or more diagnosable conditions at the
same time. For example, research has suggested that between 50 to 65 percent of children with
ADHD have one or more comorbid conditions such as depression (Pliszka et al., 1999). Unless the
exogenous genetic or environmental factors can be clearly disentangled between these disorders,
estimating the causal impact of one disorder in the absence of related health states may not provide
accurate results. In our analysis, we estimate a large and signicant positive impact of obesity on
for the full health vector. Further, the signicant impact of hyperactivity (HD) changes signs when
one controls for the full vector of health states. Since many individuals suer from more than one
disorder, ignoring related illnesses may lead to some misleading conclusions.
The rest of the paper is organized as follows. In Section 2, we provide an overview of the
scientic literature linking genes to health behaviors and health outcomes. An overview of the data
we employ in this study is provided in section 3. The framework that guides our understanding
of how education and health interact in adolescence is described in section 4. We discuss the
identication strategy and estimating equations in this section. Our results are presented and
discussed in Section 5. A concluding section summarizes our ndings and discusses directions for
future research
2Scientic Primer on Genetic Markers
Since it was not possible to collect data on genetic markers, empirical researchers in the social
sciences traditionally chose to either ignore or assumed the unobserved heterogeneity conferred
byvariationingeneticinheritanceisxed over time for the same individual or across siblings or
twins. Yet recent advances in elds of molecular and behavioral genetics, most notably through the
decoding of the human genome (Venter et al., 2002) permits researchers to elucidate how dierences
in the genetic code correlate with dierences in specic behaviors or outcomes across individuals.
While researchers were able to identify the genetic code for a number of inherited traits and diseases
such as eye color, cystic brosis, and Huntingdon’s disease, most products of inheritance have been
found to be polygenic, caused by the interaction of numerous genetic markers. The health outcomes
and behaviors we consider are thought to be polygenic with researchers associating approximately
160 genes with obesity (Perusse et al., 2005) and 42 genes with ADHD (Comings et al., 2000). For
these disorders researchers have focused their attention on genes involved in the reward pathway of
the brain. This pathway is closely linked to primal drives such as feeding and sex, and has been
shown to have a powerful eect on decision making among higher mammals including humans.
For example, in a well-known study (Olds, 1956), rats that were given the choice of food versus
stimulation of their reward system by electrodes ended up starving to death rather than lessening
the stimulation of their pleasure center.
Since the reward system of the brain has been found to be closely linked to numerous human
activities such as addiction much research has focused on how variation in dierent components of
the pathway might make an individual more or less predisposed to addiction. In general, this system
operates when activities such as feeding or sex are undertaken. A region of the brain known as the
ventral tegmental area (VTA) is activated and neurons (brain cells) in the VTA release signaling
molecules known as neurotransmitters (in this case dopamine9) to another area of the brain known
as the nucleus accumbens (NA). These signals pass through the synapses (small gaps separating
neurons) until they eventually reach the frontal cortex, where most “decisions” are made. Increases
in the synapse of either neurotransmitters or receptor neurons for them allow for a much stronger
signal to be sent.10 Since the response of these neurons to nicotine and other substances has been
9Dopamine has been called the “pleasure” chemical of the brain because people who are electri-
cally stimulated in the limbic dopaminergic centers of the brain report intense feelings of well-being
and sometimes orgasm.
10Certain food and drugs such as nicotine or caeine can have an especially powerful eect on the
shown to vary between individuals, it has been hypothesized that genetic dierences could explain
why dierent individuals report dierent levels of “highs” when smoking cigarettes, which is the
underlying idea of having a genetic predisposition. In addition, since the VTA-NA pathway is
important in regulating pleasure and, therefore, emotion, a number of behavioral traits including
depression and ADHD have been linked to this pathway.
The particular genetic markers included in this study were chosen based upon a large and growing
body of research showing a correlation between their variation and traits such as smoking behavior
and depression, controlling for other relevant factors. These markers include the, i) Dopamine
Receptor D2 locus (DRD2), ii) SLC6A3 locus (DAT), iii) Tryptophan hydroxylase locus (TPH) and
iv) CYP2B6 locus (CYP). Each person inherits from each parent, a single copy known as an allele
for each marker. Alleles can dier by the particular building blocks, or base pairs, that make up
all DNA or the number of repeats, or base pairs in a row that repeat themselves. An individual
whoinherits2ofthesame(dierent) allele is considered to be homozygous (heterozygous) for that
marker. Specically, correlations between dierent allelic combinations also called polymorphisms
and variation in behaviors and outcomes are studied to assess predispositions.
brain, including those in the VTA. The D2 receptor is one of at least ve physiologically distinct
reward center of the brain as they mimic or potentiate the eects of neurotransmitters that occur
there naturally. This process is often described as a molecular “hijacking” of the reward path-
way. For example, nicotine has been shown to increase levels of synaptic dopamine by stimulating
dopamine release in the VTA (Di Chiara and Imperato, 1988) and inhibiting dopamine reuptake in
the reward pathway (Carr et al., 1992).
dopamine receptors (D1-D5) found on the synaptic membranes of neurons in the brain. The DRD2-
A1 allele has been associated with a reduced density of dopamine receptors.11 Several researchers
postulate that the reduced density of dopamine receptors explains the higher associations individuals
with DRD2-A1 alleles (A1/A1 or A1/A2) have with compulsive and addictive behaviors including
smoking, depression and obesity, relative to individuals with two DRD2-A2 alleles.12
synaptic levels of dopamine in the brain.13 Variability in the length of the DAT gene is believed to
positively inuence levels of the reuptake protein in the brain.14 Individuals with shorter variants of
the SLC6A3 gene have diminished dopamine reuptake and greater availability of synaptic dopamine.
Since there is more synaptic dopamine it has been suggested that these individuals receive smaller
benets from substances that stimulate dopamine transmission.
The tryptophan hydroxylase gene (TPH) is a member of the serotonergic neurotransmission
system and plays a crucial role in the regulation of mood and impulsivity. This particular gene
is involved in the biosynthesis of serotonin, another neurotransmitter that operates in conjunction
11This nding was rst reported in Blum et al. (1991).
12See Audrain-McGovern (2004) and Epstein et al. (2002) and the references within for evidence
on these associations.
13Bannon, Granneman, and Kapatos (1995) present an overview of the SLC6A3 gene. The
SLC6A3 gene has been implicated in Parkinson’s disease (Seeman and Niznik (1990)), attention
decit disorder (Cook et al. (1995)), and Tourette’s syndrome (Connors et al. (1996))
14The length is associated with the number of variable tandem repeats on each marker. Each
repeat increases the amount of reuptake protein. The majority of individuals have SLC6A3 alleles
with lengths of 9 or 10 base pairs, where the length is positively associated with levels of DAT
protein. Note the SLC6A3 loci may also take the form of 7- repeat, 8-repeat, 11-repeat or 12-
repeat; each of which is extremely rare in both the population and our sample.
with the brain’s reward system. Serotonin activity has been linked to a number of behavioral and
physical conditions including depression, appetite, and addictive behavior.15
The CYP genes as a group code for enzymes present in various body organs, primarily the liver
which break down a number of drugs and toxins, including nicotine. Polymorphisms of the CYP2A6
gene in particular have been linked to across population dierences to smoking, alcoholism, and
response to anti-depression medications. 16
Finally, dierent allelic combinations when interacted can potentially have powerful eects. For
example, the level of endogenous synaptic dopamine depends not only on the amount of dopamine
released but also on the number of receptors that dopamine can bind to (proxied by the DRD2 gene)
aswellastheamountofreuptakeprotein(proxiedbythelengthoftheSLC6A3allele). Similarly,
one could imagine that the rate of metabolism determined by the CYP2B6 gene interacts with both
the TPH and DRD2 genes.
This paper uses data primarily from the Georgetown Adolescent Tobacco Research (GATOR) study.
GATOR is a unique longitudinal data set of adolescents that combines information from a series
of 5 questionnaires given over four years of high school (1999-2003) along with the four genetic
markers described in the preceding section.
15See Lucki (1998) for evidence of these associations.
16See Lerman et al. (2001, 2003) for a discussion.
The study began in 1999 when researchers selected ve high schools from the same county in
Northern Virginia.17 The county contains over 950,000 residents and is one of the most auent in
the US with a median household income of $70,000 in 1995.18 School administrators provided the
names and mailing addresses of the complete 9th grade class roster of students for each of these
schools. To recruit study participants project information packets which included an explanatory
cover letter from the school principal, consent forms, and a brief demographic/response form were
mailed to 2120 students’ homes.19 To increase participation rates, up to three waves of mailings
were sent and telephone calls were placed to encourage parents to respond. Of the 72% of the
parents/guardians (1533 of 2120) who responded to the mailings, three quarters (1151) provided
written consent for their adolescent to participate in the study. 99% of the 1151 adolescents who
had parental consent to participate provided assent themselves.20
Biological samples were collected using buccal swabs from which DNA was extracted via standard
phenol-chloroform techniques. DNA was extracted from buccal cells to avoid a selective exclusion
of subjects with blood and injection phobia. Since the method to genotype varies across markers
17A total of 21 high schools exist in this county. Using data from the NCES CCD we did not
nd any signicant dierences in student demographics or standard school input measures between
schools included and excluded from the sample.
18The average household income is twice that of the nation and only 8.7% of households had
incomes below $25,000 in 1995.
19Students who the principals indicated special class placement, such as a severe learning disability
or diculty speaking and understanding the English language were excluded from the study. In
total 273 students or 11% of the total population were excluded.
20See Audrain et al. (2002) for more details regarding the data collection. We compared the
students who consented to the school population using data from the NCES Common Core of Data
and found no signicant dierences in race and gender.
dierent assays were conducted.21
In all assays, 20% of the samples were repeated for quality control. Quality control procedures
included positive and negative controls with each assay and independent repeat genotyping for 20%
of the results. The rate of discordance was less than 5%, and ambiguous results were not reported.
In total, full genetic information was obtained for 1032 subjects.
The GATOR study also contains basic information on demographic characteristics (i.e. race,
gender, etc.), academic performance as measured by GPA (waves 3-5 only), reports on physical
activity and information on smoking activity by family and residence members. In the initial
survey this information was collected during mandatory grade 9 health and physical education
classes. These surveys were administered by a GATOR stamember to students who provided
assent. Participants received $5 gift certicates to media stores to acknowledge their time and
participation in this study.
The participants were resurveyed in the fall and spring of the 10th grade and in the spring of
the 11th and 12th grade, for a total of ve data collection waves. The rates of participation at the
21For example in conducting SLC6A3 genotyping the following assay was conducted. DNA (25
ng)wasmixedwithprimers(20pmol),GeneAmpPCRbuer (10 mM tris-HCl pH 8.3, 50 mM KCl,
1.5 mM MgCl2 , and 0.0001% gelatin; Perkin Elmer, Norwalk, CT), Amplitaq DNA polymerase
(2.5 μ; Perkin Elmer, Norwalk, CT), and 2’-deoxynucleotides-3’-triphosphates (144μM ;Pharmacia,
Piscataway, NJ) in 50-μl total volume. The reaction conditions included an initial melting step
(940C; 4 min) followed by 35 cycles of melting (940C; 1 min), annealing (650C; 1 min), and extending
(720C; 1 min). The VNTR repeat was then determined with a 4% agarose gel electrophoresis (3:1
nusieve:agarose). The authors would be happy to provide full details on the assays for the other
markers by request. Note each assay was validated by conrming a polymorphic inheritance pattern
in seven human family lines encompassing three generations.
follow-ups from baseline were about 95%, 96%, 93% and 89% respectively. Similar to the initial
data collection, surveys were completed during a classroom common to all students in the presence
of a member of the research team.22
Students were identied on the completed survey by an identication number and during each
wave a member of the research team read aloud a set of instructions, emphasizing condentiality to
promote honest responding, and encouraged questions if survey items were not clear. To minimize
missing data, make-up days were scheduled for those adolescents who were absent during the regular
survey administration. Further, surveys were mailed to the homes of students who had either
switched schools or dropped out of school.
The GATOR data contains numerous questions on health and health behavior. Each survey
contained standard epidemiological questions related to self-reported experimentation with, and
current use of, cigarettes. Each participant who reported having smoked a cigarette provided addi-
tional information on both recent and lifetime cigarette use. From this information, we constructed
two variables that represented whether an adolescent was currently smoking cigarettes and years
of being a cigarette smoker. A current smoker was dened as having smoked a cigarette within
the past month and over one hundred cigarettes over the lifetime. Using this information on being
a current smoker with self-reported smoking histories we constructed a conservative measure of
number of years of smoking.
22Students without parental consent completed classroom assignments during the administration
of these surveys. Classroom teachers and school administrative personnel did not participate in the
survey portion of the research, nor were they permitted to view participants’ responses.
With the exception of the survey in the fth wave, participants completed The Center for Epi-
demiologic Studies-Depression Scale (CES-D), a 20-item self-report measure of depressive symp-
toms. Items on the CES-D are rated along a 4-point Likert scale to indicate how frequently in the
past week each symptom occurred (0 = never or rarely; 3 = very often). The sum of these items
is calculated to provide a total score where higher scores indicate a greater degree of depressive
symptoms. To determine whether an individual may be depressed, we followed ndings from ear-
lier research with adolescent samples (Roberts, Lewinsohn, and Seeley, 1991) who suggest using
gender and age appropriate dichotomous cutoscores (>24 for female adolescents, >22 for male
adolescents) to ascertain the presence of clinically signicant levels of depressive symptoms.
The Current Symptoms Scale-Self Report Form (CSSF), a well-standardized, 18-item self-report
measure were used to assess symptoms of Attention-decit/hyperactivity disorder (ADHD) from
DSM-IV (Barkley and Murphy, 1998) in the second wave survey.23 This form allows participants to
rate their recent behavior regarding how often they experience symptoms of inattention (9 items)
and hyperactivity-impulsivity (9 items) on a 4-point Likert scale (0 = never or rarely; 3 = very of-
ten). Typical diagnostic criteria (endorsement of at least moderate severity on at least six symptoms
from either the inattention or hyperactivity-impulsivity subscale) was used to determine the likely
presence or absence of clinically signicant ADHD symptoms. In the nal wave of the GATOR
23Barkley and Murphy (1998) describe the scoring algorithmn. The American Psychiatric Asso-
ciation denes ADHD as a heterogeneous neurobehavioral syndrome that begins in childhood and
is applied to individuals who display developmentally inappropriate levels of attention problems
or hyperactivity-impulsivity, along with impairments in functioning at home, school, or in social
settings. It is important to state explicitly that we are not focusing on diagnosed cases but rather
on responses to questions which are used to construct a diagnosis known only to researchers.
survey participants provided self reports of their height and weight. These measures were used to
construct body mass index and we applied standard denitions for being obese (BMI>30).24
In total we have information on academic performance as measured by GPA (collected in waves
3-5 only), genetics, health outcomes and health behaviors for 893 study participants. Approximately
90% of these students (807 students) completed the survey in all three years. The top panel of Table
1 presents summary statistics of the time invariant characteristics of the 893 participants in our
study. The sample is predominately Caucasian and the largest minority population are Asians. The
percentage of African Americans and Hispanics in the student body of the schools in our sample
vary between 2.07% to 12.20% and 5.54% to 19.3% respectively. The overall sample’s AD and HD
subscale averages fell within standard ranges (inattention mean = 5.9; hyperactivity/impulsivity
mean = 6.6) for adolescent samples. Over 40% of the students report that at least one of their
parents was either currently smoking or was an active smoker during their childhood.
The bottom panel of Table 1 presents information on time varying controls and outcomes.
Neither GPA nor percentage of students who have a household member that smokes have any
substantial change in summary statistic over the three years when GPA was collected. In contrast,
the number of individuals who currently smoke and have tried smoking rises rapidly during this
period. The percentage of daily smokers in the 10th grade and 12th grade is similar to national
averages calculated using the NELS88 (Miller, 2005). The percentage of depressed adolescent in our
sample is slightly higher than the 1999 estimate of the fraction of the adolescent population being
24We examine the robusteness of our results to alternative cutos for obesity, ADHD and depres-
clinically depressed (12.5%) from the U.S. Department of Health and Human Services. Summary
statistics on one year lagged smoking and depression are included since we use these predetermined
measures in our empirical analysis. Similar to our ADHD measure, we need to use predetermined
variables since one could postulate that the answers from the psychological questionnaires used to
diagnose these conditions could be inuenced by current academic performance or another factor
which simultaneously aects responses and current academic performance (e. g. divorce).
The GATOR data also contains information on smoking patterns and smoking history within
the household and across a complete set of family members. Finally, we supplemented the original
data with information from other sources to improve measures of the students’ neighborhood and
4.1 The Dynamics From Health to Education
In this section, we present a three-stage model that guides our empirical analysis. The rst two
stages of our model incorporate elements from three competing theories in three distinct disciplines
that explain the heterogeneity in health behaviors across individuals. Economics contributes the
standard model of health investment (starting with Grossman, 1972). This model postulates that
individuals make inter-temporal decisions trading oimmediate satisfactions for future benets.
25Data at the school level was obtained from the CCD and neighborhood information was obtained
from US census records at the zip code level.
Dierent time discount factors and value of life could result in dierent health choices. Psycholo-
gists claim that the heterogeneous health behaviors arise from dierent environment or situational
factors that individuals encounter. Natural scientists hypothesize that genetic variations in single
or multiple genes are associated with health dierences across the population.
Stage 1, at the beginning of period T(T0),adolescents choose whether or not to (continue to)
engage in a risky behavior such as smoking, drinking alcohol or using narcotic drugs given their
demographics, discount rates, the value of life, genetic markers and home and school environments
as well as their current health status (HiT 1). Adolescent iat time T0chooses action or behavior
kif the immediate satisfaction it provides exceeds the aggregation of the current cost and the
perceived future cost to her. The immediate satisfaction that adolescent iderives from action k
could be aected by her current health status26 and their genetic predispositions. The immediate
cost of taking action kincludes both pecuniary parts such as price of cigarette and non-pecuniary
partssuchashowdicult it is to take action k. For instance a teenager may face obstacles in
acquiring cigarettes or narcotic drugs that can be measured as time spent. The obstacle faced
are determined by neighborhood, school and family environment inputs. For example, increased
parental monitoring might make cigarette smoking more costly; a drug infested neighborhood might
make drug usage less dicult. The perceived future costs usually depend on the discount rates and
the value of life, which may vary with current health status (healthy people are more patient in
general) and genetic predispositions. Since the data contains no information on this matter, wlog
26Research has also suggested that individuals with ADHD employ nicotine to enhance cognitive
function (e.g. Coger et al. (1996), Levin et al. (1996) and Pomerleau et al. (1995)).
we assume a non-binding monetary budget constraint for ease of exposition. As a result adolescent
i’s choice of kis a function of the market price for kthat’s available to i(pk) and the health status
at time T0(HiT 1),given i’s endowed predisposition to taking action k–thatis,thesetofgenes
(Gk)associated with kand the environment variables that are included in the matrix X1iT .
kiT =k(X1iT ,H
iT 1,pk,G
iT )(1)
where k
iT captures an independent random shock. This model can be easily generalized to treat k
as a vector of behaviors that are either health enhancing (i.e. proper diet and regular exercise) or
health deteriorating (i.e. smoking and drinking).
Stage2,attimeT1, altruistic parents select a level of health input liT for adolescent i, given
the teenager’s observed health behaviors
KiT (not necessarily equal to kiT )at the beginning of
this period and revealed health status HiT 1, that provides the highest indirect utility for their
household Vl
iT :
iT ViT (X2i,C
lT ,H
iT 1,
KiT ,G
i),for each lavailable to i0s family (2)
where X2iare person-specic and environmental characteristics of the child i;ClT isthecostof
health input lat time Twhich include the cost of insurance payment and the wage-rate forgone
when taking care of child i’s sickness etc.; and GH
iis a vector of genetic markers that provide
endowed predispositions to the current state of health status.
For empirical identication, the set of genetic markers, home and school environments that
impact health outcomes are not identical to those that determine health behaviors. Given the
history of health behaviors chosen by adolescent iand the health inputs chosen by i’s parents,
health production functions translate these elements into a vector of health outputs as follows
HiT =g(X2iT ...X2i0,k
iT ...ki0,l
iT ...li0,GH
iT ...H
where X2iT ...X2i0,k
iT ...ki0,l
iT ...li0and H
iT ...H
i0are the full history of individual and environmental
characteristics, health behaviors, health inputs and independent random shocks to health production
respectively. Child i’s initial health stock at the start of life is represented by Hi0.
We assume here a display of single-mindedness in parental preference on child health. That is,
it,)if H1
it >H
We also assume a discrete set of health input levels (i.e. health insurance packages) all well within
the budget constraint. By this, we leave out the extreme cases that parents have to choose between
putting enough food on the table and paying the kid’s medical and insurance bills. Since our data
has no health input information, this assumption places no constraints on the estimation equations.
Under these two assumptions, parents will always choose lthat leads to the highest possible level
of health for child i.
Stage3,attheendofperiodT,T2, parents choose a set of education inputs (i.e. school quality,
employing tutors, etc.) based on the health status of their child. Parents select among these inputs
the optimal school jfor child iwhich provides the highest indirect utility for their household V
ij ,
Vij Vij (X3i,C
iT 1,I
i),for each javailable to child i(5)
where X3iare observable person-specic and family characteristics of the child i;Cjis the cost of
attending school j, which include the cost if living in a good school district; Qjis school-specic
characteristics; AiT1indexes child i’s measured achievement at the stage of decision making; and Ii
is child i’s innate abilities. The availability of schools to a child is described by the school admission
rules in the local areas where parents can commute to work daily.
Conditional on the selection of school jin the third stage, the standard education production
model states that child iin school jat time Tgains human capital as measured by a score on an
achievement test or report card. The general conceptual model depicts this level of achievement
AijT to be a function of the full history of family, community, school inputs and own innate abilities.
These variables interact with each other in a nontrivial, unknown way. This general model expresses
current achievement over time as
AijT =f(Xe
iT ...Xe
iT ,I
i,iT ...i0)(6)
where Xe
it is a vector of community variables, individual and family characteristics in year t,Qjt is
a vector of school characteristics, Iiis a vector of unobserved heterogeneity including such factors
as student innate abilities, parental tastes, determination, among others and (iT ...i0)are the full
history of independent random shocks assumed to have zero mean and no serial correlation.27
There are three popular explanations put forth in the health economics literature for the ob-
served positive relationship between health and education. The rst model considers education an
27This model underlies education production function studies was rst discussed in Boardman
and Murnane (1979).
investment in the future as paying large dividends the longer one lives, thus incentivizing individuals
to stay healthy and live longer (Becker, 1993). The second model postulates that education is a
critical component in a health production function, thus, educated individuals are better equipped
to stay healthy (Grossman, 1972). The third explanation suggests that the relationship exists be-
cause both health status and education are directly related to an unobserved variable such as time
discounting (Fuchs (1982)) or one’s family background (Rosenzweig and Schultz, 1983). However,
there’s no formal economic model postulating how health enters into the education production
process as an input. As a result, we hypothesize below the possible channels under which health
status (HiT ) potentially aect education.
First, it may aect the physical energy level of a child which determines the time (including
classroom attendance and after school educational activities) that can be used for learning. For
example, obesity has been found to be the largest determinant of absenteeism (Schwimmer et al.,
2003). Second, it aects the child’s mental status that may have a direct impact on academic per-
formance. For example, obesity may cause low self esteem which leads to classroom disengagement
that may reduce academic performance. Other health status such as being diagnosed with ADHD
or clinical depression may directly aect a child’s attention span, which adversely aects her acad-
emic outcomes. Third, a child’s health status may aect the way her teachers, parents and peers
treat her; this in part shapes the learning environment that she encounters. For example, obese
children are often less popular among their peers and teachers. Depressed children are associated
with personal distress, and if the state lasts a long time or occur repeatedly, they can lead to a
circumscribed life with fewer friends and sources of support (Klein et al., 1997). The rst two
channels directly aect own health input (both physical and mental) in the education process while
the latter scenario inuences a child’s education outcome through other inputs such as peer quality
and teacher attention that is the result of a certain health status.
Ideally we would like to disentangle the eect of obesity on education (the structural parameter)
from that which is due to the impact of the environment resulting from being obese. If parents,
schools or peers are responding to negative health outcomes by increasing investment into other
inputs this may oset the deleterious eects of poor health on achievement. Conversely the response
of these individuals could move in a direction that reinforces the deleterious impact of health such
as discrimination. For example, parents may decide not to invest or invest less in a child’s education
due to observed health status of their child. Since our data lacks information on family and school
inputs as well as peers, we will obtain a combined (reduced form parameter) impact of health on
4.2 The Estimating Equations
Linearizing the achievement relationship (equation 6) yields
AijT =β0T+β1TXe
iT +β2THijT+β3TQjT+β4TIi+(
it+α3tQjt+α4tIi+δit)+iT (7)
where δit =α5tεit for some coecient α5t. The components of equation (7) may include higher
order and interaction terms. We re-express the achievement function as
AijT =β0+β1XiT +β2HiT +β3QjT +˜
iT (8)
where the vector Xcontains individual characteristics (gender, race, residential smoking status),28
the vector His a vector of variables that captures current predetermined health measures.29 Sim-
ilarly we linearize and generate equations for both the health production function in equation (3)
and the decision to engage in health behavior equation (1) as follows:
HiT =γ0+γ1XiT +γ2kjT +γ3GH
iT (9)
kiT =δ0+δ1XiT +δ2HiT +δ3Gk
iT (10)
Instrumental variable methods are used to estimate the above system of equations ((8) - (10)) to
generate consistent estimates of the causal impact of health on education (β2).Ouridentication
relies on the assumption that the vectors of genetic markers that impact health behaviors (GH
are unrelated to unobserved components of equation(9). Whilethereisabsolutely no evidence for
28Since parents may choose to make investments in their children based on their health status,
our estimates should be viewed as an upper bound of the health impact on academic performance
if the investment is positively related to good health. Conversely, if the investment is negatively
related to good health, our estimates provide a lower bound.
29This model is commonly used in the economics of education literature and it implicitly assumes
that the eect of all previous observed and unobserved inuences are zero in the current period.
The empirical validity of this assumption has only recently been tested (Ding and Lehrer (2005),
Todd and Wolpin (2005)) who each nd support for it with school but not home inputs. This model
was elected since our data lacks information on home inputs.
the former assumption that the markers considered in this study have any impact on the education
production process, it remains possible.
5.1 Basic Patterns in the Data
5.1.1 Do people win or lose the genetic lottery?
Understanding the relationship between the genetic markers in our study provides support for
our identication strategy by demonstrating that there is substantial unique variation from these
markers and their interactions. Summary information on the genetic markers in our data is provided
in Table 2. The DAT genotypes are classied with indicator variables for the number of 10-repeat
alleles (zero, one, or two). We include indicator variables for the available AA, AC and CC genotypes
of the TPH gene. Similarly, the DRD2 gene is classied as A1/A1, A1/A2 or A2/A2. Finally, we
include indicator variables for the available CC, CT and TT genotypes of the CYP gene. The
rst column of Table 2 provides the raw number of individuals who possess each particular marker.
Excluding the TPH gene, the majority of individuals in our data are homozygous for A2/A2 (of
the DRD2 gene), CC (of the CYP gene) and have two ten repeat alleles of the DAT gene. For
each of these genes the heterozygous combination is the next most populated and the remaining
homozygous combinations of the CYP and DRD2 genes are rarest. For the TPH gene there is nearly
an equal number of people who possess either the heterozygous AC or homozygous CC combination.
The entries in the remaining columns of Table 2 indicate the number of people in each row
that also possess one of the rare allele combinations of the other genes along with the conditional
probability of possessing this combination. Each cell in the table is populated with at least two
individuals and there does not exist any systematic relationship between the dierent genetic poly-
morphisms.30 Thus, having a rare polymorphism for one gene does not make it more likely that
you would have a rare polymorphism in another gene. These results are encouraging as they do not
lend support to correlations between markers of dierent genes.
5.1.2 Candidate Genes for Adolescents
To justify our four sets of genetic markers and two by their polygenic interactions to explain health
behavior and status we begin by examining whether there are dierences in health measures between
individuals with dierent genetic markers. Table 3 presents information on summary measures for
each genetic marker. That is, each cell contains the conditional mean, standard deviation and odds
ratio of alternative health outcomes for individuals that possess a particular marker.
For each genetic marker, there exists a substantial dierence in the occurrence rate of at least
one of the health outcomes and behaviors.31 Individuals with the AA polymorphism of the TPH
30Statistically, to determine whether there were links between markers of dierent genes we con-
ducted regressions and tests for homogeneity of odds ratios to see whether possessing a given marker
increased the odds of possessing a specicmarkerforadierent gene. We did not nd any evidence
indicating a systematic relationship between markers of any two of these genes.
31In addition, we conducted simple linear regressions by gene of health outcomes on discrete
indicators for possessing each allele combination. The regression results are available from the
authors by request. Several relationships are statistically signicant and we denote statistically
dierent odds ratios with * in the Table.
gene have 50% and 20% higher propensities (relative to other TPH markers) for smoking and
obesity respectively. For the CYP gene, those with the rare TT polymorphism are more than 85%
more likely to be diagnosed with inattention (AD) and hyperactivity (HD), while those with the
common CC marker are at least 50% more likely to be obese. For the DRD2 gene, individuals
with the common A2A2 allele are substantially less likely to be diagnosed as depressed or obese
relative to the other DRD2 markers. Among the DAT gene, individuals with one 10-repeat (DAT1)
independently have both higher rates of being diagnosed with ADHD and lower rates of depression.
Individuals that have no 10- repeats (DAT0) are associated with slightly higher smoking rates.
These results clearly demonstrate that the four sets of genetic markers have statistically signicant
associations with our health measures.
5.1.3 HealthandEducationOutcomesinAdolescence
The well known positive association between good health and educational outcomes is also observed
in the data. As indicate din Appendix Table 2, individuals diagnosed with ADHD, depression and
obesity respectively have on average GPA scores that are 0.26, 0.18 and 0.43 lower than their
counterparts. These dierences are statistically signicant (one sided t-tests). The raw GPA gap of
individuals with ADHD or obesity relative to those not diagnosed increases between grades 10 to
12 by approximately 20%. While the gap between depressed and non-depressed children does not
vary through grades, cigarette smokers close their GPA gap with non-smokers from 0.58 in grade
10 to 0.49 in grade 11 and 0.37 in grade 12. This is somewhat misleading as numerous individuals
start smoking over time. These new smokers have substantially higher GPA scores than long-term
smokers. Between grade 10 and grade 12 long-term smokers consistently have GPA scores that are
approximately one half point lower relative to non-smokers.
Not only do smokers have lower GPA scores but they also have a higher propensity of being
diagnosed with negative health status. Individuals with each health disorder are signicantly more
likely to be smokers at the 1% signicance level.32 The largest gaps occur for individuals diagnosed
with either inattention or ADHD whose smoking rate is over 250% higher than the remaining
population (33% of individuals with ADHD smoke versus 13% of the remaining individuals and 39%
of individuals with AD smoke versus 12% of the remaining population). The propensity to smoke is
twice as high among adolescents with hyperactivity (HD) relative to those not diagnosed with this
disorder. Lastly, adolescents diagnosed as obese or depressed are associated with approximately
50% greater smoking propensities versus the remaining sample.
A major statistical challenge in accounting for these health outcomes is the presence of comorbid
conditions. Comorbid conditions, or comorbidities, are conditions that happen to occur at the same
time. For example, Biederman et al. (1995) report that seventy percent of adults with ADHD are
treated for depression at some point in their life. Table 4 presents some summary information on
the presence of comorbordities in our full sample.33 Column1ofTable4displaysthenumberof
individuals (and marginal distribution) in each wave who smoke or have been diagnosed with either
32Results from one sided t-tests.
33Appendix Table 4 presents the same analysis for each gender. Recall being diagnosed with
ADHD means that an individual has been diagnosed with either AD or HD. It also does not make
a distinction between individuals with one or both disorders.
AD, HD, ADHD, obesity or depression. Across each row we present the number of individuals (and
conditional frequency) who also engage in smoking or suer other poor health outcomes. Not only
are adolescents who are diagnosed with ADHD more likely to smoke but they also have a higher
rate of being diagnosed as either clinically depressed or obese than their cohorts (one sided t-tests).
This result is not unique to ADHD as we nd that individuals diagnosed with any of these health
disorders are signicantly more likely to engage in smoking than those not diagnosed in grade 12.
Since health disorders and risky health behaviors are more common among individuals diagnosed
with one particular disorder than among the remaining population we will investigate whether
estimates of the impacts of a disorder vary if we do not control for comorbidities. The majority
of the literature on the impacts of health generally include only single outcome measure such as
obesity, smoking or birthweight in their analysis. Estimates of the impact of health disorders may
vary if there are both strong correlations between included and omitted health outcomes and if the
omitted health outcomes have a signicant impact on the dependent variable. Our instruments are
unlikely to be unique to specic disorders as they are associated with the same region of the brain.34
Thus, even with the genetic instruments excluding signicant comorbid conditions may result in
estimates of the impacts of included disorder proxying for the eects of the omitted outcomes.
34Recall, from the scientic literature that these disorders are believed to be polygenic and that
there is no unique depression or obesity or ADHD gene. Pharmaceutical companies are now in the
process of examining the use of nicotine patches to deal with ADHD. Ritalin, which is currently
prescribed to children with ADHD was originally developed as an anti-depressant.
5.2 Estimates of the Empirical Model
Ordinary least squares estimates of equations (8) that ignore the endogeneity of health outcomes
and smoking behavior are presented in the top panel of Table 5.35 In our analysis we consider two
dierent health vectors. The rst health vector includes depression, obesity and ADHD. The results
are reported in columns 1 - 3. The second health vector (results reported in columns 4 -6) includes
depression and obesity but decomposes the diagnosis of ADHD into being clinically inattentive (AD)
or clinically hyperactive / impulsive (HD). Results for the full sample are presented in columns 1
and 4, for the sample of females in columns 2 and 5 and the male sample in columns 3 and 6.
As shown in column 1 of Table 5, the impact of each health disorder in the rst vector is
negatively and signicantly associated with academic performance for the full sample. The negative
impact of obesity is approximately twice the magnitude of the other health outcomes. On average
obese individuals have a GPA 0.37 points lower, an eectthatislargerthanthatfromanyrace
or family variable. Column 2 shows that female academic performance is signicantly negatively
associated with obesity. Obese girls saw a point decrease in their GPA, a magnitude that is ve
times as large as being depressed. In addition, ADHD does not correlate with female’s academic
outcomes. In contrast, column 3 demonstrates that the impact of health measures were all negatively
and signicantly associated with GPA for boys but the coecients do not vary across the health
measures. Finally, the negative impact of the household environment variable is nearly twice as
35Due to space limitations estimates of equations (9) and (10) are available from the authors by
large for boys than for girls.
Decomposing the impact of ADHD into its components, columns 4 to 6 of Table 5 indicate that
AD was responsible for the negative coecient of ADHD in column 1. For the full sample, HD is
positively associated with academic performance. Column 6 shows a strong negative association
between AD and GPA for males that is approximately 50% larger than that found in females.
Similarly, the positive impact of HD is 50% larger for boys but is statistically insignicant for both
genders. Interestingly Asian females performed signicantly higher than their Caucasian classmates
while there were no dierences for Asian boys.
5.2.1 Endogenous Health Outcomes and Health Behaviors: First-stage Estimates
A challenge exists in selecting an appropriate subset of the markers in our data to serve as instru-
ments. The scientic literature provides some (arguably weak) guidance as the evidence tends to be
inconsistent across studies.36 We present and report results from instruments selected by forward
stepwise estimation for each health outcome and behavior at the 5% level. This set was selected not
only because it has good rst stage properties for the full sample by design but rather because it is
more parsimonious than the other instrument sets we used to verify the robustness of our ndings.37
36These studies tend to use very small unrepresentative clinical samples. Since it is not possible
(and probably unethical) to engage in random mutations of an individual genetic code we argue
it is best to treat genetic predispositions as a form of neural correlates with health behaviors and
health status.
37For robustness, we considered seven dierent instrument sets for the equations. One set involved
the use of the complete set of the markers in our study, another set was constructed based on our
reading of the neuroscientic literature up to May 2005 and the remaining ve sets were constructed
from stepwise estimation using alternative selection criteria.
We do not vary our instrument set across race or gender so that any observed dierence in terms
of health eects is not the result of the selection of dierent instrument sets that are race or gender
For the markers to serve as instruments they must possess two statistical properties. First,
they must have a substantial correlation with the potentially endogenous health variables. Second,
they must be unrelated to unobserved determinants of the achievement equation. Table 6 presents
results from two specication tests that examine the statistical performance of the instruments for
each health equation and sample.
In the top panel of Table 6 we present estimates of the F-statistics of the joint signicance of
the instruments in the rst stage regressions. For each health outcome and health behavior with
each sample, the instrument set is jointly statistically signicant at a level above current cutos
for weak instruments.39 Since our estimates are over-identied, we use a J-test to formally test
the overidentifying restrictions. The associated p-values for these tests are presented in the bottom
panel of Table 6. The smallest of the ve p-values is a reassuring 0.21, provides little evidence
against the overidentifying restrictions. In addition many of the p-values are large and exceed 0.5.
However, these tests are known to have poor power properties.
38Our results (available upon request) were robust to the instrument set for the full sample and
sub-sample of females. The estimates do not vary substantially either qualitatively or quantitatively.
For the sub-sample of males there were some minor dierences with some of the other instrument
39Similarly the F statistics for the full set of instruments for the entire model is above current
cutos. We report equation by equation results in Table 6 to demonstrate that the results are not
driven by the instruments performing well in some health equations and not in others.
5.2.2 Endogenous Health Outcomes and Health Behaviors: Second-stage Estimates
Two stage least squares (2SLS) results for the achievement equation (8) for the two health vectors
is presented in Table 7. Column one presents results for the full sample and only depression is
signicantly related to academic performance. The impact of depression is approximately four times
larger than the OLS estimate presented in Table 5. When ADHD is broken into components (AD and
HD) both obesity and HD become statistically signicant as shown in column 4. Hyperactivity and
impulsiveness is positively related to academic performance. In contrast, the portion attributable
to AD is no longer statistically signicant once we correct for endogeneity.
The results for the subsample of females in columns 2 and 5 are most striking. The quantitative
impact of each health behavior is substantial. Both depression and obesity lead to decreases in
GPA. The impact of depression is nearly three times as large as that of obesity in health vector
one. With health vector two, both depression and obesity lead to a 0.8 GPA point decrease. While
the total impact of ADHD is close to zero, the separate eects of AD and HD are statistically
signicant. While inattention (AD) leads to lower GPA, the impact of HD is of opposite sign.
In contrast, for the subsample of males in columns 3 and 6, health outcomes are no longer
statistically signicant once we correct for their endogeneity. The separate impact of obesity, de-
pression and AD are statistically dierent across the genders.40 For each sample and health vector
40For health vector 1, the t statistic for dierences in the coecient estimates between genders
is 0.502, 2.499 and 2.020 for ADHD depression and obesity respectively. For health vector 2, the t
statistic for dierences in the coecient estimates between genders is 1.845, 0.537, 1.412, and 1.812
for AD, HD depression and obesity respectively.
we checked whether health status should be treated as endogenous by testing the null hypothesis
that the OLS and 2SLS estimates are equal using a Hausman-Wu test.41 We can reject the Null of
exogeneity of health outcomes for each health vector with each sample at the 5% level.42
There are several additional dierences between the estimates for males and females. Asian girls
are associated with higher GPA scores among females. Hispanic boys have signicantly lower GPA
among the males. The negative impact of home smoking environment is statistically signicant for
both samples. The magnitude in the 2sls estimates increases relative to OLS for the boys but di-
minishes by approximately 40% for girls. We should emphasize that our variable indicating whether
a smoker resides in the household is a proxy for family environment that we lack direct information
on. Concerns regarding whether a smoker residing in the home may represent inheritability of genes
from biological parents were examined. First, the raw association between biological parents having
been regular smokers and the presence of a smoker in the household is 35%, within the households
that smoke approximately 65% of the smokers are other family members. Second, we replicated
the analysis in Table 7 excluding this proxy for home environment, the magnitude as well as the
41Note, in the event of weak instruments and / or overtting of the achievement equation the
2SLS estimates would be biased towards the OLS estimates.
42We also considered the more ecient 3sls estimation of equation 8 where we accounted for the
one way error component structure of ˜
iT in running GLS. The 3sls results are consistent with a
underlying model which treats the components of ˜
iT as follows: Iicanbeviewedasarandomeect
that is i.i.d. across people and iT is an error term which is assumed i.i.d. across grades for the
same individual. There are limited eciency gains and no substantial dierences in the magnitude
or signicance of any of our results in this section moving from 2sls to 3sls. For completeness, 3sls
results that correspond to Table 7 are presented in Appendix Table 3. The only minor change is
that in the full sample with health vector 1, depression is now signicant at the 10% rather the 5%
level but the magnitude is virtually unchanged.
statistical signicance of the health disorders were unchanged for all three samples and two health
As indicated in Appendix Table 3, which presents comorbidities by gender, there are substan-
tially fewer girls diagnosed with both AD and HD relative to boys. Further, there are many more
depressed females particularly in the early waves. However, unlike males, girls that suer from
depression have fewer comorbid conditions.
To demonstrate the robustness of our results, Appendix Table 5 presents results for the male
and female subsample that correspond to their preferred instruments sets using stepwise estimation
on those subsamples. While the rst stage properties for these samples are improved, a eyeball
test conrms that there are no important statistical dierences between these estimates and those
using the instruments set constructed for the full sample with health vector 1 in Table 7. Similarly,
combining the separate instrument sets for males and females and estimating the system of equations
for the full sample yields no observable dierences. For females with health vector 2, the positive
impact of HD and negative impact of AD shrinks by approximately 25% with this instrument.
However, the impact of depression increases by 25% with this alternative instrument set. Overall,
the results continue to demonstrate that females suer large decreases in their GPA when they have
been diagnosed with AD, depression or as obese; whereas no signicant relationships exist for the
5.2.3 Discussion
The parameter estimates we obtain are reduced form coecients. Information on parental and
teacher investment as well as peer group composition is not available to disentangle the impact
of the health condition as explained by genes from that of the response from the environment
to the health conditions as explained by genes. While this appears unsatisfying, this limitation
is also implicitly shared by other empirical strategies used to estimate the impact of health on
education which generally either treat genetics as part of a big blackbox that can be eliminated under
strong assumptions or propose the use of alternative instrumental variables such as an individual’s
phenotype.43 The availability of genes as instrumental variables for the rst time makes it crystal
clear the level of diculty in obtaining structural parameter estimates and the importance of detailed
accurate information on health and education inputs. Further, structural parameters of this kind
even if they could be obtained, may quickly become invalid every time a new (medical) treatment
is developed that changes the occurrence rate or severity of these disorders’ negative impacts.
The use of exact measures of genes permits us to enter what traditionally has been a blackbox
in empirical economics. Studies that exploit variation within siblings or within twins not only
assume that the set of genetic factors do not vary between pairs but implicitly the impacts of these
factors and unobserved (to the analyst) family investments are constant between family members.
Most unsatisfying is that one can not test the validity of these two assumptions and if they are
43Phenotype reects the observable manifestation of a person’s genotype in which the variation
across individuals is due to past experiences with the environment.
refuted biases could increase from dierencing.44 Increasing evidence that monozygotic human twins
are discordant in many physical traits and diseases is not only ascribed to environmental factors
but also epigenetic modications.45 Epigenetics refers to DNA and chromatin modications that
play a critical role in regulation of various genomic functions. Essentially a substantial degree of
epigenetic variation can be generated during the mitotic divisions of a cell in the absence of any
specic environmental factors. This variation which results primarily from stochastic events is either
assumed the same in the sibling and twin dierencing strategies or has zero impacts on outcomes.
Our 2sls estimates, however, are not assuming a constant eect on health for individuals with
the same genetic markers. Drawing on Imbens and Angrist (1994), Angrist and Imbens (1995), and
Angrist, Imbens and Rubin (1996) when heterogeneous response to the instrument and heteroge-
neous treatment eects are pervasive, the 2sls estimate can take a causal interpretation as a local
average treatment eect (LATE) under two assumptions.46 This LATE parameter is simply the
average causal eect on education that can be attributed to the health disorders for the subset of
the population whose health disorders are induced by the chosen set of genetic markers and their
44The notion that estimates with samples of twins may increase biases is discussed in Bound and
Solon (1999) and Neumark (1999) in the context of estimating the returns to education.
45For example, while 80% of the variation in schizophrenia is assumed to be heritable only half
of monozygotic twin pairs in which at least one twin has the disease, share the disorder. In total,
only 10% of diseases are assumed to be due strictly to heritable genetic factors. Gringas and Chen
(2001) discuss the mechanisms that lead monozygotic twins to be genetically dierent.
46Specically the exclusion restriction of the traditional IV literature is made stronger as the
instrument is required to be entirely independent of the potential outcomes and potential treatments.
Second a specic monotonicity condition on individuals’ responses to shifts in the instrument is
made. This condition requires those induced to change their health status by the instruments have
health changes operating in only one direction.
interactions (or, at least, a mechanism that the genetic markers reects).
As noted, the use of genes as an instrument presents a challenge in regards to intergenerational
transmission. It is well known that ospring of parents with psychological problems are more likely
to develop these disorders. For example, it has been estimated that 40% of children with depressed
parents experience psychiatric disorders by the age 20 (Beardslee et al., 1998). Data from the
Minnesota Twin Family Study nds a weak positive association between maternal depression and
ospring depression but does not nd any evidence of an association between paternal depression
with either maternal or ospring depression. The mechanism by which parental disorders inuence
ospring psychopathology has not been established and is hypothesized among other factors a
combination of genes and environmental factors.
Our coecient estimates may also capture a dynastic eect of the impact of health disorders.
Without more detailed data on parental diagnoses as well as parental genes we can not separate
out the portion of the impact that is uniquely brought on by the child’s condition. As a result,
this eect may include the impact of family environments provided by depressed parents whose
depression can be explained by exactly the same set of genes and genetic interaction terms that we
selcted to explain the child’s depression in our study. This dynastic eect may be useful to estimate
since individuals are in general not randomly assigned to families. Similarly if the assortative mating
process is stable, then the dynastic eect is important to recover since kids with certain disorders
will increasingly come from families that also have this disorder. It is also worth noting that there
is limited evidence that individuals seek out partners with similar genetic makeup. Animal studies
on mate choice have shown that both signals of genetic quality and genetic diversity play important
roles whose relative weight varies according to the respective ranges of these characteristics in the
study population.47 The pursuit of genetic diversity serves to weaken intergenerational correlations,
especially on adverse health attributes.
To summarize, the genetic markers we employ in our study are predetermined to any interaction
that the adolescents have with the environment, even those interactions such as pre-natal care that
occurinuteroandaect measures such as birth weight and APGAR scores. They possess strong
correlations with certain health disorders and health outcomes. At present there is no detectable
evidence that they are correlated with genetic factors that associate with inputs to either innate
ability or the development of intelligence. We are not ruling out the possibility that the genes aect
the acquisition of intelligence but rather we are assuming that these genes neither directly enter the
education production process nor are correlated with genes involved in production of these education
outcomes. The assumptions underlying these markers for identication are supported by statistical
tests. Not only can these assumptions be tested but we argue that this strategy imposes substantially
weaker assumptions on the relationship between nature, nurture and adolescent outcomes than other
empirical strategies used in the literature. Despite these advances substantially richer data would
be needed to recover the structural parameter.
47Roberts and Gosling (2003) use experiments with rodents to reach this conclusion and note that
genetic diversity is desired since it increases reproductive success.
5.3 Accounting for Endogenous Cigarette Smoking Matters
5.3.1 Are Smoking Patterns Dierent Between the Sexes?
Our analysis indicates that a substantial gap exists between the genders in the impacts of health
disorders on academic achievement. One potential candidate that can account for this gender
dierence is smoking patterns. A strength of our data is that we have detailed information on
the smoking behavior of each individual throughout adolescence. Between the sexes simple t-tests
suggest that there are no systematic dierences in tobacco consumption as measured in current
smoking and year smoked. However, boys diagnosed with either depression, ADHD, AD, or HD
smoked cigarettes with signicantly more tar and nicotine content than girls diagnosed with the
same disorder.48 Males with mental disorders may use the nicotine in the cigarettes to self-medicate
against these disorders since nicotine is well known to have a positive eectonattentionandindirect
eects on the dopaminergic system, potentially reducing symptoms of ADHD and depression.49
This is consistent with the hypothesis that for individuals with limited attention spans there is an
immediateacademicbenet or compensation from cigarette smoking.50
While it is unlikely that only males would self-medicate with tobacco, a recent survey in the
48Simple linear regressions controlling for school eects and demographic variables conrm this
49Conners et al., (1996) present research that suggests nicotine does indeed enhance attention
function in adults with ADHD.
50Smoking diersfromotherhealthbehaviorssuchasdrugoralcoholuseasitisnotknownto
impair judgment and the detrimental health impacts come much later in life relative to drug use,
thus appears to be less damaging in the present. Tobacco does not alter consciousness and many
smokers claim that by smoking cigarettes they relieve symptoms associated with a variety of health
psychiatric literature (Perkins et al. 1999) concludes that gender dierences in the motivation for
tobacco consumption and maintenance exist in both human and animal populations. This nding
in combination with evidence that females are less sensitive to the eects of nicotine is interpreted
in the survey as supporting the hypothesis that females are less likely to self-medicate with tobacco.
If males are more able or inclined to take advantage of the immediate compensating benets from
smoking this may explain the dierence in the impacts of the health disorders.
To investigate whether smoking patterns do indeed have dierent relationships with diagnosed
health disorders between the genders we present OLS and 2SLS estimates of the impacts of smoking
on each health outcomes for each sample and health vector in Appendix Table 6. Whereas smoking
is positively associated with each health outcome when treated as exogenous (in the bottom panel),
the 2sls estimates present dierent patterns. Smoking is positively related to ADHD and negatively
related to obesity once we account for endogeneityasreportedincolumn1. Further,boyswho
smoke are signicantly less likely to be diagnosed with ADHD; particularly HD. In contrast, females
who smoke are less likely to be obese or be diagnosed with depression although neither impact
is statistically signicant at conventional levels. These gender dierences add a further layer of
complexity and support the possibility that smoking patterns account for some of the gap in the
impacts of health disorders on education between the genders. We next examine the sensitivity of
our results of treating smoking as a state as opposed to a control variable.
5.3.2 Are Smoking Decisions Exogenous?
With genetic markers as instruments we can investigate the degree to which smoking is a choice
variable. Past research has suggested that smoking could proxy for an individuals’ discount rate in
the economics literature. Several studies using this strategy have implicitly assumed that smoking
does not reect a choice.51
Treating cigarette smoking as an exogenous input to health outcomes presents striking changes
to our results. Table 8 presents 2SLS estimates of equations (8) and (9) which assume this choice
is exogenous. Notice that the magnitude of all health outcomes in Table 8 increases markedly from
those presented in Table 7, where smoking was treated endogenous. Most surprising is that by
treating smoking as an exogenous behavior, the estimates on the impact of depression, HD and
obesity become statistically signicant for males. The results suggest that being obese leads boys
to score 0.8 points higher on their GPA. For the full sample and subsample of girls, the estimated
impact of depression nearly doubles in magnitude. In addition, ADHD becomes statistically signif-
icant for the full sample. Finally, the estimates on AD and HD for girls become implausibly large
but continue to oset one another. The implausible magnitude of these coecients are a result of
both limited independent variation to separately identify impacts and the use of smoking as an
invalid exclusion restriction.
51This idea is due to Farrell and Fuchs (1982) and subsequent studies such as Evans and Mont-
gomery (1994) have tried to use smoking as an instrument for education in wage equations. Ham-
mermesh (2000) argues that smoking behavior is a measure of family background and is unlikely to
be a valid instrument for education.
We conducted a Hausman test of each health status equation for each vector in Table 8 by
comparing it to the corresponding equation in Table 7. We can reject the Null of exogeneity for
years of cigarette smoking, suggesting that smoking is indeed a choice variable. Our investigation
into the endogeneity of smoking shows that despite the use of genes as instruments for the health
outcomes, the dierent ways of accounting for the smoking decision leads to very dierent results.
This could result from the fact that genes associated with smoking tendency are also associated
with health disorders as well as smoking directly impacting health disorders. By either ignoring
smoking decisions or treating smoking decisions as exogenous the exclusion restriction assumption
of the genetic instrument is violated since individuals with these disorders are more likely to smoke.
5.4 Accounting for Comorbid Health Outcomes Matters
We now consider what, if any, eect it would have on our estimates if we followed the usual practice
of ignoring comorbid conditions and only including one health outcome at a time. The results from
2SLS of achievement equations that include only one health variable are presented in Table 9. Each
entry refers to the point estimate of that health behavior from a system of equation which includes
the achievement equation and that health behavior or status alone.
Examining results from separate regressions using the full sample, we would conclude that
inattention is positively and HD negatively related to GPA, which is the opposite of the pattern
reported in Table 7. The results for the subsample of boys completely change when comorbid
conditions are omitted. Obesity, AD and HD are all positively related to academic performance
and the magnitude of the impact for obesity is extremely large. Similarly, for the full sample and
subsample of girls the impact of depression is approximately 40% larger as it may be capturing
a portion of the negative impact of obesity or ADHD. Taken together, the results of Table 8 and
Table 9 illustrate the need to account for a greater set of health outcomes and endogenous behaviors
in any analysis. Even with exogenous instruments such as genes to correct for the endogeneity of
health status, the omission of comorbid conditions and behaviors may present a misleading picture
of the causal relation between particular health states and academic performance among other
Understanding the consequences of growing up in poor health for adolescent development is an
important research question. This question is particularly interesting to policymakers since part
of the explicit rationale for programs such as Medicaid is to improve the development of children.
However, it is challenging to address due to endogeneity that arises from omitted variables and
measurement error problems pertaining to health.
In this paper, we use information on genetic markers to overcome these challenges and identify
the causal eect of health on education via an instrumental variables strategy. The explicit use of
52This may be due to the fact that the genes are associated with more than one health outcome
in a vector. But if genetic markers cannot separate one health outcome from another, it is hard to
imagine that any nurture or environmental factor could break the statistical association between
these disorders. This issue does not have a simple solution.
genetic markers in empirical social science research is becoming possible due to an ever increasing
understanding of how genetic inheritance relates to individual health outcomes as well as knowledge
from the human genome project. While the decoding of the human genome has been compared to
breakthroughs such as Galileo’s celestial searching or sending a man to the moon since it has the
potential to revolutionize medical treatments, we believe that it also has the ability to shed light on
open questions in the social sciences. For example, the interactions and dynamics between health
behavior and health status together with the information on genes might really be important in a
line of research that tries to assess the impact of health as a form of human capital on many outcomes
of interest to economists such as labor market activity, marriage and educational attainment.
We nd strong statistical evidence that genetic markers indeed show a great deal of promise as
a set of instrumental variables for several health outcomes and behaviors. Using these genes as a
novel source of identication we nd that the impact of poor health on academic achievement is
large. Depression and inattention both lead to a 0.5 point decrease on GPA, which is roughly a one
standard deviation reduction in performance. There exists substantial heterogeneity in the impacts
of health status on academic performance as female adolescents are strongly adversely aected by
negative physical and mental health conditions, whereas males are not signicantly impacted. In
addition, we nd that it is very important for researchers in explaining health status to account for
comorbid health disorders as well as endogenous health enhancing or health deteriorating behaviors.
Our evidence indicates that either treating these behaviors as exogenous or ignoring comorbid
conditions would lead to either dierent signed estimates or substantially larger impacts of health
on education.
Unfortunately, the results also lead to more questions, particularly in understanding why females
and not males are so adversely aected by poor health outcomes. More research is needed to further
our understanding on this issue. For example, responses to a variety of psychological questionnaires
can be used to shed light on possible dierences between females and males in their self-perception.
Future research could also incorporate additional dynamics such as how parents, teachers and peers
respond to an individual’s changing health state to explore more deeply some of the sources for this
heterogeneity. In conclusion, recent years have witnessed an explosion of ndings on the causes and
correlates of health outcomes and behaviors in neurobiology, which could oer a promising source
of predetermined exogenous variations to help identify the impact of health on a set of outcomes of
great interest to economists.
[1] Almond, Douglas, Kenneth Y. Chay and David S. Lee (2005), “The Costs of Low Birth Weight,”
forthcoming in Quarterly Journal of Economics.
[2] Angrist, Joshua D., Guido W. Imbens, and Donald B. Rubin (1996), “Identication of Causal
Eects Using Instrumental Variables,” Journal of the American Statistical Association,91444-
[3] Angrist, Joshua D. and Guide W. Imbens (1995), “Two-State Least Squares Estimation of
Average Causal Eects in Models with Variable Treatment Intensity,” Journal of the American
Statistical Association, 90 431-442.
[4] Audrain-McGovern, Janet, Daniel Rodriguez, Kenneth P. Tercyak, Jocelyn Cuevas, Kelli
Rodgers and Freda Patterson (2004), “Identifying and Characterizing Adolescent Smoking
Trajectories,” Cancer Epidemiology Biomarkers & Prevention, 13, 2023-2034.
[5] Audrain-McGovern, Janet, Kenneth P. Tercyak, Paula Goldman and Angelita Bush (2002),
“Recruiting Adolescents into Genetic Studies of Smoking Behavior,” Cancer Epidemiology Bio-
markers & Prevention, 11, 249-52.
[6] Barkley, Russell A. and Kevin R. Murphy (1998), Attention-Decit/Hyperactivity Disorder: A
Clinical Workbook (2nd ed), New York: The Guilford Press.
[7] Bannon Michael J., Paola Sacchetti and James G. Granneman (1995), “The Dopamine Trans-
porter: Potential Involvement in Neuropsychiatric Disorders” in Floyd E. Borroni and David
J. Kupfer (eds.), Psychopharmacology: The Fourth Generation of Progress, New York: Raven
Press Ltd., 179-188.
[8] Beardslee William R., S. Swatling, L. Hoke, L, P. C. Rothberg, P. van de Velde, L. Focht and
D. Podorefsky (1998), “From Cognitive Information to Shared Meaning: Healing Principles in
Prevention Intervention,” Psychiatry, 61(2), 112-130.
[9] Becker, Gary S. (1993), Human Capital: A Theoretical and Empirical Analysis with Special
Reference to Education (3d edition), Chicago: The University of Chicago Press.
[10] Behrman, Jere R. and Mark R. Rosenzweig (2004), “Returns to Birthweight,” Review of Eco-
nomics and Statistics, 86, 586-601.
[11] Behrman, Jere R. and Victor Lavy (1998), “Child Health and Schooling Achievement: Asso-
ciation, Causality and Household Allocations,” CARESS Working Papres 97-23,Universityof
[12] Behrman, Jere R., Mark R. Rosenzweig and Paul Taubman (1994), “Endowments and the
Allocation of Schooling in the Family and in the Marriage Market: The Twins Experiment,”
Journal of Political Economy, 102, 1131-1174.
[13] Biederman Joseph, Stephen Faraone, Eric Mick and Elise Lelon (1995), “Psychiatric comor-
bidity among referred juveniles with major depression: fact or artifact?,” Journal of American
Academy of Child and Adolescent Psychiatry, 34, 579-590
[14]Blum,Kenneth, ErnestP.Noble,PaulJ.Sheridan, Olivia Finley, Anne Montgomery,
Terry Ritchie, Tulin Ozkaragoz, Robert J. Fitch, Frank Sadlack, Donald Sheeld, Tommie
Dahlmann, Sheryl Halbardier and Harou Nogami (1991), “Association of the A1 Allele of the
D2 Dopamine Receptor Gene with Severe Alcoholism,” Alcohol, 8, 409-416.
[15] Boardman, Anthony E. and Richard J. Murnane (1979), “Using Panel Data to Improve Esti-
mates of the Determinants of Educational Attainment,” Sociology of Education, 52, 113-121.
[16] Bound John and Gary Solon (1999), “Double Trouble: On the Value of Twins-Based Estimation
of the Return to Schooling”, Economics of Education Review, 18, 169-182.
[17] Carr, Laurence A., J. Khristen Basham, Brian K. York and Peter P. Rowell (1992), “Inhibi-
tion of Uptake of l-methyl-4-phenylpyridinium Ion and Dopamine in Striatal Synaptosomes by
Tobacco Smoke Components,” European Journal of Pharmacology, 215, 285-287.
[18] Coger, Roger W., Kathryn L. Moe and E. A. Serafetinides (1996), “Attention decit disorder in
adults and nicotine dependence: Psychobiological factors in resistance to recovery?”, Journal
of Psychoactive Drugs, 28(3), 229-240.
[19] Comings, David E., Radhika Gade-Andavolu, Nancy Gonzalez, Shijuan Wu, Donn Muhleman,
Hezekiah Blake, Fumin Chiu, Edward Wang, K Farwell, Salima Darakjy, Richard Baker, George
Dietz, Gerard Saucier and James P MacMurray (2000), “Multivariate Analysis of Associations
of 42 Genes in ADHD, ODD and Conduct Disorder,” Clinical Genetics, 58(1), 31-40.
[20] Comings, David E., S. Wu, Connie Chiu, Robert H. Ring, Radhika Gade, Chul Ahn, James
P. MacMurray, George Dietz and Donn Muhleman (1996), “Polygenic Inheritance of Tourette
Syndrome, Stuttering, Attention Decit Hyperactivity, Conduct, and Oppositional Deant
Disorder: The Additive and Subtractive Eect of the Three Dopaminergic Genes—DRD2, D
beta H, and DAT1,” American Journal Of Medical Genetics, 67, 264-288.
[21] Conners, Keith C., Edward D. Levin, Elizabeth Sparrow, Sean C. Hinton, D. Erhardt, W.
H. Meck, J. E. Rose and J. March (1996), “Nicotine and attention in adult attention decit
hyperactivity disorder (ADHD)”, Psychopharmacology Bulletin,32(1),67-73.
[22] Cook,EdwardH.Jr.,MarkA.Stein,MatthewD.Krasowski,NancyJ.Cox,D.M.Olkon,J.
E. Kieer and Bennett L. Leventhal (1995), “Association of Attention-Decit Disorder and the
Dopamine Transporter Gene,” American Journal of Human Genetics, 56, 993-998.
[23] Currie, Janet (2005), “Health Disparities and Gaps in School Readiness,” Fu tu re of Chi l-
dren,15(1), 117-138.
[24] Currie, Janet and Mark Stabile (2005), “Child Mental Health and Human Capital Accumula-
tion: The Case of ADHD,” NBER Working Paper: 10435.
[25] Currie, Janet and Rosemary Hyson (1999), “Is the Impact of Health Shocks Cushioned by
Socioeconomic Status? The Case of Low Birthweight,” American Economic Review, 89(2),
[26] Cutler, David and Edward Glaeser (2005), “What Explains Dierences in Smoking, Drinking
and Other Health-Related Behaviors?” Cambridge: NBER Working Paper w11100.
[27] Di Chiara, Gaetaro. and Assunta Imperto (1988), “Drugs Abused by Humans Preferentially
Increase Synaptic Dopamine Concentrations in the Mesolimbic System of Freely Moving Rats,”
Proceedings of the National Academy of Sciences USA, 85, 5274-5278.
[28] Ding, Weili, and Steven F. Lehrer (2005), “Accounting for Unobserved Ability Heterogeneity
within Education Production Functions,” mimeo, Queen’s University.
[29] Evans, William and Edward Montgomery (1994), “Education and Health: Where There’s
Smoke There’s an Instrument.” NBER Working Paper: 4949
[30] Epstein, Leonard H., Jodie L. Jaroni, Rocco A. Paluch, John J. Leddy, Holly E. Vahue, Larry
Genotype as a Risk Factor for Obesity in Smokers,” Obesity Research, 10(12), 1232-1240.
[31] Farrell, Phillip and Victor R. Fuchs (1982), “Schooling and Health: The Cigarette Connection,”
Journal of Health Economics, 1(3), 217-230.
[32] Fuchs, Victor R. (1982), “Time Preference and Health: An Explanatory Study” in Victor R.
Fuchs e d. Economic Aspects of Health, University of Chicago Press for NBER, Chicago.
[33] Glewwe, Paul and Hanan Jacoby (1995), “An Economic Analysis of Delayed Primary School
Enrollment in a Low-Income Country-the Role of Early Childhood Nutrition,” Review of Eco-
nomics and Statistics, 77, 156-169.
[34] Gringras, Paul and Wai Chen (2001), “Mechanisms for Dierences in Monozygous Twins,”
Early Human Development, 64(2), 105-117.
[35] Grossman, Michael and Robert Kaestner (1997), “EectsofEducationonHealth,”inJere
R. Behrman and Nezver Stacey eds. The Social Benets of Education, University of Michigan
Press, Ann Arbor.
[36] Grossman, Michael (1975), “The Correlation between Health and Schooling,” in Household
Production and Consumption, Ed N. E. Terleckyj, Studies in Income and Wealth, Vol. 40,
Conference on Research in Income and Wealth. New York: Columbia University Press for the
National Bureau of Economic Research,
[37] Grossman, Michael (1972), “On the Concept of Health Capital and the Demand for Health,”
Journal of Political Economy,80(2),223-255.
[38] Hamermesh, Dan (2000), “The Craft of Labormetrics,” Industrial and Labor Relations Review,
53(3), 363-380.
[39] Imbens, Guido W. and Joshua D. Angrist (1994), “Identication and Estimation of Local
Average Treatment Eects,” Econometrica, 62, 467-475.
[40] Kenkel, Donald (1991), “Health Behavior, Health Knowledge and Schooling,” Journal of Po-
litical Economy, 99(2), 287-305.
[41] Klein, Daniel N., Peter M. Lewinsohn and John R. Seeley (1997a), “Psychosocial Characteris-
tics of Adolescents with a Past History of Dysthymic Disorder: Comparison with Adolescents
with Past Histories of Major Depressive and Non-Aective Disorders, and Never Mentally Ill
Controls,” Journal of Aective Disorders, 42, 127-135.
[42] Kremer Michael and Edward Miguel (2004), “Worms: Identifying Impacts on Education and
Health in the Presence of Treatment Externalities,” Econometrica, 72(1), 159-217.
[43] Lerman, Caryn, Neil E. Caporaso, Angelita Bush, Yun-Ling Zheng, Janet Audrain, David Main
and Peter G. Shields (2001), “Tryptophan Hydroxylase Gene Variant and Smoking Behavior,”
American Journal of Medical Genetics, 105(6), 518-520.
[44] Lerman, Caryn, and Wade Berrettini (2003), “Elucidating the Role of Genetic Factors in
Smoking Behavior and Nicotine Dependence,” American Journal of Medical Genetics (Neu-
ropsychiatric Genetics), 118B, 48-54.
[45] Levin, Edward D., Keith C. Conners, Elizabeth Sparrow, Sean C. Hinton, D. Erhardt,
W.H. Meck, J.E. Rose and J. March (1996), “Nicotine eects on adults with attention-
decit/hyperactivity disorder”, Psychopharmacology, 123 55—63.
[46] Lucki, Irwin, (1998), “The Spectrum of Behaviors Inuenced by Serotonin,” Biological Psychi-
[47] Mays Jr., Herman L. and Georey E. Hill, “Choosing Mates : Good Genes versus Genes that
are a Good Fit”, Trends in Ecology and Evolution, 10(19), 554-559.
[48] McClure, Samuel M., David I. Laibson, George Loewenstein and Jonathan D. Cohen (2004),
“Separate Neural Systems. Value Immediate and Delayed. Monetary Rewards,” Science,306,
[49] Miller, David C. (2005), “Adolescent Cigarette Smoking: A Longitudinal Analysis Through
Young Adulthood,” NCES Working Paper #2005333
[50] Neumark, David (1999), “Biases in Twin Estimates of the Return to Schooling,” Economics of
Education Review, 18, 143-148.
[51] Olds, James, (1967), “The Limbic System and Behavioral Reinforcement,” Progress in Brain
Research, 27, 144-64.
[52] Perri, Timothy J. (1984), Health Status and Schooling Decisions of Young Men, Economics of
Education Review, 3, 207-213.
[53] Perusse, Louis, Tuomo Rankinen, Aamir Zuberi, Yvon C. Chagnon, S. John Weisnagel, George
Argyropoulos, Brandon Walts, Eric E. Snyder and Claude Bouchard (2005), “The Human
Obesity Gene Map: The 2004 Update,” Obesity Research, 13, 381-490.
[54] Pliszka S. R., C. L. Carlson, J. M. Swanson (1999), ADHD with Comorbid Disorders: Clinical
Assessment and Mangement, New York: Guilford Press.
[55] Pomerleau, Ovide F., Karen K. Downey, Fred W. Stelson and Cynthia S. Pomerleau (1995),
“Cigarette smoking in adult patients diagnosed with attention decit hyperactivity disorder”,
Journal of Substance Abuse, 7(3), 373-378.
[56] Roberts, Craig S. and Morris L. Gosling (2003), “Genetic Quality and Similarity Interact in
Mate Choice Decisions by Female Mice,” Nature Genetics, 35, 103-106.
[57] Roberts, Robert E., Peter M. Lewinsohn, and John R. Seeley (1991), “Screening for Adolescent
Depression: Comparison of Depression Scales,” Journal of the American Academy of Child
and Adolescent Psychiatry, 30, 58-66.
[58] Rosenzweig, Mark R. and Theodore. P. Schultz (1983), “Estimating a Household Production
Function: Heterogeneity, the Demand for Health, Inputs, and Their Eects on Birth Weight,”
Journal of Political Economy, 91, 723-746.
[59] Seeman, Philip, and Hyman B. Niznik (1990), “Dopamine Receptors and Transporters in
Parkinson’s Disease and Schizophrenia,” The FASEB Journal,4,2737-2744.
[60] Sibley, David R. and Frederick J. Monsma, Jr. (1992), “The Molecular Biology of the Dopamine
Receptors”, Trends in Pharmacological Science, 13, 61-68.
[61] Schwimmer, Jerey B., Tasha M. Burwinkle and James W. Varni (2003), “Health-Related
Quality of Life of Severely Obese Children and Adolescents,” Journal of American Medical
Association, 289, 1813-1819.
[62] Strauss, John and Duncan Thomas (1998), “Health, Nutrition, and Economic Development,”
Journal of Economic Literature, 36(2), 766-817.
[63] Todd, Petra E. and Kenneth I. Wolpin (2004), “The Production of Cognitive Achievement in
Children: Home, School and Racial Test Score Gaps,” mimeo, University of Pennsylvania.
[64] Venter, J. Craig, Mark D. Adams, Eugene W. Myers, Peter W. Li, Richard J. Mural, Granger
G. Sutton, Hamilton O. Smith, Mark Yandell, Cheryl A. Evans, Robert A. Holt, Jeannine D.
Gocayne, Peter Amanatides, Richard M. Ballew, Daniel H. Huson, Jennifer Russo Wortman,
Qing Zhang, Chinnappa D. Kodira, Xiangqun H. Zheng, Lin Chen, Marian Skupski, Gan-
gadharan Subramanian, Paul D. Thomas, Jinghui Zhang, George L. Gabor Miklos, Catherine
Nelson, Samuel Broder, Andrew G. Clark, Joe Nadeau, Victor A. McKusick, Norton Zinder,
Arnold J. Levine, Richard J. Roberts, Mel Simon, Carolyn Slayman, Michael Hunkapiller, Ran-
dall Bolanos, Arthur Delcher, Ian Dew, Daniel Fasulo, Michael Flanigan, Liliana Florea, Aaron
Halpern, Sridhar Hannenhalli, Saul Kravitz, Samuel Levy, Clark Mobarry, Knut Reinert, Karin
Remington, Jane Abu-Threideh, Ellen Beasley, Kendra Biddick, Vivien Bonazzi, Rhonda Bran-
don, Michele Cargill, Ishwar Chandramouliswaran, Rosane Charlab, Kabir Chaturvedi, Zuom-
ing Deng, Valentina Di Francesco, Patrick Dunn, Karen Eilbeck, Carlos Evangelista, Andrei E.
Gabrielian, Weiniu Gan, Wangmao Ge, Fangcheng Gong, Zhiping Gu, Ping Guan, Thomas J.
Heiman, Maureen E. Higgins, Rui-Ru Ji, Zhaoxi Ke, Karen A. Ketchum, Zhongwu Lai, Yiding
Lei, Zhenya Li, Jiayin Li, Yong Liang, Xiaoying Lin, Fu Lu, Gennady V. Merkulov, Natalia Mil-
shina, Helen M. Moore, Ashwinikumar K Naik, Vaibhav A. Narayan, Beena Neelam, Deborah
Nusskern, Douglas B. Rusch, Steven Salzberg, Wei Shao, Bixiong Shue, Jingtao Sun, Zhen Yuan
Wang, Aihui Wang, Xin Wang, Jian Wang, Ming-Hui Wei, Ron Wides, Chunlin Xiao, Chunhua
Yan, Alison Yao, Jane Ye, Ming Zhan, Weiqing Zhang, Hongyu Zhang, Qi Zhao, Liansheng
Zheng, Fei Zhong, Wenyan Zhong, Shiaoping C. Zhu, Shaying Zhao, Dennis Gilbert, Suzanna
Baumhueter, Gene Spier, Christine Carter, Anibal Cravchik, Trevor Woodage, Feroze Ali,
Huijin An, Aderonke Awe, Danita Baldwin, Holly Baden, Mary Barnstead, Ian Barrow, Karen
Beeson, Dana Busam, Amy Carver, Angela Center, Ming Lai Cheng, Liz Curry, Steve Dana-
her, Lionel Davenport, Raymond Desilets, Susanne Dietz, Kristina Dodson, Lisa Doup, Steven
Ferriera, Neha Garg, Andres Gluecksmann, Brit Hart, Jason Haynes, Charles Haynes, Cheryl
Heiner, Suzanne Hladun, Damon Hostin, Jarrett Houck, Timothy Howland, Chinyere Ibegwam,
Jeery Johnson, Francis Kalush, Lesley Kline, Shashi Koduru, Amy Love, Felecia Mann, David
May, Steven McCawley, Tina McIntosh, Ivy McMullen, Mee Moy, Linda Moy, Brian Murphy,
Keith Nelson, Cynthia Pfannkoch, Eric Pratts, Vinita Puri, Hina Qureshi, Matthew Reardon,
Robert Rodriguez, Yu-Hui Rogers, Deanna Romblad, Bob Ruhfel, Richard Scott, Cynthia
Sitter, Michelle Smallwood, Erin Stewart, Renee Strong, Ellen Suh, Reginald Thomas, Ni Ni
Tint, Sukyee Tse, Claire Vech, Gary Wang, Jeremy Wetter, Sherita Williams, Monica Williams,
Sandra Windsor, Emily Winn-Deen, Keriellen Wolfe, Jayshree Zaveri, Karena Zaveri, Josep
F. Abril, Roderic Guigó, Michael J. Campbell, Kimmen V. Sjolander, Brian Karlak, Anish
Kejariwal, Huaiyu Mi, Betty Lazareva, Thomas Hatton, Apurva Narechania, Karen Diemer,
Anushya Muruganujan, Nan Guo, Shinji Sato, Vineet Bafna, Sorin Istrail, Ross Lippert, Russell
Schwartz, Brian Walenz, Shibu Yooseph, David Allen, Anand Basu, James Baxendale, Louis
Blick, Marcelo Caminha, John Carnes-Stine, Parris Caulk, Yen-Hui Chiang, My Coyne, Carl
Dahlke, Anne Deslattes Mays, Maria Dombroski, Michael Donnelly, Dale Ely, Shiva Esparham,
Carl Fosler, Harold Gire, Stephen Glanowski, Kenneth Glasser, Anna Glodek, Mark Gorokhov,
Ken Graham, Barry Gropman, Michael Harris, Jeremy Heil, Scott Henderson, Jerey Hoover,
Donald Jennings, Catherine Jordan, James Jordan, John Kasha, Leonid Kagan, Cheryl Kraft,
Alexander Levitsky, Mark Lewis, Xiangjun Liu, John Lopez, Daniel Ma, William Majoros,
Joe McDaniel, Sean Murphy, Matthew Newman, Trung Nguyen, Ngoc Nguyen, Marc Nodell,
Sue Pan, Jim Peck, Marshall Peterson, William Rowe, Robert Sanders, John Scott, Michael
Simpson, Thomas Smith, Arlan Sprague, Timothy Stockwell, Russell Turner, Eli Venter, Mei
Wang, Meiyuan Wen, David Wu, Mitchell Wu, Ashley Xia, Ali Zandieh, and Xiaohong Zhu
(2001), “The Sequence of the Human Genome,” Science, 291, 1304-1351.
Table 1: Summary Characteristics of the Sample
Time Invariant Variables N=893
Variable Mean Standard Deviation
Male 0.469 0.499
African American 0.073 0.260
Hispanic 0.093 0.291
Asian 0.106 0.308
Caucasian 0.667 0.471
Biological Parent smoked 0.449 0.498
Body Mass Index 23.426 4.410
Obese (BMI>=30) 0.081 0.272
School 1 0.176 0.381
School 2 0.249 0.432
School 3 0.214 0.410
School 4 0.138 0.345
School 5 0.227 0.419
AD diagnosis 0.043 0.202
HD diagnosis 0.040 0.197
ADHD diagnosis 0.063 0.243
Time Varying Variables
10 Mean Grade 10
Grade 11
Grade 12
Tried Smoking 0.433 0.495 0.483 0.500 0.533 0.499
Current Smoker 0.091 0.288 0.152 0.359 0.178 0.382
Years as a Regular
Smoker 0.116 0.398 0.245 0.680 0.399 0.968
Currently depressed 0.161 0.368 0.117 0.322 N/A N/A
Smoker in Household 0.241 0.428 0.246 0.431 0.231 0.422
Grade Point Average
(GPA) 3.184 0.567 3.148 0.598 3.176 0.571
Age 16.032 0.399 17.030 0.396 18.034 0.400
Depressed last period 0.168 0.374 0.169 0.375 0.122 0.327
Smoker last period 0.088 0.283 0.095 0.293 0.147 0.354
Lagged number of years
smoking 0.071 0.278 0.120 0.406 0.235 0.662
Number of observations 834 863 879
Table 2: Summary Information on Genetic Markers in the Sample
Gene Marker Total
of people
also have
of people
also have
Number of
people also
have A1A1
of people
also have
AA 120
[0.135] **** 4
(0.033) 5
(0.042) 16
AC 393
[0.440] **** 15
(0.038) 20
(0.051) 39
CC 380
[0.426] **** 12
(0.032) 27
(0.071) 65
TT 31
[0.035] 4
(0.129) **** 2
(0.065) 3
CT 191
[0.214] 24
(0.126) **** 9
(0.047) 19
CC 671
[0.751] 92
(0.137) **** 41
(0.061) 56
A1A1 52
[0.058] 5
(0.096) 2
(0.038) **** 3
A1A2 286
[0.320] 34
(0.119) 9
(0.031) **** 19
A2A2 555
[0.622] 81
(0.146) 20
(0.036) **** 56
DAT0 72
[0.081] 16
(0.222) 3
(0.042) 3
(0.042) ****
DAT1 317
[0.355] 38
(0.120) 13
(0.041) 17
(0.054) ****
[0.558] 65
(0.131) 15
(0.030) 32
(0.064) ****
Note: Each cell contains the number of individuals that possess the respective row and column
combination of genetic markers. The conditional frequency of having the dual markers is presented
in round parentheses. The marginal frequency of possessing a marker is presented in square
Table 3: Relationship Between Genetic Markers with Health Behaviors and Health Outcomes
During Adolescence
Gene Marker Depressi
on Smoking Obesity BMI ADHD AD HD
AA 0.149
(4.516) 0.067
AC 0.150
(4.140) 0.074
CC 0.156
(4.640) 0.050
TT 0.165
(3.283) 0.129
CT 0.159
(4.195) 0.031
CC 0.150
(4.508) 0.069
A1A1 0.189
(5.998) 0.058
A1A2 0.174
(4.651) 0.049
A2A2 0.138
(4.088) 0.070
DAT0 0.155
(5.310) 0.064
DAT1 0.109
(4.749) 0.091
DAT2 0.172
(4.004) 0.044
Note: Each cell presents the conditional mean, the standard deviation in round parentheses and the
odds ratio for outcomes (excluding BMI) in square parentheses. *, **, *** denote the Null of
homogeneity of odds across markers by genotype from a chi-squared test is rejected at the 1%, 5%,
10% level respectively. The tests were conducted with the same sample used to construct Table 1.
Table 4: Relationship Between Health Behaviors and Health Outcomes During Adolescence
Behavior Total
Number Nothing
Else1 Also
Smokes Also
AD Also
HD Also
Obese Also
Wave 3, N=834
Nothing 471
[0.565] *** *** *** *** *** *** ***
Smokes 73
[0.088] 36
(0.493) *** 7
(0.096) 4
(0.055) 8
(0.110) 7
(0.096) 16
AD 33
[0.040] 5
(0.152) 7
(0.212) *** 14
(0.424) 33
(1.000) 3
(0.091) 15
HD 30
[0.036] 8
(0.267) 4 (0.133) 14
(0.467) *** 30
(1.000) 2
(0.067) 10
[0.059] 25
(0.510) 8 (0.163) 33
(0.673) 29
(0.592) *** 4
(0.082) 19
Obese 68
[0.082] 39
(0.574) 7 (0.103) 3
(0.044) 2
(0.029) 4
(0.059) *** 17
Depression 140
[0.168} 93
(0.664) 16
(0.114) 15
(0.107) 10
(0.071) 19
(0.136) 17
(0.121) ***
Wave 4, N=863
Nothing 477
[0.553] *** *** *** *** *** *** ***
Smokes 82
[0.095] 42
(0.512) *** 9
(0.110) 5
(0.061) 10
(0.122) 10
(0.122) 21
AD 37
[0.043] 7
(0.189) 9
(0.243) *** 17
(0.459) 37
(1.000) 4
(0.108) 15
HD 34
[0.039] 9
(0.265) 5
(0.147) 17
(0.5) *** 34
(1.000) 3
(0.088) 9
[0.063] 25
(0.463) 10
(0.185) 37
(0.685) 33
(0.611) *** 5
(0.093) 19
Obese 70
[0.081] 34
(0.486) 10
(0.143) 4
(0.057) 3
(0.043) 5
(0.071) *** 17
Depression 146
[0.169] 96
(0.656) 21
(0.144) 15
(0.103) 9
(0.062) 19
(0.130) 17
(0.116) ***
Wave 5, N=879
Nothing 483
[0.595] *** *** *** *** *** *** ***
Smokes 129
[0.147] 60
(0.465) *** 15
(0.116) 11
(0.085) 18
(0.14) 15
(0.116) 20
AD 38
[0.043] 8
(0.211) 15
(0.395) *** 18
(0.474) 38
(1.000) 4
(0.105) 10
HD 36
[0.041] 8
(0.222) 11
(0.306) 18
(0.500) *** 36
(1.000) 3
(0.083) 9
[0.064] 30
(0.536) 18
(0.321) 38
(0.679) 36
(0.643) *** 5
(0.089) 15
Obese 67
[0.076] 28
(0.418) 15
(0.224) 4
(0.06) 3
(0.045) 5
(0.075) *** 10
Depression 107
[0.122] 66
(0.617) 20
(0.187) 10
(0.093) 9
(0.084) 15
(0.140) 10
(0.093) ***
Note: Each cell contains the number of individuals diagnosed with the respective row and column
combination. The conditional frequency of dual diagnoses is presented in round parentheses. The marginal
probability of being diagnosed with each outcome is presented in square parentheses.
1 For ADHD nothing else excludes AD and HD.
Table 5: Ordinary Least Squares Estimates of the Achievement Equation
Full Sample Females
Only Males
Only Full Sample Females
Only Males
ADHD -0.198**
(0.094) -0.154*
(0.041) -0.241***
(0.123) N/A
N/A N/A -0.431**
(0.172) -0.350**
(0.157) -0.493**
N/A N/A 0.158*
(0.033) 0.120
(0.071) 0.177***
Depression -0.143*
(0.022) -0.097*
(0.031) -0.191*
(0.047) -0.135*
(0.021) -0.094*
(0.027) -0.180*
Obesity -0.371*
(0.051) -0.529*
(0.074) -0.204*
(0.051) -0.370*
(0.050) -0.533**
(0.065) -0.199*
Smoker in
Home -0.199*
(0.022) -0.135*
(0.034) -0.274*
(0.039) -0.195*
(0.021) -0.135*
(0.034) -0.265*
Age 0.856**
(0.381) 0.754**
(0.367) 0.918
(0.563) 0.856**
(0.357) 0.745**
(0.364) 0.917***
Age Squared -0.027**
(0.012) -0.022***
(0.013) -0.030***
(0.018) -0.027**
(0.012) -0.022***
(0.013) -0.030***
Black -0.313*
(0.034) -0.276*
(0.060) -0.345*
(0.032) -0.318*
(0.033) -0.283*
(0.062) -0.346*
Hispanic -0.266*
(0.034) -0.253***
(0.102) -0.250**
(0.127) -0.255*
(0.084) -0.235**
(0.105) -0.244
Asian 0.095
(0.061) 0.170***
(0.092) -0.053
(0.071) 0.094
(0.062) 0.163***
(0.094) -0.041
Male -0.255*
(0.051) N/A N/A -0.249*
(0.055) N/A N/A
Constant -3.234
(2.950) -2.986
(2.520) -3.636
(4.251) -3.216
(2.713) -2.912
(2.456) -3.587
N 2576 1366 1210 2576 1366 1210
R squared 0.19 0.21 0.14 0.20 0.21 0.16
Note: Corrected standard errors in parentheses. Regressions include school and time period
indicators. *, **, *** denote statistical significance at 1%, 5%, 10% level respectively.
Table 6: Summary Information on the Performance of the Instruments
Full Sample Females
Only Males
Only Full Sample Females
Only Males
First Stage F statistics
ADHD 9.51 8.12 7.25 N/A N/A N/A
AD N/A N/A N/A 13.80 10.25 10.88
HD N/A N/A N/A 9.37 12.83 7.32
Depression 6.95 5.78 6.55 6.95 5.78 6.55
Obesity 7.43 12.55 11.39 7.43 12.55 11.39
Smoking 6.38 9.83 8.81 6.38 9.83 8.81
P-values from Overidentification Tests
ADHD 0.553 0.420 0.236 N/A N/A N/A
AD N/A N/A 0.817 0.842 0982 0.440
HD N/A N/A N/A 0.845 0.812 0.266
Depression 0.773 0.822 0.465 0.773 0.822 0.465
Obesity 0.216 0.232 0.817 0.216 0.232 0.817
Achievement 0.267 0.874 0.421 0.524 0.617 0.293
Note: First stage F statistics is computed from a joint test of significance of the full set of genetic
instruments from individual first stage regressions that also include the full set of control variables
included in the second stage. In each case, the Null is rejected at the 1% level. P-values are
computed from Sargan tests of the joint null hypothesis that the excluded instruments are valid
Table 7: Two Stage Least Squares Estimates of the Achievement Equation
Sample Females
Only Males Only Full
Sample Females
Only Males Only
ADHD 0.017
(0.275) -0.074
(0.331) 0.161
(0.331) N/A
N/A N/A -0.644
(0.487) -1.410**
(0.661) -0.036
N/A N/A 1.297***
(0.718) 1.306***
(0.790) 0.753
Depression -0.574**
(0.238) -1.112*
(0.308) -0.127
(0.246) -0.520**
(0.250) -0.822**
(0.331) -0.237
Obesity -0.288
(0.282) -0.452
(0.275) 0.338
(0.278) -0.634**
(0.294) -0.838**
(0.359) 0.011
Smoker in
Home -0.194*
(0.031) -0.094**
(0.043) -0.306*
(0.047) -0.187*
(0.034) -0.095**
(0.043) -0.296*
Age 0.691
(0.547) 0.378
(0.886) 0.761
(0.799) 0.663
(0.562) 0.611
(0.868) 0.626
Age Squared -0.022
(0.016) -0.011
(0.026) -0.025
(0.024) -0.021
(0.016) -0.018
(0.025) -0.021
Black -0.323*
(0.048) -0.367*
(0.061) -0.342*
(0.078) -0.319*
(0.049) -0.372*
(0.063) -0.321*
Hispanic -0.259*
(0.047) -0.137
(0.094) -0.234*
(0.066) -0.224*
(0.054) -0.021
(0.122) -0.255
Asian 0.128*
(0.039) 0.225*
(0.061) -0.067
(0.062) 0.127*
(0.041) 0.150*
(0.069) -0.018
N 2576 1366 1210 2576 1366 1210
Note: Corrected standard errors in parentheses. Regressions include school and time period
indicators. *, **, *** denote statistical significance at 1%, 5%, 10% level respectively.
Table 8: Two Stage Least Squares Estimates of the Achievement Equation where Years of
Smoking is Treated as Exogenous
Full Sample Females
Only Males
Only Full Sample Females
Only Males
ADHD -0.959**
(0.414) -0.630
(0.447) -0.138
(0.422) N/A
N/A N/A -1.382
(0.560) -3.760**
(1.529) -0.441
N/A N/A 1.101
(0.993) 3.782
(2.488) 1.689
Depression -1.297*
(0.334) -1.251*
(0.429) -0.857**
(0.347) -1.304
(0.298) -0.962
(0.756) -1.456*
Obesity -0.158
(0.408) -0.601***
(0.351) 0.774***
(0.429) -0.912*
(0.353) -2.080***
(1.095) 0.833
Smoker in
Home -0.123*
(0.041) -0.062
(0.048) -0.280**
(0.053) -0.113*
(0.034) -0.045
(0.074) -0.268*
Age 0.787
(0.664) 0.458
(0.969) 0.774
(0.842) 0.734
(0.566) 1.051
(1.350) 0.319
Age Squared -0.024
(0.019) -0.014
(0.028) -0.024
(0.025) -0.023
(0.017) -0.030
(0.039) -0.010
Black -0.389*
(0.061) -0.386*
(0.070) -0.348*
(0.096) -0.371*
(0.049) -0.429**
(0.107) -0.299*
Hispanic -0.228*
(0.059) -0.070
(0.107) -0.270*
(0.076) -0.153*
(0.056) 0.345
(0.287) -0.315*
Asian 0.164*
(0.055) 0.219*
(0.068) -0.025
(0.074) 0.157*
(0.042) 0.010
(0.160) 0.104
Male -0.267*
(0.031) N/A N/A -0.260*
(0.026) N/A N/A
Constant -2.751
(5.686) -0.184
(8.342) -2.816
(7.180) -2.153
(4.835) -5.355
(11.649) 0.860
N 2576 1366 1210 2576 1366 1210
Note: Corrected standard errors in parentheses. Regressions include school and time period
indicators. *, **, *** denote statistical significance at 1%, 5%, 10% level respectively.
Table 9: Two Stage Least Squares Estimates of the Achievement Equation Including A Subset of
Health Outcomes
Include health
behaviors Full Sample
ADHD -0.351
(0.319) -0.319
(0.359) 0.284
AD 1.392***
(0.669) 0.648
(0.633) 0.615
HD -1.966***
(1.183) -1.040
(0.609) 0.237
AD 0.529
(0.304) -0.124
(0.400) 0.766
HD -0.144
(0.517) -0.330
(0.445) 0.972
(0.302) -1.250*
(0.455) -0.032
Obesity -0.331
(0.329) -0.352
(0.235) 1.067
Observations 2576 1366 1210
Note: Corrected standard errors in parentheses. Each cell of the table corresponds to a separate
regression. The dependent variable of the regression differs by row. Columns reflect different
samples. Regressions include the non-health inputs in Table 7, school and time period indicators.
*, **, *** denote statistical significance at 1%, 5%, 10% level respectively.
Appendix Table 1: Ordinary Least Squares Estimates of the Cigarette Smoker Equation
Full Sample Females Only Males Only Full Sample Females Only Males Only
ADHD 0.092
(0.056) 0.139
(0.077) 0.051
(0.061) N/A
N/A N/A 0.155***
(0.060) 0.280*
(0.082) 0.096
N/A N/A 0.020
(0.034) 0.042
(0.098) 0.006
Depression 0.043
(0.029) -0.001
(0.052) 0.096
(0.071) 0.039
(0.029) -0.008
(0.050) 0.093
Obesity 0.019
(0.075) 0.104
(0.136) -0.060
(0.071) 0.019
(0.074) 0.104
(0.132) -0.060
Smoker in
Home 0.121***
(0.063) 0.198***
(0.092) 0.019
(0.052) 0.119***
(0.064) 0.195**
(0.092) 0.017
Age -0.963**
(0.280) -0.610
(0.309) -1.031*
(0.484) -0.971*
(0.281) -0.613***
(0.305) -1.038**
Age Squared 0.030**
(0.009) 0.020
(0.011) 0.032*
(0.015) 0.030*
(0.009) 0.020
(0.011) 0.032**
Black -0.029
(0.060) 0.003
(0.065) -0.074
(0.106) -0.027
(0.061) 0.008
(0.067) -0.073
Hispanic -0.074***
(0.040) -0.002
(0.071) -0.167*
(0.054) -0.079**
(0.038) -0.014
(0.074) -0.169*
Asian -0.071***
(0.029) -0.065
(0.056) -0.083***
(0.037) -0.070***
(0.029) -0.061
(0.057) -0.084***
Male 0.028
(0.032) N/A N/A 0.026
(0.034) N/A N/A
Constant 7.790**
(2.232) 4.695***
(2.060) 8.386
(3.948) 7.847*
(2.253) 4.732***
(2.078) 8.436
N 2576 1366 1210 2576 1366 1210
R squared 0.08 0.11 0.11 0.08 0.12 0.12
Note: Corrected standard errors in parentheses. Regressions include school and time period
indicators. *, **, *** denote statistical significance at 1%, 5%, 10% level respectively.
Appendix Table 2: Summary Statistics on GPA Performance by Health Disorder and Health
Grade 10 Grade 11 Grade 12
Smokers 2.673
(0.661) 2.626
(0.715) 2.847
Non Smokers 3.233
(0.532) 3.202
(0.557) 3.232
T-statistic for Differences in
Mean GPA by Smoking Status 8.388* 8.662* 7.278*
Depression Diagnosis 3.035
(0.617) 3.003
(0.647) 3.025
No depression Diagnosis 3.213
(0.552) 3.177
(0.583) 3.197
T-statistic for Differences in
Mean GPA by Depression Status 3.416* 3.224* 2.921*
Obese 2.830
(0.620) 2.699
(0.729) 2.788
Non Obese (BMI <30) 3.215
(0.552) 3.187
(0.568) 3.208
T-statistic for Differences in
Mean GPA by Obesity Status 5.453* 6.713* 5.883*
ADHD Diagnosis 2.929
(0.694) 2.919
(0.685) 2.919
No ADHD Diagnosis 3.200
(0.555) 3.163
(0.589) 3.193
T-statistic for Differences in
Mean GPA by ADHD Diagnosis 3.263* 2.911* 3.492*
AD Diagnosis 2.714
(0.703) 2.733
(0.718) 2.754
No AD Diagnosis 3.203
(0.553) 3.166
(0.585) 3.195
T-statistic for Differences in
Mean GPA by AD Diagnosis 4.921* 4.357* 4.713*
HD Diagnosis 3.155
(0.527) 3.054
(0.587) 3.047
No HD Diagnosis 3.185
(0.569) 3.151
(0.598) 3.181
T-statistic for Differences in
Mean GPA by HD Diagnosis 0.285 0.937 1.379
Note: Most cells present the mean GPA and standard deviations in parentheses for individuals by
health category. *, **, *** denote statistically significant differences in mean GPA by health
outcome at the 1%, 5%, 10% level respectively.
Appendix Table 3: Two Stage Least Squares Estimates of the Achievement Equation by Subsample
with Alternative Preferred instrument Sets
Females Only Males Only Females Only Males Only
ADHD -0.222
(0.350) 0.255
(0.311) N/A N/A
AD N/A N/A -1.092**
(0.541) -0.036
HD N/A N/A 0.580
(0.421) 0.835
Depression -1.296*
(0.349) -0.207
(0.316) -1.132*
(0.324) -0.199
Obesity -0.385
(0.237) 0.166
(0.311) -0.708*
(0.257) 0.055
Smoker in
Home -0.057
(0.052) -0.246*
(0.048) -0.052
(0.051) -0.237*
Age 0.291
(0.959) 0.634
(0.740) 0.490
(0.924) 0.587
Squared -0.009
(0.028) -0.021
(0.022) -0.015
(0.027) -0.020
Black -0.397*
(0.080) -0.324*
(0.075) -0.383*
(0.077) -0.321*
Hispanic -0.123
(0.085) -0.263*
(0.060) -0.028
(0.101) -0.274*
Asian 0.237*
(0.062) -0.054
(0.068) 0.183*
(0.063) -0.017
N 1366 1210 1366 1210
Note: Corrected standard errors in parentheses. Regressions include school and time period
indicators. *, **, *** denote statistical significance at 1%, 5%, 10% level respectively.
Appendix Table 4: Three Stage Least Squares Estimates of the Achievement Equation
Sample Females
Only Males
Only Full
Sample Females
Only Males
ADHD 0.007
(0.414) -0.107
(0.449) 0.139
(0.464) N/A
N/A N/A -0.451
(0.571) -1.570***
(0.980) -0.075
N/A N/A 0.263
(0.390) 1.451
(1.109) 0.802
Depression -0.603
(0.372) -1.135*
(0.409) -0.112
(0.418) -0.478
(0.387) -0.768***
(0.399) -0.244
Obesity -0.225
(0.408) -0.407
(0.355) 0.228
(0.446) -0.560
(0.403) -0.871***
(0.493) -0.059
Smoker in
Home -0.080**
(0.038) -0.009
(0.131) -0.177
(0.140) -0.062
(0.040) -0.002
(0.048) -0.137*
Age 0.037
(0.358) 0.751
(1.878) -0.585
(1.391) 0.013
(0.358) -0.102
(0.563) -0.040
Age Squared -0.297
(1.045) -0.026
(0.050) -0.145*
(0.051) -0.257
(1.045) 0.244
(1.627) -0.228
Black -0.364*
(0.069) -0.176
(0.111) -0.271*
(0.092) -0.358*
(0.073) -0.392*
(0.102) -0.388*
Hispanic -0.302*
(0.063) 0.223
(0.081) -0.092
(0.102) -0.276*
(0.072) -0.055
(0.162) -0.294*
Asian 0.116***
(0.061) -0.281
(0.647) -0.084
(0.477) 0.108
(0.065) 0.132*
(0.097) -0.036
N 2576 1366 1210 2576 1366 1210
Note: Standard errors in parentheses. Regressions include school and time period indicators.
*, **, *** denote statistical significance at 1%, 5%, 10% level respectively.
Appendix Table 5: Relationship Between Health Behaviors and Health Outcomes During
Adolescence by Gender FEMALES
Behavior Total
Number Nothing
Else Also
Smokes Also AD Also HD Also
Obese Also
Wave 3, N=438
Nothing 231 *** *** *** *** *** ***
Smokes 33 13 *** 4 3 6 7
AD 11 1 4 *** 4 1 7
HD 13 3 3 4 *** 1 6
Obese 34 19 6 1 1 *** 9
Depression 81 59 7 7 6 9 ***
Wave 4, N=453
Nothing 237 *** *** *** *** *** ***
Smokes 35 8 *** 4 3 8 9
AD 13 2 4 *** 4 2 7
HD 15 5 3 4 *** 2 6
Obese 36 17 8 2 2 *** 10
Depression 88 64 9 7 6 10 ***
Wave 5, N=466
Nothing 243 *** *** *** *** *** ***
Smokes 64 30 *** 7 6 10 7
AD 13 3 7 *** 6 2 3
HD 15 4 6 6 *** 2 4
Obese 35 11 10 2 2 *** 5
Depression 56 41 7 3 4 5 ***
Behavior Total
Number Nothing
Else Also
Smokes Also AD Also HD Also
Obese Also
Wave 3, N=389
Nothing 240 *** *** *** *** *** ***
Smokes 39 23 *** 3 1 1 8
AD 22 4 3 *** 10 1 8
HD 16 5 1 10 *** 1 4
Obese 34 22 1 2 1 *** 8
Depression 58 34 8 8 4 8 ***
Wave 4, N=402
Nothing 240 *** *** *** *** *** ***
Smokes 46 27 *** 5 2 2 12
AD 24 5 5 *** 13 2 7
HD 18 4 2 13 *** 1 3
Obese 34 20 2 2 1 *** 7
Depression 58 32 12 7 3 7 ***
Wave 5, N=405
Nothing 240 *** *** *** *** *** ***
Smokes 62 30 *** 8 5 5 10
AD 25 5 8 *** 12 2 7
HD 20 4 5 12 *** 1 5
Obese 32 17 5 2 1 *** 5
Depression 51 25 10 7 5 5 ***
Appendix Table 6: OLS and Two Stage Least Squares Estimates of the Impacts of Cigarette
Smoking on Health Outcomes
Sample Females
Only Males
Only Full
Sample Females
Only Males
Two Stage Least Squares
AD N/A N/A N/A 0.006
(0.046) 0.009
(0.051) -0.009
HD N/A N/A N/A -0.050
(0.046) 0.095
(0.058) -0.111*
ADHD -0.092
(0.060) 0.092
(0.086) -0.126*
(0.063) N/A N/A N/A
Depressed 0.036
(0.088) -0.126
(0.129) 0.052
(0.076) 0.028
(0.086) -0.096
(0.126) 0.044
Obese -0.097
(0.079) -0.126
(0.129) -0.073
(0.077) 0.046
(0.070) -0.053
(0.090) -0.043
AD N/A N/A N/A 0.031**
(0.008) 0.042**
(0.010) 0.025
HD N/A N/A N/A 0.014
(0.008) 0.024*
(0.011) 0.006
ADHD 0.023*
(0.010) 0.034
(0.013) 0.014
(0.015) N/A N/A N/A
Depressed 0.029
(0.015) 0.003
(0.023) 0.046*
(0.019) 0.029
(0.015) 0.003
(0.023) 0.046*
Obese 0.007
(0.011) 0.032*
(0.016) -0.020
(0.016) 0.009
(0.011) 0.032*
(0.016) -0.019
Note: Corrected standard errors in parentheses. Each cell contains information on the impact of
smoking on a health outcome from a regression that also controls for all the factors listed in Table
7, genetic markers, school and time period indicators. *, **, *** denote significance at 1%, 5%,
10% level respectively.
... Due to the empirical challenges involved in assessing causality in the relationship, there is less agreement on what the precise mechanisms are that drive this correlation. As fi rst highlighted by Grossman (1973) and more recently by other authors including Cutler and Lleras-Muney (2006); Ding et al. (2006); Gan and Gong (2007), health and education may interact in three not mutually exclusive ways: ...
... These health conditions and risk factors tend to coincide with and affect each other, as indicated by the double-headed arrow connecting the left-hand boxes. One study illustrated this clearly: studying adolescents in the United States, Ding et al. (2006) found striking differences in the estimated impacts of depression and obesity when examining a single health state in isolation. That research also concluded that individuals with health disorders such as obesity or depression were signifi cantly more likely to smoke. ...
... Mediating factors include all those aspects determined by health that in turn can have an impact on educational outcomes e.g. (Ding et al., 2006): • cognitive and learning skills development • treatment received by children in the classroom in connection with their health condition(s) • discrimination by peers • self-esteem • students' physical energy. ...
Full-text available
This paper examines the causal link between parenting style and children’s educational outcomes. The existing literature seems to lack any effort to use a nationally representative data from the United States, to properly address endogeneity, or to examine educational outcomes beyond high school level. This paper attempts to mitigate these shortcomings. Drawing upon the National Longitudinal Survey of Youth 1997, it first used OLS and logit regression. It then applied the maximum simulated likelihood approach to get rid of endogeneity, thereby isolating the causal impact of parenting style on children’s educational outcomes. Findings suggested that parenting style mattered for children academic performance. Authoritative parenting style was found to be the best among all types of parenting style. Particularly, relative to uninvolved parents’ children, authoritatively reared children were predicted to have 1.1 more years of schooling and be 18.5, 13.6, and 16.3 percentage points more likely to obtain at least bachelor’s degree, associate’s degree, and high school diploma, respectively. Also, they had 5.5 percentage points less likelihood of being high school dropouts than children reared by uninvolved parents.
... The impacts of adolescent depression on educational outcomes are more significant in females. Ding et al. (2006) utilize a novel method of incorporating DNA information to control for individual hereditary factors that may affect both education attainment and wages to estimate the relationship between educational outcomes and adolescent depression. The study finds that females with adolescent depression on average have a GPA lower than their nondepressed counterparts, and lower than male adolescents with similar levels of depression. ...
... Information on education and the presence and severity of depressive symptoms during adolescence are collected from 531 randomized individuals who are currently (age 30) undergoing outpatient treatments for chronic depression, while the wage information is obtained from the 1995 US Census Bureau's Current Population survey. Similar to Fletcher (2010) and Ding et al. (2006), they find that only females who suffered from adolescent depression display significant reduction in educational attainment. Then using education, age and gender as matching variables between the two data sources, the study predicts a wage penalty from 12% at age 21 and progressively peaked to 18% at age 55 for a female who has experienced depression as adolescence. ...
Full-text available
It is well recognized that a depressive mental state can persist for a long time, and this can adversely impact labour market outcomes. The aim of this article is to examine the direct association between depression status in late-teenage years and adult wages, as well as the indirect association, operating through accumulated education, experience and occupation choice. Using the National Longitudinal Survey of Youth 1997 data, we find adolescent depression is associated with a wage penalty of around 10–15%, but its mechanics are very different for males and females. For males, about three quarters of the wage penalty is through the direct channel, whilst for females the indirect effect channel is dominant. The indirect channel is driven by lower accumulated education, mostly because depression discourages further study post high school. These results are important because they imply that the association between adolescent depression and wages is stronger than has been estimated in previous cross-sectional studies.
... references are:Norton et al. (1998); Salganik et al. (2008);Aral et al. (2009);Hogan and Lancaster (2004);Land and Deane (1992);Ding et al. (2006);Fletcher and Lehrer (2009); Langbehn et al. (2004); Visal-Puig et al. (1996);Greene (2003);Hardin and Carroll (2003);Hogan et al. (2004); and Noel and Nyhan (2011).weighting for causal inference from longitudinal observational studies. ...
... Specifically, we use genotypes as instruments for phenotypes and behaviors. Two other economic paper use genetic information as instrumental variables to study how health affects education (Ding et al., 2006; Fletcher and Lehrer, 2008). Certain genes are known to be related to obesity and substance use. ...