ArticlePDF Available

Who Becomes an Inventor in America? The Importance of Exposure to Innovation*

Authors:

Abstract and Figures

We characterize the factors that determine who becomes an inventor in the United States, focusing on the role of inventive ability (“nature”) versus environment (“nurture”). Using deidentified data on 1.2 million inventors from patent records linked to tax records, we first show that children’s chances of becoming inventors vary sharply with characteristics at birth, such as their race, gender, and parents’ socioeconomic class. For example, children from high-income (top 1%) families are 10 times as likely to become inventors as those from below-median income families. These gaps persist even among children with similar math test scores in early childhood—which are highly predictive of innovation rates—suggesting that the gaps may be driven by differences in environment rather than abilities to innovate. We directly establish the importance of environment by showing that exposure to innovation during childhood has significant causal effects on children’s propensities to invent. Children whose families move to a high-innovation area when they are young are more likely to become inventors. These exposure effects are technology class and gender specific. Children who grow up in a neighborhood or family with a high innovation rate in a specific technology class are more likely to patent in exactly the same class. Girls are more likely to invent in a particular class if they grow up in an area with more women (but not men) who invent in that class. These gender- and technology class–specific exposure effects are more likely to be driven by narrow mechanisms, such as role-model or network effects, than factors that only affect general human capital accumulation, such as the quality of schools. Consistent with the importance of exposure effects in career selection, women and disadvantaged youth are as underrepresented among high-impact inventors as they are among inventors as a whole. These findings suggest that there are many “lost Einsteins”—individuals who would have had highly impactful inventions had they been exposed to innovation in childhood—especially among women, minorities, and children from low-income families.
Content may be subject to copyright.
WHO BECOMES AN INVENTOR IN AMERICA?
THE IMPORTANCE OF EXPOSURE TO INNOVATION
ALEX BELL
RAJ CHETTY
XAVI ER JARAVEL
NEVIANA PETKOVA
JOHN VAN REENEN
We characterize the factors that determine who becomes an inventor in the
United States, focusing on the role of inventive ability (“nature”) versus environ-
ment (“nurture”). Using deidentified data on 1.2 million inventors from patent
records linked to tax records, we first show that children’s chances of becoming
inventors vary sharply with characteristics at birth, such as their race, gender,
and parents’ socioeconomic class. For example, children from high-income (top
1%) families are 10 times as likely to become inventors as those from below-
median income families. These gaps persist even among children with similar
math test scores in early childhood—which are highly predictive of innovation
rates—suggesting that the gaps may be driven by differences in environment
rather than abilities to innovate. We directly establish the importance of envi-
ronment by showing that exposure to innovation during childhood has signifi-
cant causal effects on children’s propensities to invent. Children whose families
move to a high-innovation area when they are young are more likely to become
inventors. These exposure effects are technology class and gender specific. Chil-
dren who grow up in a neighborhood or family with a high innovation rate in a
specific technology class are more likely to patent in exactly the same class. Girls
A preliminary draft of this article was previously circulated under the title
“The Lifecycle of Inventors.” The opinions expressed herein are those of the authors
alone and do not necessarily reflect the views of the Internal Revenue Service, U.S.
Department of the Treasury, or the National Institutes of Health. We particularly
thank Philippe Aghion, with whom we started thinking about these issues, for
inspiration and many insightful comments. We also thank Daron Acemoglu, Ufuk
Akcigit, Olivier Blanchard, Erik Hurst, Danny Kahnemann, Pete Klenow, Henrik
Kleven, Richard Layard, Eddie Lazear, Josh Lerner, Alex Olssen, Jim Poterba,
Scott Stern, Otto Toivanen, Heidi Williams, and numerous seminar participants
for helpful comments and discussions. Trevor Bakker, Augustin Bergeron, Mike
Droste, Jamie Fogel, Nikolaus Hildenbrand, Alexandre Jenni, Benjamin Scuderi,
and other members of the Opportunity Insights research team provided outstand-
ing research assistance. This research was funded by the National Science Foun-
dation, the National Institute on Aging Grant T32AG000186, Harvard University,
the European Research Council, the Economic and Social Research Council at CEP,
the Washington Center for Equitable Growth, the Kauffman Foundation, the Bill
and Melinda Gates Foundation, and the Robert Wood Johnson Foundation.
C
The Author(s) 2019. Published by Oxford University Press on behalf of the Presi-
dent and Fellows of Harvard College. All rights reserved. For Permissions, please email:
journals.permissions@oup.com
The Quarterly Journal of Economics (2019), 647–713. doi:10.1093/qje/qjy028.
Advance Access publication on November 29, 2018.
647
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
648 THE QUARTERLY JOURNAL OF ECONOMICS
are more likely to invent in a particular class if they grow up in an area with
more women (but not men) who invent in that class. These gender- and technology
class–specific exposure effects are more likely to be driven by narrow mechanisms,
such as role-model or network effects, than factors that only affect general human
capital accumulation, such as the quality of schools. Consistent with the impor-
tance of exposure effects in career selection, women and disadvantaged youth are
as underrepresented among high-impact inventors as they are among inventors as
a whole. These findings suggest that there are many “lost Einsteins”—individuals
who would have had highly impactful inventions had they been exposed to in-
novation in childhood—especially among women, minorities, and children from
low-income families. JEL Codes: O3, I2, R3.
I. INTRODUCTION
Innovation is widely viewed as a central driver of economic
growth (e.g., Romer 1990;Aghion and Howitt 1992). As a result,
many countries use a wide variety of policies to spur innovation,
ranging from tax incentives to investments in education. Most
existing work analyzing the effectiveness of such policies has ex-
amined their effects on the rate of innovation at the firm, industry,
or macroeconomic level (e.g., Becker 2015). In this article, we take
a different approach, focusing on the individuals who become in-
ventors. By analyzing the factors that determine who becomes
an inventor, we identify new approaches to increasing rates of
innovation, especially among subgroups that are currently under-
represented in the innovation sector.1
Although there is a growing body of work studying the
backgrounds of inventors using historical data from the United
States and contemporary data from Scandinavian countries (e.g.,
Khan and Sokoloff 1993;Akcigit, Grigsby, and Nicholas 2017;
Aghion et al. 2017), relatively little is known about the individu-
als who become inventors in the modern era in the United States.
This is because most sources of data on innovation (e.g., patent
records) do not record even basic demographic information, such
as an inventor’s age or gender.
1. For example, it is important to understand whether the “extensive margin”
decision to become an inventor is driven primarily by financial incentives or by
nonfinancial factors, such as the environmental “exposure effects” we investigate
below. More broadly, studying who becomes an inventor also sheds light on the link
between inequality and innovation and the mechanisms that drive career choice.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 649
We present a comprehensive portrait of inventors in the
United States today by linking patent records to income tax
records. Following standard practice in prior work on innovation,
we define an “inventor” as an individual who holds a patent.2We
link data on the universe of patent applications and grants in
the United States between 1996 and 2014 to federal income tax
returns to construct a panel data set covering 1.2 million inven-
tors (patent applicants or recipients). Using this new data set, we
track inventors’ lives from birth to adulthood to identify factors
that determine who becomes an inventor, focusing on the role of
inventive ability (“nature”) versus environment (“nurture”).3We
organize our analysis into three parts.
In the first part of the article, we show that children’s charac-
teristics at birth—their socioeconomic class, race, and gender—are
highly predictive of their propensity to become inventors. Chil-
dren born to parents in the top 1% of the income distribution are
10 times as likely to become inventors as those born to families
with below-median income.4Whites are more than three times
as likely to become inventors as are blacks. And 82% of 40-year-
old inventors today are men. This gender gap in innovation is
shrinking gradually, but at the current rate of convergence, it will
take another 118 years to reach gender parity.
Why do rates of innovation vary so sharply based on charac-
teristics at birth? One potential explanation is that the differences
stem from inherited differences in talents or preferences to pursue
2. The use of patents as a proxy for innovation has well-known limitations
(e.g., Griliches 1990;OECD 2009). In particular, not all innovations are patented
and not all patents are meaningful innovations. We address these measurement
issues by showing that (i) our results hold if we focus on highly cited (i.e., high-
impact) patents and (ii) the mechanisms that lead to the differences in rates of
patenting across subgroups that we document are unlikely to be affected by these
concerns.
3. There is no sharp dichotomy between nature and nurture because behavior
is likely determined by an interaction between the two factors, as emphasized, for
example, in the literature on epigenetics. We therefore focus not on decomposing
the relative importance of these two factors but on investigating whether and how
environmental factors influence rates of innovation.
4. This pattern is not unique to innovation: children from high-income families
are also substantially more likely to enter other high-skilled professional occupa-
tions and, more generally, reach the upper tail of the income distribution. We focus
on innovation here because it is thought to have particularly large positive social
spillovers and because it has methodological advantages in understanding the
mechanisms underlying career choice, as we discuss below.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
650 THE QUARTERLY JOURNAL OF ECONOMICS
innovation as a career. An alternative explanation is that children
from different backgrounds grow up in different environments and
therefore end up pursuing different careers.
As a first step toward evaluating whether differences in in-
herited abilities can explain gaps in innovation, we use math test
scores in early childhood as an (imperfect) proxy for innovative
potential. We obtain data on test scores from 3rd to 8th grade
by linking school district records for 2.5 million children who at-
tended New York City public schools to the patent and tax records.
Math test scores in 3rd grade are highly predictive of patent rates
but account for less than one-third of the gap in innovation be-
tween children from high- versus low-income families.5This is
because children from lower-income families are much less likely
to become inventors even conditional on having test scores at the
top of their 3rd-grade class. Differences in 3rd-grade math scores
explain a small share of the gap in innovation by race and virtually
none of the gap in innovation by gender. The gap in innovation ex-
plained by test scores grows in later grades, consistent with prior
evidence that test score gaps widen as children progress through
school (e.g., Fryer and Levitt 2004;Fryer 2011). Half of the gap
in innovation by parent income can be predicted by differences in
math test scores in 8th grade. Furthermore, gaps in innovation
by parental income are relatively small conditional on the college
a child attends. These results suggest that low-income children
start out on a relatively even footing with their higher-income
peers in terms of innovation ability, but fall behind over time,
perhaps because of differences in their childhood environment.
However, they do not provide conclusive evidence about the role
of environment because test scores are an imperfect measure of
inventive ability. If a child’s ability to innovate is poorly captured
by standardized tests, particularly at early ages, ability could still
account for a substantial share of gaps in innovation.6Moreover,
this analysis leaves open the possibility that differences in inher-
ited preferences explain gaps in innovation.
5. Although test scores in English are highly predictive of propensities to
invent unconditionally, they have no predictive power conditional on test scores
in math. This suggests that tests in early childhood are diagnostic of the specific
skills that matter for innovation.
6. On the other hand, since children from different socioeconomic backgrounds
are exposed to different environments even before they enter school, these calcu-
lations could overstate the portion of the gap in innovation due to differences in
inventive ability.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 651
In the second part of the article, we study the effects of child-
hood environment directly to address these issues. We show that
exposure to innovation during childhood through one’s family or
neighborhood has a significant causal effect on a child’s propensity
to become an inventor.7We establish this result—which we view
as the central empirical result of the paper—in a series of steps.
We first show that children who grow up in commuting zones
(CZs) with higher patent rates are significantly more likely to be-
come inventors, even conditional on the CZ in which they work in
adulthood. We show that this pattern holds not just for whether
a child innovates but also for the technology category in which
he or she innovates. For example, among people living in Boston,
those who grew up in Silicon Valley are especially likely to patent
in computers, while those who grew up in Minneapolis—which
has many medical device manufacturers—are especially likely to
patent in medical devices. We find similar patterns at the family
level: children whose parents or parents’ colleagues hold patents
in a technology class are more likely to patent in exactly that field
themselves.
These patterns of transmission hold even across the 445 nar-
rowly defined technology subclasses into which patents can be
classified. For example, a child whose parents hold a patent in am-
plifiers is much more likely to patent in amplifiers than in anten-
nas. Moreover, the patterns are gender specific: women are much
more likely to patent in a specific technology class if female work-
ers in their childhood CZ were especially likely to patent in that
class. Conditional on women’s patent rates, men’s patent rates
have no predictive power for women’s innovation. Conversely,
men’s innovation rates are influenced by male rather than female
inventors in their area.
Under the assumption that differences in genetic abilities do
not generate differences in propensities to innovate across narrow
technology classes in a gender-specific manner, this set of results
on patenting by technology class implies that exposure to innova-
tion during childhood has a causal effect on the type of innovation
one pursues. Intuitively, as long as genetics do not govern one’s
7. We use the phrase “exposure to innovation” to mean having contact with
someone in the innovation sector, for example, through one’s family or neighbors.
We do not distinguish between the mechanisms through which such exposure
matters, which could range from specific human capital accumulation to changes
in aspirations.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
652 THE QUARTERLY JOURNAL OF ECONOMICS
ability to invent an amplifier rather than an antenna in a gender-
specific manner, the close alignment between the subfield in which
children innovate and the type of innovation they were exposed to
in their families or neighborhoods must be driven by causal expo-
sure effects. Formally, the sharp variation in rates of innovation
across technology classes and gender subgroups provides a set of
overidentifying restrictions that allow us to distinguish exposure
effects from plausible models of selection in observational data.
The technology class–level results show that exposure affects
the type of innovation one pursues, but do not necessarily imply
that exposure affects whether one chooses to become an inventor
to begin with. To test whether exposure affects the level of inno-
vation, we study the outcomes of children whose families move
across CZs, exploiting variation in the timing of moves between
areas as in Chetty and Hendren (2018). We find that children
who move to areas with higher rates of innovation (among adults)
earlier in their childhood are more likely to become inventors
themselves. Under the identifying assumption that unobservable
determinants of children’s outcomes in adulthood are uncorre-
lated with the age at which they move to a different area—an
assumption validated by Chetty and Hendren (2018)—this result
implies that neighborhoods have causal effects on the total level
of innovation. The estimates imply that approximately 75% of the
observational correlation between children’s propensity to become
inventors and patent rates among adults in their CZ is driven by
causal effects of environment. It follows that moving a child from
a CZ that is at the 25th percentile of the distribution in terms of
the fraction of adult inventors (e.g., New Orleans, LA) to the 75th
percentile (e.g., Austin, TX) would increase his or her probability
of becoming an inventor by 37%.
The exposure effects we document here are consistent with
recent evidence documenting neighborhood exposure effects on
earnings, college attendance, and other outcomes (Chetty, Hen-
dren, and Katz 2016). Such neighborhood effects have typically
been attributed to factors that affect general human capital
accumulation, such as the quality of local schools or residential
segregation. Our findings show that at least in the context of
innovation, such mechanisms are unlikely to be the sole reason
that childhood environment matters, because it is implausible
that some neighborhoods prepare children to innovate in one
particular technology class, such as amplifiers. Rather, they point
to mechanisms such as transmission of specific human capital,
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 653
mentoring, or networks (e.g., through internships) that lead chil-
dren to pursue certain career paths. Children from low-income
families, minorities, and women are less likely to have such
exposure through their families and neighborhoods, which helps
explain why they have significantly lower rates of innovation
overall. For example, our estimates imply that if girls were as
exposed to female inventors as boys are to male inventors in
their childhood CZs, the current gender gap in innovation would
shrink by half.
In the final section of the article, we briefly examine inventors’
career trajectories, focusing on how the returns to innovation vary
across subgroups to learn about which types of individuals appear
to be screened out of innovation.8We find that inventors from un-
derrepresented groups (women, minorities, and those from low-
income families) have very similar earnings and citations to other
inventors on average. Put differently, women and disadvantaged
youth are just as underrepresented among high-impact inventors
as they are among inventors as a whole. This result is consistent
with our finding that exposure is a central determinant of innova-
tion. A lack of exposure may prevent some individuals (“lost Ein-
steins”) from pursuing a career in innovation even though they
would have had highly effective innovations had they done so.
Hence, drawing more children from underrepresented groups into
careers in innovation can have substantial impacts not only on the
total number of inventors but also on the number of high-impact,
high-return inventions.
We conclude that increasing exposure to innovation among
children who (i) excel in math and science at early ages and (ii)
are from underrepresented groups can have large effects on aggre-
gate innovation. Indeed, we estimate that if women, minorities,
and children from lower-income families were to invent at the
same rate as white men from high-income (top-quintile) families,
the total number of inventors in the economy would quadruple.
We caution, however, that this finding does not necessarily imply
that aggregate welfare would be higher if these individuals were
to enter innovation, as they might currently be pursuing other
careers that also have substantial social returns.
Although our analysis demonstrates the importance of child-
hood exposure to innovation, it does not provide direct guidance
8. We present a more comprehensive analysis of inventors’ labor market ca-
reers in our companion paper (Bell et al. 2019).
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
654 THE QUARTERLY JOURNAL OF ECONOMICS
on specific policies to increase exposure to innovation. To facilitate
future work evaluating such policies, we construct a set of pub-
licly available data tables that provide statistics on patent rates
and citations by technology category, parent income group, gender,
age, CZ, and college. In addition, we report statistics on inventors’
income distributions by year and citations. These statistics can be
used to study a variety of issues, ranging from the effects of local
economic conditions and policies on rates and types of innovation
to how the returns to innovation have changed over time.
I.A. Related Literature
Our results build on and contribute to several literatures.
First, our results relate to the literature on career choice (e.g.,
Topel and Ward 1992;Hall 2002). Some studies in this literature
have used data on specific occupations—such as medicine and
law—to show that children are particularly likely to pursue
their parents’ occupations (e.g., Laband and Lentz 1983;Lentz
and Laband 1989), but they have not separated causal expo-
sure effects from selection effects as we do here. Although the
mechanisms we document may apply to other careers as well,
we focus on innovation because of its importance for economic
growth (e.g., Jones and Williams 1999;Bloom, Schankerman, and
Van Reenen 2013).
Second, our results relate to the literature on the misallo-
cation of talent across occupations (e.g., Murphy, Shleifer, and
Vishny 1991;Hsieh et al. 2016). Our analysis does not directly
show that talent is misallocated, but our finding that the alloca-
tion of talent to innovation is driven partly by differences in expo-
sure rather than inherited abilities is consistent with the premise
of this literature. Indeed, our results raise the possibility that the
welfare costs of distortions in the allocation of talent may be even
greater than predicted by models such as Hsieh et al. (2016),since
some of the individuals who fail to pursue innovation due to a lack
of exposure could have had high-impact patents. More broadly,
our findings suggest that improving opportunities for children
from low-income or minority backgrounds (e.g., Heckman 2006;
Card and Giulano 2014) could increase not just their own earnings
but also economic growth by improving the allocation of talent.
Third, our study contributes to the nascent literature on the
origins of inventors that sheds light on the “supply” of innova-
tion (Goolsbee 1998;Romer 2000). For example, Aghion et al.’s
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 655
(2017) study of inventors in Finland documents gaps in inno-
vation by parental background consistent with our results and
characterizes the predictive power of other factors that we do not
observe in our data, such as IQ and parental education.9Our study
also contributes to a related literature on the determinants of en-
trepreneurship that analyzes the role of ability (Nicolaou et al.
2008;Shane and Nicolaou 2013) and peer effects (Giannetti and
Simonov 2009;Nanda and Sørensen 2010). Our analysis comple-
ments these studies by (i) identifying different factors that affect
career choice, most important, the causal effect of childhood expo-
sure; and (ii) presenting comprehensive data and publicly avail-
able statistics on inventors’ origins and careers in the United
States.
The article is organized as follows. Section II describes the
data. Section III presents the results on inventors’ characteristics
at birth. Section IV analyzes the role of childhood environ-
ments. Section V presents results on inventors’ career trajectories.
Section VI concludes. Data tables on patent rates by subgroup can
be downloaded from the Equality of Opportunity Project website.
II. DATA
In this section, we describe our data sources, define the sam-
ples and key variables used in our analysis, and present summary
statistics.
II.A. Data Sources
1. Patent Records. We obtain information on patents from
two sources. First, we use information on patent grants from a
database hosted by Google, which contains the full text of all
patents granted in the United States from 1976 to the present.
We focus on the 1.7 million patents granted between 1996 and
2014 to United States residents. Second, we use data on 1.6 million
patent applications between 2001 and 2012 provided by Strumsky
(2014).10
9. Other recent studies in a similar vein include Giuri et al. (2007);Nicholas
(2010);Azoulay, Graff Zivin, and Manso (2011);Toivanen and Vaananen (2012);
Dorner, Harhoff, and Hoisl (2014);Jung and Ejermo (2014),andLindquist, Sol,
and Van Praag (2015). A forerunner of this recent work was a classic study by
Schmookler (1957) of 57 American inventors.
10. In 2001, the United States began publishing patent applications (and not
just patent grants) 18 months after filing. For a fee, applicants can choose to have
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
656 THE QUARTERLY JOURNAL OF ECONOMICS
We define an individual as an inventor if he or she is listed as
an inventor on a patent application between 2001–2012 or grant
between 1996–2014; for simplicity, we refer to this outcome as
“inventing by 2014” below. Importantly, we include all individuals
listed as inventors, not just those assigned intellectual property
rights. In particular, inventors employed by companies are listed
as inventors, while their company is typically listed as the as-
signee. In addition to inventors’ names, we also extract informa-
tion on inventors’ geographic location (city and state) when they
filed the patent and the three-digit technology class to which the
patent belongs, as assigned by the U.S. Patent and Trademark
Office (USPTO). We classify patents into technology categories
using the classification developed in the National Bureau of Eco-
nomic Research (NBER) Patent Data Project by Hall, Jaffe, and
Trajtenberg (2001). We assign each inventor in our data a single
technology class based on the class in which he or she has the most
patents, breaking ties randomly. We obtain data on the number of
times each granted patent was cited from its issuance date until
2014 from the USPTO’s full-text issuance files.
2. Tax Records. We use federal income tax records spanning
1996–2012 to obtain information such as an individual’s gender
and age, geographic location, and own and parental income. The
tax records cover all individuals who appear in the Death Mas-
ter file produced by the Social Security Administration, which
includes all persons in the United States with a Social Security
Number or Individual Taxpayer Identification Number (ITIN).
The data include both income tax returns (1040 forms) and third-
party information returns (e.g., W-2 forms), which give us infor-
mation on the earnings of those who do not file tax returns.
The patent data were linked to the tax data using an in-
ventor’s name, city, and state. In the tax data, these fields were
obtained from the Death Master file, 1040 forms, and third-party
information returns (see the Online Appendix for a complete de-
scription of the matching procedure). Eighty-eight percent of indi-
viduals who applied for or were granted a patent were successfully
their filing kept secret; 15% of applicants choose to do so. To ensure that this
missing data problem does not generate selection bias, we verify that the results
we report below are all robust to defining inventors purely using patent grants
rather than applications.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 657
linked, with higher match rates in more recent years because in-
formation returns are unavailable prior to 1999.
We evaluate the quality of our matching algorithm by using
external data on ages for a subset of inventors from Jones (2010).
The age of the inventor recorded in the Death Master file matches
the age reported in Jones’s data set in virtually all cases, con-
firming that our algorithm generates virtually no false matches.
The 12% of inventors who are not matched are individuals with
common names that are difficult to link to unique records (e.g.,
“John Smith”), individuals with spelling errors in their names or
addresses, or individuals who listed different addresses on their
patent applications and tax forms. The observable characteristics
(in the patent data) of unmatched inventors are very similar to
those of matched inventors, suggesting that the individuals we
match are representative of inventors in the United States.
3. New York City School District Records. We use data from
the New York City (NYC) school district to obtain information on
test scores in childhood for the subset of individuals who attended
New York City public schools. These data span the school years
1988–1989 through 2008–2009 and cover roughly 2.5 million chil-
dren in grades 3–8. Test scores are available for English language
arts and math for students in grades 3–8 in every year from the
spring of 1989 to 2009, with the exception of 7th-grade English
scores in 2002. These data were linked to the tax data by Chetty,
Friedman, and Rockoff (2014) with an 89% match rate, and we
use their linked data directly in our analysis.
After these three databases were linked, the data were dei-
dentified (i.e., individual identifiers were removed), and the anal-
ysis was conducted using the deidentified data set.
II.B. Sample Definitions
We use three different samples in our empirical analysis: full
inventors, intergenerational, and NYC schools.
1. Full Inventors Sample. Our first analysis sample consists
of all inventors (individuals with patent grants or applications)
who were successfully linked to the tax data. There are approx-
imately 1.2 million individuals in this sample. This sample is
structured as a panel from 1996 to 2012, with data in each year
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
658 THE QUARTERLY JOURNAL OF ECONOMICS
on individuals’ incomes, patents, and other variables. We use this
sample to analyze inventors’ labor market careers in Section V.
2. Intergenerational Sample. Much of our empirical analy-
sis compares inventors to noninventors in terms of characteristics
at birth (Section III) and childhood environment (Section IV). To
measure conditions at birth and childhood location, we must link
individuals to their parents. To do so, we use the sample con-
structed by Chetty et al. (2014) to study intergenerational mobil-
ity, focusing on all children in the tax data who (i) were born in
the 1980–1984 birth cohorts, (ii) can be linked to parents, and (iii)
were U.S. citizens as of 2013. Chetty et al. (2014, appendix A) de-
scribe how this intergenerational sample is constructed starting
from the raw tax data; here, we briefly summarize its key features.
We define a child’s parents as the first tax filers between 1996
and 2012 to claim the child as a dependent and were between the
ages of 15 and 40 when the child was born. Since children begin
to leave the household after age 16, the earliest birth cohort that
we can reliably link to parents is the 1980 birth cohort (who are
16 in 1996, when our data begin). Children are assigned parent(s)
based on the first tax return on which they are claimed, regardless
of subsequent changes in the parents’ marital status or dependent
claiming. Although parents who never file a tax return cannot be
linked to children, we still identify parents for more than 90%
of children, as the vast majority of children are claimed at some
point because of the tax benefits of claiming children. We restrict
the sample to children who are citizens in 2013 to exclude indi-
viduals who are likely to have immigrated to the United States as
adults, for whom we cannot measure parent income. We cannot di-
rectly restrict the sample to individuals born in the United States
because the database only records current citizenship status.11
Because few individuals patent in or before their early twen-
ties, we focus on individuals in the 1980–1984 birth cohorts, who
are between the ages of 28–32 in 2012, the last year of our data.
There are 16.4 million individuals in our primary intergenera-
tional analysis sample, of whom 34,973 are inventors. To assess
11. In addition, we limit the sample to parents with positive income (excluding
1.5% of children) because parents who file a tax return—as is required to link them
to a child—yet have zero income are unlikely to be representative of individuals
with zero income, while those with negative income typically have large capital
losses, which are a proxy for having significant wealth.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 659
whether our results are biased by focusing on innovation at rela-
tively early ages (by age 32), we also examine a set of older cohorts
using data from Statistics of Income (SOI) cross sections, which
provide 0.1% stratified random samples of tax returns prior to
1996. The SOI cross sections provide identifiers for dependents
claimed on tax forms starting in 1987, allowing us to link parents
to children back to the 1971 birth cohort (Chetty et al. 2014, ap-
pendix A). There are approximately 11,000 individuals, of whom
131 are inventors, in the 1971–1972 birth cohorts in the SOI sam-
ple that we use to study innovation rates up to age 40.
3. NYC Schools Sample. When analyzing whether test
scores explain differences in rates of innovation (Section III), we
focus on the sample of children in the NYC public schools data
linked to the tax data. We also use this sample when analyzing
differences in innovation rates by race and ethnicity, as race and
ethnicity are only observed in the school district data. We focus
on children in the 1979–1985 birth cohorts for the test score anal-
ysis because the earliest birth cohort observed in the NYC data
is 1979. As in Chetty, Friedman, and Rockoff (2014), we exclude
students who are in classrooms where more than 25% of students
are receiving special education services and students receiving
instruction at home or in a hospital. There are approximately
430,000 children in our NYC schools analysis sample, of whom
452 are inventors.
II.C. Variable Definitions and Summary Statistics
In this subsection, we define the key variables we use in our
analysis and present summary statistics. We measure all mon-
etary variables in 2012 dollars, adjusting for inflation using the
consumer price index (CPI-U).
1. Income. We use two concepts to measure individuals’ in-
comes: wage earnings and total income. Wage earnings are total
earnings reported on a person’s W-2 forms. Total (individual) in-
come is wage earnings and self-employment income and capital
income. Total income is defined for tax filers as Adjusted Gross
Income (as reported on the 1040 tax return) plus tax-exempt in-
terest income and the nontaxable portion of Social Security and
Disability benefits minus the spouse’s W-2 wage earnings (for mar-
ried filers). For nonfilers, total income is defined as wage earnings.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
660 THE QUARTERLY JOURNAL OF ECONOMICS
Individuals who do not file a tax return and who have no W-2 forms
are assigned an income of zero.12 Because the database does not
record W-2 and other information returns prior to 1999, we cannot
reliably measure individual earnings before that year, and there-
fore measure individuals’ incomes only starting in 1999. Income
is measured prior to the deduction of individual income taxes and
employee-level payroll taxes.
2. Parents’ Incomes. Following Chetty et al. (2014), we mea-
sure parent income as total pretax income at the household level.
In years where a parent files a tax return, we define family income
as Adjusted Gross Income (as reported on the 1040 tax return) plus
tax-exempt interest income and the nontaxable portion of Social
Security and Disability benefits. In years where a parent does not
file a tax return, we define family income as the sum of wage earn-
ings (reported on form W-2), unemployment benefits (reported on
form 1099-G), and gross Social Security and Disability benefits
(reported on form SSA-1099) for both parents.13 In years where
parents have no tax return and no information returns, family
income is coded as zero. As in Chetty et al. (2014), we average
parents’ family income over the five years from 1996 to 2000 to
obtain a proxy for parent lifetime income that is less affected by
transitory fluctuations. We use the earliest years in our sample to
best reflect the economic resources of parents while the children
in our sample are growing up.
3. Geographic Location. In each year, individuals are
assigned ZIP codes of residence based on the ZIP code from
which they filed their tax return. If an individual does not file
in a given year, we search W-2 forms for a payee ZIP code in
that year. Nonfilers with no information returns are assigned
missing ZIP codes. We map ZIP codes to counties and CZs using
the crosswalks and methods described in Chetty et al. (2014,
12. Importantly, these observations are true zeros rather than missing data.
Because the database covers all tax records, we know that these individuals have
no taxable income.
13. Because we do not have W-2s prior to 1999, parent income is coded as 0
prior to 1999 for nonfilers. Assigning nonfiling parents 0 income has little impact
on our estimates because only 3.1% of parents in the full analysis sample do not
file in each year prior to 1999 and most nonfilers have very low W-2 income (Chetty
et al. 2014). For instance, in 2000, the median W-2 income among nonfilers in our
baseline analysis sample was $0.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 661
appendix A). For children whose parents were married when they
were first claimed as dependents, we always track the mother’s
location if marital status changes.
4. College Attendance. Chetty et al. (2017) construct a roster
of attendance at all colleges in the United States from 1999 to 2013
by combining information from IRS Form 1098-T, an information
return filed by colleges on behalf of each of their students to report
tuition payments, with Pell Grant records from the Department of
Education.14 We assign each child in the intergenerational sample
to the college he or she attends (if any) for the most years between
ages 19 and 22. See Chetty et al. (2017, appendix B) for further
details on how colleges are identified.
5. Test Scores. We obtain data on standardized test scores
directly from the NYC school district database. The tests were
administered at the NYC school district level during the period
we study. Following Chetty, Friedman, and Rockoff (2014),wenor-
malize the official scale scores from each exam (math and English)
to have mean 0 and standard deviation 1 by year and grade to ac-
count for changes in the tests across school years.
6. Summary Statistics. Table I presents descriptive statis-
tics for the three analysis samples described above. Column (1)
presents statistics for the full inventors sample; columns (2) and
(3) consider inventors and noninventors in the intergenerational
sample; and columns (4) and (5) consider inventors and noninven-
tors in the NYC schools sample.
In the full inventors sample, the median number of patent
applications between 1996 and 2012 is 1 and the median number
of citations per inventor is also only 1. But these distributions
are very skewed: the standard deviations of the number of patent
applications and citations are 11.1 and 118.1, respectively. In-
ventors have median annual wage earnings of $83,000 and total
income of $100,000. Again, these distributions are very skewed,
with large standard deviations and mean incomes well above the
14. All institutions qualifying for federal financial aid under Title IV of the
Higher Education Act of 1965 must file a 1098-T form in each calendar year for
any student who pays tuition. The Pell Grant records are used to identify students
who pay no tuition.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
662 THE QUARTERLY JOURNAL OF ECONOMICS
TABLE I
SUMMARY STATISTICS
Sample Full Intergenerational NYC School District
Inventors Inventors Noninventors Inventors Noninventors
(1) (2) (3) (4) (5)
Patenting outcomes
Patent grants Mean 3.0 1.4 1.3
Median 1.0 1.0 1.0
Std. dev. 6.5 2.7 2.0
Patent applications Mean 3.2 2.2 2.1
Median 1.0 1.0 1.0
Std. dev. 11.1 4.3 3.4
Patent citations Mean 26.2 1.2 1.3
Median 1.0 0.0 0.0
Std. dev. 118.1 12.3 8.7
Number of collaborators Mean 4.7 4.0 3.5
Median 2.0 3.0 2.0
Std. dev. 8.2 5.0 4.4
Age at application Mean 43.7 27.5 27.7
Std. dev. 11.5 2.3 2.7
Income in 2012
Individual wage earnings ($) Mean 111,457 82,902 94,622
Median 83,000 72,000 74,000
Std. dev. 140,463 91,909 127,712
Total individual income ($) Mean 188,782 111,118 173,126
Median 100,000 74,000 75,000
Std. dev. 567,813 396,673 800,082
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 663
TABLE I
CONTINUED
Sample Full Intergenerational NYC School District
Inventors Inventors Noninventors Inventors Noninventors
(1) (2) (3) (4) (5)
Parent household income ($) Mean 183,303 85,992 108,049 47,509
Median 109,000 59,000 66,000 33,000
Std. dev. 662,669 336,387 208,251 81,607
Attended college at Age 20 86.0% 47.7%
Test Scores
3rd grade mean math score 1.0 0.1
3rd grade mean English score 0.8 0.1
8th grade mean math score 1.3 0.2
8th grade mean English score 1.0 0.2
Demographics
Female share 13.1% 18.5% 49.8% 21.9% 48.8%
White non-Hispanic share 44.9% 19.5%
Black non-Hispanic share 17.3% 36.0%
Hispanic share 8.4% 33.7%
Asian share 27.4% 9.6%
Sample size 1,200,689 34,973 16,360,910 452 433,863
Notes. This table presents summary statistics for the three samples of inventors and corresponding samples of noninventors used in the empirical analysis. Wedefine individuals
as inventors if they were listed as an inventor on a patent application between 2001 and 2012 or grant between 1996 and 2014. The full inventors sample (column (1)) includes all
inventors who were linked to the tax data using the procedure described in the Online Appendix. The intergenerational sample consists of United States citizens born in 1980–1984
matched to their parents in the tax data (columns (2) and (3)). The NYC School District sample includes children in the 1979–1985 birth cohorts who attended NYC public schools
at some point between grades 3 and 8 and were linked to the tax data (columns (4) and (5)). Citations are measured as total patent citations between 1996 and 2014. The number
of collaborators is measured as the number of distinct individuals that the inventor has ever coauthored a patent grant or application with in our linked data set. For individuals
with more than one patent application, age at application is the age at a randomly selected patent application filing. Incomes are measured in 2012. Individual wage earnings is
defined as total earnings reported on an individual’s W-2 forms. Total individual income is defined for tax filers as Adjusted Gross Income (as reported on the 1040 tax return)
minus the spouse’s W-2 wage earnings (for married filers). For nonfilers, total individual income is defined as wage earnings. In this table only, wage earnings are top-coded at $1
million and total individual income is top-coded at $10 million. Parent income is measured as mean household income (AGI) between 1996 and 2000. Median income variables are
rounded to the nearest thousand dollars. College attendance at age 20 is measured using 1098-T forms filed by colleges, as in Chetty et al. (2017). Test scores, which are based on
standardized tests administered at the district level, are normalized to have mean 0 and standard deviation 1 by year and grade. See Section II for further details on sample and
variable definitions.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
664 THE QUARTERLY JOURNAL OF ECONOMICS
medians. The mean age of inventors is 44, and 13% of inventors
in the sample are women.
The intergenerational and NYC school samples have younger
individuals because they are restricted to more recent birth co-
horts. As a result, inventors in these subsamples have lower me-
dian incomes, patent applications, and citations than in the full
sample.
III. INVENTORS’CHARACTERISTICS AT BIRTH
In this section, we study how rates of innovation differ along
three key dimensions determined at birth: parental income, race,
and gender. We first document gaps in rates of innovation and
then use test score data to assess the extent to which these gaps
can be explained by differences in abilities to innovate.
III.A. Gaps in Innovation by Characteristics at Birth
1. Parental Income. Figure I, Panel A plots the fraction of
children who invent by 2014 versus their parents’ income per-
centile using our intergenerational analysis sample (children in
the 1980–1984 birth cohorts). We assign parents percentile ranks
by ranking them based on their mean household income from
1996 to 2000 relative to other parents with children in the same
birth cohort. Children from higher-income families are signifi-
cantly more likely to become inventors. Eight out of 1,000 children
born to parents in the top 1% of the income distribution become
inventors, 10 times higher than the rate among those with below-
median-income parents. The relationship is steeply upward slop-
ing even among high-income families: rates of innovation rise by
22% between the 95th percentile ($193,322) and 99th percentile
($420,028) of the parental income distribution. This pattern sug-
gests that liquidity constraints or differences in resources are un-
likely to fully explain why parent income matters, as liquidity
constraints are less likely to bind at higher income levels and
resources presumably have diminishing marginal returns.
Figure I, Panel B shows that the probability a child has highly
cited patents—defined as having total citations in the top 5% of
his or her cohort’s distribution—has a very similar relationship
to parental income. Hence, the relationship between patenting
and parent income is not simply driven by children from high-
income families filing low-value or defensive patents at higher
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 665
(A)
(B)
FIGURE I
Patent Rates versus Parent Income
This figure characterizes the relationship between patent rates and parental
income using our intergenerational analysis sample, which consists of United
States citizens in the 1980–1984 birth cohorts (see Section II.B for details). Panel
A plots the number of children (per 1,000 individuals) who invent by 2014 versus
their parents’ income percentile. Parents are assigned percentile ranks by ranking
them based on their mean household income from 1996 to 2000 relative to other
parents with children in the same birth cohort. Inventing by 2014 is defined as
being listed as an inventor on a patent application between 2001 and 2012 or grant
between 1996 and 2014 (see Section II.B). Panel B replicates Panel A but plots
as the outcome the chances of being a highly cited inventor, defined as having
total citations in the top 5% of the distribution among inventors in the same birth
cohort.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
666 THE QUARTERLY JOURNAL OF ECONOMICS
rates. The pattern in Figure I also remains robust at older ages,
allaying the concern that children from higher-income families
may simply patent earlier than those from low-income families.
In particular, using the Statistics of Income 0.1% sample, we find
that the relationship between rates of innovation between ages 30
and 40 and parental income remains qualitatively similar (Online
Appendix Figure I, Panel A). Defining inventors purely on the
basis of patent grants or patent applications also yields similar
results (Online Appendix Figure I, Panel B).
The relationship between innovation and parental income
is representative of the relationship between achieving profes-
sional success and parental income more generally. Children’s
propensities to reach the upper tail of the income distribution
have a similarly convex and sharply increasing relationship
with parental income (Online Appendix Figure II). For instance,
children with parents in the top 1% of the parent income distri-
bution are 27 times more likely to reach the top 1% of their birth
cohort’s income distribution and 10.6 times more likely to reach
the top 5% of their cohort’s income distribution than those born
to parents below the median. As discussed in the introduction,
we focus on innovation here (rather than professional success in
general) because of innovation’s relevance for economic growth,
its unique risk profile, and its advantages in characterizing
mechanisms more precisely. The results and mechanisms we
establish here may apply to other careers beyond innovation.
2. Race and Ethnicity. Next we turn to gaps in innovation
by race and ethnicity. Because we do not observe race or ethnicity
in the tax data, we use the NYC school district sample for this
analysis. The first set of bars in Figure II shows the fraction of
children who patent by 2014 among white non-Hispanic, black
non-Hispanic, Hispanic, and Asian children. Of children who at-
tend NYC public schools between grades 3 and 8, 1.6 per 1,000
white children and 3.3 per 1,000 Asian children become inventors.
These rates are considerably higher than those of black children
(0.5) and Hispanics (0.2), consistent with evidence from Cook and
Kongcharoen (2010).15
15. The innovation rates are lower than those in Figure I, Panel A because
NYC public schools have predominantly low-income students, with more than
75% of students from families with incomes below the national median. NYC pub-
lic schools also have a much larger share of minorities than the United States
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 667
1.6 1.6 1.6
0.5
1.0
0.6
0.2 0.3 0.3
3.3
4.2
3.1
0123 4
dnasuohT rep srotnevnI
White
Non-Hispanic
Black
Non-Hispanic
Hispanic Asian
Raw rate
Reweighted to
match parental
incomes of whites
Reweighted to match
3rd grade test scores
of whites
FIGURE II
Patent Rates by Race and Ethnicity
This figure presents patent rates by race and ethnicity using our NYC public
schools sample, which consists of children in the 1979–1985 birth cohorts who
attended NYC public schools at some point between grades 3 and 8. Each bar plots
the number of children (per 1,000 individuals) who invent by 2014, as defined in
the notes to Figure I. In each triplet, the first bar shows the raw patent rate for
the relevant subgroup. The second bar plots the patent rate that would prevail if
children in the relevant subgroup had the same distribution of parental income
as white children. To construct these estimates, we divide children into 20 bins
based on their parental incomes and compute mean patent rates across the 20
bins, weighting each bin by the fraction of white children with incomes in that bin.
The third bar in each triplet shows the patent rate that would prevail if children
in the relevant subgroup had the same distribution of 3rd-grade math test scores
as white children. These estimates are constructed by dividing children into 20
bins based on their test scores and computing mean patent rates across the 20
bins, weighting each bin by the fraction of white children with test scores in that
bin.
population: 19.5% of the children in our NYC sample are white, 9.6% are Asian,
33.7% are Hispanic, and 36.0% are black. Although we cannot be sure that the
racial patterns within the NYC schools hold nationally, we do find that the re-
lationship between parental income and innovation in the NYC sample is very
similar to the national pattern in Figure I, Panel A, suggesting that it provides
representative evidence at least on the socioeconomic dimension (Online Appendix
Figure I, Panel C).
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
668 THE QUARTERLY JOURNAL OF ECONOMICS
Because there are significant differences in parental income
by race and ethnicity, the raw gaps across race and ethnicity partly
reflect the income gradient shown in Figure I. To separate these
two margins, we control for differences in income by nonparamet-
rically reweighting the parental income distributions of blacks,
Hispanics, and Asians to match that of whites in the NYC sam-
ple, following the methodology of DiNardo, Fortin, and Lemieux
(1996). We divide the parental income distribution of children in
the NYC sample into ventiles (20 bins) and compute mean patent
rates across the 20 bins for each racial/ethnic group, weighting
each bin by the fraction of white children whose parents fall in
that income bin (i.e., integrating over the income distribution for
whites).
The second set of bars in Figure II plot the resulting innova-
tion rates. Controlling for income differences does not eliminate
the racial and ethnic gaps, but changes their magnitudes. The
black–white gap falls by a factor of 2 (from 1.1/1,000 to 0.6/1,000).
The white–Asian gap widens from 1.7/1,000 to 2.6/1,000 when
we reweight by income, as Asian parents in NYC public schools
have lower incomes on average than white parents. The Hispanic–
white gap remains essentially unchanged.
3. Gender. Finally we examine gaps in innovation by gender.
Because gender is recorded in the tax data for all individuals in the
population, we use the full inventors sample for this analysis. The
advantage of doing so is that we can examine gender differences
in rates of innovation not just for those born in the 1980s as in
our intergenerational sample but for older cohorts as well.
Figure III plots the fraction of female inventors—individuals
who applied for or were granted a patent between 1996
and 2014—by birth cohort.16 Consistent with prior work
(Thursby and Thursby 2005;Ding, Murray, and Stuart 2006;Hunt
2009;Kahn and Ginther 2017), we find substantial gender differ-
ences in innovation for those in the prime of their careers today;
for instance, 18% of inventors born in 1980 are women. What is
16. Because we examine patenting in a fixed time window, we measure patent
rates at different ages for different cohorts, ranging from ages 56–72 for the 1940
cohort to ages 16–32 for the 1980 cohort. This approach yields consistent estimates
of the gender gap across cohorts if gender differences in patenting do not vary
by age. Although we cannot evaluate the validity of this assumption across all
cohorts, examining patent rates at a fixed age (e.g., age 40) over the 17 cohorts we
can analyze yields similar results (not reported).
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 669
Average change per year: 0.27%
(0.01%)
010 20 30 40 50
elameFeraohwsrotnevnIfoegatnecreP
1940 1950 1960 1970 1980
Year of Birth
118 years to reach 50% female share
FIGURE III
Percentage of Female Inventors by Birth Cohort
This figure plots the percentage of inventors who are female by year of birth
using our full inventors sample, which consists of all 1.2 million individuals in the
linked patent-tax data. Inventing is defined as being listed as an inventor on a
patent application between 2001 and 2012 or grant between 1996 and 2014 (see
Section II.B for details). The change per year is estimated using an unweighted
OLS regression of the percentage of female inventors on birth year, depicted by
the solid line. The standard error from this regression is shown in parentheses.
less well known from prior work is the rate at which this gap is
changing over time. Figure III shows that the fraction of female
inventors was only 7% in the 1940 cohort and has risen monoton-
ically and linearly over time. However, the rate of convergence is
slow: a 0.27 percentage point increase in the fraction of female
inventors per cohort on average, based on a linear regression. At
this rate, it will take another 118 years to reach gender parity in
innovation.
III.B. Do Differences in Abilities Explain the Gaps in Innovation?
Why do rates of innovation vary so widely across individuals
with different characteristics at birth? One potential explanation
is that the differences stem from inherited differences in abilities
to innovate or preferences to pursue innovation as a career.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
670 THE QUARTERLY JOURNAL OF ECONOMICS
TABLE II
FRACTION OF GAP IN INNOVATION BY PARENTAL INCOME EXPLAINED BY DIFFERENCES
IN 3RD-GRADE TEST SCORES
Patent rates
for children
Patent rates
for children
with parents
below 80th
with parents
above 80th
High- versus
low-income
percentile percentile innovation gap
(1) (2) (3)
Raw estimates 0.52 1.93 1.41
(0.05) (0.20) (0.21)
Reweighted to match 3rd-grade 0.96 1.93 0.97
scores of high-income children (0.07) (0.20) (0.21)
Gap in innovation explained by 3rd-grade test scores: 31.2%
Notes. This table shows how much of the gap in patent rates by parental income can be explained by
3rd-grade math test scores. The statistics in this table are based on the children in the NYC public schools
sample, which consists of children in the 1979–1985 birth cohorts who attended NYC public schools and
were linked to the tax data. We divide children into two groups: those with parents in the top quintile of
the income distribution within the NYC sample (“high-income children”) and all other children in the sample
(“low-income children”). We define a child as an inventor if he or she is listed as an inventor on a patent
application between 2001 and 2012 or grant between 1996 and 2014 (see Section II.B). The first row lists the
fraction of children who become inventors among low-income (column (1)) and high-income children (column
(2)) along with the differences between these two values (column (3)). In the second row,column (1) shows the
patent rate that low-income children would have if they had the same math test scores as the high-income
children. We calculate this counterfactual rate by dividing the math test score distribution into ventiles (20
bins) and then calculating the patent rate for low-income children weighting by the number of high-income
children in each bin. Column (2) repeats the patent rates for high-income children, and column (3) shows
the gap between the high-income patent rate and the counterfactual low-income patent rate in column (1).
This adjusted gap can be interpreted as the difference in patent rates that would remain if test scores were
identical across low- and high-income children. The percentage of the raw gap in innovation explained by
3rd-grade test score is the percentage reduction in the gap from the raw to the reweighted estimates. Standard
errors are reported in parentheses.
In this subsection, we take a step toward evaluating the role
of differences in abilities to invent by using data on childhood test
scores for children in our NYC schools sample. Although students
who attend NYC public schools are a selected subgroup, differ-
ences in innovation rates by parental income (Online Appendix
Figure I, Panel C) and gender (Table I) are very similar in the
NYC school district sample as in the full intergenerational sam-
ple. We consider whether math test scores—an imperfect proxy for
inventive ability that nonetheless proves to be highly predictive
of innovation rates—can account for the gap in innovation within
the NYC sample by income, race, and gender in turn.
1. Parental Income. In Table II, we estimate the fraction of
the gap in innovation by parental income that can be predicted
by math test scores in 3rd grade (the first grade we observe in
the NYC data). We define “high-income” children as those with
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 671
parents in the top income quintile within the NYC sample, placing
all others in the “lower-income” category; using other thresholds
to divide the two groups yields similar results. We focus on math
test scores because scores in English do not predict innovation
rates conditional on math scores (Online Appendix Table I).17
The first row of Table II shows that 1.93 out of 1,000 chil-
dren from top-quintile families born between 1979–1985 invent
by 2014, as compared with 0.52 out of 1,000 children from lower-
income families. The raw gap in innovation across these income
groups is thus 1.41 inventors per 1,000 children. In the second
row, we reweight the test scores of the lower-income students to
match those of children from high-income families, following the
methodology of DiNardo, Fortin, and Lemieux (1996) as in our
analysis of income and race above. We divide the 3rd-grade math
test score distribution of children in the NYC sample into ventiles
and compute mean patent rates across the 20 bins for the lower-
income group, weighting each bin by the fraction of high-income
children with test scores in that bin. The second row of Table II
shows that according to this statistical decomposition, children
from lower-income families would have a patent rate of 0.96 per
1,000 (rather than 0.52) if they had the same test scores as chil-
dren from high-income families. The patent rate rises because
children from high-income families have higher test scores in 3rd
grade; for instance, children from the top income quintile score
0.65 standard deviations higher on average than children from
lower quintiles (Online Appendix Figure III, Panel A). However,
these differences in test scores account for less than one-third of
the raw gap in innovation, as the gap remains at 0.97 per 1,000
even after adjusting for differences in test scores, as shown in
column (3) of Table II.
Figure IV, Panel A illustrates why test scores fail to fully
predict the gap in innovation by plotting innovation rates versus
test scores for children with parents in the top quintile (circles)
and those with lower-income parents (triangles). Each point in
this figure shows the fraction of inventors within a ventile of
the test score distribution. In high-income families, children who
score highly on 3rd-grade math tests are much more likely to
become inventors than those with lower test scores. By contrast,
17. The same is not true for success on other dimensions: for instance, both
math and English scores are predictive of the probability that a child reaches the
top 1% of the income distribution (Online Appendix Table I).
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
672 THE QUARTERLY JOURNAL OF ECONOMICS
(A) By Parental Income
90th Percentile
0 2 4 6 8
Inventors per Thousand
-2 -1 0 1 2
3rd Grade Math Test Score (Standardized)
Parent Income Below 80th Percentile Parent Income Above 80th Percentile
(B) By Race and Ethnicity
90th Percentile
0 2 4 6 8
Inventors per Thousand
-2 -1 0 1 2
3rd Grade Math Test Score (Standardized)
Hispanic Black White Asian
FIGURE IV
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 673
90th Percentile
0 2 4 6 8
Inventors per Thousand
-2 -1 0 1 2
3rd Grade Math Test Score (Standardized)
Female Male
(C) By Gender
FIGURE IV (Continued)
Patent Rates versus 3rd-Grade Math Test Scores
This figure shows the relationship between patent rates and math test scores in
3rd grade for various subgroups. The sample consists of children in the 1979–1985
birth cohorts who attended NYC public schools in 3rd grade. Test scores, which
are based on standardized tests administered at the district level, are normalized
to have mean 0 and standard deviation 1 by year and grade. In Panel A, we divide
children into two groups based on whether their parents’ incomes fall below the
80th percentile of the income distribution of parents’ income in the NYC sample.
The figure presents a binned scatter plot of patent rates versus test scores for these
two subgroups. To construct the figure, we first divide children into 20 equal-sized
bins (ventiles) based on their test scores. We then plot the share of inventors (per
1,000 individuals) versus the mean test score within each bin for each of the two
subgroups. Panel B and C replicate Panel A, dividing children by their race and
ethnicity (Panel B) and gender (Panel C) instead of parental income. We use 10 bins
rather than 20 bins of test scores in Panel B because of smaller sample sizes for
some racial and ethnic groups. The vertical dashed lines depict the 90th percentile
of the test-score distribution.
in lower-income families, children with higher test scores do
not have much higher innovation rates. As a result, among
students with test scores in the top 5% of the distribution,
those from high-income families are more than twice as likely
to become inventors as those from lower-income families. This
result suggests that becoming an inventor in America relies on
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
674 THE QUARTERLY JOURNAL OF ECONOMICS
two traits: having high inventive ability (as proxied for by math
test scores early in childhood) and being born into a high-income
family.18
To obtain further insight into the role of inventive ability, we
repeat the preceding analysis using test scores in later grades.
Figure V plots the fraction of the raw gap in innovation that is
accounted for by math test scores in each grade from grades 3–8.
As children get older, test scores account for more of the gap in
innovation by parental income. By 8th grade, 48% of the gap can
be predicted by differences in test scores, significantly higher than
the 31% in 3rd grade. Based on a linear regression across the six
grades in which we observe scores, we estimate that on average
an additional 3.2 percentage points of the gap is accounted for by
test scores each year (p<.01).
Extrapolating linearly back to birth, our estimates imply
that only 5.7% of the gap in innovation would be predicted by
math test scores (our proxy for inventive ability) at birth. Con-
versely, test scores at the end of high school would account
for 60.1% of the gap.19 These results suggest that low-income
children start out on an even footing with their higher-income
peers in terms of inventive ability, but fall behind steadily as
they grow older, perhaps because of differences in childhood
environment.
Consistent with this conclusion, we find that gaps in innova-
tion by parental income are relatively small among children who
attend the same college. Figure VI, Panel A lists the 10 colleges
(among colleges with at least 500 students per cohort) whose
18. This figure also implies that efforts to increase innovation among under-
represented groups are likely to have the biggest effects if they are targeted at
children who excel in math and science at early ages. Because such efforts are
unlikely to raise the innovation rates of children from underrepresented groups
beyond those observed for children with comparable test scores from advantaged
backgrounds, Figure IV, Panel A suggests that there is limited scope to increase
innovation rates among low-income children who score below the 90th percentile
on math tests in 3rd grade. However, there may be substantial potential to do so
among those who score in the top 10%.
19. Naturally, the evolution of gaps in inventive ability may differ at earlier
and later ages, so the results of these extrapolations should be interpreted with
caution. We use these calculations simply to illustrate that the gaps in test scores
expand sufficiently rapidly during childhood that they would account for essen-
tially none of the gap in innovation if (hypothetically) measured at birth, but the
majority of the gap if measured at the end of high school.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 675
Slope: 3.20%
(0.55)
30 35 40 45 50
Percent of Gap Explained by Math Test Scores
3 4 5 6 7 8
Grade
FIGURE V
Gap in Patent Rates by Parental Income Explained by Test Scores in Grades 3–8
This figure shows how much of the gap in patent rates by parental income can
be explained by math test scores in grades 3–8. The sample consists of children in
our NYC public schools sample (birth cohorts 1979–1985), who we divide into two
groups: those with parents in the top quintile of the income distribution within
the NYC sample (“high-income children”) and all other children in the sample
(“low-income children”). The gap in innovation explained by math test scores in
grade gis the percentage reduction in the gap in innovation when we reweight
low-income students’ grade gtest score distribution to match that of high-income
students. Table II illustrates how we construct this estimate using 3rd-grade test
scores (31.2%); estimates for later grades use the same methodology. The slope and
best-fit line are estimated using an unweighted OLS regression on the six points,
with standard error reported in parentheses.
students are most likely to become inventors.20 Figure VI, Panel B
presents a binned scatter plot of innovation rates versus parental
income rank among students at these 10 high-innovation colleges.
Of children with parents in the top 1% of the national income dis-
tribution, 7.1% become inventors at these colleges, compared with
4.0% of children from below-median-income families. This gap is
20. Innovation rates for every college in the United States that has at least 10
inventors in our sample are provided in Online Data Table III. The college-level
estimates are blurred to protect confidentiality using the procedure in Chetty et al.
(2017, appendix C). The degree of error due to the blurring procedure is smaller
than the degree of sampling error in the estimates.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
676 THE QUARTERLY JOURNAL OF ECONOMICS
(A) Colleges with the Highest Share of Inventors per Student
020 40 60 80 100 120
Inventors per 1000 Students
Rice University
Rochester Institute of Technology
Georgia Institute of Technology
Clarkson University
Michigan Technological University
Case Western Reserve University
Stanford University
Rensselaer Polytechnic Institute
Carnegie Mellon University
Massachusetts Institute of Technology
(B) Patent Rates vs. Parent Income in the 10 Most Innovative Colleges
010 20 30 40 50 60 70 80 90
Inventors per 1000 Students
020 40 60 80 100
Parents' Percentile Rank in National Income Distribution
FIGURE VI
Patent Rates by College
This figure presents data on the share of students who become inventors by
2014 (as defined in the notes to Figure I) by the college they attended. The sample
consists of all individuals in the tax data in the 1980–1984 birth cohorts who
are linked to parents. Children are assigned to the college that they attend most
frequently at age 19–22, following the methodology of Chetty et al. (2017).Panel
A lists the 10 colleges that have the highest fraction of students who become
inventors, among colleges with at least 500 students per cohort. This figure is
produced from the college-level estimates in Online Data Table 3. These college-
level estimates are blurred to protect confidentiality using the procedure in Chetty
et al. (2017, appendix C). Panel B presents a binned scatterplot of patent rates
versus parental income for students who attended the 10 colleges listed in Panel
A. It is constructed by binning parent income into 20 equal-sized bins (ventiles)
and plotting the mean share of inventors (per 1,000 students) versus the mean
parent rank in the national income distribution within each bin. There are fewer
points on the left because there are fewer students from low-income families than
high-income families at these colleges.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 677
an order of magnitude smaller than the 10 to 1 gap shown in Fig-
ure I, Panel A for the nation as a whole, suggesting that children’s
levels of achievement around age 20 almost fully account for gaps
in innovation. More broadly, this finding suggests that most of
the innovation gap is explained by factors that affect children
before they enter the labor market, as we show in Section IV.
2. Race and Ethnicity. We use analogous methods to those
above to estimate how much of the racial gaps in innovation can
be accounted for by test scores in the NYC schools sample. The
third set of bars in Figure II show the innovation rates that would
prevail if all children had 3rd-grade math test scores comparable
to those of whites. The gaps shrink modestly, showing that test
scores account for very little of the racial gaps in innovation. For
example, the black–white gap shrinks from 1.1 to 1.0, a change of
less than 10%, while the Asian–white gap falls by 9%. Figure IV,
Panel B illustrates why this is the case by plotting patent rates
versus test scores by race and ethnicity. Even conditional on test
scores, whites and Asians are substantially more likely to become
inventors than blacks and Hispanics. Very few of even the highest-
scoring black and Hispanic children pursue innovation.
Replicating the reweighting analysis by grade, we find that
test scores in later grades account for more of the racial gaps in
innovation, consistent with the patterns for income. For instance,
51% of the gap in patent rates between Asians and other racial
and ethnic groups can be explained by 8th-grade test scores.
3. Gender. Finally, we conduct an analogous exercise for
gender, reweighting girls’ test scores to match that of boys. Math
test scores in 3rd grade account for only 2.4% of the difference
in innovation rates between men and women (Online Appendix
Table II). This is because the distribution of math test scores for
boys and girls is extremely similar in 3rd grade (Online Appendix
Figure III, Panel B). Similar to the patterns by race and parental
income, high-scoring girls are much less likely to become inven-
tors than high-scoring boys (Figure IV, Panel C).
Even in 8th grade, test scores account for only 8.5% of the
gender gap in innovation. One explanation for why the gender
gap in test scores expands less across grades than racial and class
gaps is that boys and girls attend similar schools and grow up in
similar neighborhoods, whereas children with different parental
income and racial backgrounds do not.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
678 THE QUARTERLY JOURNAL OF ECONOMICS
Overall, the results in this section are consistent with evi-
dence from other domains that disparities in measurable skills
are small at birth and expand gradually over time (e.g., Fryer
and Levitt 2006;Fryer 2011). One explanation for these pat-
terns is that differences in childhood environment—for example,
in the quality of schools or the degree of exposure to science and
innovation—affect the amount students learn or the amount of
time they study. However, as noted in prior work, one must be cau-
tious in attributing these results to environmental differences. If
tests at later ages are more effective at capturing intrinsic ability,
one may find the patterns across grades documented above even
in the absence of differences in childhood environment. In light of
this limitation, we directly examine the causal effects of childhood
environment in the next section.
IV. CHILDHOOD ENVIRONMENT AND EXPOSURE TO INNOVATION
In this section, we study how childhood environments affect
innovation, focusing in particular on the role of exposure to inven-
tors. We first exploit variation across technology classes to show
that children’s propensities to invent in a given field are heav-
ily influenced by growing up with parents, parents’ coworkers, or
neighbors who are inventors. We then analyze the outcomes of
children who move across areas to show that childhood environ-
ment affects not just the types of innovation that children pursue
but also the overall fraction who go into innovation.
IV.A. Parents
To characterize the role that children’s parents play in shap-
ing their decision to pursue innovation, we begin by asking
whether children whose fathers are inventors are more likely to
become inventors themselves.21 In our intergenerational analysis
sample (children in the 1980–1984 birth cohorts), 2.0 out of 1,000
children whose parents were not inventors become inventors by
2014. In contrast, 18.0 per 1,000 children of inventors become
21. We focus on fathers here because the vast majority of inventors, partic-
ularly in older generations, are male (Figure III). We examine the role of female
inventors in the context of neighborhood differences, where we have greater power,
in Section IV.B. We define a father as an inventor if he applied for a patent be-
tween 2001–2012 or was granted a patent between 1996–2014, analogous to the
definition for children.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 679
inventors themselves—a nine-fold difference.22 This pattern holds
even conditional on parental income, across the parent income dis-
tribution (not reported).
The intergenerational persistence of innovation could be
driven by the genetic transmission of ability to innovate across
generations or by an exposure effect—the environmental effect
of growing up in a family of innovators, holding one’s intrinsic
invention ability fixed. These exposure effects could reflect the ac-
cumulation of specific human capital, changes in preferences, or
simply increased awareness about innovation as a career pathway.
We distinguish between intrinsic inventive ability and ex-
posure effects by exploiting variation in the specific technology
class in which a child innovates. Following the USPTO’s classi-
fication system and Hall, Jaffe, and Trajtenberg (2001), patents
can be grouped into seven broad categories (chemicals, computers
and communications, drugs and medical, electrical and electronic,
mechanical, design and plant, and other). Within these categories,
patents are further classified into 37 subcategories and 445 spe-
cific technology classes. These technology classes are very narrow:
for instance, within the communications category, there are sepa-
rate classes for modulators, demodulators, and oscillators; within
the resins subcategory, there are separate classes for synthetic
and natural resins.
We isolate the causal effects of exposure by analyzing whether
children are particularly likely to patent in the same technology
classes as their parents. The idea underlying our research design
is that genetic differences in inventive ability are unlikely to lead
to differences in propensities to innovate across similar, narrowly
defined technology classes. For instance, a child is unlikely to
have a gene that codes specifically for ability to invent in modula-
tors rather than oscillators. Under this assumption, the degree
of alignment between the specific technology classes in which
children and their parents innovate can be used to estimate causal
exposure effects.
22. Part of this association reflects the fact that children and their fathers
sometimes are coinventors on the same patent. However, this is relatively rare:
13.7 out of 1,000 children of inventors file patents on which their parent is not a
coinventor, still far higher than the rate for noninventors. In addition, our measure
of parental inventor status suffers from measurement error because we do not
observe parents’ patents prior to 1996 in our data, likely attenuating our estimate
of the difference.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
680 THE QUARTERLY JOURNAL OF ECONOMICS
Implementing this research design requires a metric for
the degree of similarity between technology classes. We define
the distance between two technology classes A and B based on the
share of inventors in class A who also invent in class B; the higher
the share of common inventors, the lower the distance between
AandB.Online Appendix Table III gives an example that illus-
trates this distance metric by showing the technology classes that
are closest to technology class 375, “pulse or digital communica-
tions.” Pulse or digital communications has a distance of 0 with
itself by definition. Inventors who had a patent in pulse or dig-
ital communications were most likely to have another patent in
demodulators, which is therefore assigned an ordinal distance of
d=1 from the pulse and digital communications class. The next
closest class is modulators (d=2), and so on.
Figure VII, Panel A plots the fraction of children who patent
in a technology class dunits away from their father’s technology
class, among children of inventors in our intergenerational sam-
ple.23 Nearly 1 in 1,000 children patent in the same technology
class as their father (d=0). In contrast, the probability of invent-
ing in the next closest technology class (with distance d=1) is
less than 0.2 per 1,000, an estimate that is significantly different
from the value at d=0withp<.01. The child’s probability of
inventing in a given class then falls gradually as drises, although
the gradient is relatively flat compared to the jump between
d=0andd=1.
The jump in innovation rates at d=0 suggests that part of
the reason that children of inventors are more likely to become
inventors themselves is because of exposure to innovation rather
than differences in natural talents. To formalize the identification
assumption underlying this conclusion, let eic {0, 1}represent an
indicator for whether child i’s father has a patent in technology
class c(i.e., if child iis “exposed” to innovation in class c)and
aic represent the child’s intrinsic ability to innovate in class c.
Suppose that child ipatents in technology class cif aic +βeic >0.
Here, βmeasures the causal effect of exposure to innovation. Our
identification assumption is that:
(1) lim
d0Cov(ai,cai,c+d,ei,cei,c+d)=0.
23. Children or fathers who patent in multiple technology classes are assigned
the technology class in which they patent most frequently. We omit observations
where a child and his or her father are coinventors on the same patent to eliminate
mechanical effects on the rate of patenting in the same class.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 681
TABLE III
EXPOSURE TO INNOVATION FROM PARENTS’COLLEAGUES:CHILDRENSINNOVATION RATES VERSUS PATENT RATES I N FATHERSINDUSTRY
Dependent variable:
Fraction
inventing
Fraction
inventing
in patent
category
Fraction
inventing
in patent
subcategory
Fraction
inventing in
patent class
Fraction
inventing in
patent class
(1) (2) (3) (4) (5)
Patent rate in father’s industry 0.250
(0.028)
Patent rate in father’s industry in
same category
0.163
(0.018)
Patent rate in father’s industry in
same subcategory
0.155
(0.017)
Patent rate in father’s industry in
same class
0.078 0.0598
(0.013) (0.0125)
Patent rate in father’s industry in
same subcategory but other class
0.0044
(0.0008)
Patent rate in father’s industry in
same category but other subcategory
0.0001
(0.0004)
Patent rate in father’s industry in
other category
0.0002
(0.0000)
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
682 THE QUARTERLY JOURNAL OF ECONOMICS
TABLE III
CONTINUED
Dependent variable
Fraction
inventing
Fraction inventing
in patent category
Fraction inventing
in patent subcategory
Fraction inventing
in patent class
Fraction inventing
in patent class
(1) (2) (3) (4) (5)
Fixed effects None Patent category Patent subcategory Patent class Patent class
Unit of observation Father’s
industry
Fat her’s i nd us try
by patent category
Fat her’s i nd us try
patent subcategory
Fat her’s i nd us try
by patent class
Fat her’s i nd us try
by patent class
Number of cells 345 2,415 12,765 153,525 153,525
Mean of dependent variable 0.002341 0.000334 0.000063 0.000005 0.000005
Std. dev. of dependent variable 0.001063 0.000275 0.000118 0.000018 0.000018
Mean of independent variable 0.001040 0.000168 0.000034 0.000003 0.000003
Std. dev. of independent variable 0.002368 0.000654 0.000206 0.000030 0.000030
Notes. This table analyzes how a child’s propensity to invent is related to patent rates in his or her father’s industry. The sample consists of children in the intergenerational
sample (1980–1984 birth cohorts) whose parents are not inventors. Each column presents estimates from a separate OLS regression, with standard errors clustered by industry
in parentheses. In column (1), we regress the share of children who become inventors among those with fathers in industry jon the patent rate among workers in industry j,
with one observation per industry (six digit NAICS code). We measure the patent rate among workers in each industry as the average number of patents issued to individuals
in that industry per year between 1996 and 2012 divided by the average number of workers per year (based on W-2 counts) in each industry between 1999 and 2012. Column
(2) is run at the industry-by-patent-category level. Here, we regress the share of children with fathers in industry jwho invent in patent category con the share of workers in
industry jwho have patents in category c. We include patent category fixed effects in this regression to account for differences in patent rates across categories. Columns (3) and (4)
are analogous to column (2), but use more narrowly defined categorizations of patent types: patent subcategories and patent classes. Column (5) replicates column (4) with three
additional controls: the fraction of inventors in (i) the same subcategory but in a different patent class, (ii) the same category but a different subcategory, and (iii) other categories.
All regressions are weighted by the number of children in each cell. There are 10,213,731 children underlying these regressions, the set of children in the intergenerational sample
whose fathers have a nonmissing NAICS code.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 683
(A) Patent Rates by Distance from Father’s Technology Class for
Children of Inventors
0
0.2
0.4
0.6
0.8
1
Inventors per Thousand
020 40 60 80 100
Distance from Father's Technology Class
(B) Effects of Class-Level Patent Rates within Father’s Industry
by Technological Distance
00.02 0.04 0.06 0.08
Regression Coefficient on Class-Level Patent Rate
0 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100
Distance Between Technology Classes
FIGURE VII
Children’s Patent Rates versus Class-Level Patent Rates in Childhood
Environment
This figure shows how children’s propensities to patent in a technology class
vary with the class in which their father (Panel A), father’s colleagues (Panel B),
or childhood neighbors (Panel C) patented. In Panel A, the sample consists of all
children in our intergenerational sample whose fathers are inventors (those who
applied for a patent between 2001 and 2012 or were granted a patent between
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
684 THE QUARTERLY JOURNAL OF ECONOMICS
(C) Effects of Class-Level Patent Rates within Childhood CZ
by Technological Distance
0 0.2 0.4 0.6 0.8 1 1.2
Regression Coefficient on Class-Level Patent Rate
0 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100
Distance Between Technology Classes
FIGURE VII (Continued)
1996 and 2014) and who were not listed as coinventors on a patent with their fa-
thers. To construct Panel A, we first assign fathers and children a technology class
based on the class in which they have the most patents and patent applications.
We then define the distance between two technology classes A and B based on the
share of inventors in class A who also invent in class B. Using this distance metric,
for each child, we define d=0 as the class in which his or her father patents,
d=1 as the next closest class, and so on. We then plot the share of children (per
1,000 individuals) who invent in a technology class that is dunits away from their
father’s class. Classes in which fewer than 100 inventors have a patent grant or
application between 1996 and 2014 are omitted. In Panels B and C, the sample
consists of all children in our intergenerational sample whose parents are not in-
ventors. Each bar in Panel B plots estimates from a separate regression, with one
observation per father’s industry (six-digit NAICS code) and patent technology
class. In the first bar, we regress the fraction of children who patent in technology
class camong those with fathers in industry jon the patent rate among workers in
industry jinthesametechnologyclassc. We measure the class-level patent rate
among workers in each industry as the average number of patents in class cissued
to individuals in that industry per year (between 1996 and 2012) divided by the
average number of workers per year in each industry between 1999 and 2012. In
the second bar, we regress the same dependent variable on the mean patent rate
in the father’s industry in the 10 closest classes (d=1 to 10). The third bar uses
the average patent rate in classes with d=11 to 20, and so on. All regressions
are weighted by the number of children in each cell and include class-level fixed
effects for class c. Panel C replicates Panel B, replacing patent rates in the father’s
industry with patent rates of workers in the CZ where the child grew up. CZ-level
patent rates are defined as the average number of patents issued in class cper
year to individuals from a given CZ between 1980 and 1990 divided by the CZ’s
population between ages 15 and 64 in the 1990 census.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 685
Equation (1) requires that an individual’s intrinsic ability to in-
novate in a technology class does not covary with whether his or
her father innovates in that particular technology class among
technology classes that are very similar. Under this assumption,
we can identify the causal effects of exposure (β) even though in-
ventive ability is correlated with exposure (Cov(eic,αic )>0) by
analyzing how a child’s propensity to innovate in a given technol-
ogy class varies with the distance between that class and the class
in which his or her parents patented. In particular, the jump in
rates of innovation at d=0inFigure VII cannot be generated
by differences in ability under the assumption in equation (1) and
must therefore be driven by the causal effect of exposure.24
Interpreting the difference in innovation rates between tech-
nology class d=0andd=1 as purely driven by exposure, we
infer that having a parent who is an inventor in a given technol-
ogy class increases a child’s probability of inventing in that class
by at least a factor of five. This result suggests that exposure
plays a substantial role in determining children’s propensities to
innovate.25
Although this result is useful in establishing that exposure
matters, replicating the level of exposure one obtains through
one’s parents is likely to be challenging from a policy perspec-
tive. Moreover, parents are only one of many potential sources
through which children may acquire knowledge about careers
in innovation. We therefore turn to two broader sources of
exposure outside one’s immediate family: parents’ coworkers and
residential neighbors.
IV.B. Parents’ Coworkers
In this subsection, we examine how exposure to innovation
through parents’ coworkers affects a child’s propensity to become
24. Equation (1) is a convenient way to conceptualize our research design, but
we cannot literally take the limit as d0 because of the discreteness of technology
classes. In practice, we effectively assume that Cov(αi,cαi,c+1,ei,cei,c+1)=
0, that is, that a child’s ability to invent in a technology class does not covary with
parental exposure across two adjacent classes.
25. More precisely, this research design demonstrates that parental exposure
influences the technology class in which a child innovates. Although this finding
supports the view that children whose parents are inventors are more likely to
invent themselves because of exposure effects, one may be concerned that exposure
affects only the type of innovation a child pursues and not whether or not the child
invents at all. We address this possibility using an alternative research design in
Section IV.D.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
686 THE QUARTERLY JOURNAL OF ECONOMICS
an inventor. To do so, we assign each father in our intergenera-
tional sample an industry based on the six-digit NAICS code of
his most frequent employer between 1999–2012.26 We then mea-
sure the patent rate among workers in the father’s industry—
whom we call the father’s “coworkers”—as the average number
of patents issued to individuals in that industry per year (be-
tween 1996–2012) divided by the average number of workers in
that industry per year based on counts of W-2 forms in the tax
data. To ensure that we do not capture the effects of parental
exposure itself, we drop children whose own parents were inven-
tors during our sample period throughout the remainder of this
section.27
In column (1) of Table III, we regress the fraction of children
who become inventors among those with fathers in a given indus-
try on patent rates for workers in that industry. This regression
has one observation for each of the 345 industries and is weighted
by the number of fathers in each industry.28 The estimate of 0.250
(std.err. =0.028) implies that a 1 percentage point increase in the
patent rate among a father’s coworkers is associated with a 0.25
percentage point increase in the probability that a child becomes
an inventor. This estimate implies that a 1 standard deviation
(0.24 percentage point) increase in the fraction of inventors in
the father’s industry is associated with a 25.3% (0.059 percentage
point) increase in children’s innovation rates.
The association in column (1) of Table III could reflect either
the causal effect of exposure to innovation through a parents’
coworkers or a correlation with other unobservables, such as a
child’s intrinsic ability to innovate. As before, we isolate exposure
effects by testing whether children are more likely to innovate in
the same technology classes as their parents’ coworkers. Using the
26. For individuals receiving W-2s from multiple firms in a given year, we
define the employer in that year to be the firm that issued the W-2 with the
highest salary. We exclude fathers working in industries with fewer than 50,000
individuals (5% of fathers), as patent rates are measured imprecisely for these
industries.
27. To ensure that the findings are not driven by mechanical copatenting with
parents’ coworkers, we have verified that restricting the sample to children who
have sole-authored patents yields very similar results.
28. This regression is equivalent to regressing an indicator for whether a child
is an inventor on the rate of innovation in his or her father’s industry in a data
set with one observation per child, clustering standard errors by industry, because
the innovation rate (the right-hand-side variable) does not vary within industries.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 687
same measure of distance dbetween technology classes defined in
Section IV.A, we estimate OLS regressions of the form:
(2) ycj =κc+bdPc+d,j+εcj,
where ycj denotes the patent rate in technology class cof children
with fathers who work in industry j,κcrepresents a class-specific
intercept, and Pc+d,jdenotes the patent rate in the class c+d
among workers in industry j. We estimate these regressions at
the industry-by-technology-class level, weighting by the number
of children with fathers in each industry. We include class fixed
effects (κc) to account for the variation in size across classes and
identify bdfrom variation across industries in class-specific patent
rates.
Figure VII, Panel B plots estimates from regressions analo-
gous to equation (2). Each bar plots estimates of bdfrom a separate
regression, varying the distance dused to define workers’ patent
rates Pc+d,jin equation (2). The first bar plots b0, the relationship
between children’s patent rates in a given class and their fathers’
coworkers patent rates in the same class (d=0). In the second
bar, we define Pc+d,jas the mean patent rate in the father’s in-
dustry in the next 10 closest classes (d=1 to 10). The third bar
uses the average patent rate in classes with d=11 to 20, and so
on. The coefficient bjon parents’ coworkers’ patent rates drops by
85% from the same class (d=0) to the next closest classes (p<
.01). That is, children are much more likely to patent in the same
class as their parents’ coworkers than in very similar classes. This
result implies that an increase in parents’ coworkers’ patent rates
causes an increase in a child’s propensity to innovate under the
following identification assumption:
(3) lim
d0Cov(εc,jεc+d,j,Pc,jPc+d,j)=0.
This assumption, which is analogous to equation (1), requires that
as the distance dbetween technology classes grows small, dif-
ferences in unobservable determinants of children’s innovation
rates in class cversus c+dare orthogonal to differences in par-
ents’ coworkers’ innovation rates in those classes. Intuitively, we
require that children whose fathers work in an industry where
many workers patent in amplifiers rather than antennas do not
have greater intrinsic ability to invent in amplifiers relative to
antennas themselves. Under this assumption, we can infer from
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
688 THE QUARTERLY JOURNAL OF ECONOMICS
Figure VII, Panel B that a 1 percentage point increase in patent
rates among parental coworkers in a given class increases a child’s
probability of inventing in that class by b0b1-10 =0.065 percent-
age points (83%).
Our measure of distance between technology classes based on
copatenting rates is one of many potential approaches to identify-
ing “similar” patent classes. To assess the sensitivity of our results
to this choice, we use the Hall, Jaffe, and Trajtenberg (2001) hier-
archical classification system, which groups patents into similar
fields (categories, subcategories, and classes), as an alternative
way to identify similar patent classes. In Table III, columns (2)–
(5), we estimate a series of regressions to assess whether children
patent in the same fields as workers in their father’s industry
using the USPTO’s classification system. In column (2), we test
whether children are more likely to invent in the same categories
as their father’s coworkers using a regression specification anal-
ogous to equation (2) estimated at the category by industry level
with d=0. Columns (3) and (4) replicate the specification in col-
umn (2) at the subcategory and technology class levels. Finally, in
column (5), we replicate column (4) with three additional controls:
patent rates in (i) the same subcategory but in a different patent
class, (ii) the same category but a different subcategory, and (iii)
other categories.
At all levels of the hierarchy, we find a strong, statistically
significant association between children’s patent rates and their
parents’ coworkers patent rates. Moreover, column (5) shows that
innovation among parents’ coworkers leads to a 10 times larger
increase in innovation in the same technology class (e.g., synthetic
resins) as it does in other classes even within the same subcate-
gory (e.g., natural resins). The coefficient on the own-class patent
rate is not statistically different from the specification in column
(4), while the coefficients on the other-class and category patent
rates are very close to 0. Under our identification assumption (3),
the much smaller estimates for other classes imply that children’s
propensity to invent in the same class as their parents’ coworkers
is driven by the causal effect of exposure.
The class-specificity of the exposure effects also sheds light on
the mechanism through which exposure matters. Transmission of
general human capital or an interest in science would be unlikely
to have impacts that vary so sharply by technology class. Instead,
the data point to mechanisms such as transmission of specific
human capital, access to networks that help children pursue a
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 689
certain subfield, acquisition of information about certain careers,
or role model effects.
IV.C. Neighborhoods
In this subsection, we study how rates of innovation in the
neighborhood in which a child grows up affect his or her propensity
to innovate. Following Chetty et al. (2014), we assign children in
our intergenerational sample to CZs based on where they were
first claimed as dependents by their parents.
Figure VIII, Panel A maps rates of innovation across the CZs
where children grew up, with darker colors representing areas
where more children become inventors. Figure VIII, Panel B lists
the 10 CZs where children are the most or least likely to grow up to
become inventors (among the 100 most populated CZs). Children
who grow up in the Northeast, coastal California, and the rural
Midwest have the highest probabilities of becoming inventors,
whereas those in the Southeast have the lowest probability. The
areas where children grow up to become inventors tend to have
higher mean incomes (population-weighted correlation ρ=0.63),
fewer single parents (ρ=−0.39), and higher levels of absolute
upward intergenerational mobility (ρ=0.32), based on the CZ-
level measures defined in Chetty et al. (2014). However, there are
some stark exceptions to these patterns, such as Detroit, where
children have one of the highest likelihoods of becoming inventors
but where income mobility and mean incomes are relatively low.
The spatial analysis in Figure VIII differs from previous anal-
yses of “innovation clusters” and agglomeration (e.g., Porter and
Stern 2001;Kim and Marschke 2005) because it reflects the loca-
tions where inventors grow up, which may differ from where they
work as adults. Nevertheless, children who grow up in the areas
where the most innovation occurs tend to be most likely to go into
innovation themselves. For instance, children who grow up in the
San Jose CZ, which includes Silicon Valley, top the list in terms of
the probability of becoming inventors. To examine this relation-
ship more systematically, we define the patent rate of workers in
each CZ as the average number of patents issued per year (in the
full USPTO data) to individuals from a given CZ between 1980
and 1990 divided by the CZ’s population between the ages of 15
and 64 in the 1990 census. Figure IX presents a scatter plot of
the fraction of children who go on to become inventors versus the
patent rate of workers in their childhood CZ (their “neighbors”)
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
690 THE QUARTERLY JOURNAL OF ECONOMICS
FIGURE VIII
The Origins of Inventors: Patent Rates by Childhood Commuting Zone
Panel A maps the share of children who become inventors by the commuting
zone (CZ) in which they grew up using our intergenerational sample (United States
citizens in the 1980–1984 birth cohorts) (color version available online). Each child
is assigned a CZ based on the ZIP code from which their parents filed their 1040
tax return in the year they were first claimed as dependents (which is typically
1996, as our data begin in 1996). The map is constructed by dividing the CZs into
unweighted deciles based on patent rates, with darker shades representing areas
where more children grow up to become inventors. Data for CZs with fewer than
1,000 children, which account for 0.3% of the children in the sample, are omitted.
Panel B lists the CZs with the 10 highest and lowest shares of inventors per 1,000
children among the 100 CZs with the largest populations in the 2000 census.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 691
Newark
Houston
Minneapolis
San Jose
Brownsville
Portland
Madison
0 1 2 3 4 56
Num. of Inventors per 1000 Children who Grow Up in CZ
0 0.2 0.4 0.6 0.8
Annual Patent Rate per Thousand Working Age Adults in CZ
FIGURE IX
Children’s Patent Rates versus Patent Rates of Workers in their Childhood CZ
The figure plots the patent rates of children who grow up in a given CZ (con-
structed as in Figure VIII) versus the patent rates of workers who live in that CZ.
Patent rates of workers in each CZ are defined as the average number of patents
per year issued to inventors residing in that CZ between 1980 and 1990 (based
on the universe of USPTO data) divided by the CZ’s population between the ages
of 15 and 64 in the 1990 census. We restrict the figure to the 100 CZs with the
largest populations in the 2000 census. The solid best-fit line is estimated using
an unweighted OLS regression on these 100 observations (slope =4.22, standard
error =0.40).
among the 100 most populated CZs. There is a clear positive rela-
tionship between these variables, with a correlation of 0.75.
The correlation in Figure IX is consistent with the hypothesis
that exposure to innovation during childhood through one’s neigh-
bors increases a child’s propensity to innovate, but it could also
reflect geographical sorting. We isolate the causal effect of expo-
sure by estimating the extent to which children invent in the same
narrow technology classes as their neighbors, as in our analysis
of industry-level differences above. Figure VII, Panel C replicates
Figure VII, Panel B, plotting coefficients from regressions of chil-
dren’s innovation rates in a given technology class con class-level
patent rates of workers in their childhood CZs versus the distance
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
692 THE QUARTERLY JOURNAL OF ECONOMICS
between technology classes. The coefficient on neighbors’ patent
rates drops by 85% from the same class (d=0) to the next closest
classes (p<.01), implying that neighborhoods have substantial
causal exposure effects on the class in which a child innovates
under an identification assumption analogous to assumption (3).
In Table IV, we evaluate the robustness of this result and
the mechanisms underlying it using a set of fixed effects regres-
sion specifications. As a reference, in column (1), we regress the
fraction of children who grow up to be inventors in each CZ on
the patent rate of workers in their childhood CZ, replicating the
analysis in Figure IX including all 741 CZs rather than just the
100 largest ones. The coefficient of 2.9 implies that a 1 standard
deviation (0.02 percentage point) increase in the annual CZ-level
patent rate is associated with a 0.058 percentage point (28.5%)
increase in the fraction of children who become inventors.
One potential explanation for the result in column (1) (and
Figure VII, Panel C) is that children tend to stay near the areas
where they grew up, and may mechanically end up being more
likely to patent if they live in an area like Silicon Valley simply
because the jobs that are available in such areas tend to be in the
innovation sector. To distinguish this supply of jobs mechanism
from childhood exposure effects that change the careers children
choose to pursue, we focus on the subset of children who move to
a different CZ in adulthood from where they grew up. In column
(2), we estimate a regression analogous to that in column (1) at
the childhood CZ by current CZ level, limiting the sample to chil-
dren whose current (2012) CZ differs from their childhood CZ. We
regress the fraction of children who grow up to be inventors in
these cells on the patent rate of the CZ in which they grew up, in-
cluding fixed effects for the child’s 2012 CZ so that the coefficient
of interest is identified purely from comparisons across individu-
als who grew up in different areas but currently live in the same
area. The coefficient on the patent rate in the childhood CZ is only
slightly lower at 2.6 in this specification (compared to 2.9 in col-
umn (1)), showing that most of the relationship in column (1) is
not mechanically driven by the types of jobs available in an area.
In the remaining columns of Table IV, we use the hierarchi-
cal patent classification system to identify similar patent classes
instead of the distance metric used in Figure VII. In columns
(3)–(5), we analyze whether the result in column (1) continues
to hold at the category level: do children go on to patent in the
same categories as their neighbors did while they were growing
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 693
TABLE IV
NEIGHBORHOOD EXPOSURE EFFECTS:CHILDRENSINNOVATION RATES VERSUS PATENT RATES IN CHILDHOOD COMMUTING ZONE
Dependent variable Fraction Fraction
Fraction
inventing
in patent
Fraction
inventing
in patent
Fraction
inventing
in patent
Fraction
inventing
in patent
Fraction
inventing
in patent
Fraction
inventing
in patent
inventing inventing category category category subcat. class class
(1) (2) (3) (4) (5) (6) (7) (8)
Patent rate in childhood CZ 2.932 2.578
(0.417) (0.531)
Patent rate in same 1.759 1.114 1.722
category in childhood CZ (0.404) (0.341) (0.406)
Patent rate in same 1.526
subcategory in childhood CZ (0.375)
Patent rate in same technology 1.108 1.017
class in childhood CZ (0.181) (0.162)
Patent rate in same subcategory, 0.0003
but different technology class
in childhood CZ
(0.0063)
Patent rate in same category, 0.0015
but different subcategory
in childhood CZ
(0.0028)
Patent rate in different 0.0054
category of childhood CZ (0.0006)
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
694 THE QUARTERLY JOURNAL OF ECONOMICS
TABLE IV
CONTINUED
Dependent variable Fraction Fraction
Fraction
inventing
in patent
Fraction
inventing
in patent
Fraction
inventing
in patent
Fraction
inventing
in patent
Fraction
inventing
in patent
Fraction
inventing
in patent
inventing inventing category category category subcat. class class
(1) (2) (3) (4) (5) (6) (7) (8)
Fixed effects None Current CZ Category Current CZ
by category
Father’s
NAICS by
category
Subcategory Class Class
Unit of observation Childhood
CZ
Childhood
CZ by
current CZ
Childhood
CZ by
category
Childhood
CZ by
current CZ
by category
Childhood
CZ by
father’s
NAICS by
category
Childhood
CZ by
subcategory
Childhood
CZ by patent
class
Childhood
CZ by patent
class
Number of cells 741 221,621 5,187 1,551,347 1,637,706 27,417 329,745 329,745
Mean of dep. var. 0.002019 0.003692 0.000289 0.000527 0.000336 0.000055 0.000005 0.000005
Std. dev. of dep. var. 0.000905 0.010896 0.000240 0.003908 0.002477 0.000102 0.000017 0.000017
Mean of indep. var. 0.000286 0.000273 0.000041 0.000039 0.000042 0.000008 0.000001
Std. dev. of indep. var. 0.000196 0.000204 0.000046 0.000048 0.000046 0.000013 0.000002
Notes. This table analyzes how a child’s propensity to invent is related to patent rates in his or her childhood commuting zone. The sample consists of childreninthe
intergenerational sample (1980–1984 birth cohorts) whose parents are not inventors. Each child is assigned a childhood CZ based on the ZIP code from which their parents first
claimed them as dependents. Each column presents estimates from a separate OLS regression, with standard errors clustered by CZ in parentheses. In column (1), we regress
the share of children who become inventors among those who grow up in CZ jon the patent rate among workers in CZ j, with one observation per CZ. We measure the patent
rate among workers in each CZ as the average number of patents issued per year (in the full USPTO data) to individuals in a given CZ between 1980 and 1990 divided by the
CZ’s population between the ages of 15 and 64 in the 1990 census. Column (2) is run at the childhood-CZ-by-current-CZ level, limiting the sample to children whose current (2012)
CZ differs from their childhood CZ. Here, we regress the share of inventors in each cell on the patent rate in the childhood CZ and on fixed effects for the 2012 CZ, so that the
coefficient on childhood CZ patent rates is identified from comparisons across individuals currently living in the same CZ. Column (3) is run at the childhood-CZ-by-patent-category
level. We regress the share of children from CZ jwho invent in patent category con the share of workers in CZ jwhohavepatentsincategoryc. We include patent category
fixed effects in this regression to account for differences in patent rates across categories. Column (4) replicates column (2) at the category level, limiting the sample to children
who move and estimating the model at the childhood-CZ-by-current-CZ-by-category level, with current-CZ-by-category fixed effects. In column (5), we include all children and
replace the CZ-by-category fixed effects with fixed effects for the father’s industry-by-category, estimating the model at the childhood-CZ-by-father’s-industry-by-category level.
This specification isolates variation from one’s neighbors that is orthogonal to the variation from parents’ colleagues. Columns (6) and (7) are analogous to column (3) but use more
narrowly defined categorizations of patent types: patent subcategories and patent classes. Column (8) replicates column (7) with three additional controls: the fraction of inventors
in (i) the same subcategory but in a different patent class, (ii) the same category but a different subcategory, and (iii) other categories. All regressions are weighted by the number
of children in each cell. There are approximately 15.5 million children underlying the regressions in columns (1), (3), (6), (7), and (8). Columns (2) and (4) are based on the subset
of 5.4 million individuals who moved across CZs. Column (5) includes the 10.2 million children whose fathers have nonmissing NAICS codes.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 695
up? We consider three different specifications. In column (3), we
replicate the specification in column (1) at the CZ by patent cat-
egory level, using the same specification as in equation (2) when
d=0, but letting jindex CZs instead of industries. In column (4),
we replicate the specification in column (2) at the category level.
We restrict attention to movers and regress the share patenting
in a given category (with one observation per childhood CZ, cur-
rent CZ, and category) on the childhood CZ patent rate in that
category. We include current-CZ-by-category fixed effects in this
specification. In column (5), we include all children and replace
the CZ-by-category fixed effects with fixed effects for the father’s
industry-by-category, estimating the model at the childhood-CZ-
by-father’s-industry-by-category level. This specification isolates
variation from one’s neighbors that is orthogonal to the variation
from parents’ coworkers examined above in Table III.
In all three of these specifications in Table IV, we find robust
and significant positive relationships between children’s category-
level innovation rates and the corresponding category-level patent
rates of workers in their childhood CZ. Intuitively, these specifica-
tions effectively show that children who grow up in Silicon Valley
are especially likely to patent in computers, while children who
grow up in Minneapolis (which has many medical device manu-
facturers) are especially likely to patent in medical devices. This is
true even among children who live in the same place in adulthood
and whose parents work in the same industry.
In Table IV, columns (6) and (7), we replicate the specification
in column (3) at the subcategory and technology class levels,
respectively. We continue to find substantial positive coefficients
in these specifications, confirming the result in Figure VII, Panel
C that children tend to invent in the same technology classes
that those around them did during their childhood. Column (8)
replicates the specification in column (7) including controls for
patent rates in other classes, subcategories, and categories, as in
column (5) of Table III. The coefficient on the own class patenting
rate is not statistically different from the specification in column
(7), while the coefficients on the other-class and category patent
rates are close to 0. Under our identification assumption, the co-
efficient of 1.02 in column (8) implies that a 1 standard deviation
(0.0002 percentage point) increase in the annual CZ-level patent
rate in a given technology class causes a 0.0002 percentage point
(43%) increase in the fraction of children who become inventors
in the same class.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
696 THE QUARTERLY JOURNAL OF ECONOMICS
TABLE V
GENDER-SPECIFIC EXPOSURE EFFECTS:CHILDRENSINNOVATION RATES VERSUS INNOVATION RATES BY GENDER IN CHILDHOOD CZ
Dependent variable
Fraction
inventing
Fraction
of women
inventing
Fraction
of men
inventing
Fraction
of women
inventing in
patent category
Fraction of men
inventing in
patent category
(1) (2) (3) (4) (5)
Innovation rate in childhood CZ 0.986
(0.145)
Innovation rate of women in childhood CZ 2.408 0.356 2.232 2.157
(1.265) (4.398) (0.607) (1.300)
Innovation rate of men in childhood CZ 0.174 1.784 0.102 1.693
(0.154) (0.625) (0.062) (0.295)
Fixed effects None None None Category Category
Unit of observation Childhood CZ Childhood CZ Childhood CZ Childhood CZ
by category
Childhood CZ
by category
Number of cells 741 741 741 5,188 5,188
p-value from F-test for equality of coefficients .113 .667 .001 .015
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 697
TABLE V
CONTINUED
Dependent variable
Fraction
inventing
Fraction
of women
inventing
Fraction
of men
inventing
Fraction
of women
inventing in
patent category
Fraction of men
inventing in
patent category
(1) (2) (3) (4) (5)
Mean of dep. var. 0.002020 0.000745 0.003282 0.000102 0.000453
Std. dev. of dep. var. 0.000905 0.000396 0.001487 0.000117 0.000433
Mean of indep. var. 0.000628
Std. dev. of indep. var. 0.000631
Mean of innov. rate (women) 0.000061 0.000060 0.000008 0.000008
Std. dev. of innov. rate (women) 0.000066 0.000066 0.000017 0.000017
Mean of innov. rate (men) 0.000568 0.000567 0.000080 0.000080
Std. dev. of innov. rate (men) 0.000569 0.000568 0.000139 0.000139
Notes. This table analyzes how a child’s propensity to invent is related to the innovation rates of adults of the same gender in his or her childhood commuting zone (CZ). The
sample consists of children in the intergenerational sample (1980–1984 birth cohorts) whose parents are not inventors. Each column presents estimates from a separate OLS
regression, with standard errors clustered by CZ in parentheses. Column (1) replicates the specification in column (1) of Table IV, except that here we define the independent
variable using the linked patent-tax data rather than just the patent data, since we do not observe gender in the patent data itself. Specifically, we define the innovation rate for
workers in CZ jas the total number of patent applications filed by individuals born before 1980 in our full inventors sample divided by the number of individuals between ages 15
and 64 in CZ jin the 1990 census. We convert this measure to an annual rate by dividing by 17, as we observe patents between 1996–2012. In column (2), we regress the fraction
of girls from CZ jwho become inventors on the patent rates of female and male workers in CZ j. Column (3) replicates column (2) using the share of boys who become inventors
as the dependent variable. The regression in column (4) is run at the childhood-CZ-by-patent-category level. Here, we regress the share of girls from CZ jwho invent in patent
category con the share of male and female workers in CZ jwho have patents in category c. We include patent category fixed effects in this regression to account for differences
in patent rates across categories. Column (5) replicates column (4) using the share of boys who become inventors as the dependent variable. All regressions are weighted by the
number of children in each cell. The last row of the table reports p-values from F-tests for equality of the coefficients on male and female innovation rates in each regression. There
are 15,499,290 individuals underlying each of the regressions.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
698 THE QUARTERLY JOURNAL OF ECONOMICS
FIGURE X
Geographical Variation in Gender Gaps in Patent Rates
Panel A maps the percentage of female inventors by the state in which they grew
up using our intergenerational sample (United States citizens in the 1980–1984
birth cohorts) (color version available online). Each child is assigned a state based
on ZIP code from which their parents filed their 1040 tax return in the year they
were first claimed as dependents (which is typically 1996, as our data begin in
1996). The map is constructed by dividing the states into unweighted quintiles
based on the female inventor share, with darker shades representing areas where
women account for a larger share of inventors. Panel B lists the commuting zones
(CZs) with the 10 highest and lowest female inventor shares among the 100 CZs
with the largest populations in the 2000 census.
1. Gender-Specific Exposure Effects. Next we examine the
heterogeneity of exposure effects by gender, focusing specifically
on whether girls are more likely to go into innovation if they are
exposed to female inventors as children. As a first step, Figure X
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 699
shows how gender gaps in innovation vary across the areas in
which children grow up using our intergenerational analysis sam-
ple. Panel A maps the fraction of female inventors by the state in
which inventors grew up, and Panel B shows this statistic for the
top 10 and bottom 10 CZs among the 100 largest CZs.29 Although
no state comes close to gender parity, there is significant variation
in the magnitude of the gender gap: 28.7% of children who grow
up to become inventors in Rhode Island are women, as compared
with 11.3% in Idaho.30
To test whether gender-specific differences in exposure to in-
novation lead to the differences in gender gaps in Figure X,we
estimate gender-specific patent rates for workers in each CZ. We
do so using our linked patent-tax sample instead of all patents in
the USPTO data as above because gender is not observed in the
USPTO data.31 As a benchmark, column (1) of Table V replicates
the specification in column (1) of Table IV using this alternative
measure of the CZ-level innovation rate. The raw magnitude of
the coefficient differs because the tax-data-based innovation rate
is scaled differently from the USPTO-based measure. However, a
1 standard deviation increase in the CZ innovation rate is associ-
ated with a 30.8% increase in children’s propensities to innovate,
very similar to the 28.5% estimate obtained above in column (1)
of Table IV.
In Table V, column (2) we regress the fraction of women who
go on to patent in each CZ on the innovation rates for women
and men in that CZ. The coefficient on female innovation rate is
significant and positive, and the coefficient on the male innovation
29. We present this map at the state level because gender-specific patent rates
are noisy in small CZs due to the small number of female inventors.
30. The gender gap is generally smaller in states that score higher on Pope
and Sydnor (2010)’s gender stereotype adherence index on standardized tests in
8th grade, which measures the extent to which children in a state adhere to the
stereotype that boys are better at math and science while girls are better at English
(population-weighted correlation =0.21; Online Appendix Figure IV).
31. Specifically, we define the innovation rate for gender gin CZ jas the total
number of patent applications filed by individuals of gender gborn before 1980
in our full inventors sample divided by the number of individuals between ages
15 and 64 of gender gin CZ jin the 1990 census. We convert this measure to an
annual rate by dividing by 17, as we observe patent applications between 1996 and
2012. We restrict attention to inventors born before 1980 to avoid overlap with the
intergenerational analysis sample we use to study outcomes. Pooling genders, the
populated-weighted correlation across CZs between this measure of innovation
rates and the USPTO-based measure used above is 0.65.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
700 THE QUARTERLY JOURNAL OF ECONOMICS
rate is small and statistically insignificant. Symmetrically, column
(3) shows that male innovation rates are more predictive of boys’
propensities to become inventors than female innovation rates.32
These estimates imply that if girls were as exposed to female
inventors in their childhood CZs as boys are to male inventors,
female innovation rates would rise by 164% and the gender gap
in innovation would fall by 55%.33
One potential concern with the analysis in columns (2) and
(3) is that women may have particularly strong tastes or abilities
to innovate in certain fields (e.g., biology). This could generate
the gender-specific associations in columns (2) and (3) even in the
absence of exposure effects if children live in the same areas as
their parents and the types of jobs (e.g., biology versus informa-
tion technology) vary across places. Columns (4) and (5) evaluate
this concern by examining variation in innovation rates across
patent categories, using a specification with one observation per
CZ by patent category with category fixed effects, as in column (3)
of Table IV. We find very similar patterns in these specifications:
women are more likely to innovate in a particular category if there
were more women innovating in that category in the area where
they grew up. We reject the null hypothesis that the coefficients
are the same for both genders with p<.02 in both of these speci-
fications, implying that the findings in columns (2) and (3) are not
due to selection across categories.
32. We find similar patterns at the individual level—daughters are more likely
to become inventors if their mothers are inventors, while sons are more likely to
become inventors if their fathers are inventors—but the coefficients are impre-
cisely estimated because there are so few female inventors among parents in our
intergenerational sample.
33. We estimate the counterfactual innovation rate for girls by adding to the
current innovation rate for girls the difference in exposure to own-sex inventors for
boys versus girls multiplied by the coefficient of 2.408 in column (2). To calculate
the difference in the gender gap, we similarly use the estimates of the effect of
exposure to adult female inventors on both boys and girls (columns (2) and (3)) to
predict how the patenting rates of both genders would change if exposure to fe-
male inventors were as high as it is to male inventors. Naturally, these estimates
should be interpreted with caution as they rely on out-of-sample linear extrap-
olations. We defer quantification of the extent to which exposure explains gaps
in innovation by parental income and race to future work, as we lack analogous
measures of exposure along these dimensions because we only observe race in
the NYC school district sample and there are very few inventors who come from
low-income families.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 701
In sum, Table V further supports the hypothesis that expo-
sure to innovation in childhood through one’s neighbors has a
causal effect on children’s propensities to pursue innovation by
providing an additional overidentification test of that hypothesis.
In particular, the results in Table V imply that any confounding
variable would have to vary not just across technology classes but
also in a gender-specific manner. Moreover, these findings suggest
that the differences in rates of innovation across areas where chil-
dren grow up are unlikely to be driven purely by factors such as
schools or segregation emphasized in prior work on neighborhood
effects, as such factors would be unlikely to generate effects that
vary so sharply by gender and technology class.
IV.D. Neighborhood Effects on the Level of Innovation
The technology class–level results in the preceding subsec-
tions show that exposure affects the type of innovation one pur-
sues, but they do not necessarily imply that exposure matters
for whether one chooses to become an inventor to begin with. In
this section, we examine whether exposure also affects the level
of innovation. To do so, we study how the patent rates of chil-
dren who move across areas vary with the age at which they
move. Chetty and Hendren (2018) use this timing-of-move design
to establish that neighborhoods have causal effects on children’s
earnings. Here we use the same design to study the impacts of
neighborhoods on the fraction of children who patent in adulthood.
Intuitively, we ask: “are children who move to high-innovation ar-
eas at younger ages more likely to become inventors (in any field)
themselves?” Under the assumption that the children who make
a given move at earlier versus later ages are comparable to each
other, the answer to this question reveals the extent to which
neighborhoods have causal effects on children’s propensities to
invent.34
1. Empirical Specification. We study the outcomes of chil-
dren who move across CZs exactly once during their childhood
using our intergenerational sample, which we extend to include
birth cohorts 1980–1988 to expand the range of ages at move that
we can observe. Let iindex children. In the sample of one-time
34. Critically, this research design does not require that where people move is
orthogonal to their potential outcomes; it simply requires that the timing of those
moves is unrelated to potential outcomes.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
702 THE QUARTERLY JOURNAL OF ECONOMICS
movers, let midenote the age at which child imoves from origin
CZ oto destination CZ d.Chetty and Hendren (2018) show that
neighborhoods have causal exposure effects on earnings and a va-
riety of other outcomes before age 24; we therefore focus on moves
that occur at or before age 24 in our analysis.35
As in the previous subsection, we define the patent rate
among adults in each CZ as the average number of patents issued
per year (in the full USPTO data) to individuals from a given CZ
between 1980 and 1990, divided by the CZ’s population between
the ages of 15 and 64 in the 1990 census. Let ¯pdand ¯podenote the
patent rates in the destination and origin CZs and od =¯pd¯po
denote the difference in patent rates in the destination versus
origin CZ.
After computing these variables, we regress an indicator for
whether the child becomes an inventor by 2012 (yi) on the mea-
sures of origin and destination patent rates interacted with the
child’s age at move:
(4) yi=a+βmiod +γ1od +γ2Xi+i,
where Xidenotes a control vector that includes age at move fixed
effects, birth cohort fixed effects, and other controls that we vary
across specifications. The key parameter of interest is β,which
captures how a child’s propensity to become an inventor varies
with the age at which he or she moves to an area with higher
patent rates.
2. Identification Assumptions. We can interpret βas the
causal effect of one additional year of exposure to a higher-
innovation area (i.e., an area with higher observed patent rates)
during childhood, under the assumption that the potential
outcomes of children who move to better versus worse areas do
not vary with the age at which they move. Chetty and Hendren
(2018) present a series of tests supporting this orthogonality con-
dition: controlling for unobserved heterogeneity across families
using sibling comparisons in models with family fixed effects,
35. More precisely, Chetty and Hendren (2018,figure IV) demonstrate that
children’s earnings (and other outcomes) decline linearly with age at move (m)
up to age 24 and are constant thereafter. Motivated by this functional form, we
include moves that occur after age 24 by defining mi=24 for such moves to
maximize the precision of our estimates. Excluding moves above age 24 yields
qualitatively similar but less precise estimates.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 703
implementing a set of placebo tests exploiting heterogeneity
in predicted causal effects across subgroups, and validating
the results using experimental designs, for example, from the
Moving to Opportunity Experiment (Chetty, Hendren, and Katz
2016). They also show that the relationship between children’s
outcomes and age at move declines linearly up to age 23 and is
flat thereafter, justifying the linear specification in equation (4).
Furthermore, Chetty and Hendren (2018) provide evidence that
estimates of place effects among movers are externally valid to
the broader population because they find similar results among
those who self-select to move as compared with families displaced
by idiosyncratic events such as hurricanes. Building on these
results, we take the validity of the research design and empirical
specification in equation (4) as given here and apply it to identify
the causal effects of neighborhoods on patent rates.36
3. Results. Table VI reports estimates of βfor several vari-
ants of equation (4). In column (1), we estimate equation (4) in-
cluding origin fixed effects, effectively comparing children who
start in the same CZ but move to different CZs. We obtain an es-
timate of β=−0.08 (p<.01). This estimate implies that if a child
grows up for 20 years in a CZ with a patent rate among adults
that is one standard deviation (0.02 percentage points) above the
mean, then his likelihood of becoming an inventor increases by
20 ×0.08 ×0.02 =0.032 percentage points (22%).
Columns (2) and (3) of Table VI present variants of the
specification in column (1) to assess the robustness of the
estimates. In column (2), we control for the origin patent rate
instead of including origin fixed effects. This more parsimonious
specification yields a very similar estimate of β=−0.08. In
column (3), we include interactions of the change in patent rates
and the origin patent rate with indicators for the child’s birth
cohort to account for the fact that children’s propensities to invent
36. Because patenting is a relatively rare outcome, we lack the precision to
replicate the nonparametric specifications and additional tests implemented by
Chetty and Hendren (2018); for instance, specifications that include family fixed
effects yield point estimates similar to our baseline estimates but are statistically
insignificant. However, given that Chetty and Hendren (2018) establish the valid-
ity of the design for several outcomes that are highly correlated with innovation,
such as earnings and college attendance, we believe the design is likely to be valid
for patenting as well.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
704 THE QUARTERLY JOURNAL OF ECONOMICS
TABLE VI
EXPOSURE EFFECTS ON LEVEL OF INNOVATION:ESTIMATES BASED ON MOVERS
Dependent variable Indicator for inventing by 2014
(1) (2) (3) (4)
Difference in patent 0.0797 0.0806 0.0872
rates ×age at move (0.0170) (0.0164) (0.0172)
Difference in patent rates 3.137 3.166 4.612
(0.339) (0.301) (0.631)
Origin patent rate 3.123 6.369 2.044
(0.146) (0.671) (0.282)
Origin FE ×
Cohort FE ×difference in
patent rates
×
Cohort FE ×origin
patent rate
×
AgeatmoveFE ×××
Cohort FE ×××
N 3,637,481 3,637,481 3,637,481 28,798,471
Mean of dep. var. 0.00139 0.00139 0.00139 0.00138
Std. dev. of dep. var. 0.03732 0.03732 0.03732 0.03707
Mean of difference in
patent rates
0.00004 0.00004 0.00004
Std. dev. of difference in
patent rates
0.00026 0.00026 0.00026
Mean of origin patent
rate
0.00028 0.00028 0.00029
Std. dev. of origin patent
rate
0.00020 0.00020 0.00020
Notes. This table analyzes how a child’s propensity to innovate varies with the amount of time spent during
childhood (before age 24) in a neighborhood with a low versus high fraction of inventors among adults in
the area. The sample consists of children in an extended intergenerational sample (1980–1988 birth cohorts)
whose parents are not inventors. Each column presents estimates from a separate OLS regression run at
the individual level. The dependent variable in each regression is an indicator for whether the child is an
inventor. Columns (1)–(3) include children whose parents moved across CZs exactly once between 1996 and
2014. Children’s origin and destination CZ’s are coded based on the ZIP codes from which their parents filed
taxesineachyear.AsinTable I V, each CZ’s mean patent rate among adults is defined as the average number
of patents issued per year (in the full USPTO data) to individuals in a given CZ between 1980 and 1990
divided by the CZ’s population between the ages of 15 and 64 in the 1990 census. The variable difference in
patent rates is the patenting rate of adults in the destination CZ minus that in the origin CZ. Age at move
refers to the child’s age at the time of the parent’s move; if the age at move is above 24, it is recoded to 24 given
Chetty and Hendren’s (2018) finding that neighborhood exposure matters only up to age 24. The youngest
moves in this sample occur at age 9 and the oldest, prior to recoding, at 32. The coefficient on difference in
patent rates ×age at move can be interpreted as the causal effect of one additional year of exposure to a
higher-innovation area (i.e., an area with higher observed patent rates) during childhood. Column (1) includes
indicators for the child’s birth cohort and age at move as well as origin CZ fixed effects as additional controls.
Column (2) controls for origin patent rates among adults rather than origin fixed effects. Column (3) shows
robustness of the estimates to interacting the controls in column (2) with birth cohort. Finally, column (4)
replicates the specification in column (1) of Table IV in the extended intergenerational sample as a reference.
Here, we regress an indicator for being an inventor on the patent rates of adults in the first CZ in which
we observe the child, which we call the Origin CZ for the purpose of this table. Standard errors, reported in
parentheses, are unclustered in columns (1)–(3) and are clustered by Origin CZ in column (4).
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 705
by 2012 will naturally vary across cohorts. This specification
again yields quite similar estimates.
To gauge the magnitude of these exposure effect estimates, in
column (4) of Table VI we report estimates from a cross-sectional
regression of an indicator for whether a child invents on the
patent rate of adults in the first CZ in which we observe the child
living, including both movers and nonmovers. This specification
replicates the cross-sectional regression presented above in
column (1) of Table IV using the extended set of birth cohorts
(1980–1988) that we use in our movers analysis. The coefficient
of 2.04 in column (4) implies that a 1 percentage point increase in
the annual patent rate among adults in a CZ is associated with
a 2.04 percentage point increase in the fraction of children who
become inventors. Under our identification assumptions, columns
(1)–(3) imply the causal effect of growing up in a neighborhood
(for 20 years of childhood) with 1 percentage point higher patent
rates among adults increases children’s patent rates in adulthood
by 20 ×0.08 =1.6 percentage points. Hence, approximately 75%
(1.6
2) of the cross-sectional relationship between innovation rates
and children’s probability of inventing documented above in Fig-
ure IX is due to neighborhood-level exposure effects on the level
of innovation.37
The estimates above imply that moving a child from a CZ
that is at the 25th percentile of the distribution in terms of in-
ventors per capita (e.g., New Orleans, LA) to the 75th percentile
(e.g., Austin, TX)—a 1.4 standard deviation change—would in-
crease his or her probability of becoming an inventor by 1.4 ×
0.032 =0.045 percentage points (37%). Exposure to innovation
thus has substantial impacts not just on the types of innovation
children pursue but also on whether or not they become inventors
at all.
37. These results show that moving to an area with higher rates of innovation
earlier in childhood causes children to be more likely to become inventors, but they
do not themselves establish that this causal effect is due to exposure to innovation
itself, rather than exposure to other correlated factors, such as better schools or
higher levels of income. Investigating this, we find that controlling for measures
such as average household income by CZ (interacted with age at move) does not
affect the innovation exposure estimates significantly. Moreover, as discussed in
Section IV.C., the technology class- and gender-specificity of the exposure effects
we document using our first research design indicate that the central driver is
exposure to innovation itself rather than other broader factors.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
706 THE QUARTERLY JOURNAL OF ECONOMICS
V. INVENTORS’CAREERS:THE POTENTIAL FOR LOST EINSTEINS
Are the children from low-income families who do not pursue
innovation (e.g., because of a lack of exposure) those who would
have ended up having highly impactful innovations? Or do the
most productive “stars” overcome the hurdles they face and be-
come inventors regardless of their background, as predicted by
economic models of career selection with barriers to entry (Hsieh
et al. 2016)? In this section, we address this question by analyzing
how the returns to innovation vary with inventors’ characteristics
at birth.
We consider two measures of returns to innovation: inventors’
earnings (a measure of private returns) and patent citations
(a proxy for social impact). As reference, we plot the income
distribution of inventors between ages 40 and 50 in our sample in
Online Appendix Figure V, Panel A. The distribution is extremely
skewed: the median annual income (in 2012 dollars) is $114,000,
the mean is $192,000, and the 99th percentile is $1.6 million.
The private returns to innovation are highly correlated with
their social impact, as measured by citations (Online Appendix
Figure 5, Panel B). Notably, inventors who have patents in the
top 1% of the citation distribution earn more than $1 million a
year between ages 40 and 50, confirming that highly cited patents
are very valued by the market.
Section III.B suggests that the ability to innovate does not
vary significantly with children’s characteristics at birth (race,
gender, or parental income). Under the assumption that ability
does not vary across groups, inventors from underrepresented
groups will have higher observed returns on average if the in-
dividuals who are screened out tend to be those who would have
had the lowest returns (Hsieh et al. 2016). We test whether this is
the case in Figure XI. In Panel A, we compare the mean incomes
of inventors with different characteristics at birth. The first pair
of bars compares individuals from families with incomes above
versus below the 80th percentile of the parental income distribu-
tion using inventors in our intergenerational analysis sample. The
second pair compares minorities (blacks and Hispanics) to nonmi-
norities using inventors in the NYC schools sample. The third pair
compares men and women using the full inventors sample. In all
cases, inventors from the underrepresented groups have similar
or lower earnings on average than those from more advantaged
backgrounds—challenging the view that the individuals from
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
WHO BECOMES AN INVENTOR IN AMERICA? 707
(A) Mean Income
241
143
261
224
122
193
0100 200 300 400
Mean Income in 2012 ($1000)
Par Inc.
Above p80
Par Inc.
Below p80
Non-Minority Minority Male Female
(B) Fraction with Highly Cited Patents
5.7 5.2 5.4
2.6
4.8
5.3
0 2 46810
Pct. of Inventors in Top 5% of Citation Distribution
Par Inc.
Above p80
Par Inc.
Below p80
Non-Minority Minority Male Female
FIGURE XI
Income and Citations of Inventors by Characteristics at Birth
This figure presents how two measures of inventor productivity (income and ci-
tations) differ across various demographic groups. Panel A plots the mean incomes
of inventors in 2012 by their parents’ income, race/ethnicity, and gender. The first
pair of bars uses our intergenerational sample (1980–1984 birth cohorts), divided
into two subgroups based on whether parents’ household income is below or above
the 80th percentile of the parent income distribution. The second pair of bars uses
our NYC schools sample, divided into two subgroups based on race and ethnic-
ity: minorities (blacks and Hispanics) and nonminorities. The third pair of bars
uses our full inventors sample divided by gender. The vertical lines depict 95%
confidence intervals. Panel B replicates Panel A using the fraction of highly cited
inventors as the outcome. Highly cited inventors are defined as inventors whose
patents have citations per coauthor in the top 5% of the distribution among those
in their birth cohort.
Downloaded from https://academic.oup.com/qje/article/134/2/647/5218522 by guest on 06 December 2021
708 THE QUARTERLY JOURNAL OF ECONOMICS
underrepresented groups who do not pursue innovation would
have had low returns.
Figure XI, Panel B replicates this analysis using the proba-
bility of having a highly cited patent (in the top 5% of the distri-
bution of citations among inventors in a given birth cohort) as the
outcome. The patterns are analogous: inventors from underrepre-
sented groups also do not have higher-impact inventions.
Figure XI implies that the probability that an individual be-
comes a star (high-return) inventor is just as sensitive to his or her
conditions at birth as the probability that he innovates at all, as
shown in Figure I, Panel B in the context of parental income. This
finding is consistent with our conclusion above that differences in
exposure to innovation play a key role in generating these gaps.
A lack of exposure (e.g., awareness of innovation as a potential
career) is likely to reduce the probability that individuals pursue
innovation uniformly across all levels of productivity. In contrast,
this result challenges standard economic models that explain dif-
ferences in occupational choice purely by differences in barriers
to entry across subgroups (e.g., Hsieh et al. 2016), because such
models predict that the marginal inventors who are screened out
are those with lower potential. To explain the patterns, the fac-
tors that generate barriers to entry must also reduce individuals’
productivity after entering innovation (e.g., discrimination).38
Regardless of the explanation, the key implication
of Figure XI is that there are many “lost Einsteins”—individuals
who do not pursue a career in innovation even though they could
have had highly impactful innovations had they done so. To quan-
tify the amount of lost innovation, we consider a counterfactual
under which women, minorities, and children from low-income
(bottom 80%) families invent at the same rate as white men from
high-income (top 20%) families.39 In this scenario, there would be
4.04 times as many inventors in the United States as there are
38.