Using Extended Genealogy to Estimate Components of
Heritability for 23 Quantitative and Dichotomous Traits
Noah Zaitlen1*, Peter Kraft2,3,4, Nick Patterson4, Bogdan Pasaniuc5, Gaurav Bhatia2,3,4,
Samuela Pollack2,3,4, Alkes L. Price2,3,4*
1Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, California, United States of America, 2Department of Epidemiology,
Harvard School of Public Health, Boston, Massachusetts, United States of America, 3Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts,
United States of America, 4Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America,
5Interdepartmental Program in Bioinformatics Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, California, United States of America
Important knowledge about the determinants of complex human phenotypes can be obtained from the estimation of
heritability, the fraction of phenotypic variation in a population that is determined by genetic factors. Here, we make use of
extensive phenotype data in Iceland, long-range phased genotypes, and a population-wide genealogical database to
examine the heritability of 11 quantitative and 12 dichotomous phenotypes in a sample of 38,167 individuals. Most previous
estimates of heritability are derived from family-based approaches such as twin studies, which may be biased upwards by
epistatic interactions or shared environment. Our estimates of heritability, based on both closely and distantly related pairs
of individuals, are significantly lower than those from previous studies. We examine phenotypic correlations across a range
of relationships, from siblings to first cousins, and find that the excess phenotypic correlation in these related individuals is
predominantly due to shared environment as opposed to dominance or epistasis. We also develop a new method to jointly
estimate narrow-sense heritability and the heritability explained by genotyped SNPs. Unlike existing methods, this approach
permits the use of information from both closely and distantly related pairs of individuals, thereby reducing the variance of
estimates of heritability explained by genotyped SNPs while preventing upward bias. Our results show that common SNPs
explain a larger proportion of the heritability than previously thought, with SNPs present on Illumina 300K genotyping
arrays explaining more than half of the heritability for the 23 phenotypes examined in this study. Much of the remaining
heritability is likely to be due to rare alleles that are not captured by standard genotyping arrays.
Citation: Zaitlen N, Kraft P, Patterson N, Pasaniuc B, Bhatia G, et al. (2013) Using Extended Genealogy to Estimate Components of Heritability for 23 Quantitative
and Dichotomous Traits. PLoS Genet 9(5): e1003520. doi:10.1371/journal.pgen.1003520
Editor: Peter M. Visscher, The University of Queensland, Australia
Received September 27, 2012; Accepted April 6, 2013; Published May 30, 2013
Copyright: ? 2013 Zaitlen et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was funded by NIH grant R03HG005732 (NZ and ALP), NIH fellowship 5T32ES007142-27 (NZ), and the Rose Traveling Fellowship Program in
Chronic Disease Epidemiology and Biostatistics (NZ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: firstname.lastname@example.org (NZ); email@example.com (ALP)
Although genome-wide association studies (GWAS) have
resulted in the discovery of thousands of novel associations of loci
to hundreds of phenotypes , concerns have been raised about
the finding that these loci appear to explain a relatively small
proportion of the estimated heritability, the fraction of phenotypic
variation in a population that is due to genetic variation . This
has led to considerable speculation by researchers about the
genetic basis of complex human phenotypes and the ‘‘missing
heritability’’, i.e. the fraction of heritability not accounted for by
the associations discovered to date [3,4,5,6,7,8,9]. Among the
proposed explanations for missing heritability is the existence of
many presently unidentified common variants with small effect
sizes, rare variants not captured by current genotyping platforms,
structural variants, epistatic interactions, gene-environment inter-
actions, parent-of-origin effects, or inflated heritability estimates
[3,5,10]. Studies that examine the sources of missing heritability
can help researchers to evaluate the prospects of future studies
focusing on common versus rare variation and thereby devise
effective strategies to discover the remaining sequence variants that
affect disease risk and other aspects of phenotypic variation in
The narrow-sense heritability of a phenotype (h2) is the fraction
of phenotypic variance that can be described by an additive model
over the set of SNPs that are functionally related to the phenotype
(i.e. the causal SNPs) . It is commonly estimated by comparing
the phenotypic correlation of monozygotic (MZ) to that of
dizygotic (DZ) twins. The difference between h2and the fraction
of phenotypic variance accounted for by variants discovered by
means of GWAS (h2
gwas) is the so-called missing heritability.
Recently, Yang et al  developed a method to estimate the
variance explained by all SNPs on a genotyping platform including
those that are not genome-wide significant (h2
limit of h2
gwasfor infinite sample size.
There are two major challenges in comparing h2and h2
g), representing the
quantify missing heritability. First, there is the potential for
inflation of h2estimates based on closely related individuals such as
MZ/DZ twins. It is well known that epistatic interactions can
inflate heritability estimates in studies of related individuals .
PLOS Genetics | www.plosgenetics.org1May 2013 | Volume 9 | Issue 5 | e1003520
Recent work from Zuk et al  has examined this in detail.
Other factors that could also lead to inflated estimates of h2using
closely related pairs of individuals include dominance and shared
environment. Second, there is a tradeoff between inflation and
sampling variance when estimating h2
component approach described by Yang et. al results in inflated
[12,14,15,16,17]. However, removing related individuals reduces
the sample size, resulting in a larger standard error around the
estimate [18,19]. Both of these issues can adversely affect estimates
of missing heritability.
Here, we analyze the heritability of 23 complex phenotypes in
an Icelandic cohort of 38,167 individuals, leveraging both a
population-wide genealogical database and genotype data from
over 300,000 SNPs that have been long-range phased across and
between chromosomes (i.e. where not only the phase, but also the
parental origin of alleles has been determined) . Importantly,
we develop an approach that allows h2to be estimated on the basis
of both closely and distantly related pairs of individuals. We find,
for all of the quantitative phenotypes, that our estimates of h2are
smaller than those from the literature that were based on MZ/DZ
twins . Our results indicate that previous estimates were
inflated by the impact of epistasis or shared environment.
We further introduce a new variance components method that
provides simultaneous estimates of h2and h2
principal advantages. First, by adequately taking account of both
closely and distantly related pairs of individuals, it minimizes the
standard error of the estimates, whilst avoiding the upward bias that
canresultfromcalculations based on closely related pairs.Second, it
produces both estimates of heritability for the same population
sample, ensuring that h2and h2
gare directly comparable.
For most of the 23 phenotypes examined here, our results show
identified many SNPs with large effect sizes (i.e. h2
gwasby a considerable margin, it follows that
g. The recent variance
g. This method has two
gaccounts for more than half of h2. As GWAS have not
gwasis small), and
gis greater than h2
there must be many associated sequence variants that remain to be
discovered, i.e. these phenotypes are highly polygenic. Currently,
only common variants are well captured by the genotyping arrays
used in most GWAS studies. As the difference between h2
is likely due to common and rare variants not captured by the
genotyping array , it may be assumed that a fair number of
association signals remain to be identified through more compre-
hensive approaches, such as whole genome-sequencing. However,
our estimates of h2
gshow that GWAS genotyping arrays capture a
greater proportion of h2than indicated by previous twin-based
estimates of h2.
Overview of methods
Below, we provide an overview of the approaches we used to
estimate various components of heritability. The details of these
approaches are provided in the Methods section.
We used a linear mixed model approach to estimate compo-
nents of heritability. In this approach, each phenotype is modeled
using a multivariate normal distribution. Each of the components
of heritability that we estimated corresponds to a different model
of the phenotypic covariance.
Narrow-sense heritability (h2) estimates from variance compo-
nent models rely on covariance matrices specifying the genome-
wide genetic relatedness of individuals in the data set. An estimate
of h2can be obtained by using an identity-by-descent (IBD) based
covariance matrix, which is trivial to obtain from long-range
phased genotype data (see below).
The fine-scale estimates of IBD used here rely on long-range
phasing data that are not available in most data sets. An estimate
of h2can also be obtained by using an identity-by-state with
threshold (IBS.t) based covariance matrix with all values below a
threshold t set to 0, i.e. focusing on closely related individuals. An
alternative is to use the full IBS based covariance matrix to obtain
an estimate of the heritability explained by genotyped SNPs (h2
however, this requires removing related individuals . If related
individuals are included, the resulting estimate is neither an
estimate of h2nor an estimate of h2
Previous approaches to estimating the heritability explained by
genotyped SNPs (h2
g) required filtering related individuals, thereby
increasing the standard error of the estimates. However, joint
estimates of h2and h2
gcan be obtained using two covariance
matrices based on IBS.t and IBS. The first component provides
an estimate of h2, and the second provides an estimate of h2
approach removes the need to filter related individuals. Alternate-
ly, joint estimates of h2and h2
covariance matrices based on IBD and IBS, where here IBD
replaces IBS.t to estimate h2.
Broad-sense heritability (H2) is the sum of additive, dominant,
and epistatic components of heritability. The additive, dominant,
environmental (ADE) model can be used to obtain joint estimates
of dominance and additive components of heritability, using two
covariance matrices based on IBD2 (two copies shared IBD) and
Below, we investigate all of these modeling approaches. Table
S1 contains definitions of all parameters quantifying components
of heritability that are used in the text.
gcan be obtained using two
Estimates of narrow-sense heritability (h2)
Ideally, estimates of narrow-sense heritability of a particular
phenotype would be based on a genetic relationship matrix
Phenotype is a function of a genome and its environment.
Heritability is the fraction of variation in a phenotype
determined by genetic factors in a population. Current
methods to estimate heritability rely on the phenotypic
correlations of closely related individuals and are poten-
tially upwardly biased, due to the impact of epistasis and
shared environment. We develop new methods to
estimate heritability over both closely and distantly related
individuals. By examining the phenotypic correlation
among different types of related individuals such as
siblings, half-siblings, and first cousins, we show that
shared environment is the primary determinant of inflated
estimates of heritability. For a large number of pheno-
types, it is not known how much of the heritability is
explained by SNPs included on current genotyping
platforms. Existing methods to estimate this component
of heritability are biased in the presence of related
individuals. We develop a method that permits the
inclusion of both closely and distantly related individuals
when estimating heritability explained by genotyped SNPs
and use it to make estimates for 23 medically relevant
phenotypes. These estimates can be used to increase our
understanding of the distribution and frequency of
functionally relevant variants and thereby inform the
design of future studies.
Components of Heritability via Extended Genealogy
PLOS Genetics | www.plosgenetics.org2May 2013 | Volume 9 | Issue 5 | e1003520
8. Hill WG, Goddard ME, Visscher PM (2008) Data and theory point to mainly
additive genetic variance for complex traits. PLoS Genet 4: e1000008.
9. Wray NR, Purcell SM, Visscher PM (2011) Synthetic associations created by
rare variants do not explain most GWAS results. PLoS Biol 9: e1000579.
10. Zuk O, Hechter E, Sunyaev SR, Lander ES (2012) The mystery of missing
heritability: Genetic interactions create phantom heritability. Proc Natl Acad
Sci U S A.
11. Visscher PM, Hill WG, Wray NR (2008) Heritability in the genomics era–
concepts and misconceptions. Nat Rev Genet 9: 255–266.
12. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, et al. (2010)
Common SNPs explain a large proportion of the heritability for human height.
Nat Genet 42: 565–569.
13. Falconer DS (1986) Introduction to quantitative genetics. Burnt Mill, Harlow,
Essex, England New York: Longman Scientific & Technical ; Wiley. viii, 340 p.
14. Lango Allen H, Lettre G, Estrada K, Berndt MN, Weedon MN, Abecasis GR,
Boehnke M, Gieger C, Gudbjartsson D, Heard-Costa NL, Jackson AU,
McCarthy MI, Rivadeneira F, Smith A, Soranzo N, Uitterlinden AG, Frayling
TM, Hirschhorn JN, GIANT Consortium. The identification of over 135 loci
involved in adult height variation provides important insights into the
contribution of common variation to a model complex trait. Talk presented at
the 59th annual meeting of the American Society of Human Genetics, October
22, 2009, Honolulu, HI.
15. Pasaniuc B, Zaitlen N, Lettre G, Chen GK, Tandon A, et al. (2011) Enhanced
statistical tests for GWAS in admixed populations: assessment using African
Americans from CARe and a Breast Cancer Consortium. PLoS Genet 7:
16. So HC, Li M, Sham PC (2011) Uncovering the total heritability explained by all
true susceptibility variants in a genome-wide association study. Genet Epidemiol
17. Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, et al. (2010) Variance
component model to account for sample structure in genome-wide association
studies. Nat Genet 42: 348–354.
18. Deary IJ, Yang J, Davies G, Harris SE, Tenesa A, et al. (2012) Genetic
contributions to stability and change in intelligence from childhood to old age.
Nature 482: 212–215.
19. Vattikuti S, Guo J, Chow CC (2012) Heritability and Genetic Correlations
Explained by Common SNPs for Metabolic Syndrome Traits. PLoS Genet 8:
20. Kong A, Masson G, Frigge ML, Gylfason A, Zusmanovich P, et al. (2008)
Detection of sharing by descent, long-range phasing and haplotype imputation.
Nat Genet 40: 1068–1075.
21. Kendler KS, Neale MC, Kessler RC, Heath AC, Eaves LJ (1993) A test of the
equal-environment assumption in twin studies of psychiatric illness. Behav Genet
22. Falconer DS (1989) Introduction to quantitative genetics. Burnt Mill, Harlow,
Essex, England New York: Longman Wiley. xii, 438 p. p.
23. Powell JE, Visscher PM, Goddard ME (2010) Reconciling the analysis of IBD
and IBS in complex trait studies. Nat Rev Genet.
24. Browning SR, Browning BL (2010) High-resolution detection of identity by
descent in unrelated individuals. Am J Hum Genet 86: 526–539.
25. Gusev A, Lowe JK, Stoffel M, Daly MJ, Altshuler D, et al. (2009) Whole
population, genome-wide mapping of hidden relatedness. Genome Res 19: 318–
26. Visscher PM, Medland SE, Ferreira MA, Morley KI, Zhu G, et al. (2006)
Assumption-free estimation of heritability from genome-wide identity-by-descent
sharing between full siblings. PLoS Genet 2: e41. doi:10.1371/journal.p-
27. Visscher PM, Macgregor S, Benyamin B, Zhu G, Gordon S, et al. (2007)
Genome partitioning of genetic variation for height from 11,214 sibling pairs.
Am J Hum Genet 81: 1104–1110.
28. Price AL, Helgason A, Thorleifsson G, McCarroll SA, Kong A, et al. (2011)
Single-tissue and cross-tissue heritability of gene expression via identity-by-
descent in related or unrelated individuals. PLoS Genet 7: e1001317.
29. Browning SR, Browning BL (2013) Identity-by-descent-based heritability
analysis in the Northern Finland Birth Cohort. Hum Genet 132: 129–138.
30. Visscher PM, McEvoy B, Yang J (2010) From Galton to GWAS: quantitative
genetics of human height. Genet Res (Camb) 92: 371–379.
31. Silventoinen K, Sammalisto S, Perola M, Boomsma DI, Cornes BK, et al. (2003)
Heritability of adult body height: a comparative study of twin cohorts in eight
countries. Twin Res 6: 399–408.
32. Hayes BJ, Visscher PM, Goddard ME (2009) Increased accuracy of artificial
selection by using the realized relationship matrix. Genet Res (Camb) 91: 47–60.
33. Lee SH, Wray NR, Goddard ME, Visscher PM (2011) Estimating Missing
Heritability for Disease from Genome-wide Association Studies. Am J Hum
Genet 88: 294–305.
34. Stahl EA, Wegmann D, Trynka G, Gutierrez-Achury J, Do R, et al. (2012)
Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis.
35. Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits.
Sunderland, Mass.: Sinauer. xvi, 980 p. p.
36. Helgason A, Palsson S, Gudbjartsson DF, Kristjansson T, Stefansson K (2008)
An association between the kinship and fertility of human couples. Science 319:
37. Deary IJ, Yang J, Davies G, Harris SE, Tenesa A, et al. (2012) Genetic
contributions to stability and change in intelligence from childhood to old age.
38. Goldstein DB (2009) Common genetic variation and human traits. N Engl J Med
39. McClellan J, King MC (2010) Genetic heterogeneity in human disease. Cell 141:
40. Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: A Tool for
Genome-wide Complex Trait Analysis. Am J Hum Genet 88: 76–82.
41. Wasserman L (2005) All of Statistics: Springer.
42. Pilia G, Chen WM, Scuteri A, Orru M, Albai G, et al. (2006) Heritability of
cardiovascular and personality traits in 6,148 Sardinians. PLoS Genet 2: e132.
43. Towne B, Czerwinski SA, Demerath EW, Blangero J, Roche AF, et al. (2005)
Heritability of age at menarche in girls from the Fels Longitudinal Study.
Am J Phys Anthropol 128: 210–219.
44. Murabito JM, Yang Q, Fox C, Wilson PW, Cupples LA (2005) Heritability of
age at natural menopause in the Framingham Heart Study. J Clin Endocrinol
Metab 90: 3427–3430.
45. Feitosa MF, Borecki I, Hunt SC, Arnett DK, Rao DC, et al. (2000) Inheritance
of the waist-to-hip ratio in the National Heart, Lung, and Blood Institute Family
Heart Study. Obes Res 8: 294–301.
46. Yang J, Manolio TA, Pasquale LR, Boerwinkle E, Caporaso N, et al. (2011)
Genome partitioning of genetic variation for complex traits using common
SNPs. Nat Genet 43: 519–525.
Components of Heritability via Extended Genealogy
PLOS Genetics | www.plosgenetics.org11 May 2013 | Volume 9 | Issue 5 | e1003520