Page 1

PROCEEDINGSOpen Access

Estimating heritability using family and unrelated

individuals data

Priya B Shetty, Huaizhen Qin, Junghyun Namkung, Robert C Elston, Xiaofeng Zhu*

From Genetic Analysis Workshop 17

Boston, MA, USA. 13-16 October 2010

Abstract

For the family data from Genetic Analysis Workshop 17, we obtained heritability estimates of quantitative traits Q1

and Q4 using the ASSOC program in the S.A.G.E. software package. ASSOC is a family-based method that estimates

heritability through the estimation of variance components. The covariate-adjusted mean heritability was 0.650 for

Q1 and 0.745 for Q4. For the unrelated individuals data, we estimated the heritability of Q1 as the proportion of

total variance that can be accounted for by all single-nucleotide polymorphisms under an additive model. We

examined a novel ordinary least-squares method, a naïve restricted maximum-likelihood method, and a calibrated

restricted maximum-likelihood method. We applied the different methods to all 200 replicates for Q1. We observed

that the ordinary least-squares method yielded many estimates outside the interval [0, 1]. The restricted maximum-

likelihood estimates were more stable than the ordinary least-squares estimates. The naïve restricted maximum-

likelihood method yielded an average estimate of 0.462 ± 0.1, and the calibrated restricted maximum-likelihood

method yielded an average of 0.535 ± 0.121. Our results demonstrate discrepancies in heritability estimates using

the family data and the unrelated individuals data.

Background

The heritability of a trait is usually calculated using

family data. The identified genetic variants found through

genome-wide association studies account for only a small

portion of heritability for most complex traits [1] com-

pared with the heritability estimated from family data.

This discrepancy in the estimates, the missing heritabil-

ity, is of great interest because the sources of this differ-

ence are still unknown [1]. Recently, Yang et al. [2], using

a novel statistical method, suggested that the missing

heritability can be recovered using the genome-wide

associations of unrelated samples [2]. Because the

Genetic Analysis Workshop 17 (GAW17) data set

included family data and unrelated individuals data for

the same traits [3], we estimated the “heritability” of Q1

with the unrelated individuals data and estimated the

“heritability” of Q1 and Q4 with the family data.

For the family data, the heritability is the narrow sense

heritability, estimated with the polygenetic effect model;

we conducted a George-Elston transformation [4] to esti-

mate the heritability. For the unrelated data, the heritabil-

ity is the proportion of the total variance in a phenotype

that can be described by all single-nucleotide polymorph-

isms (SNPs) under an additive model; we estimated it

using the ordinary least-squares (OLS) method suggested

by Yang et al. [2], a naïve restricted maximum-likelihood

(REML) method, and a calibrated REML method. In all

our analyses, the heritability estimates were obtained after

adjustments for age, sex, and smoking status.

Methods

PEDINFO and ASSOC

For the family data, we chose to use quantitative traits

Q1 and Q4 of four randomly selected data set replicates

(Table 1). We used the Statistical Analysis for Genetic

Epidemiology (S.A.G.E.) software and the PEDINFO and

ASSOC programs. The PEDINFO program calculates

summary statistics about the family data set. The ASSOC

program performs a family-based association test using a

* Correspondence: zhu1@darwin.epbi.cwru.edu

Case Western Reserve University School of Medicine, 2103 Cornell Road,

Cleveland, OH 44106, USA

Shetty et al. BMC Proceedings 2011, 5(Suppl 9):S34

http://www.biomedcentral.com/1753-6561/5/S9/S34

© 2011 Shetty et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons

Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in

any medium, provided the original work is properly cited.

Page 2

polygenic mixed effect model for a quantitative trait, and

it estimates the heritability through the estimation of the

proportion of a polygenic component to the total trait

variance. In our analysis, the heritability estimates were

obtained after adjustments for age, sex, and smoking sta-

tus. The George-Elston transformation was applied for

normality of residual distribution [4]. We did not include

any genotype variables in the model.

OLS and REML estimates

For the unrelated data, we used the OLS method sug-

gested by Yang et al. [2] and the two REML methods to

estimate the heritability of Q1 with all 200 data set

replicates. Here, the heritability refers to the proportion

of the variance in Q1 that can be accounted for by all

SNPs under an additive model [2]. We fitted the mixed

effects model:

yXZue

=++

g

(1)

where y

lated individuals, X

…, xi3) consists of the sex, age, and smoking status of the

ith individual, respectively, g

effect sizes of the covariates, Z

genotype data of m unknown causal variants such that zi=

(zi1,

…,zim),and

z

ij

= −

2 1()

−

fjj

s

if the genotype of the ith individual at the

jth causal variant is aa, aA, or AA, respectively, fjis the

frequency of allele A and sj

prime indicates the transpose of a vector or matrix.

Let the effects of m causal variants be:

yyn

=

=

′

1

( ,...,)

1

consists of trait values of n unre-

xxn

[( ,),...,( ,)]1

1

′ ′ ′

, where xi= (xi1,

g

(

g

=

′

′ ′

,...,

z

′

[ ,...,

1

)

zn

03

consists of the

]

summarizes

=

f

jj

−

2

1

s

, ()12

1

−

−

j

fj

s

,or

1

−

jj

ff

2

2 1(

=−

) . Here the

uuuNI

mu m

=

′

(,..., ) ~( ,0 s

)

1

2

(2)

where su

2is the variance and the residuals be:

eeeNI

ne n

=

′

( ,...,

1

) ~( ,0 s

)

2

(3)

where se

matrix of order n,

Then the variance-covariance matrix of y is:

2is the residual variance, Inis the identity

var( ) yGI

ge n

=+

ss

22

(4)

where G

matrix of causal SNPs and s

rank r (=4 for the GAW17 unrelated individuals data),

and let Pppr

= [ ,..., ],

1

where p

nal eigenvectors corresponding to eigenvalue 1 of idem-

potent matrix IX X X

′

()1

and e P e

=

′ . It follows that:

m ZZ

)

=

′

( /1

is the genetic relationship

s

gu

m

=

. Let X have the

22

pr

1,...,

are all orthogo-

X

n−

′

−

. Let y P y

=

′ , Z P Z

′ ,

=

y

e

Zu NV

=+

~( , ),0

(5)

where:

y

VGI

ge n r

s

==+

−

var( )

s

22

(6)

and

G

mZZ

P GP

′

=

′ =

1

(7)

Note that

Ey

y

ppG p

′

) (

p

ijegijij

−

()

⎡

⎣⎢

⎤

⎦⎥=

+−−

2

22

2ss ( ).

(8)

Thus the slope and intercept of the regression of:

Δ

y

yy

ijij

=−

()2

(9)

on (pi− pj)′G(pi− pj) are sg

Because G is unknown, it is replaced with an estimate.

One naïve estimate is A, the genetic relationship of gen-

ome-wide SNPs. Yang et al. [2] established an unbiased

estimate A* for G by calibrating the prediction error of

genetic relationship G of unobserved causal SNPs.

Replacing G with A* in the regression, we can estimate

the heritability as:

2and 2

2

se, respectively.

hA

A

AA

g

ge

2

2

22

s

s

s

( *)

( *)

( *) ( *)

.

=

+

(10)

Because this estimate is based on OLS, it does not

need iteration. By replacing G with A and A* in the

model given by

yZue

=+

, we can constructed the

Table 1 Heritability estimates for Q1 and Q4 using the family data

Replicate numberQ1Q4

HeritabilityStandard errorHeritabilityStandard error

1

2

52

137

0.608

0.640

0.698

0.655

0.063

0.067

0.103

0.105

0.754

0.687

0.773

0.766

0.106

0.061

0.117

0.104

Shetty et al. BMC Proceedings 2011, 5(Suppl 9):S34

http://www.biomedcentral.com/1753-6561/5/S9/S34

Page 2 of 5

Page 3

naïve and calibrated REML estimates by maximizing the

likelihood of (,).

ss

ge

22

Results

Heritability estimates using the family data

In the family data, 697 individuals (202 founders and 495

nonfounders) form eight pedigrees. The pedigrees all have

four generations of family members and a mean size of

87.13 individuals (range, 73–128). The pedigrees include

194 sibships with a mean size of 2.55 (range, 1–9). In the

four randomly selected replicates, the heritability estimates

for Q1 ranged from 0.608 to 0.698 with an average of

0.650; the heritability estimates for Q4 ranged from 0.687

to 0.773 with an average of 0.745 (Table 1).

Heritability estimates using the unrelated individuals data

The unrelated individuals data consist of genotypes of

24,487 SNPs and 200 replicates of 697 individuals for Q1.

The OLS estimates of the heritability were apparently

unstable (Figure 1), because many of them were outside

the interval [0, 1]. We computed the mean and standard

deviation of all 200 heritability estimates, including those

greater than 1 or less than 0. Over the 200 replicates, the

average heritability estimate for Q1 was μ = 0.555 with

standard deviation s = 0.480 after correcting for age, sex,

and smoking status.

We found that the REML estimates for Q1 were more

stable than estimates obtained using the OLS method

(Figure 2). After accounting for age, sex, and smoking

status, the 200 naïve REML estimates yielded an average

heritability estimate of 0.462 ± 0.999, and the calibrated

REML estimates yielded an average heritability estimate

of 0.5351 ± 0.1206 for Q1.

We were unable to obtain REML estimates for Q4

because the convergence rate of the REML was extre-

mely slow. We found that the convergence of the REML

failed because no SNP contributed any phenotypic varia-

tion in the simulated model [3].

Discussion and conclusions

In our analyses, we estimated heritability using both the

family data for Q1 and Q4 and the unrelated individuals

data for Q1. The heritability estimates for Q1 and Q4

using the family data appeared stable and reasonable. In

the simulation, Q1 has a heritability of 0.575, where

0.135 is due to the 39 causal SNPs and 0.440 is due to a

polygenic component, and Q4 has a heritability of 0.70

resulting from a polygenic effect. The mean heritability

estimates for Q1 and Q4 with the family data were 0.650

and 0.745, respectively.

The heritability estimates using the unrelated indivi-

duals data seem less reasonable. The OLS method did

not work well for the GAW17 unrelated individuals data

because the method was designed for genome-wide

common SNPs. In the GAW17 unrelated individuals

data, most of the SNPs are rare variants and a few of

02040?

60?

80?

100?

120?

140 160180?

200

-1

-0.5

0

0.5?

1

1.5?

2

2.5?

μ μ μ μ?

μ μ μ μ? - 3σ σ σ σ?

μ μ μ μ? + 3σ σ σ σ?

Heritability estimate

Replicate?

?

? ?? ?

Figure 1 OLS estimates of the heritability of Q1. The estimates at many of the 200 replicates were greater than 1 or less than 0. Over the

200 estimates, the average heritability estimate for Q1 was μ = 0.5549 with standard error s = 0.4803.

Shetty et al. BMC Proceedings 2011, 5(Suppl 9):S34

http://www.biomedcentral.com/1753-6561/5/S9/S34

Page 3 of 5

Page 4

them are causal variants. The genetic relationships esti-

mated using many rare variants may be unreliable, and

this results in the instability of the OLS estimates. The

REML approaches appear to be more stable than the

OLS method for Q1. We observed that the heritability

estimates using the unrelated individuals data were less

than those using the family data on average. For exam-

ple, the mean of the heritability estimates for Q1 for the

unrelated individuals data was 0.462 (by naïve REML),

which was 0.188 less than the mean for the family data.

One possible reason is that the polygenic component

(0.440) in Q1 is not due to any SNPs in the GAW17

sequence data set. We should not be able to uncover

the polygenic effect using unrelated samples. However,

the mean naïve REML estimate (0.462) is much larger

than the heritability because of the causal SNPs (0.135).

The reason is that we used all 24,487 SNPs to estimate

the relationships among individuals. There might be

other sources contributing to the heritability estimates.

Finally, we failed to estimate the heritability for Q4

using the unrelated samples because of the convergence

problem, which was the result of no genotyped exonic

SNPs in the data contributing to the phenotypic

variation.

Acknowledgments

The Genetic Analysis Workshop is supported by National Institutes of Health

(NIH) grant R01 GM031575 from the National Institute of General Medical

Sciences. This work was supported by National Cancer Institute grant P30

CAD43703 and NIH grants HL074166, HL086718, R01 HG003054 and R01

HG005854. Some of the results of this paper were obtained by using the

program package S.A.G.E., which is supported by U.S. Public Health Service

Resource Grant RR03655 from the National Center for Research Resources.

We thank the other members of Xiaofeng Zhu’s laboratory for their critiques

and comments.

This article has been published as part of BMC Proceedings Volume 5

Supplement 9, 2011: Genetic Analysis Workshop 17. The full contents of the

supplement are available online at http://www.biomedcentral.com/1753-

6561/5?issue=S9.

Authors’ contributions

PBS performed the statistical analysis of family data and HQ performed the

statistical analysis of the unrelated individuals data. PBS , HQ, JN and XZ

drafted and revised the manuscript. XZ conceived the project, RCE criticized

and edited the manuscript. All authors read and approved the final

manuscript.

Competing interests

The authors declare that there are no competing interests.

Published: 29 November 2011

References

1.Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ,

McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, et al: Finding the

missing heritability of complex diseases. Nature 2009, 461:747-753.

50?

100 150?

200

0

0.1?

0.2?

0.3?

0.4?

0.5?

0.6?

0.7?

0.8?

0.9?

1

μ μ μ μ?

μ μ μ μ? - 3σ σ σ σ?

μ μ μ μ?+ 3σ σ σ σ?

Heritability estimate?

Replicate

a. Naive REML? ??

50?

100 150?

200

0

0.1?

0.2?

0.3?

0.4?

0.5?

0.6?

0.7?

0.8?

0.9?

1

μ μ μ μ?

μ μ μ μ? - 3σ σ σ σ?

μ μ μ μ? + 3σ σ σ σ?

Heritability estimate

Replicate

b. Calibrated REML? ??

Figure 2 REML estimates of heritability of Q1. (a) The relationship A of genome-wide SNPs was used to estimate the relationship G at

unobserved causal SNPs. Over the 200 replicates, the average heritability estimate was μ = 0.4618 with standard error s = 0.0999 after

correcting for age, sex, and smoking status. (b) The calibrated relationship A* was used to estimate the relationship G at unobserved causal SNPs.

Over the 200 replicates, the average heritability estimate was μ = 0.5351 with standard error s = 0.1206 after correcting for age, sex, and

smoking status.

Shetty et al. BMC Proceedings 2011, 5(Suppl 9):S34

http://www.biomedcentral.com/1753-6561/5/S9/S34

Page 4 of 5

Page 5

2.Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR,

Madden PA, Heath AC, Martin NG, Montgomery GW, et al: Common SNPs

explain a large proportion of the heritability for human height. Nat

Genet 2010, 42:565-569.

Almasy LA, Dyer TD, Peralta JM, Kent JW Jr, Charlesworth JC, Curran JE,

Blangero J: Genetic Analysis Workshop 17 mini-exome simulation. BMC

Proc 2011, 5(suppl 9):S2.

George V, Elston RC: Generalized modulus power transformations.

Commun Stat Theory Meth 1988, 17:2933-2952.

3.

4.

doi:10.1186/1753-6561-5-S9-S34

Cite this article as: Shetty et al.: Estimating heritability using family and

unrelated individuals data. BMC Proceedings 2011 5(Suppl 9):S34.

Submit your next manuscript to BioMed Central

and take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at

www.biomedcentral.com/submit

Shetty et al. BMC Proceedings 2011, 5(Suppl 9):S34

http://www.biomedcentral.com/1753-6561/5/S9/S34

Page 5 of 5