Page 1

From the SelectedWorks of Shuo Jiao

July 2011

The use of imputed values in the meta-analysis of

genome-wide association studies.

Contact

Author

Start Your Own

SelectedWorks

Notify Me

of New Work

Available at:http://works.bepress.com/shuo_jiao/6

Page 2

Genetic Epidemiology (2011)

The Use of Imputed Values in the Meta-Analysis of Genome-Wide

Association Studies

Shuo Jiao,1?Li Hsu,2Carolyn M. Hutter,1and Ulrike Peters1,3

1Cancer Prevention Program, Fred Hutchinson Cancer Research Center, Seattle, Washington

2Biostatistics and Biomathematics Program, Fred Hutchinson Cancer Research Center, Seattle, Washington

3Department of Epidemiology, School of Public Health, University of Washington, Seattle, Washington

In genome-wide association studies (GWAS), it is a common practice to impute the genotypes of untyped single nucleotide

polymorphism (SNP) by exploiting the linkage disequilibrium structure among SNPs. The use of imputed genotypes

improves genome coverage and makes it possible to perform meta-analysis combining results from studies genotyped on

different platforms. A popular way of using imputed data is the ‘‘expectation-substitution’’ method, which treats the imputed

dosage as if it were the true genotype. In current practice, the estimates given by the expectation-substitution method are

usually combined using inverse variance weighting (IVM) scheme in meta-analysis. However, the IVM is not optimal as the

estimates given by the expectation-substitution method are generally biased. The optimal weight is, in fact, proportional to the

inverse variance and the expected value of the effect size estimates. We show both theoretically and numerically that the bias

of the estimates is very small under practical conditions of low effect sizes in GWAS. This finding validates the use of the

expectation-substitution method, and shows the inverse variance is a good approximation of the optimal weight. Through

simulation, we compared the power of the IVM method with several methods including the optimal weight, the regular

z-score meta-analysis and a recently proposed ‘‘imputation aware’’ meta-analysis method (Zaitlen and Eskin [2010] Genet

Epidemiol 34:537–542). Our results show that the performance of the inverse variance weight is always indistinguishable from

the optimal weight and similar to or better than the other two methods. Genet. Epidemiol. 2011.

r 2011 Wiley-Liss, Inc.

Key words: GWAS; imputation; bias; meta-analysis; weight

Contract grant sponsor: National Institutes of Health; Contract grant numbers: 5R01 CA059045; 5U01 CA137088; R01AG14358;

P01CA53996; Contract grant sponsors: Division of Cancer Prevention; National Cancer Institute; National Institutes of Health;

Department of Health and Human Services; NIH GEI; Contract grant numbers: Z01 CP 010200; U01 HG 004438.

?Correspondence to: Shuo Jiao, Cancer Prevention Program, Fred Hutchinson Cancer Research Center, Seattle, Washington.

E-mail: sjiao@fhcrc.org

Received 6 April 2011; Revised 2 June 2011; Accepted 3 June 2011

Published online in Wiley Online Library (wileyonlinelibrary.com).

DOI: 10.1002/gepi.20608

INTRODUCTION

The advance of high-throughput technology makes it

possible to genotype hundreds of thousands of single

nucleotide polymorphisms (SNPs) simultaneously which

allows researchers to examine genetic variation across the

wholegenomeingenome-wide

(GWAS). By testing the association between SNPs and

complex traits and diseases, GWAS have successfully

uncovered hundreds of novel susceptibility loci to date

[Hindorff et al., 2009].

Even though current GWAS platforms include markers

for hundreds of thousands or even millions of SNPs, they

still only directly assay a proportion of the whole genome.

Obviously, if only directly genotyped SNPs are considered,

this can lead to associated SNPs undetected. Another

drawback of the partial coverage is that the selected SNP

panel often varies for different platforms [Barrett and

Cardon, 2006]. When different studies use different

platforms, combining across studies will lead to a much

reduced set of SNPs genotyped in all the studies. For

example, the overlap between the Affymetrix SNP Array

associationstudies

6.0 and Illumina OmniExpress genotyping array is less

than 30%. An effective approach to overcome the afore-

mentioned problems is to impute the untyped SNPs based

on a common reference panel.

The basic idea behind genotype imputation is to take

advantage of the linkage disequilibrium (LD) information

among SNPs. Because of the LD and haplotype structure,

genotyped variants can provide information about untyped

SNPs. It is feasible to use data on genotyped SNPs along

with an appropriate reference panel containing informa-

tion on a larger set of SNPs to predict the genotypes of the

ungenotyped SNPs. Currently, the HapMap project [The

International HapMap Consortium, 2005, 2007] provides

such reference panels, and future studies are likely to

extend to the 1,000 Genomes Project [The 1,000 Genomes

Project Consortium, 2010] or other whole genome or

exome sequence data. The most popular imputation

programs include MACH [Li et al., 2010], IMPUTE

[Marchini et al., 2007], and Beagle [Browning and

Browning, 2009], among others.

There are several approaches to using imputed values in

the association analysis. Suppose a SNP of a given subject i

has genotype gi, where gitakes one of the three values 0, 1,

r 2011 Wiley-Liss, Inc.

Page 3

and 2, the number of copies of one of the alleles (typically

the ‘‘minor’’ or lower frequency allele). The output of an

imputation program usually includes three probabilities:

pi05P(gi50); pi15P(gi51); pi25P(gi52). One method is

to use the most likely genotype (the genotype with the

highest probability) as if it were the true genotype.

However, it has been shown in Lin and Huang [2007] that

this method leads to intrinsically biased estimates because

of the unavoidable discrepancy between the most likely

genotype andthe truegenotype.

approach is the so-called expectation-substitution method.

Instead of using the most likely genotype, this method

usesthedosages,expected

alleles5pi112pi2, as if it were the true genotype. In the

haplotype analysis framework, several studies [Kraft et al.,

2005; Kraft and Stram, 2007; Cordell, 2006] have shown

through a series of simulation experiments that the

expectation-substitution method has no noticeable bias

under practical settings. It is also possible to use Bayesian

methods [Marchini et al., 2007; Servin and Stephens, 2007]

to perform the imputation and the association test at the

same time, however, these methods are usually computa-

tionally intensive and hence not feasible on a genome wide

scale. Therefore, in the remaining of the article, we will

focus on the expectation-substitution method.

If multiple studies are imputed using the same

reference, then the different studies have data on a

common set of SNPs, making meta-analysis across studies

possible. Because combining studies increases sample size,

meta-analysis increases power and allows detection of loci

not found in individual studies. One way of performing

meta-analysis is to use the regular z-score meta-analysis

(MetaZ), which combines z-scores weighted by square root

of sample sizes. Alternatively, the effect size meta-analysis

(MetaBeta) combines effect sizes by computing a weighted

average of the estimates. For meta-analysis that involves

imputed genotypes, the imputation quality is an important

factor. Hence, it seems natural that the imputation quality

should also be reflected in the weight for meta-analysis.

For MetaZ, de Bakker et al. [2008] suggested scaling the

weighted sum of z-scores by the imputation quality

measure. Based on this idea, Zaitlen and Eskin [2010]

have recently proposed an ‘‘imputation aware’’ method to

combine z-scores. In the ‘‘imputation aware’’ method, the

weight for the z-score of each study is proportional toR

where R2is the imputation quality measure and n is the

sample size. Results has shown the ‘‘imputation aware’’

method is more powerful than the regular z-score meta-

analysis when the imputation quality varies among

studies [Zaitlen and Eskin, 2010].

For MetaBeta, most studies use the traditional inverse

variance weighting (IVM) to combine estimates from

imputed and genotyped SNPs in current practice [Soranzo

et al., 2009; Willer et al., 2008]. However, it is unknown

whether the IVM is the optimal weighting scheme under

this situation. In this article, we address this question. For

imputed SNPs, we find that the optimal weight is

proportional to both the expected value and inverse

variance of estimates given by the expectation-substitution

method. While the expectation-substitution method does

not give unbiased estimators in general, the bias is usually

very small under practical situations of GWAS. Based on

this finding, we show that the inverse-variance weighting

scheme is a good approximation of the optimal weight for

the meta-analysis of imputed SNPs. These results are

Anotherpopular

numberof minor

ffiffiffi

n

p

,

important, because they validate that the expectation-

substitution method and the IVM scheme currently being

used in GWAS meta-analysis are adequate and close to be

optimal in GWAS settings.

MATERIALS AND METHODS

MODELS

Consider a case-control study of n individuals. For a

given SNP, suppose for subject i, i ¼ 1;...;n, the true

genotype is gi50, 1, or 2 and the disease status is di50 or 1,

where 0 indicates control and 1 indicates case, then the

standard logistic model for modeling the association

between the SNP and disease status is:

mðgi;b0;b1Þ ? Pðdi¼ 1;b0;b1Þ ¼

expðb01b1giÞ

11expðb01b1giÞ:

ð1Þ

Note that model (1) is designed for a prospective study

where subjects are first selected, then followed up for

disease development. However, in many GWAS, the study

design is retrospective. In a seminal article by Prentice and

Pyke [1979], the authors showed that it is valid to apply

model (1) to a case-control study as if the data were

prospectively collected and the resulting estimators of b1

are consistent to the true values and asymptotically

normal. Because of its simplicity and the appealing

interpretation of exp(b1) which approximates relative risk

in rare disease, model (1) has been widely used in practice

and will be used throughout this article.

If the genotype for this given SNP is unknown, the

expectation-substitution method replaces the unknown

genotype by the dosage from the imputation ? gi¼ pi112pi2.

In this case, model (1) becomes

mð? gi;b0;b1Þ ¼ Pðdi¼ 1;b0;b1Þ ¼

expðb01b1? giÞ

11expðb01b1? giÞ:

ð2Þ

The likelihood function can be written as:

Lðb0;b1Þ ¼

Y

n

i¼1

mð? gi;b0;b1Þdif1 ? mð? gi;b0;b1Þg1?di:

ð3Þ

By Taylor’s expansion, the maximum likelihood esti-

mator ð^b0;^b1Þ for (b0,b1) satisfies

ð^b0;^b1Þ0¼ ðb0;b1Þ01Iðb?

where

0;b?

1Þ?1Uðb0;b1Þ;

ð4Þ

Uðb0;b1Þ ¼ n?1X

n

i¼1

fdi?mð? gi;b0;b1Þg;

X

n

i¼1

? gifdi?mð? gi;b0;b1Þg

"#0

ð5Þ

;

Iðb?

0;b?

1Þ ¼ ?n?1

@2log Lðb?

0;b?

1Þ

@b2

0

@2log Lðb?

@b0@b1

@2log Lðb?

0;b?

1Þ

@2log Lðb?

@b1@b0

0;b?

1Þ

0;b?

1Þ

@b2

1

0

B

B

B

@

1

C

C

C

A;

ð6Þ

and ðb?

0;b?

1Þ is on the line segment joining ð^b0;^b1Þ and (b0,b1).

2 Jiao et al.

Genet. Epidemiol.

Page 4

Taking the expectation of U(b0,b1) in Equation (4), we have

0

B

EfUðb0;b1Þg ¼

n?1P

n?1P

n

i¼1

n

fEðdijpi0;pi1;pi2Þ ? mð? gi;b0;b1Þg

i¼1

? gifEðdijpi0;pi1;pi2Þ ? mð? gi;b0;b1Þg

B

@

1

C

ð7Þ

C

A;

When b150 (no association) or one of pi0, pi1, pi2 is 1

(perfectly imputed), it is obvious that

Eðdijpi0;pi1;pi2Þ ? mð? gi;b0;b1Þ

¼ pi0mð0;b0;b1Þ1pi1mð1;b0;b1Þ1pi2mð2;b0;b1Þ

? mðpi112pi2;b0;b1Þ

¼ 0

and^b1is unbiased. Therefore, the expectation-substitution

method does not cause potential inflation in type I error

rate. On the other hand, if b16¼ 0 and the imputation is

imperfect,^b1from (4) is biased, which as we show below,

could cause potential problems.

OPTIMAL WEIGHT FOR META-ANALYSIS

WITH IMPUTED VALUES

Suppose for a given imputed SNP, the b1ð6¼ 0Þ estimate

from (4) in the ith study (i ¼ 1;...;M) is^bi

variance for^bi

the estimate for b1from the meta-analysis is

1; the estimated

1is^Vi; the weight for the ith study is wi, then

^bmeta

1

¼

X

M

i¼1

wi^bi

1:

Denote Eð^bi

1Þ by mi, the test statistic is

PM

^bmeta

1

=seð^bmeta

1

Þ ¼

i¼1wi^bi

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1

PM

i¼1w2

i^Vi

q

! N

PM

i¼1wimi

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

PM

i¼1w2

iVi

q

;1

0

@

B

1

A:

C

ð8Þ

Based on (8), the optimal weight to maximize the power

to detect the association is equivalent to maximizing

0

@

A simple derivation shows that wineeds to be propor-

tional to mi/Viin order to maximize (9). Hence, even if the

effect size is the same across studies, mi may still vary

among studies because variation in imputation quality

between studies will yield a different degree of bias in b1

estimates. This contrasts to the directly genotyped data

where mi5b1 for all studies so wi needs to only be

proportional to 1=^V. However, this optimal weight which

incorporates both the variance and miis hard to estimate in

practice, because of the difficulty in estimating mi.

Fortunately, we can show theoretically that the bias of

^b1 is very small when the true b1is small, regardless of

the imputation quality. For example, when b050 and

b15log(1.2), the bias of^b1¼ jEð^b1Þ ? b1jo0:002; when

b15log(1.5), jEð^b1Þ ? b1jo0:02. Further theoretical details

showing the upperbound of bias are provided in

Appendix A. The theoretical results about the approximate

PM

i¼1wimi

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

PM

i¼1w2

iVi

q

B

1

A

C

2

:

ð9Þ

unbiasedness are also verified by extensive simulations in

the Results section.

Given the approximate unbiasedness of b1estimators,

the optimal weight can therefore be approximated by the

regular inverse variance weight.

INVERSE VARIANCE INCORPORATES IMPU-

TATION QUALITY

We have shown the inverse variance weight can

approximate the optimal weight. For imputed SNPs, it

seems natural that the weight for^b1 should increase as

imputation quality increases. For this reason, we will

explore whether the IVM scheme incorporates imputation

quality.Intheexpectation-substitution

variance of ð^b0;^b1Þ0can be estimated by I?1ð^b0;^b1Þ. Let

hðg;^b0;^b1Þ ¼ mðg;^b0;^b1Þf1 ? mðg;^b0;^b1Þg, we have

d

¼

i?1? g2

method,the

varð^b1Þ

Pn

i¼1hð? gi;^b0;^b1Þ

ihð? gi;^b0;^b1Þ?fPn

Pn

i¼1hð? gi;^b0;^b1ÞPn

The first derivative of hðg;^b0;^b1Þ with respect to g is

^b1expð^b01^b1gÞf1?expð^b0þ^b1gÞg=½ð11expf^b0þ^b1gÞg3?, which

is approximately 0 when^b1 is sufficiently small. Hence,

we can consider hð? gi;^b0;^b1Þ as a constant c, and write

Equation (10) as

i¼1? gihð? gi;^b0;^b1Þg2:

ð10Þ

varð^b1Þ ? ðncÞ?1

1

n?1Pn

i¼1? g2

i? ðn?1Pn

i¼1? giÞ2

? ðncÞ?1varðgiÞðR2Þ?1;

ð11Þ

where R2is the imputation quality measure in MACH [Li

et al., 2010] defined as the ratio of the sample variance of ? gi

and the expected variance of gi, which is equivalent to the

squared correlation between true and imputed genotypes.

From Equation (11), we can see that the inverse variance of

^b1is approximately proportional to the imputation quality.

Thus, we show that the current IVM scheme automatically

incorporates imputation quality in the meta-analysis.

Simulation results confirm the positive correlation between

the imputation quality and inverse variances (see Results

section).

Another interesting observation is that there is a

connection between the IVM scheme and the ‘‘imputation

aware’’ method in Zaitlen and Eskin [2010] through (11).

Note that the IVM estimator can be written as

X

M

i¼1

^V?1

i

^bi

1?

X

M

i¼1

nR2

varðgiÞ=c

^bi

1;

ð12Þ

and the ‘‘imputation aware’’ method can be written as

ffiffiffiffiffi

We can see that the only difference between (12) and (13) is

the var(gi) part. Since var(gi) depends on minor allele

frequency (MAF), we expect those two methods perform

similarly when the MAFs of the SNP across studies are

similar. Generally, we do not expect the MAF varies

much for studies with similar ethnicity. However, if

X

M

i¼1

R

ffiffiffi

n

p

zi¼ R

ffiffiffi

n

p^bi

1=

^Vi

q

?

X

M

i¼1

nR2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

varðgiÞ=c

p

^bi

1;

ð13Þ

3Use of Imputed Values in the Meta-Analysis of GWAS

Genet. Epidemiol.

Page 5

meta-analysis was conducted across different ethnic

groups [Xiong et al., 2009; Chapman et al., 2008], the

MAF variation can be substantial. In such cases, we expect

the IVM method to have better power.

RESULTS

In this section, we first use simulation to demonstrate

the finite sample properties of^b1given by the expectation-

substitution method, such as the approximate unbiased-

ness and relationship between varð^b1Þ and imputation

quality. Then, we compare the power of the IVM method

in the meta-analysis with various other methods.

FINITE SAMPLE PROPERTIES OF^b1

Simulation situations.

of two SNPs, considering a range of MAF combinations

(f1,f2) of the two SNPs, and a range of LD measure as D0. To

We generated the genotypes

mimic the imputation scenario, we assume that genotypes

of the second SNP are unknown, and imputed its dosage

based on the genotypes of the first SNP. We varied the

imputation quality by changing the LD measure D0.

A population of 10,000 was generated based on the logistic

regression model in Equation (1) with genotypes at the

second SNPas the

gi’s,

logð1:5Þ;logð2Þ, corresponding to odds ratios 1.2, 1.5,

and 2. Then 1,000:1,000 case-control samples were ran-

domly selected from this population of 10,000. We fit

model (2) to the case-control samples with the imputed

dosage at the second SNP as ? gi. For comparison, we also

fitted model (1) with the true genotype gi. For each

parameter setting, we replicated the above procedure

10,000 times. The results are summarized in Table I.

When b150, all the estimated type I error rates are well

controlled at the nominal a level 0.05. When b15log(1.2)

and log(1.5), the relative bias of^b1 is very small (o2%),

regardless of the MAF of both SNPs. In contrast, when b1is

b050,and

b1¼ logð1:2Þ;

TABLE I. Simulation results of the expectation-substitution method under various parameter settings based on 10,000

simulated data sets, each has 1,000 cases and 1,000 controls

b1

f1

F2

D0

Bias (%)SESDSD?

95% CP

R2

PowerPower?

00.20.20.5

0.7

0.9

0.99

0.5

0.7

0.9

0.99

0.5

0.7

0.9

0.99

0.5

0.7

0.9

0.99

0.5

0.7

0.9

0.99

0.5

0.7

0.9

0.99

0.5

0.7

0.9

0.99

0.5

0.7

0.9

0.99

NAa

NAa

NAa

NAa

NAa

NAa

NAa

NAa

?0.2

?0.5

?0.1

0.5

0.0

0.1

0.1

?0.1

?1.5

?0.5

?0.6

0.1

?1.4

?1.7

?1.4

?1.5

?4.1

?2.5

?0.8

0.2

?5.1

?4.1

?3.6

?3.0

0.160

0.112

0.088

0.080

0.174

0.123

0.094

0.087

0.159

0.115

0.088

0.080

0.172

0.124

0.095

0.088

0.158

0.115

0.091

0.082

0.172

0.124

0.097

0.089

0.165

0.118

0.094

0.086

0.174

0.125

0.099

0.090

0.158

0.113

0.088

0.080

0.172

0.123

0.096

0.087

0.159

0.113

0.088

0.080

0.172

0.123

0.096

0.087

0.160

0.115

0.090

0.082

0.173

0.124

0.097

0.088

0.161

0.117

0.093

0.085

0.174

0.125

0.099

0.090

0.079

0.079

0.079

0.079

0.069

0.069

0.069

0.069

0.080

0.080

0.080

0.080

0.069

0.069

0.069

0.069

0.081

0.081

0.081

0.081

0.071

0.071

0.071

0.071

0.085

0.085

0.085

0.085

0.074

0.074

0.074

0.074

0.948

0.953

0.952

0.951

0.949

0.947

0.954

0.952

0.950

0.948

0.951

0.950

0.948

0.949

0.951

0.948

0.950

0.949

0.947

0.948

0.950

0.949

0.949

0.948

0.941

0.947

0.946

0.947

0.942

0.945

0.940

0.942

0.250

0.490

0.810

0.980

0.161

0.315

0.521

0.630

0.250

0.490

0.810

0.980

0.161

0.315

0.521

0.630

0.250

0.490

0.810

0.980

0.161

0.315

0.521

0.630

0.250

0.490

0.810

0.980

0.161

0.315

0.521

0.630

0.052b

0.047b

0.048b

0.049b

0.051b

0.053b

0.046b

0.048b

0.211

0.362

0.536

0.630

0.185

0.312

0.475

0.548

0.711

0.942

0.995

0.999

0.637

0.898

0.987

0.996

0.984

1.000

1.000

1.000

0.965

1.000

1.000

1.000

0.048b

0.048b

0.05b

0.048b

0.049b

0.051b

0.052b

0.048b

0.640

0.626

0.624

0.634

0.753

0.750

0.756

0.748

1.000

1.000

1.000

0.999

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

0.40.3

log(1.2) 0.20.2

0.4 0.3

log(1.5)0.2 0.2

0.4 0.3

log(2)0.20.2

0.40.3

b1is the true value; f1and f2are the MAFs for SNP 1 (the marker) and 2 (the disease-causing SNP with missing genotypes), respectively;

D0is the LD measure; Bias (%) is the percentage of relative bias (100ðEð^b1Þ ? b1Þ=b1); SE and SD are standard error and standard deviation

estimates of^b1from 10,000 replicates, respectively; 95% CP is the estimated coverage probability for the 95% confidence interval; R2is the

imputation quality measure; Power is obtained at a significance level of 0.05. SD?and Power?are the counterparts of SD and Power when

fitting the model with true genotypes. MAF, minor allele frequency; SNP, single nucleotide polumorphism; LD, linkage disequilibrium.

aThe percentage of relative bias 100 ðEð^b1Þ ? b1Þ=b1is not defined when b150.

bEstimated type I error rate under the null.

4Jiao et al.

Genet. Epidemiol.

Page 6

larger, log(2),^b1slightly underestimates the true b1and the

bias is greater as the imputation quality worsens. Under

the simulation settings in Table I, for any given MAF

combinations (f1,f2), b0, b1, and D0, we obtained a numeric

solution of b?

system of equations:

1, where b1! b?

1by solving the following

E di?

?

expðb?

11expðb?

expðb?

11expðb?

01b?

01b?

1? giÞ

1? giÞ

01b?

01b?

??

¼ 0

?

E? gi di?

1? giÞ

1? giÞ

¼ 0

ð14Þ

Figure 1 shows that even with the worst imputation

quality in Table I (when D050.5), the bias of^b1 is still

less than 5% for the odds ratio as large as 2. Since it is

less common for the associated alleles identified by

GWAS to have an odds ratio greater than 2 [Hindorff

et al., 2009, 2011], this bias is not really problematic in

GWAS settings.

In Table I, the mean of standard errors (SE) and the

standard deviation (SD) of the estimates over 10,000

simulated data sets agree with each other very well,

suggesting that the SE estimates are reliable. Furthermore,

the SE of^b1decreases as the imputation quality R2increases;

as a result, the power (Power) increases. As a comparison,

we also show the standard deviations of parameter estimates

(SD?) and power (Power?) if the genotypes for SNP 2 are

known. As we can see, SD?is always less than SD and

Power?is always greater than Power, which implies that

there is efficiency loss using imputed genotypes. For

example, when b15log(1.2) and f15f250.2, the power loss

decreases from 67% to 0.6% as the imputation quality

increases. Taken together, we can see that even with very

small R2, the power is still acceptable in many cases using

imputed genotypes. The estimated coverage probabilities are

all very close to the nominal value 0.95, indicating that the

confidence interval estimates are very accurate.

Real imputation data.

performance of the expectation-substitution method in a

more realistic setting, we used GWAS scans from Prostate,

Lung, Colorectal, and Ovarian Cancer Screening Trial

(PLCO) [Prorok et al., 2000; Hayes et al., 2000]. PLCO is a

randomized, two-arm trial coordinated by the NCI in 10

US centers.

In order to explore the

The PLCO data include 2,520 samples, genotyped on

Illumina Human Hap 300k&240k, 550k and 610k plat-

forms. We randomly selected 1,000 genotyped SNPs on

chromosome 22 and masked their genotypes. Then we

used MACH to impute the genotypes of the 1,000 SNPs as

if they were untyped, using HapMap II release 24 as the

reference panel. In this way, we have both the true

genotypes and the imputed dosages. Similarly, as in the

previous section, case-control samples were generated

based on model (1) using true genotypes and^b1 was

estimated by fitting the model (2) with imputed dosages.

We set b1to be 0, log(1.2), log(1.5), and log(2). For each

value of b1, we replicate the procedure 50 times for each of

the 1,000 SNPs. Figure 2 shows a boxplot of the percent of

bias of^b1of SNPs grouped by MAF and R2. We can see that

^b1is approximately unbiased regardless of the imputation

quality R2, which agree with the theoretical results. On the

other hand, the variability of the estimates is much greater

when R2o0.3 and MAFo0.05.

PERFORMANCE OF IVM IN THE META-

ANALYSIS

We generated the data in the same way as the previous

section. Here, we let b1take 10 equally spaced values from

0.05 to log(2), MAFs (f1,f2) of the two SNPs be (0.2,0.2) for

both studies and the LD measure D00.5 and 0.99 for two

studies, respectively. We conducted meta-analysis for the

two studies using the following four methods and

compared their power:

1. The optimal weighting, which is proportional to mi=^Vi.

In practice, it is usually impossible to estimate mi.

However, with b1,f1,f2and D0known in the simulation,

we can compute mifrom (14). Hence, we can estimate

the optimal weight for the purpose of comparison.

2. The IVM method, which is an approximation of the

optimal weighting under practical situations in GWAS.

3. The ‘‘imputation aware’’ method by Zaitlen and Eskin

[2010].

4. The regular z-score meta-analysis (MetaZ) method

without correcting for imputation quality.

b1

Bias(%)

0log(1.2) log(1.5)log(2)

0

f1=0.4;f2=0.3

f1=0.2;f2=0.2

Fig. 1. The theoretical relative bias (%) of^b1 as a function of

true b1. The biases are computed from (14) with different f1, f2,

and b1. b0is fixed at 0 and D0is fixed at 0.5.

123456

0

5

10

15

Bias

1: MAF>0.05 & R2<0.3

2: MAF>0.05 & 0.3<R2<0.6

3: MAF>0.05 & R2>0.6

4: MAF<0.05 & R2<0.3

5: MAF<0.05 & 0.3<R2<0.6

6: MAF<0.05 & R2>0.6

Fig. 2. Boxplot of the bias of^b1of SNPs grouped by different

MAF and R2categories. MAF, minor allele frequency; SNP,

single nucleotide polymorphism.

5Use of Imputed Values in the Meta-Analysis of GWAS

Genet. Epidemiol.

Page 7

As we can see from Figure 3, the optimal weighting,

IVM and the ‘‘imputation aware’’ method have indis-

tinguishable performance. In addition, they are all more

powerful than the regular MetaZ method which does not

account for imputation quality. This confirms that the IVM

method is a good approximation of the optimal weight

and it automatically incorporates the imputation quality.

We also simulated a situation where the MAFs are

different between the two studies, which results in

different var(gi). Instead of letting the MAF50.2 for both

studies, we let the MAF50.1 for the first study and 0.4 for

the second study. The power comparison is shown in

Figure 4. As we expected, the IVM method has better

performance than the ‘‘imputation aware’’ method in this

case because it is an approximation to the optimal weight.

In practice, we would not expect MAFs differ substantially

for studies of similar populations. However, for a cross-

ethnicity meta-analysis, the IVM is superior to the

‘‘imputation aware’’ method since it accounts for the

MAF variation among different ethnic groups.

DISCUSSION

As imputation has been widely used to recover

information from GWAS data, the expectation-substitution

method is the most commonly used method to analyze

imputed SNPs while accounting for genotype uncertainty.

Our work shows, both numerically and theoretically, that

the expectation-substitution method gives approximately

unbiased estimates under practical conditions of low effect

sizes for GWAS studies of common diseases. We also show

that the IVM scheme approximates the optimal weight

well and always has the best power among different meta-

analysis methods compared.

Two recent articles have outlined the advantages of

using meta-analysis, and discussed study design, quality

control, and analysis issues to consider when implement-

ing meta-analysis of GWAS data [Cantor et al., 2010;

Zeggini and Ioannidis, 2009]. These articles address

weighting schemes for combining results, but focus more

on random-effects vs. fixed-effects analysis, rather than on

methods to include imputation quality.

The different imputation software packages provide

information not only on the probability of each genotype

but also an overall imputation quality measure. This

measure is typically defined as the ratio of the sample

variance of the genotype to the expected variance, with

lower scores indicating less well-imputed SNPs. Studies

often exclude SNPs with either low R2or low MAF.

A threshold of imputation R250.3 has been recommended

by MACH as the imputation quality cut-off for estimates

[MACH Homepage]. Our results show that in terms of

bias, the combination of imputation quality and MAF

seems to be most relevant. In particular, we show that the

variability of estimates is large for lower imputation

quality and lower MAF. In current practice, rare variants

(MAFo0.05) are often excluded from imputation and

subsequent meta-analysis. In this situation, either not

using a filter, or using a filter based only on R2is likely

sufficient. However, as meta-analysis grows larger and

data become available to impute rare variants, we

recommend using both the imputation quality and the

MAF to set filtering criterion. For example, in our

simulation results (Fig. 2), the optimal filter appears to

be excluding SNPs with both MAFo0.05 and R2o0.3,

rather than all SNPs with R2o0.3. In this article, we used

the imputation quality measure R2defined by MACH [Li

et al., 2010], which is the squared correlation between true

0.10.20.30.40.50.6 0.7

0.0

0.2

0.4

0.6

0.8

1.0

b1

power

Optimal

IVW

’Imputation Aware’ Z

MetaZ

Fig. 3. The power of optimal weighting (optimal), IVW method,

‘‘imputation aware’’ method (‘‘Imputation Aware’’ Z), and the

regularz-score meta-analysis

(MetaZ) from the meta-analysis of two studies. The MAFs of

the disease-causing SNP in both studies are 0.2. A commonly

used genome-wide P-value cut-off 5?10?8is used as the

significance level. IVM, inverse variance weighting; MAF,

minor allele frequency; SNP, single nucleotide polymorphism.

without imputationquality

0.10.20.30.40.50.60.7

0.0

0.2

0.4

0.6

0.8

b1

power

Optimal

IVW

’Imputation Aware’ Z

MetaZ

Fig. 4. The power of optimal weighting (optimal), IVW method,

‘‘imputation aware’’ method (‘‘Imputation Aware’’ Z), and the

regularz-scoremeta-analysis

(MetaZ) from the meta-analysis of two studies. The MAF of

the disease-causing SNP in the two studies are 0.1 and 0.4,

respectively. A commonly used genome-wide P-value cut-off

5?10?8is used as the significance level. MAF, minor allele

frequency; IVW, inverse variance weighting; SNP, single

nucleotide polymorphism.

without imputationquality

6 Jiao et al.

Genet. Epidemiol.

Page 8

genotypes and imputed dosages. In Beagle [Browning and

Browning, 2009], R2is defined as the squared correlation

between true and the most likely genotypes. To investigate

whether the choice of different quality measures makes

much difference, we randomly chose 10,000 imputed

SNPs on chromosome 22 in the PLCO data [Prorok et al.,

2000; Hayes et al., 2000] and computed their MACH R2

and Beagle R2. It turns out that the two R2’s are highly

correlated (r40.99). Thus, although the cut-offs for the two

R2’s could be slightly different, the general conclusion

should still hold.

As we move into the post-GWAS era, our results

provide important guidance for investigators on how to

optimally conduct meta-analysis in the presence of

imputed genotypes for marginal SNP associations. We

support the current practice of using the expectation-

substitution method and the IVM in meta-analysis.

Additional theoretical and numerical work is needed to

evaluate the use of imputed data in more sophisticated

analysis, including proposed methods for gene-gene and

gene-environment interactions.

ACKNOWLEDGMENTS

We thank two reviewers for their helpful comments.

This work was supported by the National Institutes of

Health(5R01CA059045

R01AG14358, P01CA53996). Genotype data included in

these analyses from the Prostate, Lung, Colorectal, and

Ovarian (PLCO) Cancer Screening Trial was supported by

the Intramural Research Program of the Division of Cancer

Epidemiology and Genetics and supported by contracts

from the Division of Cancer Prevention, National Cancer

Institute, National Institutes of Health, Department of

HealthandHumanServices.

Drs. Christine Berg and Philip Prorok, Division of Cancer

Prevention, National Cancer Institute, the Screening

Center investigators and staff or the Prostate, Lung,

Colorectal, and Ovarian (PLCO) Cancer Screening Trial,

Mr. Tom Riley and staff, Information Management

Services, Inc., Ms. Barbara O’Brien and staff, Westat, Inc.,

and Drs. Bill Kopp, Wen Shao, and staff, SAIC-Frederick.

Most importantly, we acknowledge the study participants

for their contributions to making this study possible.

Data included in these analyses were also generated

from the GWAS of Lung Cancer and Smoking. Funding for

this work was provided through the National Institutes of

Health Genes, Environment and Health Initiative [NIH

GEI] (Z01 CP 010200). The human subjects participating in

the GWAS were from The Environment and Genetics in

Lung Cancer Etiology (EAGLE) case-control study and the

Prostate, Lung, Colon and Ovarian Screening Trial and

these studies are supported by intramural resources of the

National Cancer Institute. Assistance with genotype

cleaning, as well as with general study coordination, was

provided by the Gene Environment Association Studies,

GENEVA Coordinating Center (U01 HG004446). Assis-

tance with data cleaning was provided by the National

Center for Biotechnology Information. Funding support

for genotyping, which was performed at the Johns

Hopkins University Center for Inherited Disease Research,

was provided by the NHI GEI (U01 HG 004438). The data

sets used for the analyses described in this manuscript were

obtained from dbGaP at http://www.ncbi.nlm.nih.gov/

and5U01CA137088,

Theauthorsthank

gap through dbGaP accession number ph000093 v2.p2.c1.

In addition, data generated from the Cancer Genetic

Markers of Susceptibility (CGEMS) [CGEMS] prostate

cancer scan were also included in this analysis. The data

sets used for the analyses described in this manuscript were

accessed with appropriate approval through the dbGaP

onlineresource (http://www.cgems.cancer.gov/data)

through dbGaP accession number 000207 v.1p1.c1.

REFERENCES

Barrett JC, Cardon LR. 2006. Evaluating coverage of genome-wide

association studies. Nat Genet 38:659–662.

Browning BL, Browning SR. 2009. A unified approach to genotype

imputation and haplotype phase inference for large data sets of

trios and unrelated individuals. Am J Hum Genet 84:210–223.

Cancer Genetic Markers of Susceptibility (CGEMS) Data. 2009. http://

cgems.cancer.gov/data/. May 10, 2009.

Cantor RM, Lange K, Sinsheimer JS. 2010. Prioritizing GWAS results:

a review of statistical methods and recommendations for their

application. Am J Hum Genet 86:6–22.

Chapman K, Takahashi A, Meulenbelt I, Rodriguez J, Egli R, Tsezou A,

Malizos KN, Kloppenburg M, Southam L, Breggen R, Donn R,

Qin J, Doherty M, Slagboom PE, Wallis G, Kamatani N, Jiang Q,

Gonzalez A, Loughlin J, Ikegawa S. 2008. A meta-analysis of

European and Asian cohorts reveals a global role of a functional

SNP in the 50UTR of GDF5 with osteoarthritis susceptibility. Hum

Mol Genet 17:1497–1504.

CordellHJ.2006. Estimation

haplotype effects in case-control studies: comparison of weighted

regression and multiple imputation procedures. Genet Epidemiol

30:259–275.

de Bakker PIW, Ferreira MAR, Jia X, Neale BM, Raychaudhuri S,

Voight BF. 2008. Practical aspects of imputation-driven meta-

analysis of genome-wide association studies. Hum Mol Genet 17:

R122–R128.

Hayes RB, Reding D, Kopp W, Subar AF, Bhat N, Rothman N,

Caporaso N, Ziegler RG, Johnson CC, Weissfeld JL, Hoover RN,

Hartge P, Palace C, Gohagan JK, Prostate, Lung, Colorectal and

Ovarian Cancer Screening Trial Project Team. 2000. Etiologic and

early marker studies in the prostate, lung, colorectal and ovarian

(PLCO) cancer screening trial. Control Clin Trials 21:349S–355S.

Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP,

Collins FS, Manolio TA. 2009. Potential etiologic and functional

implications of genome-wide association loci for human diseases

and traits. Proc Natl Acad Sci USA 106:9362–9367.

Hindorff LA, Junkins HA, Hall PN, Mehta JP, Manolio TA. 2011.

A catalog of published genome-wide association studies. Available

at: www.genome.gov/gwastudies. Accessed March 29.

Kraft P, Stram OD. 2007. RE: the use of inferred haplotypes in

downstream analysis. Am J Hum Genet 81:863–865.

Kraft P, Cox DG, Paynter RA, Hunter D, De Vivo I. 2005. Accounting

for haplotype uncertainty in matched association studies: a

comparison of simple and flexible techniques. Genet Epidemiol

28:261–272.

Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR. 2010. MaCH: using

sequence and genotype data to estimate haplotypes and un-

observed genotypes. Genet Epidemiol 34:816–834.

Lin DY, Huang BE. 2007. The use of inferred haplotypes in

downstream analyses. Am J Hum Genet 80:577–579.

MACHHomepage. http://www.sph.umich.edu/csg/yli/mach/

tour/imputation.html.

Marchini J, Howie B, Myers S, McVean G, Donnelly P. 2007. A new

multipoint method for genome-wide association studies via

imputation of genotypes. Nat Genet 39:906–913.

Prentice RL, Pyke R. 1979. Logistic disease incidence models and case-

control studies. Biometrika 66:403–411.

andtestingofgenotype and

7 Use of Imputed Values in the Meta-Analysis of GWAS

Genet. Epidemiol.

Page 9

Prorok PC, Andriole GL, Bresalier RS, Buys SS, Chia D, Crawford ED,

Fogel R, Gelmann EP, Gilbert F, Hasson MA, Hayes RB,

Johnson CC, Mandel JS, Oberman A, O’Brien B, Oken MM,

Rafla S, Reding D, Rutt W, Weissfeld JL, Yokochi L, Gohagan JK,

Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial

Project Team. 2000. Design of the prostate, lung, colorectal and

ovarian (PLCO) cancer screening trial. Control Clin Trials 21:

273S–309S.

Servin B, Stephens M. 2007. Imputation-based analysis of association

studies: candidate regions and quantitative traits. PLoS Genet 3:

e114.

Soranzo N, Rivadeneira F, Chinappen-Horsley U, et al. 2009. Meta-

analysis of genome-wide scans for human adult stature identifies

novel loci and associations with measures of skeletal frame size.

PLoS Genet 5:e1000445.

The International HapMap Consortium. 2005. A haplotype map of the

human genome. Nature 427:1299–1320.

The International HapMap Consortium. 2007. A second generation

human haplotype map of over 3.1 million SNPs. Nature 449:

851–861.

The 1000 Genomes Project Consortium. 2010. A map of human

genome variation from population-scale sequencing. Nature 467:

1061–1073.

Willer CJ, Speliotes EK, Loos RJ, Li S, Lindgren CM, Heid IM,

Berndt SI, Elliott AL, Jackson AU, Lamina C, Lettre G, Lim N,

Lyon HN, McCarroll SA, Papadakis K, Qi L, Randall JC,

Roccasecca RM, Sanna S, Scheet P, Weedon MN, Wheeler E,

Zhao JH, Jacobs LC, Prokopenko I, Soranzo N, Tanaka T,

Timpson NJ, Almgren P, Bennett A, Bergman RN, Bingham SA,

Bonnycastle LL, Brown M, Burtt NP, Chines P, Coin L, Collins FS,

Connell JM, Cooper C, Smith GD, Dennison EM, Deodhar P,

Elliott P, Erdos MR, Estrada K, Evans DM, Gianniny L, Gieger C,

Gillson CJ, Guiducci C, Hackett R, Hadley D, Hall AS,

Havulinna AS, Hebebrand J, Hofman A, Isomaa B, Jacobs KB,

Johnson T, Jousilahti P, Jovanovic Z, Khaw KT, Kraft P,

Kuokkanen M, Kuusisto J, Laitinen J, Lakatta EG, Luan J,

Luben RN, Mangino M, McArdle WL, Meitinger T, Mulas A,

Munroe PB, Narisu N, Ness AR, Northstone K, O’Rahilly S,

Purmann C, Rees MG, Ridderstra ˚le M, Ring SM, Rivadeneira F,

Ruokonen A, Sandhu MS, Saramies J, Scott LJ, Scuteri A,

Silander K, SimsMA,Song

StringhamHM, TungYC,

Vimaleswaran KS,Vollenweider

Watanabe RM, Waterworth DM, Watkins N, Wellcome Trust Case

ControlConsortium,Witteman

Zillikens MC, Altshuler D, Caulfield MJ, Chanock SJ, Farooqi IS,

Ferrucci L, Guralnik JM, Hattersley AT, Hu FB, Jarvelin MR,

Laakso M, Mooser V, Ong KK, Ouwehand WH, Salomaa V,

Samani NJ, Spector TD, Tuomi T, Tuomilehto J, Uda M,

UitterlindenAG, Wareham NJ, Deloukas P, Frayling TM,

Groop LC, Hayes RB, Hunter DJ, Mohlke KL, Peltonen L,

Schlessinger D, Strachan DP, Wichmann HE, McCarthy MI,

Boehnke M, Barroso I, Abecasis GR, Hirschhorn JN, Genetic

Investigation of ANthropometric Traits Consortium. 2008. Six new

loci associated with body mass index highlight a neuronal

influence on body weight regulation. Nat Genet 41:25–34.

Xiong DH, Liu XG, Guo YF, Tan LJ, Wang L, Sha BY, Tang ZH, Pan F,

Yang TL, Chen XD, Lei SF, Yerges LM, Zhu XZ, Wheeler VW,

Patrick AL, Bunker CH, Guo Y, Yan H, Pei YF, Zhang YP, Levy S,

Papasian CJ, Xiao P, Lundberg YW, Recker RR, Liu YZ, Liu YJ,

Zmuda JM, Deng HW. 2009. Genome-wide association and follow-

up replication studies identified ADAMTS18 and TGFBR3 as bone

mass candidate genes in different ethnic groups. Am J Hum Genet

84:388–398.

Zaitlen N, Eskin E. 2010. Imputation aware meta-analysis of genome-

wide association studies. Genet Epidemiol 34:537–542.

Zeggini E, Ioannidis JP. 2009. Meta-analysis in genome-wide associa-

tion studies. Pharmacogenomics 10:191–201.

K, Stephens

TT,

Waeber

J, Stevens

Duijn

Wallace

S,

Valle Van CM,

C, P,G,

JC,ZegginiE,ZhaiG,

APPENDIX A

First, we introduce some notation. Let mð? gi;b0;b1Þ ¼

expðb01b1? giÞ=ð11expðb01b1? giÞÞ. For convenience, we will

interchangeably use the notation mð:;b0;b1Þ and m(?) in the

Appendix. Denote the first derivative of m(g) with respect

to

g

as

m0ðgÞ.

infg2intðm0ðgÞÞ; DU½x1;x2?¼jU½x1;x2??fmðx2Þ ? mðx1Þg=ðx2? x1Þj;

DL½x1;x2? ¼ jL½x1;x2?? fmðx2Þ ? mðx1Þg=ðx2? x1Þj.

Lemma 1 shows that the extrema of jEðdijpi0;pi1;pi2Þ ?

mð? gi;b0;b1Þj can only be achieved on the boundary. It also

computes the extrema of jEðdijpi0;pi1;pi2Þ ? mð? gi;b0;b1Þj on

each boundary condition and chooses the maximum one

as the upperbound for jEðdijpi0;pi1;pi2Þ ? mð? gi;b0;b1Þj.

Lemma 2 shows that there exists some dðb0;b1Þ (which

depends on the upperbound given by Lemma 1), such that

when~b1? b11dðb0;b1Þ, mð? gi;b0;~b1Þ ? Eðdijpi0;pi1;pi2Þ40 for

any

? gi2 ½0;2?;

Eðdijpi0;pi1;pi2Þo0 for any ? gi2 [0;2]. As a result, b?

of score equationPn

^b1! b?

Lemma1.

jEðdijpi0;pi1;pi2Þ ? mð? gi;b0;b1Þj ? Mðb0;b1Þ

minð? gi;2 ? ? giÞ, where

Mðb0;b1Þ ¼ maxðDU½0;2?;DI½0;2?;DU½0;1?;DI½0;1?;

DU½1;2?;DI½1;2?Þ

Define

Uint¼ supg2intðm0ðgÞÞ;

Lint¼

when

~b1? b1? dðb0;b1Þ,

mð? gi;b0;~b1Þ ?

1, the root

i¼1? giðmð? gi;b0;~b1Þ ? Eðdijpi0;pi1;pi2ÞÞ ¼ 0,

lies between b1? dðb0;b1Þ and b11dðb0;b1Þ. Given that

1, the theorem is proved.

Proof. We can rewrite Eðdijpi0;pi1;pi2Þ ? mð? giÞ in terms of

pi0and ? giby following the constraints pi01pi11pi2¼ 1 and

pi112pi2¼ ? gi. This gives

fð? gi;pi0Þ ¼ Eðdijpi0;pi1;pi2Þ ? mð? giÞ

¼ pi0mð0Þ1ð2 ? 2pi0? ? giÞmð1Þ

1ðpi01? gi? 1Þmð2Þ ? mð? giÞ:

The extrema of fð? gi;pi0Þ occur when the derivative equals 0

or at the boundary. Taking the first derivative of fð? gi;pi0Þ

w.r.t pi0 we can see that there is no solution for the

derivative equaling 0. So, the extrema can only occur at the

boundary: pi0¼ 1 ? ? gi=2 or pi0¼ 1 ? ? gior pi0¼ 0. We can

calculate the extrema for each boundary condition.

When pi0¼ 1?? gi=2, fð? gi;pi0Þ ¼ ð1?? gi=2Þmð0Þ1ð? gi=2Þmð2Þ?

mð? giÞ. We can see that the value of mð? giÞ is between

½mð0Þ1L½0;2?? gi;mð0Þ1U½0;2?? gi? and also ½mð2Þ ? U½0;2?ð2 ? ? giÞ;

mð2Þ ? L½0;2?ð2 ? ? giÞ?. Plugging the upper and lower bounds

of mð? giÞ into fð? gi;pi0Þ, we have jfð? gi;pi0ÞjomaxðDU½0;2?;

DI½0;2?Þmin ð? gi;2 ? ? giÞ.

Similarly,wecanshow

jfð? gi;pi0ÞjomaxðDU½0;1?;DI½0;1?Þminð? gi;2?? giÞ;

pi0¼0, jfð? gi;pi1ÞjomaxðDU½1;2?;DI½1;2?Þminð? gi;2 ? ? giÞ.

Combining all the results above, we have

thatwhen

pi0¼ 1 ? ? gi,

when

jEðdijpi0;pi1;pi2Þ ? mð? gi;b0;b1Þj ? Mðb0;b1Þminð? gi;2 ? ? giÞ:

ðA1Þ

Lemma 2. Let dðb0;b1Þ¼sup? gi2½0;2?Mðb0;b1Þ=½ð1?mð? gi;b0;b1ÞÞ

mð? gi;b0;b1Þ?.

Eðdijpi0;pi1;pi2Þ40

b1? dðb0;b1Þ,

? gi2 ½0;2?.

Then when

for

~b1? b11dðb0;b1Þ,

any

? gi2 ½0;2?;

mð? gi;b0;~b1Þ ?

when

for

~b1?

any

mð? gi;b0;~b1Þ ? Eðdijpi0;pi1;pi2Þo0

8Jiao et al.

Genet. Epidemiol.

Page 10

Proof. Consider the following equation of~b1

mð? gi;b0;~b1Þ ? Eðdijpi0;pi1;pi2Þ ¼ 0

The root for this equation would be

ðA2Þ

~b1¼?logðEðdijpi0;pi1;pi2Þ?1? 1Þ ? b0

Denote Eðdijpi0;pi1;pi2Þ ? mð? giÞ by Di. When Di is small,

logðEðdijpi0;pi1;pi2Þ?1? 1Þ

approximated by logðmð? giÞ?1? 1Þ1Di=½ð1 ? mð? giÞÞmð? giÞ? fol-

lowingTaylor’sexpansion.

b11ðDi=? giÞ=[ð1 ? mð? giÞÞmð? giÞ]. From Equation (A1), jDij ?

Mðb0;b1Þminð? gi;2 ? ? giÞ. It follows that j~b1? b1joMðb0;b1Þ=

½ð1 ? mð? giÞÞmð? giÞ?.

½ð1 ? mð? giÞÞmð? giÞ?. As mð? gi;b0;~b1Þ is an increasing function

of

~b1,combiningwith the fact

Equation(A2)isbetween

we can see that when~b1? b11dðb0;b1Þ, mð? gi;b0;~b1Þ ?

Eðdijpi0;pi1;pi2Þ40

b1?dðb0;b1Þ,mð? gi;b0;~b1Þ?Eðdijpi0;pi1;pi2Þo0 forany ? gi2 ½0;2?.

? gi

ðA3Þ

inEquation (A3)canbe

Asa result,

~b1?

Let

dðb0;b1Þ ¼ sup? gi2[0;2]Mðb0;b1Þ=

thatthe rootfor

½b1? dðb0;b1Þ;b11dðb0;b1Þ?,

for any

? gi2 ½0;2?; and when

~b1?

Theorem. jEð^b1Þ ? b1jodðb0;b1ÞProof. As b?

the equation of~b1:

1is the root of

X

n

i¼1

? gifmð? gi;b0;~b1Þ ? Eðdijpi0;pi1;pi2Þg ¼ 0

ðA4Þ

Applying Lemma 2, when~b1? b11dðb0;b1Þ, the LHS of

Equation (A4) will be positive; when~b1? b1? dðb0;b1Þ, the

LHS of Equation (A4) will be negative. As the LHS of

Equation (A4) is also an increasing function of~b1, then the

rootof Equation(A4)

½b1? dðb0;b1Þ;b11dðb0;b1Þ?. Given that^b1! b?

jEð^b1Þ ? b1jodðb0;b1Þ.

To show the magnitude of dðb0;b1Þ, which is the

upperbound of the bias of^b1, we tried a few different

values of b1. For example, when b050, b15log(1.2),

dðb0; b1Þ ¼ DUð½0; 2?Þ=½ð1 ? mð2; b0; b1ÞÞmð2; b0; b1Þ? ¼ 0:002;

when b1 ¼ logð1:5Þ, dðb0; b1Þ ¼ DUð½0; 2?Þ=½ð1?mð2; b0; b1ÞÞ

mð2; b0; b1Þ? ¼ 0:02. Those upperbounds of bias have also

been confirmed by the simulation studies.

b?

1

mustliebetween

1, we have

9Use of Imputed Values in the Meta-Analysis of GWAS

Genet. Epidemiol.