Page 1

Optimal Methods for Meta-analysis of Genome-wide Association

Studies

Baiyu Zhou1, Jianxin Shi2, and Alice S. Whittemore1,*

1Department of Health Research and Policy, Stanford University, Stanford, California

2Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute,

Bethesda, Maryland

Abstract

Meta-analysis of genome-wide association studies involves testing single nucleotide

polymorphisms (SNPs) using summary statistics that are weighted sums of site-specific score or

Wald statistics. This approach avoids having to pool individual-level data. We describe the

weights that maximize the power of the summary statistics. For small effect-sizes, any choice of

weights yields summary Wald and score statistics with the same power, and the optimal weights

are proportional to the square roots of the sites' Fisher information for the SNP's regression

coefficient. When SNP effect size is constant across sites, the optimal summary Wald statistic is

the well-known inverse-variance-weighted combination of estimated regression coefficients,

divided by its standard deviation. We give simple approximations to the optimal weights for

various phenotypes, and show that weights proportional to the square roots of study sizes are

suboptimal for data from case-control studies with varying case-control ratios, for quantitative trait

data when the trait variance differs across sites, for count data when the site-specific mean counts

differ, and for survival data with different proportions of failing subjects. Simulations suggest that

weights that accommodate inter-site variation in imputation error give little power gain compared

to those obtained ignoring imputation uncertainties. We note advantages to combining site-

specific score statistics, and we show how they can be used to assess effect-size heterogeneity

across sites. The utility of the summary score statistic is illustrated by application to a meta-

analysis of schizophrenia data in which only site-specific p-values and directions of association

are available.

Keywords

combining GWAS; effect-size heterogeneity; meta-analysis; noncentrality parameter; score

statistics; Wald statistics

Introduction

Combining data from multiple genome-wide association studies (GWAS) of a common

outcome has emerged as a major tool for identifying susceptibility loci for human disease

and other conditions [Scott et al., 2007; Zeggini et al., 2007; Shi et al., 2009]. The goals are

to discover new variants missed by the individual studies, to identify variants for outcomes

not considered when the data were collected (e.g. cancer survival), and, for associated

variants, to assess effect-size and its possible variation across sites. A challenge to achieving

*Correspondence to: Alice S. Whittemore, Department of Health Research and Policy, Stanford University School of Medicine, 259

Campus Drive, Stanford, CA 94305-5405, alicesw@stanford.edu, Tel: 650-723-5460.

NIH Public Access

Author Manuscript

Genet Epidemiol. Author manuscript; available in PMC 2012 November 1.

Published in final edited form as:

Genet Epidemiol. 2011 November ; 35(7): 581–591. doi:10.1002/gepi.20603.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 2

these goals is the need to assess and accommodate inter-site differences in study

characteristics, such as demographic attributes of the populations studied, phenotypic

aspects such as disease severity and extent of censoring of survival data, the SNPs included

in the genotyping platforms used, and the choice and coding of covariates in need of

adjustment.

The pooled analysis of individual-level data on phenotypes, genotypes and covariates has

several advantages. Quality control can be implemented uniformly for all sites, and SNP

genotypes can be imputed using data from all sites. Covariates common to multiple studies

can be coded uniformly and their regression coefficients can be estimated with all available

data. Optimal likelihood-based methods can be used to test whether SNP regression

coefficients are nonzero and whether they vary across study sites. Offsetting these

advantages is the labor involved in assembling the raw data from each site and coding

common covariates, and privacy issues which could limit the sharing of genotypes.

An alternative to pooled analysis is the meta-analysis of site-specific test statistics, each

having approximately a standard normal distribution under the null hypothesis of no

association. Here each site types or imputes genotypes for a common set of SNPs, and then

calculates a covariate-adjusted statistic for each SNP. The site-specific statistics are either

Wald statistics (ratios of regression coefficient estimates to their estimated standard

deviations (SDs), or score statistics (ratios of efficient scores to estimates of their null SDs)

[Soranzo et al., 2009; Tanaka et al., 2009]. The coordinating center then computes a

weighted sum of these statistics as a summary test statistic for the SNP [Cantor et al., 2010].

For reviews of issues in GWAS meta-analysis, see [Ioannidis 2007; Ioannidis et al., 2007; de

Bakker et al., 2008; Guan and Stephen, 2008; Zeggini and Ioannidis, 2009; Cantor et al.,

2010].

When SNP effect size is constant across sites, the optimal summary Wald statistic is the

well-known inverse-variance-weighted combination of estimated regression coefficients,

divided by its estimated standard deviation [Cohran 1954]. However there are practical

advantages to combining score statistics rather than Wald statistics for GWAS meta-

analyses. First, the score statistics do not require iterative parameter estimates for each SNP

to be tested. Estimates for the site-specific covariate effects need be calculated just once

under the global null hypothesis of no effect for any SNP; thus score statistics provide fast

assessment of the statistical significance of large numbers of SNPs. Second, a SNP score

statistic can be computed even when the site has provided only the SNP's p-value and

direction of association. Third, the summary score statistic may be easier to interpret when

the sites differ with respect to the covariates included in their regression models or with

respect to their phenotype definition, measurement or coding. For example, a GWAS of

nicotine addiction among current and former smokers might include sites whose outcome is

a count of smoking cessation attempts, and other sites that simply classified subjects

according to presence or absence of a successful cessation attempt. For a meta-analysis such

as this, a combination of site-specific score statistics may be easier to interpret than a

combination of site-specific Wald statistics, since the SNP regression coefficients in the

latter are not comparable. Finally, site-specific score statistics have straightforward

extensions to accommodate uncertainty due to imputation of unobserved genotypes, and we

shall show that they can be used to assess heterogeneity of effect-size across sites.

Here we focus on optimal weights for combining site-specific statistics to identify new

variants and, for associated variants, to assess possible effect-size variation across sites.

Asymptotically optimal weights for combining Wald statistics have been known for more

than half a century [Cochran 1954; DerSimonian and Laird, 1986]. Surprisingly however,

despite their practical advantages, optimal weights for combining score statistics are not well

Zhou et al.Page 2

Genet Epidemiol. Author manuscript; available in PMC 2012 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 3

known. In fact, some investigators suggest or use weights proportional to the square roots of

the study-specific sample sizes [Soranzo et al., 2009; Willer et al., 2010; Hu et al., 2011].

We show that this strategy can be suboptimal when combining data from sites whose

phenotypes have different null distributions. Instead, for the small effect-sizes expected of

GWAS meta-analysis, the optimal weights for combining score statistics are essentially the

same as those used to combine Wald statistics. We provide explicit forms for the optimal

weights for a general class of phenotypes that includes binary outcomes, quantitative traits,

counts of events and censored survival outcomes.

In the following Methods section we begin by reviewing the score and Wald statistics for

testing the null hypothesis of no SNP-phenotype association at a single site, with application

to case-control studies, quantitative traits, count data and censored survival data. We also

show how the site-specific score statistics can be extended to handle imputation

uncertainties, as noted by Marchini et al., [2007]. Then, for each of these phenotypes, we

describe the optimal weights for combining site-specific score or Wald statistics. We also

extend the weights to handle SNP effect-sizes that vary across sites, and we show how the

site-specific score statistics can be used to assess inter-site heterogeneity. The Methods

section is followed by an evaluation of power for various summary score and Wald

statistics, based on simulated case-control and censored survival data. An application to

schizophrenia data shows that for binary outcomes, the score statistics can be used to assess

inter-site effect-size heterogeneity even when only site-specific SNP p-values and direction

of association are available. The final section concludes with a brief discussion.

Methods

Score and Wald Statistics for One Site

For simplicity we model the effects of each SNP as additive on an appropriate scale (the so-

called trend or gene dosage model), though other genotype models can be used. Let (y, g, x)

= {(yj, xj, gj), j = 1, …, n} represent the data for the n study subjects. For subject j, yj is the

phenotype, xj is a column vector of covariates whose first component may be one to

accommodate an intercept, and gj = (gj1, gj2, …) is a vector of minor allele counts for each

of the SNPs under study. We focus on testing association for just one of these SNPs, and

omit its subscript. We assume that the loglikelihood ℓ(y∣x, g; θ) for the phenotypes y

conditional on covariates x and genotypes g depends on a parameter θ = (α, β). Here α is a s-

dimensional vector of parameters corresponding to an intercept and/or any covariates used

by the site, and β specifies the trend relating phenotype to SNP minor allele count.

We wish to test the null hypothesis H0: θ = θ0 = (α, 0). To do so, we introduce the Fisher

information for θ, given by

(1)

Here iαα is the s × s matrix whose entries are the negative expectations of the second

derivatives of the loglikelihood function for θ with respect to α, and the other submatrices

are defined analogously. Also let

whose entries are the derivatives of the loglikelihood function with respect to θ = (α, β). The

covariate-adjusted efficient score for testing β = 0 is Uβ = Uβ (α̂(0), 0), where α̂(0) satisfies

Uα (α̂k (0), 0) = 0. Under H0 and mild regularity conditions [Cox and Hinkley, 1979], Uβ has

an asymptotic Gaussian distribution with mean zero and variance

denote the (s + 1)-dimensional score vector

Zhou et al.Page 3

Genet Epidemiol. Author manuscript; available in PMC 2012 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 4

(2)

evaluated at (α̂(0), 0).

The score and Wald statistics are

(3)

where îβ is a consistent estimate of iβ evaluated at β = 0, β̂ is the maximum likelihood

evaluated at β̂. The two statistics are locally

asymptotically equivalent, i.e., they are asymptotically equivalent when β = 0 and are

approximately so when β is small in absolute value [Cox and Hinkley, 1979]. Their

noncentrality parameters (NCPs) are

at θ = (α, β) for ZWald. In the following we shall approximate iβ for the Wald statistic by

evaluating it at its null value, and refer to a common NCP ξ for both statistics.

estimate (MLE) of β, and

, with iβ evaluated at θ0 = (α, 0) for Zscore and

We now specialize the score and Wald statistics to specific phenotypes of relevance to

GWAS. We first consider phenotypes whose relation to genotypes and covariates can be

described by generalized linear models (GLMs) [McCullagh and Nelder, 1989]. Special

cases of particular relevance are the binary phenotypes of case-control studies, quantitative

traits such as height or weight, and counts of events such as attempts to quit smoking, and

censored survival data.

Example: generalized linear models

GLMs assume that the mean but not the dispersion parameter of the phenotype distribution

depends on genotypes and covariates, so that their effects can be modeled as a function of a

linear predictor ηj = αxj + βgj. The first component of the covariate vector xj may equal one,

in order to accommodate an intercept. The loglikelihood is

(4)

where a (·), b (·) and c (·) are known functions, and ϕ is the dispersion parameter of the

model. The score for β corresponding to (4) is

(5)

where μ̂j is the fitted mean of y under H0, and φ̂ is a consistent estimate of ϕ. The null Fisher

information (1), with θ = θ0, and ηj = αxj is

Zhou et al.Page 4

Genet Epidemiol. Author manuscript; available in PMC 2012 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 5

(6)

In the absence of covariates xj = 1, ηj = α1 where α1 is the intercept, and (6) reduces to

(7)

where p = 1 − q is the SNP minor allele frequency (MAF). Using (7) in (2), we find that the

null Fisher information for β is

(8)

For binary phenotypes yj is coded as one if subject j has the trait and zero otherwise, and the

logistic model corresponds to ϕ = a (ϕ) = c (y; ϕ) = 1, and b (ηj) = ln [1 + eηj]. In a case-

control study with n1 cases and n0 = n − n1 controls, b″ (α1) can be estimated consistently by

n0n1/n2.

Substituting these expressions in (8) gives

(9)

For quantitative traits whose distribution is (possibly after transformation) Gaussian with

mean μj = ηj and variance ϕ, we have a (ϕ) = ϕ,

Thus b″ (α1) = 1. Substitution of these values for a (ϕ) and b″ (α1) into (8) gives

and .

(10)

where ϕ̂ is a consistent estimate of the phenotype variance.

For Poisson count data, a (ϕ) = 1, b (ηj) = eηj and c (y; ϕ) = 0. Thus b″ (α1) = eα1 = μ, where

μ is the null mean of the counts. Substituting these values into (8) gives

(11)

Example: censored survival data

Here the data are

and εj is a failure indicator. We assume independent times to failure and censoring, each

where for subject j, tj is survival time

Zhou et al. Page 5

Genet Epidemiol. Author manuscript; available in PMC 2012 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript