Page 1

ARTICLE

A Unified Approach to Genotype Imputation

and Haplotype-Phase Inference for Large

Data Sets of Trios and Unrelated Individuals

Brian L. Browning1,* and Sharon R. Browning1

We present methods for imputing data for ungenotyped markers and for inferring haplotype phase in large data sets of unrelated indi-

viduals and parent-offspring trios. Our methods make use of known haplotype phase when it is available, and our methods are compu-

tationally efficient so that the full information in large reference panels with thousands of individuals is utilized. We demonstrate that

substantial gains in imputation accuracy accrue with increasingly large reference panel sizes, particularly when imputing low-frequency

variants, and that unphased reference panels can provide highly accurate genotype imputation. We place our methodology in a unified

framework that enables the simultaneous use of unphased and phased data from trios and unrelated individuals in a single analysis. For

unrelated individuals, our imputation methods produce well-calibrated posterior genotype probabilities and highly accurate allele-

frequencyestimates. Fortrios, ourhaplotype-inferencemethodis fourorders ofmagnitude fasterthanthe gold-standardPHASE program

and has excellent accuracy. Our methods enable genotype imputation to be performed with unphased trio or unrelated reference panels,

thus accounting for haplotype-phase uncertainty in the reference panel. We present a useful measure of imputation accuracy, allelic R2,

and show that this measure can be estimated accurately from posterior genotype probabilities. Our methods are implemented in version

3.0 of the BEAGLE software package.

Introduction

Genotype imputation and haplotype-phase inference are

importantapproachesforimprovingthepowerofgenome-

wide association (GWA) studies.1Imputation has resulted

in the detection of additional associations, particularly

when combining data from multiple studies genotyped

on different platforms.2–5Haplotype-based association

testing with phased haplotype data can also detect addi-

tionalassociations.6Imputationcanbeusedforidentifying

association between known, ungenotyped genetic variants

and a trait. In contrast, haplotype-based association testing

is not limited to testing known genetic variants, but the

interpretation of haplotype-based association analysis is

typically more difficult.

Imputation can be used for inferring genotypes at

markers that have not been genotyped in one’s sample.

This is possible by using patterns of haplotypic variation

seen in another data set (the reference panel) that includes

the larger set of markers. There are a variety of existing

methods for imputation or testing of ungenotyped

markers.7–12Until now, the reference panels used for impu-

tationhavebeensmall,whichhaslimitedimputationaccu-

racy.However,muchlargerreferencepanelsarenow,orwill

soon be, available for many populations because of large-

scale sequencing and genotyping projects (e.g., HapMap

phase 3andthe 1000 genomes project;see Web Resources).

We show that larger reference panels substantially increase

imputation accuracy, particularly for low-frequency vari-

ants. Our previous work has shown that the performance

of the haplotype-frequency models that support imputa-

tion can depend on reference panel size.13Methods that

perform exceptionally well for small data sets may have

suboptimal performance for large data sets, particularly

when computational constraints limit the complexity of

the haplotype-frequency model. Existing imputation

methods have been tested and used with small reference

panels of 60 phased individuals. New imputation methods

are needed that can accommodate large reference panels

and combinations of unrelated and parent-offspring data.

Wepresentnewmethodsforimputationofungenotyped

markers in which the sample and reference panel contain

data for parent-offspring trios, parent-offspring pairs, and

unrelated individuals. Our methods use a haplotype-

frequency model that is computationally efficient and

that can make full use of the information in large reference

panels.13We have implemented our methods in a software

package, BEAGLE. We show that BEAGLE scales easily to

large reference panels with thousands of individuals,

whereas IMPUTE,7one of the best-performing methods

for reference panels with 60 phased individuals from the

HapMap,14does not scale well to larger reference panels.

Ourcurrentworkalsoextendsourhaplotype-phase-infer-

ence methods for unrelated individuals to large trio data

sets. Trios contain additional information on haplotype

phase compared to unrelated individuals, in the form of

constraints imposed by the rules of Mendelian inheritance.

Thus,usingspecifictrio-phasingmethodsleadstoextremely

accurate estimates of haplotype phase.15Our trio-phasing

method is four orders of magnitude faster than the gold-

standard PHASE program and has excellent accuracy.

We also present extensive results of data analyses, inves-

tigating not only the performance of our methodology, but

also examining questions of wider interest. In particular,

1Department of Statistics, University of Auckland, Auckland 1142, New Zealand

*Correspondence: b.browning@auckland.ac.nz

DOI 10.1016/j.ajhg.2009.01.005. ª2009 by The American Society of Human Genetics. All rights reserved.

210

The American Journal of Human Genetics 84, 210–223, February 13, 2009

Page 2

we demonstrate the power advantages of large reference

panels for association testing, even when the reference

panels are unphased.

Researchers must be able to assess the accuracy of

imputed genotypes when the true genotype is unknown,

so that poorly imputed markers can be identified prior to

downstream analysis. To this end, we introduce a measure

of imputation accuracy, allelic R2, the squared correlation

between the allele dosage with the highest posterior prob-

ability and the true allele dosage. We discuss the advan-

tages of the allelic R2measure, and we show that it can

be estimated from the posterior genotype probabilities

(see Appendix 1).

Material and Methods

Hidden Markov Model

We present a unified framework for inferring haplotype phase and

missing data that is applicable to a general class of hidden Markov

models (HMMs), which we call haplotype HMMs (see Appendix 2

and Rabiner16). In Appendix 2, we show that haplotype HMMs

can be generalized in an obvious way for producing HMMs for

genotype data for individuals, parent-offspring pairs (one parent

and one child), and parent-offspring trios (two parents and one

child). Analysis of haplotype HMMs can be used for inferring

haplotypes and imputing missing genotypes for individuals,

parent-offspring pairs, and parent-offspring trios conditional

upon the observed genotype data. For example, with parent-

offspringtrios, the haplotypeHMMprovides a modelof haplotype

frequencies for the four independent haplotypes in a parent-

offspring trio. The four independent haplotypes are the trans-

mitted and untransmitted haplotypes from each parent, and

each set of four haplotypes corresponds to a possible trio phasing.

The observed genotype data for a trio constrain the possible trio

phasings for each trio. These constraints are incorporated in the

emission probabilities for the HMM.

In Appendix 3, we present our methods for building a haplotype

HMM from phased genotype data. Haplotypes from any combina-

tion of individual or parent-offspring trios (with or without an

ungenotyped parent) can be used for building the model, if haplo-

types sharedby parent and child are counted as a single haplotype.

Individuals, parent-offspring pairs, and parent-offspring trios

contribute two, three, and four independent haplotypes, respec-

tively. We use an iterative algorithm for fitting a haplotype HMM

to genotype data that alternates between model building and

sampling. In the model-building step, current estimates of phased

haplotypes are used for building a new haplotype HMM. In the

sampling step, new haplotypes are sampled for each individual,

parent-offspring pair, or parent-offspring trio conditional upon

the genotype data and the current haplotype HMM. The iterative

algorithm begins with model building. Estimated phased haplo-

types for the initial iteration are obtained by imputing missing

genotypesatrandomaccordingtoallelefrequenciesandrandomly

phasing heterozygous genotypes. With our methods, typically ten

iterationsofthemodel-buildingandsamplingstepsaresufficientto

obtain a very accurate haplotype HMM.

We found that we were able to greatly improve the performance

ofourmethodbyincludinghaplotypeweightsandadjustingthese

weightsduringthefirstfewiterationsofthealgorithm.Eachhaplo-

typeisassignedaunitweightwhenbuildingthemodelwhenthere

is sporadic missing data (see Appendix 3). When imputing ungen-

otyped markers in a sample with a reference panel, we assign

reference panel haplotypes a weight of 1, and we down-weight

the haplotypes in the sample during the model-building phase

forthefirstfiveiterationsofthealgorithm.IfthereareNhaplotypes

inthesample,weassigneachhaplotypeaweightof1/Nforthefirst

twoiterationsandaweightof1/N(6-k)/4foriterationsk¼3,4,and5.

ForiterationsR6, allhaplotypesin thesampleand referencepanel

are assigned weight of 1. This weighting scheme forces the initial

estimates of haplotype phase and missing data in the sample to

be primarily determined by the reference panel data. Our experi-

ments with simulated data indicate that if down-weights are not

used,hundredsofiterationsarerequiredtoachievetheimputation

accuracy obtained when using down-weights with ten iterations

(data not shown).

Our methods also permit one to sample multiple haplotypes for

each individual, parent-offspring pair, and parent-offspring trio

and to use the multiple sampled haplotypes when building the

haplotype HMM. When multiple sampled haplotypes are used,

the multiple sampling is accounted for by down-weighting each

haplotype. For example, if k haplotype pairs are sampled for an

unrelated individual, each haplotype is given weight w/k, where

w is the weight per haplotype when only one haplotype pair is

sampled for the individual.

When imputing diallelic markers with alleles A and B in unre-

lated individuals, we calculate posterior genotype probabilities

by summing the probabilities of the HMM states that correspond

to the AA, AB, and BB genotypes. The imputed posterior genotype

probabilities can be used in downstream analyses. We have found

that averaging the posterior genotype probabilities over multiple

iterations of the algorithm increases the imputation accuracy.

When imputing missing ungenotyped markers with a reference

panel, we average posterior genotype probabilities obtained from

iterations R6.

Our methods for haplotype-phase inference and genotype

imputation are implemented in BEAGLE 3.0. BEAGLE produces

most likely haplotypes and sampled haplotypes for each indi-

vidual with all missing data imputed. When imputing genotypes

in samples of unrelated individuals, BEAGLE produces posterior

genotype probabilities for imputed genotypes. BEAGLE 3.0 also

includes an option for reducing memory usage with a two-level

‘‘checkpoint’’ algorithm.17,18Checkpoint algorithms store proba-

bilities in HMM calculations for a subset of markers (called check-

points) and then recalculate probabilities from the checkpoints as

needed. Using BEAGLE’s optional checkpoint algorithm increases

running time by a factor of less than two and reduces memory

usageduringHMMsamplingfrom O(M)to orderO(sqrt[M]),where

M is the number of markers.

All analyses in this study were performed with BEAGLE 3.0 with

default parameter settings (i.e., four samples per individual and

ten iterations). Computing runs were performed on a Linux server

with eight dual-core AMD Opteron 8220 SE processors (running at

2.8 GHz, with a 1 MB cache, and using a 64-bit architecture) and

a total of 64 GB of RAM. All reported computational times were

obtained by adding user and system times from the Linux

‘‘time’’ command, and they thus are equivalent to those that

would be obtained with only a single CPU core.

Real Data Sets

We used unphased trio data from HapMap release 21 for 30 trios of

Utah residents with ancestry from northern and western Europe

(CEU panel) and 30 trios of Yoruba sampled from Ibadan Nigeria

The American Journal of Human Genetics 84, 210–223, February 13, 2009

211

Page 3

(YRI panel).14If a marker exhibited a Mendelian inconsistency

in the unphased HapMap data for a trio, the genotypes for that

marker were set to missing in both the parents and the child for

that trio. We assessed the accuracy of our methods for inferring

haplotype phase and missing data in parent-offspring trios by

applying our methods to unphased HapMap CEU and YRI data

and comparing our results with the HapMap’s published phasing

for these data generated with the PHASE program.19We also

used the HapMap CEU data to compare the accuracy of genotype

imputation with a phased reference panel, an unphased unrelated

reference panel, and an unphased trio reference panel.

We used genotype data from the Affymetrix GeneChip Human

Mapping 500K Array (the Affymetrix 500K chip) generated by the

Wellcome Trust Case Control Consortium (WTCCC).20The

WTCCC study included approximately 2000 cases for each of

seven diseases (bipolar disorder, coronary artery disease, Crohn’s

disease, hypertension, rheumatoid arthritis, type 1 diabetes, and

type 2 diabetes) and approximately 3000 shared controls. The

shared controls were comprised of 1500 individuals selected

from a UK sample of blood donors and 1500 individuals from

the 1958 BritishBirth Cohort.21We also usedgenotypedatagener-

ated by the Wellcome Trust Sanger Institute with the Illumina

Infinium HumanHap550 SNP BeadChip (the Illumina 550K

chip) for the 1958 British Birth Cohort samples. Genotypes for

the Affymetrix 500K chip were called with Chiamo,20and geno-

types for the Illumina 550K chip were called with Illumina’s Gen-

Call software. We excluded all individuals who were excluded by

the WTCCC in their primary analysis.20For the 1958 British Birth

Cohort, we limited our analyses to 1388 individuals that had been

genotyped on both the Affymetrix and Illumina platforms.

Our previous multilocus analysis of WTCCC data had demon-

strated that multilocus analysis can be particularly sensitive to

intercohort differences in genotype error rates.6We excluded all

markers that were excluded in the WTCCC’s analysis,20and we

imposed additional data-quality filters designed to increase geno-

typeaccuracyandtoexcludemarkerswithproblematicdata.Geno-

typesfortheAffymetrix500KchipweresettomissingiftheChiamo

posterior probability for the genotype was <0.99. Genotypes for

the Illumina 550K chip were set to missing if the GenCall score

was <0.6. For members of the 1958 Birth Cohort, genotypes were

set to missing if the Affymetrix and Illumina platforms produced

conflicting genotypes. Markers were excluded for a cohort if the

missingratewas>2%inthatcohortoriftheHardy-Weinbergequi-

librium p value for themarkerwas<10?7.We excludedanymarker

withminor-allelefrequency<0.01inthe1958BritishBirthCohort.

Becausethe interpretationof the genotypedepends onthe chro-

mosome strand used to define the alleles, we checked that the

chromosome strandwasconsistentbetweendatasetsandchanged

alleles to their complementary alleles when necessary. Markers

were excluded if the genomic position in NCBI Build 35 coordi-

nates in the marker annotation files for the Affymetrix data or

for the Illumina data were not consistent with the position given

for the marker in the HapMap data set. A decision to change alleles

to their complementary alleles was based on three sources of infor-

mation: observed alleles (A/C/G/T), minor-allele frequency, and

linkage disequilibrium correlation patterns within a 100 marker

radius. Differences in minor-allele frequency between data sets

were considered significant if the difference was >0.2 and if the

difference was significant at the 0.01 level. If changing an allele

to the complementary allele for a marker in a data set did not

resolve the discrepancy between data sets, the marker was

excluded from one of the non-HapMap data sets.

For chromosome 1, after data-quality filtering, there were

53,683 markers genotyped on the 1958 British Birth cohort with

one or both of the Affymetrix 500K and Illumina 550K chips. A

subset of 24,705 of these markers were present on the Illumina

550Kchipand inHapMapphase2 data14butabsentfromtheAffy-

metrix 500K chip. This subset of 24,705 markers was masked and

imputed in subsamples of the 1958 British Birth Cohort. From

the 1388 individuals in the 1958 British Birth Cohort, we selected

three random samples of 188, 788, and 1088 individuals. The

24,705 chromosome 1 markers absent from the Affymetrix

500K chip but present on the Illumina 550K chip and in HapMap

phase 2 data were masked in each sample. For each sample, the

remaining 1200, 600, or 300 individuals (or a subset of these

remaining individuals) were used as a reference panel with geno-

type data from both the Affymetrix 500K and Illumina 550K

chip.AlthoughaproportionofIlluminagenotypesfortheimputed

markers will be incorrect, this proportion is expected to be small,

and Illumina genotypes are considered to be the true genotype

when computing measures of imputation accuracy in this study.

Comparison with IMPUTE

We compared BEAGLE 3.0 with IMPUTE7version 0.5.0 in terms

of imputation accuracy and computational efficiency. We evalu-

ated imputation accuracy by using Chromosome 1 markers

imputed in a sample of 188 individuals with reference panels of

60 phased individuals (CEU HapMap), 300 unphased individuals,

and 600 unphased individuals. A comparison using larger refer-

ence panels was not practical for the full chromosome 1 data

because of IMPUTE’s much greater computational requirements.

Because IMPUTE requires a phased reference panel, the unphased

reference panels were phased with BEAGLE13for use in the

IMPUTE analysis. As a result, the accuracy of inferred haplotypes

in the reference panel was similar when imputing genotypes

with BEAGLE or IMPUTE.

We compared the computational efficiency of BEAGLE and

IMPUTE for increasingly large reference panels by using a subset

ofchromosome1datacomprisinga5Mbregionwith1356markers

genotypedinthereferencesample,ofwhich746markersweregen-

otyped in the sample. Computational times were measured when

imputing ungenotyped markers in a sample of 188 individuals

with reference panels of 300, 600, and 1200 individuals. For

BEAGLE,thereferencesamplewasunphased,whereasforIMPUTE,

the reference panel was phased (with phase inferred by BEAGLE).

Simulated Data Sets

We evaluated our trio-phasing methods on large sample sizes with

realistic, simulated trio data. We generated simulated data by using

Cosi22withparameterscalibratedtoempiricalhumandataforindi-

vidualswithEuropeanancestry(CEU)orwithancestryofYorubain

Ibadan, Nigeria (YRI). Each simulated data set has a recombination

ratesampledfromadistributionmatchingtheDecodemap.23Three

sample sizes were simulated: 30, 300, and 3000 trios. Four parental

chromosomesweresimulated,andoneofthechromosomesofeach

parent was selected to be transmitted to the offspring. The simu-

lated regions were all of 1 Mb in length. For each data set, we

randomly selected markers with minor-allele frequencies of greater

than0.05to achieve an average markerdensity of one markerper 6

kboronemarkerper1kb.Onehundreddatasetsweresimulatedfor

each sample size, ethnicity, and marker density.

In each data set, 0.5% of individual genotypes, chosen at

random, were set to missing. In addition, 0.5% of trios were set

212

The American Journal of Human Genetics 84, 210–223, February 13, 2009

Page 4

to missing (i.e., the three genotypes for the trio were all set to

missing, as might be done when a Mendelian inconsistency is

found). These rates of missingness are somewhat different from

those seen in the unphased HapMap Phase II data. In the

HapMap data, after setting trios with Mendelian inconsistencies

to missing, there were 13 single-nucleotide polymorphisms

(SNPs) per Mb per trio with the entire trio missing in the CEU

panel (19 in the YRI panel), compared to 20 in the 1 SNP per kb

simulated data. The rate of sporadic missing data (one or two indi-

viduals in the trio missing at the SNP) was 78 SNPs per Mb per trio

in the CEU panel (83 in the YRI panel), compared to 20 in the

simulated data at the 1 SNP per kb density.

Allelic Association Tests

We investigated the effect of use of imputed genotype data on

power to detect disease associations by comparing p values

computed with true genotype data with p values computed with

imputed posterior genotype probabilities. For this analysis,

p values were computed after excluding 300 individuals in the

1958 British Birth Cohort that were used as an unphased reference

panel. p values were computed for markers that the WTCCC re-

ported as showing evidence of disease association, excluding any

marker that had more than 2% missing data in either of the two

control cohorts or the case cohort. p values were computed three

ways: with genotype data, with imputed data generated from

a phased reference panel of 60 individuals (HapMap CEU), and

with imputed data generated from an unphased reference panel

with 300 individuals. The data for each marker was imputed after

masking that marker in the sample. Standard chi-square, allelic

trend, and Fisher Exact tests are not valid when applied to the

posterior genotype probabilities for imputed data. Hence, we

compared estimated allele dosage in cases and controls with

a two-sample t test. For large sample sizes, the central limit

theorem ensures that the test statistic has the appropriate null

distribution. For genotype data, the allele dosage for each indi-

vidual was obtained from the observed genotype data. For

imputed data, the estimated allele dosage for each individual

was obtained from the imputed posterior genotype probabilities.

For imputed data, p values were computed with only those indi-

viduals who had nonmissing genotype data so that the p values

from imputed data and from observed genotype data are derived

from the same set of individuals.

Metrics for Trio Phasing

We used four metrics to measure accuracy of trio phasing. The

transmission error rate is the proportion of nonmissing parental

genotypes with ambiguous phase that were incorrectly phased.

The denominator of the transmission error rate is the number of

parent genotypes for which the parent is heterozygous and the

transmission is ambiguous (because of missing or heterozygote

genotypes for the child and other parent). The numerator of the

transmission error rate is the number of such parent genotypes

for which the phasing is incorrect (i.e., the incorrect allele is

recorded as having been transmitted). For example, if both parents

and child of a trio have the same heterozygous genotype, the trio

will contribute either 0 or 2 parents to the numerator, and

2 parents to the denominator of the transmission error rate.

The missing trio error rate is the proportion of parental alleles in

trios with missing data for both parent and child that are incor-

rectly imputed. The missing trio error rate has as its denominator

twice (given that there are two alleles per genotype) the number of

parent genotypes for which the parent and child genotypes are

missing.The numerator ofthe missingtrio errorrate is thenumber

of alleles in such phased parent genotypes that are incorrectly

imputed. For example, if the true phased parental genotype is

AG, and the imputed phased parental genotype is AA, this would

count as one error, whereas if the imputed phased parental geno-

type is GA, this would count as two errors.

The sporadic missing error rate is the proportion of incorrectly

imputed alleles in parents with missing genotype data for them-

selves and nonmissing genotype data for their child. The sporadic

missing error rate has as its denominator twice the number of

parent genotypes for which the parent genotype is missing but

the child genotype is nonmissing. The numerator for the sporadic

missing error rate is the number of alleles in such phased parent

genotypes that are incorrectly imputed (as for the missing trio

error rate).

We also calculated an error rate per trio per SNP, which is the

sum of the numerators of the three types of error (transmission,

missing trio, and sporadic missing error rates), divided by the

number of trios and by the total number of SNPs.

Metrics for Imputation

We assessed the calibration and precision of estimated posterior

genotype probabilities for imputed genotypes. The metrics we

describe below are applied at multiple levels: the genotype level

(genotype concordance rate), the marker level (allelic R2and stan-

dardized allele frequency), and the study level (allele-frequency

correlation). We also use a Wilcoxon signed-rank test to compare

accuracy of estimated allele frequencies for pairs of imputed data

that were imputed with different reference panels.

Genotype Concordance Rate

The calibration of imputed genotypeswasevaluatedby calculating

the concordance rate between the most likely imputed genotype

and the true genotype. For imputed genotypes with posterior

probability a, we expect the genotype concordance rate to be

approximately a.

Allelic R2

We assessed the accuracy of imputed genotypes in terms of the

squared correlation between the allele dosage (number of minor

alleles) of the most likely imputed genotype and the allele dosage

of the true genotype. We call this quantity the allelic R2. Allelic R2

has several desirable properties that make it an excellent metric for

evaluating imputation accuracy. Allelic R2has a simple interpreta-

tion in terms of statistical power, similar to the interpretation of

the squared correlation between two diallelic markers.24Under

Hardy-Weinberg equilibrium, if an allele confers risk for a disease,

N cases and controls with genotype data for the marker have

approximately the same statistical power to detect association

as N/r2cases and controls with imputed data for the marker where

r2is the allelic R2for the imputed data. Thus, allelic R2measures

the loss of power when the most likely imputed genotypes are

used in place of the true genotypes for a marker. Association anal-

yses using posterior imputed genotype probabilities can be more

powerful than analysis using most likely imputed genotypes

because posterior genotype probabilities contain more informa-

tion. Consequently, the loss of power measured by allelic R2is

an upper bound on the loss of power when imputed posterior

genotype probabilities are used in place of the true genotypes for

a marker. Another advantage of allelic R2is that its interpretation

does not depend on allele frequency.

WeshowthatallelicR2canbeestimatedfromtheimputedposte-

rior genotype probabilities without knowledge of the true geno-

types (see Appendix 1). The ability to estimate allelic R2from

The American Journal of Human Genetics 84, 210–223, February 13, 2009

213

Page 5

imputed posterior genotype probabilities is an important feature

because the true genotype is generally unknown. The estimated

allelic R2can be used for identifying or excluding markers with

poor imputation accuracy prior to downstream analysis.

Another estimate of imputation accuracy is the ratio of the

variance of the imputed allele dosage and the variance of the

true allele dosage. The variance of the true allele dosage is

unknown, but it can be estimated as 2p(1?p) under Hardy-

Weinberg equilibrium, where p is the estimated allele frequency.

This ratio of variances has also been called r2,25but it does not

directly estimate allelic R2and thus is different than the allelic

R2estimate presented in Appendix 1.

Standardized Allele-Frequency Error

For each imputed marker, we define the allele-frequency error as

the difference between the true allele frequency in the sample and

the estimated allele frequency in the sample computed from the

posterior genotype probabilities. If the three posterior genotype

probabilities for an individual are denoted pAA, pAB, and pBB, then

the estimated A allele frequency is found by summing (2pAAþ pAB)

overallindividualsanddividingbytwicethenumberofindividuals.

However, allele-frequency error is difficult to interpret unless the

trueallelefrequencyandsamplesizeareknown.Anallele-frequency

error of 0.01 is more serious when the allele frequency is 0.01 than

when the allele frequency is 0.5. An allele-frequency error of

0.01 is also more serious when the sample size is 10,000 than

when the sample size is 100 because the larger sample size gives

a much more precise population allele-frequency estimate from

genotypedata.Thismotivatesustostandardizetheallele-frequency

error by the standard error of the population allele-frequency esti-

mate from the true genotype data. If pAis the allele frequency in

the sample of n individuals from a population in Hardy-Weinberg

equilibrium, the standard error of the population allele-frequency

estimate is approximately sqrt(pA[1 ? pA]/[2n]). If qA is the

estimated allele frequency obtained from the imputed posterior

genotype probabilities, we define the standardized allele-frequency

error to be

jpA? qAj=ðpA½1 ? pA?=½2n?Þ1=2

Thus, a standardized allele-frequency error of z indicates that the

error in estimated allele frequency from imputed data is approxi-

mately z times the standard deviation of the estimated population

allele frequency obtained from the true genotypes.

Allele-Frequency Correlation

The allele-frequency correlation is the correlation over the set of

imputed markers between the estimated sample minor-allele

frequency from imputed posterior genotype probabilities and

the true sample minor-allele frequency. The allele-frequency corre-

lation can be used for comparing imputation accuracy under

different scenarios, with different reference panels or different

samples.

Wilcoxon Signed-Rank Test

We used a two-sided Wilcoxon signed-rank test to test for differ-

ences in imputation accuracy for markers imputed with two

different reference panels but the same sample. For each imputed

marker m, let Xmbe the absolute allele-frequency error using refer-

encepanel1 and let Ymbethe absolute allele-frequencyerrorusing

reference panel 2. The null hypothesis of the Wilcoxon signed-

rank test is that the median of Xm? Ymequals 0. Rejecting the

null hypothesis implies that there are differences in accuracy of

the estimated sample allele frequencies derived from the two

reference panels.

Results

Calibration of Posterior Genotype Probabilities

The posterior genotype probabilities produced by our

methods are well calibrated. Figure 1 presents the genotype

accuracy rate for the imputed genotype with the highest

posterior probability. Genotypes were imputed in a sample

of1088individualswithaphasedreferencepanelof60indi-

viduals, and imputed genotypes were binned according to

their posterior probability. For each bin, the proportion of

imputed genotypes concordant with the called genotype

was approximately equal to the posterior genotype proba-

bility for the bin. Similar results were obtained when impu-

tation was performed with an independent unphased

reference panel of 300 individuals (data not shown).

We also found that our estimate of allelic R2, calculated

from posterior genotype probabilities (see Appendix 1),

hadgoodaccuracy.AllelicR2wasestimatedforeachimputed

marker in a sample of 1088 individuals. Markers were

imputed with a phased reference panel of 60 individuals

(HapMap CEU panel) and imputed with an unphased refer-

encepanelof300individuals.Forthephasedreferencepanel

(60 individuals), the correlation was 0.938 between the esti-

mated allelic R2(estimated without knowledge of the true

genotypes) and the actual allelic R2(calculated from the

truegenotypes).Fortheunphasedreferencepanel(300indi-

viduals), the correlation was 0.986 between the estimated

and actual allelic R2. When markers were imputed with the

Figure 1.

Genotypes for chromosome 1 markers on the Illumina 550K chip,

but not the Affymetrix 500K chip, were imputed with a phased

reference panel of 60 individuals (HapMap CEU panel) in a sample

of 1088 individuals genotyped on the Affymetrix 500K chip.

Imputed genotypes are divided into bins according to their poste-

rior genotype probability. The proportion of imputed genotypes

that are consistent with the Illumina genotype are given for

eachbin. The line is the set of points with equal posterior genotype

probability and accuracy rate.

Calibration of Posterior Genotype Probabilities

214

The American Journal of Human Genetics 84, 210–223, February 13, 2009