Page 1

Fax +41 61 306 12 34

E-Mail karger@karger.ch

www.karger.com

Original Paper

Hum Hered 2008;66:67–86

DOI: 10.1159/000119107

Review and Evaluation of Methods Correcting

for Population Stratification with a Focus on

Underlying Statistical Principles

Hemant K. Tiwari a Jill Barnholtz-Sloan

Laura K. Vaughan a David B. Allison

c Nathan Wineinger

a Miguel A. Padilla a

a, b

a Department of Biostatistics, Section on Statistical Genetics, and b Clinical Nutrition Research Center, University of

Alabama at Birmingham, Birmingham, Ala. , c Case Comprehensive Cancer Center, Case Western Reserve University

School of Medicine, Cleveland, Ohio , USA

association studies that control for such confounding. Many

recent papers have addressed this need. We provide a com-

prehensive review of advances made in recent years in cor-

recting for population stratification and then evaluate and

synthesize these methods based on statistical principles such

as (1) randomization, (2) conditioning on sufficient statistics,

and (3) identifying whether the method is based on testing

the genotype-phenotype covariance (conditional upon fa-

milial information) and/or testing departures of the marginal

distribution from the expected genotypic frequencies.

Copyright © 2008 S. Karger AG, Basel

Introduction

Theoretical developments, computer simulations, and

empirical evidence from population studies continue to

indicate that population stratification due to genetic ad-

mixture, as well as other departures from random mat-

ing, can confound genetic association studies and pro-

duce false positive results [1–4] . Population admixture,

however, can also ‘mask’ true genotype-phenotype asso-

ciations and produce false negative results. In either case,

departures from non-random matings can result in bi-

ased estimates and faulty conclusions. This form of pop-

ulation heterogeneity is often regarded as an impediment

to genetic association studies given its potential to con-

Key Words

Admixture ? Ancestry ? Association ? Covariance-based

tests ? Genomic control ? Linkage ? Marginal-based tests ?

QTL ? RAM ? Randomization ? SAT ? Structure ? Sufficient

statistics ? TDT

Abstract

When two or more populations have been separated by geo-

graphic or cultural boundaries for many generations, drift,

spontaneous mutations, differential selection pressures and

other factors may lead to allele frequency differences among

populations. If these ‘parental’ populations subsequently

come together and begin inter-mating, disequilibrium among

linked markers may span a greater genetic distance than it

typically does among populations under panmixia [see glos-

sary]. This extended disequilibrium can make association

studies highly effective and more economical than disequi-

librium mapping in panmictic populations since less marker

loci are needed to detect regions of the genome that harbor

phenotype-influencing loci. However, under some circum-

stances, this process of intermating (as well as other process-

es) can produce disequilibrium between pairs of unlinked loci

and thus create the possibility of confounding or spurious as-

sociations due to this population stratification . Accordingly,

researchers are advised to employ valid statistical tests for

linkage disequilibrium mapping allowing conduct of genetic

Published online: March 31, 2008

Hemant K. Tiwari, PhD

Department of Biostatistics, Section on Statistical Genetics

Ryals Public Health Building, 420D, University of Alabama at Birmingham

1665 University Blvd., Birmingham, AL 35294 (USA)

Tel. +1 205 934 4907, Fax +1 205 975 2541, E-Mail htiwari@uab.edu

© 2008 S. Karger AG, Basel

0001–5652/08/0662–0067$24.50/0

Accessible online at:

www.karger.com/hhe

Page 2

Tiwari /Barnholtz-Sloan /Wineinger /

Padilla /Vaughan /Allison

Hum Hered 2008;66:67–86

68

found statistical analyses and induce spurious genotype-

phenotype associations.

Experimentally controlling mating type in plant and

animal studies is the most extreme way to control for this

confounding effect and is accomplished with the use of

recombinant inbred strains. However, this is not neces-

sarily feasible for all plant and animal studies and is im-

possible in human genetic research. Concerns about the

effects of population stratification led to the recommen-

dation of using familial data and to the development of

the seminal paper on the transmission-disequilibrium

test (TDT) [5] , based on a related idea proposed by Ru-

binstein et al. [6] and later by Falk and Rubinstein [7] . The

TDT is a family-based association test designed for test-

ing linkage disequilibrium by comparing the proportion

of alleles transmitted versus the proportion not transmit-

ted from informative parental matings (i.e., matings with

at least one heterozygous parent) to affected offspring. By

focusing on affected offspring (i.e., case-only), the TDT

assesses whether the distribution of alleles among affect-

ed children conditional on parental genotypes differs

from what is expected under the null hypothesis of no

linkage and/or no association.

Although effective at eliminating false positives due to

stratification and genetic admixture, TDT type designs

may result in substantially lower power relative to other

types of association studies since they utilize only those

individuals who are informative for allelic transmission

and exclude all others. Population-based association

studies (e.g. the case-control study design) usually have

greater power than family-based and case-only designs

as long as correction for population stratification is prop-

erly modeled. Recently, significant advances have been

made in statistical methodology to control for the poten-

tial confounding effects of population admixture via use

of measures of ‘individual admixture’ and related tech-

niques [8–16] . We refer to such methods as structured as-

sociation testing (SAT). Of equal interest are exciting new

developments in the use of individual admixture esti-

mates for what we call regional admixture mapping

(RAM) [16–21] . In principle, these methods allow re-

searchers to localize genomic regions containing trait-in-

fluencing genes in samples of unrelated individuals.

With novel procedures being proposed at such a rapid

pace, it is difficult for investigators to keep abreast of the

latest methods and their utility. Thus, here we review

many of the statistical procedures which aim to create

valid test statistics for linkage and disequilibrium map-

ping studies that control for confounding due to popula-

tion stratification.

Review of TDT – Association Testing Methods for

Family Data

In the late 1980s and early 1990s, several approaches

were proposed to identify disease genes that combined the

advantages of linkage and population association ap-

proaches [6, 7, 22–24] . These methods typically compared

alleles transmitted from parents to affected offspring

against alleles that were not transmitted, considering the

parental alleles that were not transmitted as ‘pseudo con-

trols’. For example, Rubinstein et al. [6] and later Falk and

Rubinstein [7] proposed a method for calculating the

odds ratio of transmitted vs. non-transmitted alleles to

offspring from parents. They termed this ‘Haplotype Rel-

ative Risk’ (HRR) because they were investigating HLA

haplotypes, and an odds ratio is similar to relative risk if

the disease prevalence is low. It is an unmatched case-con-

trol design comparing frequencies of transmitted alleles

vs. non-transmitted alleles from parents. A similar meth-

od was proposed by Terwilliger and Ott [24] . Ott [23] , who

studied the properties of HRR and theoretically derived

the expected frequencies of transmitted and non-trans-

mitted alleles assuming a recessive disease. Although the

test proposed by Falk and Rubinstein [7] was not a valid

test for linkage [5] , Spielman et al. [5] proposed a valid test

for linkage in the presence of association 1 based on the

idea of Falk and Rubinstein [7] . The transmission disequi-

librium test or TDT is a McNemar [25] test for a matched

case-control design that compares transmitted alleles

from heterozygous parents to an affected offspring with

the expected non-transmitted alleles, assuming there is

no transmission distortion. Here, transmitted and non-

transmitted alleles from heterozygous parents are consid-

ered both as cases and controls, creating a matched case-

control design. Tiwari et al. [26] noted that the informa-

tive families used in TDT designs can be viewed as a

mixture of experimental backcrosses (one heterozygous

parent) and F2 intercrosses (two heterozygous parents) as

an analogy to experimental crosses.

The original TDT design requires the collection of

family trios that include two parents and an affected off-

spring and is limited to di-allelic marker loci, and dichot-

omous traits. Although the TDT method is a valid test for

1

that (A) yields p values less than or equal to no more than 100 * ? % of the

time when the marker is either unlinked to or not associated with a locus

causing variation in the phenotype; and (B) yields p values less than or

equal to ? more than 100 * ? % of the time when the marker is both linked

to and associated with a locus causing variation in the phenotype.

By valid test of linkage in the presence of association, we mean a test

Page 3

TDT and Other Methods to Correct

Population Stratification

Hum Hered 2008;66:67–86

69

linkage, it only has power in the presence of population

association and is robust against population admixture

[27] . There are more than two hundred publications

describing extensions and variations of the original

TDT. Figure 1 shows the distribution of 223 published

extensions and variations of the TDT from 1993 to 2007.

In supplemental table 1 (www.karger.com/doi/10.1159/

000119107), we summarize some (but not all) of the ex-

tensions or variations of the TDT type procedures.

The extensions to the TDT fall mainly in four catego-

ries: (1) relaxing the requirement of only two alleles at the

marker locus; (2) relaxing the requirement of the trait to

be dichotomous; (3) relaxing the requirement of a parent/

offspring trio design, and (4) extension to using genotype

information from the X chromosome (X-linked TDT).

Other extensions to the TDT include multiple loci, Bayes-

ian TDT, multiple phenotypes, parent of origin/imprint-

ing effects, inbreeding, TDT for haplotypes, censored

data, simultaneous and separately modeling of the link-

age and association parameters, and other variations to

increase power; we choose to focus this review mostly on

the four main categories listed above with some discus-

sion of the other extensions.

Relaxing the Requirement of Two Alleles at the

Marker Locus

Several extensions to the TDT have been proposed to

allow for multiple alleles at the marker locus. Bickeboller

and Clerget-Darpoux [28] extended the TDT for multi-

allelic markers by comparing the genotypes formed by

the two transmitted alleles (genotype of index) and the

genotypes formed by the two nontransmitted alleles (in-

ternal control genotype) similar to Terwilliger and Ott

[24] , thus using the information on both parents simulta-

neously. This test of transmission patterns of genotypes

(T g ) was based on the homogeneity test for contingency

table of genotype frequencies. Bickeboller and Clerget-

Darpoux [28] also proposed an allelic test (T c ) based on

testing the complete symmetry of the contingency table

of allele frequencies. In addition, Rice et al. [29] proposed

an extension of the TDT that allows analysis with multi-

allelic markers, and at about the same time Sham and

Curtis [30] introduced an extended TDT (ETDT) based

on a logistic regression procedure. The advantage of the

ETDT is that it can be easily programmed in any stan-

dard statistical software. Other adaptations have fol-

lowed: Morris et al. [31] used a likelihood ratio test. Spiel-

man and Ewens [32] proposed an alternative test of mar-

ginal homogeneity (T mhet ) that is similar to Biekeboller

and Clerget-Darpoux [28] , allowing for multi-allelic

markers. Kaplan et al. [33] used a Monte Carlo approach,

called the MC-T m statistic, and showed that MC-T m is

more powerful than T mhet and ETDT. Cleves et al. [34]

proposed an exact test which is implemented using

an exact algorithm and Monte Carlo-Markov chain

(MCMC) simulation. Finally, Schaid [35] proposed using

each allele separately and then using the maximal TDT

as the test statistic to infer linkage. He also proposed a

class of model-based approaches using conditional likeli-

hood analyzing all alleles simultaneously under specific

genetic models [35] . The maximal TDT statistic, however,

does not follow a chi-square distribution. Bentensky and

Rabinowitz [36] provided a refinement to Bonferroni’s

correction for multiple testing based on maximal span-

ning trees to calculate accurate upper bounds for type 1

error and p values for the maximal TDT.

Relaxing the Requirement of the Trait to Be

Dichotomous

Extensions of the original TDT test of a dichotomous

trait to quantitative traits are mainly based on regression

framework where covariates can be easily modeled. Al-

lison [37] proposed five TDTs for quantitative traits se-

quentially called TDTQ1 to TDTQ5. The first four ver-

sions of these TDTQs were based on extreme-threshold

sampling, and TDTQ5 uses the full distribution of a

quantitative trait. TDTQ5 is the most flexible in the sense

that it can be easily extended to multiple alleles, multiple

0

45

Number of publications

1993

5

10

15

20

25

30

35

40

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

Years

Fig. 1. Distribution of TDT type methods manuscripts published

since 1993.

Page 4

Tiwari /Barnholtz-Sloan /Wineinger /

Padilla /Vaughan /Allison

Hum Hered 2008;66:67–86

70

loci, gene-environment interaction, etc., and it is also

most powerful of the five. TDTQ5 requires family trios

consisting of at least one heterozygous parent and one

child. In TDTQ5, the quantitative trait is regressed on

offspring genotypes while controlling for parental mat-

ing types defined by their genotypes. The test statistic for

TDTQ5 is an F ratio that compares the fit of two models

with or without the genetic effect in a regression frame-

work that includes the offspring’s genotype and parental

mating type. Xiong et al. [38] developed a similar ap-

proach that allows for more than one child per family.

A non-parametric TDT for quantitative traits was in-

troduced independently by Rabinowitz [39] . The advan-

tage of this test lies in its flexibility in modeling multiple

alleles at the marker locus, inclusion of other siblings, and

incorporation of covariates. Sun et al. [40] extended Rabi-

nowitz’s [39] approach to include families with only one

parent available. All these tests assume that model resid-

uals are independent, and therefore they are applicable,

as a test for linkage, only for nuclear family data.

George et al. [41] proposed a regression-based TDT for

linkage between a marker locus and a quantitative trait

locus, treating the trait as the dependent variable and

transmission status along with other predictors and con-

founders, as independent variables. This method does not

require independence of observations, thus allowing for

analysis of extended pedigree data as well, and modeling

any number of covariates. Zhu and Elston [42] proposed

conditional likelihood-ratio test statistics that allow

multi-generational data as well as a test either for linkage

in the presence of allelic association or for allelic associa-

tion in the presence of linkage. Abecasis et al. [43, 44]

proposed a general test of association for quantitative

traits in nuclear families (QTDT) based on Fulker et al.’s

[45] variance components approach. Monks and Kaplan

[46] introduced three extensions to the TDT for quantita-

tive traits: (1) T QP statistic uses genotype information for

parents and their children; (2) T QS uses genotypes for at

least two siblings having different genotypes in the ab-

sence of parental genotypes, and (3) T QPS which was a

combination of T QP and T QS . Note that the T QP statistic is

similar to the statistic proposed by Rabinowitz [39] . Wald-

man et al. [47] proposed a logistic regression framework

instead of the ordinary linear regression for continuous

and categorical data. This framework can be easily ex-

tended to include multiple phenotypes by simply includ-

ing phenotypes as predictors in the regression model, and

it can easily accommodate multiple offspring per nuclear

family. No phenotype distributional assumptions are re-

quired with this approach. Lastly, it does not require

stand alone software and any standard statistical soft-

ware such as SAS or SPSS can be used for the analysis.

Liu et al. [48] offered a unified framework for TDT

analysis for discrete and continuous traits based on a con-

ditional score test that maximizes power to detect small

effects for any distribution in the exponential family, re-

gardless of skewness or kurtosis. Kistner and Weinberg

[49] proposed quantitative trait extension of their log-lin-

ear approach for qualitative traits [50] . Like the log-linear

approach for quantitative traits their quantitative trait

extension allows for population admixture by condition-

ing on parental genotypes.

Relaxing the Requirement of a Parent/Offspring Trio

Design

Parental genotype data are often difficult or impossi-

ble to obtain when studying diseases with adulthood or

late in life onset. Several approaches have been developed

to alleviate the problems that arise from missing and in-

complete parental genotypic data.

Using Information from Unaffected Siblings

When unaffected siblings are available for the study,

their genotype information can be used in tests for al-

lelic transmission. Curtis [51] proposed an extension to

the TDT utilizing only discordant sibling pairs for both

phenotype and genotype. S-TDT, a similar approach de-

veloped by Spielman and Ewens [52] , requires (1) that at

least one affected and one unaffected sibling, and (2) that

all members of the sibship do not have the same genotype

at the marker locus. With these requirements met, the

S-TDT can be used to analyze linkage disequilibrium be-

tween a marker allele and a putative disease allele without

reconstructing parental genotypes and without relying

on allele frequency estimates. Statistically, the S-TDT

tests for significant marker allele frequency differences in

affected offspring compared to their unaffected siblings

[52] . Generally, the S-TDT is less powerful than the TDT

when parental genotypes are available because data on

the preferential transmission of parental alleles is more

informative. In fact, the S-TDT can be used jointly with

the TDT to construct a combined test (C-TDT) using nu-

clear families, trios, and discordant siblings. Schaid and

Rowland [53] showed that the S-TDT is equivalent to the

conditional likelihood with the log-additive effects of the

marker alleles.

The sibling TDT method by Curtis [51] requires ran-

domly selecting one affected sibling and then selecting

Page 5

TDT and Other Methods to Correct

Population Stratification

Hum Hered 2008;66:67–86

71

one unaffected sibling whose marker genotype is differ-

ent from that of affected sibling. To include all available

siblings from the same family, Horvath and Laird [54]

proposed a sibling disequilibrium test (SDT) based on a

standard nonparametric sign test. The SDT is effective in

cases where parental information is not available. The

data design requirement is the same as S-TDT, with the

only difference being that the SDT is a non-parametric

test. In 1998, Boehnke and Langefeld [55] introduced sev-

en association tests for multi-allelic markers which they

represent using a 2 ! k contingency table (k is the num-

ber of alleles at the marker locus). The rows represent the

disease status and columns represent marker alleles. In

some cases these discordant-alleles tests (DATs), (AC 1 ,

AC 2 , and AC ws ) are identical to each other as well as

equivalent to S-TDT but the AC 2 statistics have the best

power overall. Boehnke and Langefeld [55] proposed to

get p values for these DATs by a permutation procedure

involving randomly permuting affection status of the sib-

lings. Risch and Teng [56, 57] noted that one can derive

additional information from the sample by analyzing the

relative frequency of different sibship genotype configu-

rations. This information can then be used to estimate

the proportion of mating type frequencies for a di-allelic

marker. Weinberg [58] proposed a likelihood approach

for families with incomplete parental data. Schaid and

Rowland [59] proposed a score test statistic using parents

as controls, siblings as controls, or unrelated individuals

as controls. Note that their method generalizes the S-

TDT and the DAT. In 2000, Siegmund et al. [60] intro-

duced a test of association in the presence of linkage using

multivariate regression for correlated outcome data to

analyze sibship data.

Using Information from Nuclear Families in the

Absence of Unaffected Siblings and Only One Parent

Available

Bias can arise in the TDT statistic when information

is only available from one heterozygous parent, leading

to higher false positive rates [30] . Sun et al. [61] intro-

duced 1-TDT to detect linkage between candidate locus

and a disease locus using genotypes of affected individu-

als and only one available parent of the affected individ-

ual. The 1-TDT is a valid test of the null hypothesis of no

linkage or association. In 2000, Wang and Sun derived

the sample size needed to detect linkage disequilibrium

for S-TDT and 1-TDT, finding that the required sample

size is roughly the same as for the S-TDT with one af-

fected and one unaffected sibling, and is about twice the

sample size needed for the original TDT [62] . Clayton

[63] , Weinberg [58] , and Cervino and Hill [64] , also pro-

vided extensions to TDT when one parent is missing. Al-

len et al. [65] extended parental controlled association

tests for a di-allelic marker and disease that are valid

when parental genotype data are informatively missing

(i.e. when the missing genotype of parent influences the

probability of the parent’s genotype data being observed).

Also, Allen et al. [66] proposed a multi-allelic extension

of their missingness model [65] which also incorporated

a bootstrap calibration of missing at random (MAR) pro-

cedures to account for informative missingness.

Using Sibship Data Only and Reconstructing Missing

Parental Genotypes

For some families it might be possible to reconstruct

the genotypes of missing parents. However, Curtis [51] ,

Spielman and Ewens [67] and Knapp [68] pointed out that

reconstructing genotypes to achieve more power for the

TDT procedure can introduce bias. Knapp [68] proposed

a statistical procedure to overcome the potential bias in-

duced by the parental genotype reconstruction. Knapp

[68] incorporated a reconstruction approach that cor-

rects for bias into C-TDT and called the resulting proce-

dure the reconstruction combined TDT (RC-TDT). Com-

parisons showed that RC-TDT is more powerful than the

S-TDT.

Using Information on Non-Informative Mating Types

Because no inference on linkage disequilibrium can be

obtained from homozygous parents or other cases of

non-informative transmissions, these types of nuclear

families are not included in the classical TDT analysis.

This problem is often encountered when using binary

markers, such as single-nucleotide polymorphisms

(SNPs), which are highly abundant throughout the ge-

nome and cost effective. The maximum frequency of het-

erozygotes at a binary marker locus in Hardy-Weinberg

equilibrium is 0.5. In this scenario, at least half of the par-

ents would be non-informative in a traditional TDT. An-

alyzing marker haplotypes is a relatively straightforward

solution. However, the haplotype phase is often uncer-

tain, and restricting analyses to pedigrees where the

phase is known may lead to bias. As a result, Clayton [63]

proposed a new approach to TDT methods using tests

based upon score vectors which are averaged over all pos-

sible parental haplotypes and transmissions consistent

with the observed data (TRANSMIT 2.5.4 documenta-

tion: www-gene.cimr.cam.ac.uk/clayton/software). At its

implementation, this approach possessed three distinct

advantages over earlier TDT methods: (1) it could use any