Page 1

Genetic Epidemiology 17(Suppl 1): S169-S173 (1999)

An Empirical Test of the Significance of an

Observed Quantitative Trait Locus Effect

that Preserves Additive Genetic Variation

Stephen J. Iturria, Jeff T. Williams, Laura Almasy, Thomas D. Dyer, and John

Blangero

Department of Genetics, Southwest Foundation for Biomedicai Research, San

Antonio, Texas

We propose a constrained permutation test that assesses the significance of an observed

quantitative trait locus effect against a background of genetic and environmental vari

ation. Permutations of phenotypes are not selected at random, but rather are chosen

in a manner that attempts to maintain the additive genetic variability in phenotypes.

Such a constraint maintains the nonindependence among observations under the null

hypothesis of no linkage. The empirical distribution of the lod scores calculated using

permuted phenotypes is compared to that obtained using phenotypes simulated from

the assumed underlying multivariate normal model. We make comparisons of uni-

variate analyses for both a quantitative phenotype that appears consistent with a mul

tivariate normal model and a quantitative phenotype containing pronounced outliers.

An example of a bivariate analysis is also presented. ® 1999 Wiley-Liss, Inc.

Key words: permutation test, power, quantitative trait locus, statistical genetics

INTRODUCTION

Consider the random effects model for phenotypic variation in which the quantitative

phenotype of an individual, y, is determined as

2/ = μ + 7 + 3 + ε,

where μ is the population mean phenotype, 7 is the effect due to a single quantitative trait

locus (QTL), g is a background polygenic effect, and e is a random environmental effect.

Address reprint requests to Dr. Stephen J. Iturria, Department of Genetics, Southwest Foundation for

Biomedicai Research, 7620 NW Loop 410, P.O. Box 760549, San Antonio, TX 78245-0549.

1999 Wiley-Liss, Inc.

Page 2

S170 Iturria et al.

Phenotypic variances and covariances are of the form

V<x(y) = a2

a+a2

g+a2

e

and

Co\(y1,y2) = ncl + 2<l>a2

g

where σ \ and σ2 are, respectively, the genetic variances due to the QTL and the background

poly genie effect, both assumed to be additive, σ2 is the variance of the random environmental

effect, π is the probability of randomly selected QTL alleles being identical by descent

(IBD), and φ is the kinship coefficient. In matrix form, the variance-covariance matrix of

observed phenotype vector y is

Ω = η σ2

α +2Φ σ2

+Ισ Ι

[Amos, 1994]. Test statistics used for testing H0 : σ \ = 0 vs. Hx : σ \ > 0 are often a

function of the likelihood ratio, LR, where LR is the ratio of the maximized likelihoods

under Hx and H0, respectively, assuming a multivariate normal distribution for y. For

example, the lod score test statistic is given by log10(LÄ). Provided mat the data are indeed

multivariate normal, under H0 the large-sample distribution of 2 \n(LR) is known to be a

| : | mixture of a chi-square distribution with one degree of freedom and a point mass

at zero [Self and Liang, 1987]. If the underlying distribution of the phenotypes departs

significantly from the assumed model, however, p-values based on the chi-square mixture

distribution may be misleading.

METHODS

The empirical method we propose for assessing the significance of a lod score mea

suring the QTL effect while preserving the background additive genetic variation is carried

out as follows:

(1) Fit the multivariate normal model of the last section under the constraint of no QTL

effects {σ \ = 0) to the observed data.

(2) Use the parameter estimates in (1) to simulate phenotypes from the model without

QTL effects.

(3) Pair the simulated phenotypes from (2) with the observed phenotypes in a one-to-

one manner, attempting to minimize the total of the distances between simulated and

observed phenotypes.

(4) Take the permutation of observed phenotypes defined by the pairing in (3) and recom

pute the lod score.

(5) Return to (2) until a "large" number of lod scores under the QTL-free model has been

computed.

(6) Report the p-value for the original lod score as the proportion of permutation based

lod scores that are larger than the original lod.

The pairing of observed phenotypes with simulated phenotypes in (3) is done without

regard to family identity. Our aim is simply to match simulated phenotypes with observed

phenotypes in a manner that makes paired values as close as possible. In this way the

Page 3

An Empirical Test of QTL Significance S171

permuted phenotypes maintain much of their familial correlation which is desirable since

our goal is the assessment of the statistical significance of an observed QTL in the presence of

background genetic variation. If completely random permutations of phenotypes were used

for recomputing lod scores, all of the genetic variation in phenotypes would be removed.

Pairing phenotypes is easy in the case of univariate phenotypes—one simply orders the

simulated and observed phenotypes and pairs them based on ranks. If any ties occur during

pairing due to equal values of observed phenotypes, the simulated phenotype is paired at

random to one of the tied observed phenotypes.

Pairing multivariate phenotypes is more difficult than in the univariate case because

there is no obvious way to "rank" phenotypes in multidimensional space in a manner that

ensures optimal pairing based on proximity. We have formulated an ad hoc algorithm

for pairing multivariate phenotypes that we have found preserves much of the additive

genetic heritability in multivariate phenotypes, although it is not guaranteed to optimize

any measure of proximity per se. The algorithm is as follows:

(1) Identify the observed phenotype having the greatest mean distance (in multidimen

sional space) from the set of simulated phenotypes.

(2) Pair the observed phenotype identified in (1) with the simulated phenotype to which it

is closest.

(3) Remove the phenotype pair in (2) from consideration and return to (1) until all pheno

types have been paired.

COGA EXAMPLES

To illustrate the effect of pronounced outliers on the sampling distribution of the

lod score under the QTL-free model, we have applied our method to the P300 Cz event-

related brain potential (Cz) phenotype and to the platelet monoamine oxidase (MAO) level

phenotype. After adjusting for covariates age, age2, sex, sex x age, race (black/white/other),

and smoking status (Y/N), the distribution of the 599 Cz phenotypes was found to be highly

symmetric about the mean, with all phenotypes within four standard deviations of the

mean and all but four within three standard deviations of the mean, presenting no evidence

that the population distribution of the Cz phenotype is inconsistent with a random effects

multivariate normal model. For the 869 MAO phenotypes, however, there were three

pronounced outliers all at least nine standard deviations from the mean, indicating that such

a model is not appropriate. Using the covariate-adjusted Cz data, a genome scan for QTLs

was performed using the variance component method as implemented in the computer

program SOLAR [Blangero and Almasy, 1997]. The peak lod score of 3.68 was found

on chromosome 6 at 198 cM. Employing the IBD matrix calculated at this position, for

each phenotype 7,000 data sets were simulated for the purpose of computing the empirical

cumulative distribution function (CDF) of the lod score.

To illustrate the degree to which the sampling distribution of the lod scores for the

Cz and MAO phenotypes differ from that expected under the assumed multivariate normal

model, note the empirical cumulative distribution plots in Figure 1. Each plot contains two

empirical CDFs for 7,000 lod scores: one for phenotypes simulated under the multivariate

normal model and one for phenotypes permuted to pair up with the simulated phenotypes.

These CDFs are labeled "Simul." and "Mapped," respectively. For the Cz phenotype,

Page 4

S172 Iturria et al.

(A) C.D.F plot for Cz (B) Blowup of C.D.F plot for Cz

/ " "

/

1

Simul.

Mapped

Ö

s

Simul.

Mapped

0.8 1.0 1.2

LOO

(C) C.D.F plot for MAO

//

Simul.

Mapped

(D) Blowup of C.D.F plot for MAO

« ä f/-'-""

!

Simul.

Mapped

Fig. 1. Lod score empirical distributions for the Cz and MAO phenotypes

the strong correspondence between the two functions indicates that the Cz phenotype is

modeled very well by a multivariate normal model. This is not the case, however, for

the MAO phenotype, where the discrepancy between the empirical CDFs suggests that the

p-value of a lod score computed with the chi-square mixture distribution is an inappropriate

measure of significance.

In addition to computing test statistics, estimates of additive genetic heritability were

retained using both simulated and permuted phenotypes. For Cz, the simulated pheno

types yielded a mean estimated heritability of 0.356 and the permuted phenotypes a mean

estimated heritability of 0.353, so essentially all of the additive genetic variation was main

tained. For MAO, the mean estimated heritability was 0.763 for simulated phenotypes

and 0.546 for permuted phenotypes, so approximately 70% of the genetic variation was

preserved.

To illustrate our permutation method for a multivariate phenotype we considered

jointly both the Cz and Pz P300 event-related brain potential phenotypes. Similar to

Cz, after adjusting for covariates the 599 measurements of the Pz phenotype were highly

symmetric about the mean, all were within four standard deviations of the mean, and

all but three were within three standard deviations of the mean, giving no evidence that

the underlying distribution of the Pz phenotype is inconsistent with a normal distribution.

Mean estimated heritabihties for Cz and Pz computed with 5,000 data sets simulated from

a bivariate normal were 0.404 and 0.487, respectively. Mean heritabihties for Cz and Pz

after permuting phenotypes with our multivariate mapping algorithm were 0.343 and 0.409,

respectively. Therefore for each phenotype, on average, about 85% of the additive genetic

variation was preserved by the permutation. A plot of the empirical distribution functions

for the 5,000 lod scores calculated using both the simulated and permuted phenotypes is

Page 5

An Empirical Test of QTL Significance S173

(A) C.D.F plot for (Cz.Pz) (B) Blowup of plot for (Cz.Pz)

O

Probability

0.7 0.8 0.9

1

(0

ci

u>

ö

Sknul.

Mapped

0.0 0.5 1.0 1.5 2.0 2.5

LOD

I

a .

* ~

s .

* .

ö

ί» -

8 .

ö

ii

Simul.

Mapped

0.0 0.5 1.0 1.5 2.0 2.5

LOO

Fig. 2. Lod score empirical distributions for the Divariate phenotype (Cz, Pz).

given in Figure 2. We see that the empirical distribution functions are virtually the same,

suggesting that hypothesis tests for QTL effects based on multivariate normal theory are

appropriate for the (Cz, Pz) bivariate phenotype.

DISCUSSION

We are not suggesting that one need use a procedure such as ours for all but the most

well-behaved phenotypes. Our experience has been that the classical approach to p-value

estimation based on the chi-square distribution works very well for most of the data sets

we have encountered in practice. If one must include individuals with phenotypes that are

not consistent with the assumed underlying model, however, an empirical procedure may

be more appropriate. We would encourage the use of an empirical procedure such as the

one we have proposed as a diagnostic to check on model assumptions. If the empirical

distribution of the test statistic computed from phenotypes simulated under the assumed

model differs substantially from that based on the permuted phenotypes, one should at least

consider basing inferences on the latter and perhaps also reconsider the model assumptions.

ACKNOWLEDGMENTS

This research was supported in part by National Institutes of Health grants HL45522,

HL28972, GM31575, and MH59490.

REFERENCES

Amos CI (1994): Robust detection of genetic linkage by variance components analysis. Am J Hum Genet

54:535-543.

Blangero J, Almasy L (1997): Multipoint oligenic linkage analysis of quantitative traits. Genet Epidemiol

14:959-967.

Self SG, Liang K-Y (1987): Asymptotic properties of maximum likelihood estimators and likelihood ratio

tests under nonstandard conditions. J Am Stat Assoc 82:605-610.