Page 1

Am. J. Hum. Genet. 62:1198–1211, 1998

1198

Multipoint Quantitative-Trait Linkage Analysis in General Pedigrees

Laura Almasy and John Blangero

Department of Genetics, Southwest Foundation for Biomedical Research, San Antonio

Summary

Multipoint linkage analysis of quantitative-trait loci

(QTLs) has previously been restricted to sibships and

small pedigrees. In this article, we show how variance-

component linkage methods can be used in pedigrees of

arbitrary size and complexity, and we develop a general

framework for multipoint identity-by-descent (IBD)

probability calculations. We extend the sib-pair multi-

point mapping approach of Fulker et al. to general rel-

ative pairs. This multipoint IBD method uses the pro-

portion of alleles shared identical by descent at

genotyped loci to estimate IBD sharing at arbitrary

points along a chromosome for each relative pair. We

have derived correlations in IBD sharing as a function

of chromosomal distance for relative pairs in general

pedigrees and provide a simple framework whereby

these correlations can be easily obtained for any relative

pair related by a single line of descent or by multiple

independent lines of descent. Once calculated, the mul-

tipoint relative-pair IBDs can be utilized in variance-

component linkage analysis, which considers the likeli-

hood of the entire pedigree jointly. Examples are given

that use simulated data, demonstrating both the accu-

racy of QTL localization and the increase in power pro-

vided by multipoint analysis with 5-, 10-, and 20-cM

marker maps. The general pedigree variance component

and IBD estimation methods have been implemented in

the SOLAR (Sequential Oligogenic Linkage Analysis

Routines) computer package.

Introduction

Methods of linkage analysis that exploit identity-by-de-

scent (IBD) allele sharing between pairs of relatives are

Received January 27, 1998; accepted for publication March 13,

1998; electronically published April 17, 1998.

Address for correspondence and reprints: Dr. Laura Almasy, De-

partment of Genetics, Southwest Foundation for BiomedicalResearch,

7620 Northwest Loop 410, P.O. Box760549,SanAntonio,TX78245-

0549. E-mail: almasy@darwin.sfbr.org

? 1998 by The American Society of Human Genetics. All rights reserved.

0002-9297/98/6205-0026$02.00

widely used in the genetic analysis of complex traits as

these methods generally require few assumptions about

the genetic model underlying expression of the trait.

There are a limited range of IBD allele-sharing methods

that can be used for quantitative-trait linkage analysis.

The best known of these is the sib-pair approach of

Haseman and Elaston (1972). Recently, variance-com-

ponent linkage analysis methods, which are more pow-

erful than relative pair–based approaches and have the

added advantage of providing reasonable estimates of

the magnitude of effect of the detected locus, have been

developed (Goldgar 1990; Schork 1993; Amos 1994;

Blangero and Almasy 1997). These variance-component

methods have been extended to accommodate general

pedigrees of arbitrary size and complexity (Comuzzie et

al. 1997) and to allow analyses that include genotype

# environment interaction (Blangero 1993; Towne et

al. 1997), epistasis (Stern et al. 1996; Mitchell et al.

1997), threshold models for discrete traits (Duggirala et

al. 1997), and pleiotropy (Almasy et al. 1997c), as well

as multivariate and oligogenic analyses (Schork 1993;

Almasy et al. 1997c; Blangero and Almasy 1997; Wil-

liams et al. 1997).

Multipoint linkage analysis increases the power to de-

tect true linkages and decreases the false-positive rate.

When linkage is detected, multipoint analysisalsoallows

support or confidence intervals to be determined for the

location of a gene. To date, practical application of mul-

tipoint IBD methods has been confined to sibships or

small pedigrees (Fulker et al. 1995; KruglyakandLander

1995; Kruglyak et al. 1996; Todorov et al. 1997), al-

though there have been some recent promising devel-

opments utilizing computer-intensive Monte Carlo–

based techniques (Sobel and Lange 1996; Heath 1997;

Heath et al. 1997) in large pedigrees.

The development of variance-component linkage

methodologies for use in extended families has created

a need for a multipoint IBD method suitable for use in

such pedigrees. In general, the computational burden for

exact multipoint calculations is considerable even in nu-

clear families and is prohibitive in large pedigrees. To

alleviate this problem, Fulker et al. (1995) developed a

multipoint approximation for sib pairs that uses a linear

function of IBD values at genotyped markers to estimate

IBD sharing at arbitrary chromosomal locations. The

Fulker method is based on the evaluation of average

Page 2

Almasy and Blangero: Multipoint QTL Analysis in Pedigrees

1199

number of alleles shared IBD for a pair of siblings and,

although much less computationally expensive, has been

shown to be as effective as maximum-likelihood esti-

mation of the exact multipoint IBD distribution (Fulker

and Cherny 1996). In this article, we extend this simple

approach to allow multipoint analysis in pedigrees of

unlimited size and complexity. After presenting ageneral

variance-components framework for oligogenic quan-

titative-trait linkage analysis in arbitrary pedigrees, we

derive a series of functions for the correlation between

loci in IBD sharing as a function of chromosomal dis-

tance in relative pairs found in extended families, in-

cluding pairs as distant as third cousins (seventh-degree

relatives) and relatives related through multiple lines of

descent, such as double–first cousins and double–second

cousins. We then demonstrate the power and accuracy

of the method by using simulation techniques.

Method

Variance-Component Linkage Analysis in General

Pedigrees

Thepedigree-basedvariance-component linkage

method uses an extension of the strategy developed by

Amos (1994) to estimate the genetic variance attribut-

able to the region around a specificgeneticmarker.Gold-

gar (1990) and Schork (1993) have proposed similar

variance-component models. This approach is based on

specifying the expected genetic covariances between ar-

bitrary relatives as a function of the IBD relationships

at a quantitative-trait locus (QTL). The modeling frame-

work used in variance-componentanalysisisremarkably

general (Lange et al. 1976; Hopper and Mathews 1982),

although it is also parsimonious with regard to the num-

ber of parameters that are required to be estimated rel-

ative to that needed in penetrance model–based linkage

analysis. Also, unlike most penetrance model–free link-

age analysis methods, the variance-component method

can be used both for localization of QTLs and for ob-

taining good estimates of the relative importance of the

QTL in determining phenotypic variance in the popu-

lation (Amos et al. 1996; Blangero and Almasy 1997;

Williams et al. 1997).

Let thequantitativephenotype,y,bewrittenasalinear

function of the n QTLs that influence it:

?

i?1

n

y ? m ?

g ? e ,

i

(1)

where m is the grand mean, giis the effect of the ith

QTL, and e represents a random environmental devia-

tion. Assume giand e are uncorrelated random variables

with expectation 0 so that the variance of y is

. We also allow for both additive and dom-

?

j ? j

i?1

g

e

i

2

y

j ?

n

22

inance effects, and therefore

the additive genetic variance due to the ith locus and

is the dominance variance. If we assume two allelic

jdi

variants, Q and q with frequencies of pQand (

at a given QTL, the genotype-specific means are given

by

m

? m ? a, m

? m ? d

QQQq

QTL-specific genetic variances are given by

2p (1 ? p )[a ? (1 ? 2p )d]

QQQ

For such a simple random effects model, we can easily

obtain the expected phenotypic covariance between the

trait values of any pair of relatives as

?

i?1

, whereis

2

gi

222

j ? j ? jj

aidiai

2

)1 ? pQ

, andand the

j ?

m

? m ? a

qq

2

ai

and.

222

j ? [2p (1 ? p )d]

diQQ

n

22

[]

Cov(y ,y ) ?

1

(k /2 ? k )j ? k j

1i

2i

,(2)

2

ai

2i di

where

represent the k coefficients of Cotterman (1940) with kji

being the ith QTL-specific probability of the pair of rel-

atives sharing j alleles IBD. Similarly, the expected phe-

notypic correlation between any pair of relatives is given

by

?

i?1

, and the k termsCov(y ,y ) ? E[(y ? m)(y ? m)]

1212

n

22

i

[]

r(y ,y ) ?

1

(k /2 ? k )h ? k d

1i

2i

,(3)

2

ai

2i

where

iance due to the additive genetic contribution of the ith

QTL, andis the proportion due to the dominance

di

effect. In the classical quantitative genetic variance-com-

ponent model, we do not have information on specific

QTLs but utilize the expectation of the k probabilities

over the genome to obtain the following approximation:

is the proportion of the total phenotypic var-

2

hai

2

22

Cov(y ,y ) ≈ 2fj ? d j ,

is the total additive genetic variance,

j

i?1

ai

is the total dominance genetic variance,

is the expected kinship coefficient

f ? E[(k /2 ? k )]

1i

2i

2

over the genome with2f ? R

efficient of relationship, and

probability of sharing 2 alleles IBD. Because we are gen-

erally interested in theexamination ofoneorafewQTLs

at a time, we exploit the above approximation to reduce

the number of parameters that need to be considered.

For example, if we are focusing on the analysis of the

ith QTL in equation (1), we can absorb the effects of

all of the remaining QTLs in residual components of

covariance. Employing these residual covariance terms,

the expected phenotypic covariance between relatives is

well approximated by

(4)

12

a

t d

where

2

j ??

d

2

n

2

j ??

a

n

j

i?1

2

di

1

giving the expected co-

is the expected

d ? E[k ]

t

2i

222

g

2

Cov(y ,y ) ? p j ? k j ? 2fj ? d j ,

12

i ai

(5)

2i di

t d

whereis the coefficient of relationship

p ? (k /2 ? k )

i

1i

2i

Page 3

1200

Am. J. Hum. Genet. 62:1198–1211, 1998

or the probability of a random allele being IBD at the

ith QTL, represents the residual additive genetic var-

jg

iance, andnow represents the residual dominance

jd

genetic variance. The p and k2coefficients and their ex-

pectations effectively structure the expected phenotypic

covariances and are the basis for much of quantitative-

trait linkage analysis such as the sib-pair difference

method of Haseman and Elston (1972). For any given

chromosomal location, p and k2can be estimated from

genetic marker data and information on thegeneticmap.

Given the simple model for phenotypic variation de-

scribed above, it is possible to use data from pedigree

structures of arbitrary complexity to make inferences

regarding the localization and effect sizes of QTLs. For

the simple additive model in which n QTLs and an un-

known number of residual polygenes influence a trait,

the covariance matrix for a pedigree can be written

2

2

n

22

g

2

e

ˆ

Q ?

P j ? 2Fj ? Ij ,

i ai

(6)

?

i?1

where

predicted proportion of genes that individuals j and l

share IBD at a QTL that is linked to a genetic marker

locus, F is the kinship matrix, and is an identitymatrix.

is a function of the estimated IBD matrix for a genetic

ˆPi

marker itself () and a matrix ofcorrelations

ˆP

m

the proportions of genes IBD at the marker and at the

QTL

is the matrix whose elements ( ) provide the

ˆPp

iijl

I

between

B

ˆˆ

P ? 2F ? B(r,v) , (P ? 2F) ,

i

(7)

m

where v is the recombination frequency between marker

locus m and QTL i, and the elements b ? r(p ,p Fr,v)

are the correlations between theIBDprobabilities,where

r denotes rth type of kinship relationship. Equation (7)

is a matrix generalization of the results provided by

Amos (1994). The r-functions provide the autocorre-

lation functions between IBD probabilities as a function

of genetic distance, and they also allow prediction of the

matrix at any chromosomal location given

ˆ

P

at correlated locations (e.g., when v ! .5). Derivation of

the r functions for arbitrary pedigree relationships is

provided below.

By assuming multivariate normality as a working

model within pedigrees, the likelihood of any pedigree

can be easily written and numerical procedures can be

used to estimate the variance-component parameters.

For the model in equation (6), the ln-likelihood of a

pedigree of t individuals with phenotypic vector

given by

ijim

estimates

ˆ

P

is

y

22

g

2

e

ln L(m,j ,j ,j ,bFy,X)

ai

t

2

1

2

1

2

?

?1

? ?

ln (2p) ?

ln FQF ?

DQ D ,(8)

where m is the grand trait mean,

is a matrix of covariates, and

coefficients associated with these covariates. Likelihood

estimation assuming multivariate normality can be

shown to yield consistent parameterestimatesevenwhen

the distributional assumptions are violated (Beaty et al.

1985; Amos 1994). By performing an extensive series

of simulations, we have confirmed the consistency of

variance-component estimates of genetic effect size

(Blangero and Almasy 1997; J. T. Williams and J. Blan-

gero, unpublished data).

Using the variance-component model, we can test the

null hypothesis that the additive genetic variance due to

the ith QTL equals zero (no linkage) by comparing the

likelihood of this restricted model with that of a model

in which the variance due to the ith QTL is estimated.

The difference between the two log10likelihoods pro-

duces a LOD score that is the equivalent of the classical

LOD score of linkage analysis. Twice the difference in

logelikelihoods of these two models yields a test statistic

that is asymptotically distributed as a

variable and a point mass at zero (Self and Liang

x1

1987). When multiple QTLs are jointly considered, the

resulting likelihood-ratio test statistic has a more com-

plex asymptotic distribution that continues to be a mix-

ture of x2distributions.

This basic model has been extended to incorporate a

number of more complex genetic models by allowing

for additional sources of genetic and nongenetic vari-

ance. In multilocus models, an additive # additive com-

ponent of epistatic variance can be estimated by use of

the Hadamard product of

P

the coefficient matrix that structures the expected co-

variances among pedigree members (Mitchell et al.

1997). Dominance # dominance, additive # domi-

nance, and dominance # additive variance components

also can be specified by Hadamard products of appro-

priate andcoefficient matrices. For example, al-

P

K2

lowing for additive # additive interactions betweentwo

QTLs leads to the following equation for the phenotypic

covariance matrix of a pedigree:

,

D ? (y ? m ? Xb) X

is thematrixofregression

b

:mixture of a

1 1

2 2

2

matrices for each locus as

222

ˆˆˆˆ

Q ? P j ? P j ? (P , P )j

1 a12 a2

?

i?3

12

a1#a2

n

22

g

2

e

ˆ

?

P j ? 2Fj ? Ij .

i ai

(9)

A household or shared environment effect can be added

by an additional variance component with a coefficient

matrix ( ) whose elements are 1 if the relative pair in

H

Page 4

Almasy and Blangero: Multipoint QTL Analysis in Pedigrees

1201

Figure 1

Half–grand-avuncular pair (blackened symbols)

question shares the environmental exposure or 0 oth-

erwise. This simple incorporation of a shared household

component leads to the following model for the phe-

notypic covariance matrix

?

i?1

n

22

g

2

h

2

e

ˆ

Q ?

Pj ? 2Fj ? Hj ? Ij .

i ai

(10)

The

method, with all of the above described extensions, has

been implemented in a computer analysis package called

Sequential Oligogenic Linkage Analysis Routines (SO-

LAR), which employs the computer programs FISHER

and SEARCH (Lange et al. 1988) for likelihood optim-

ization in quantitative-trait analysis. For any of the com-

plex genetic models described above, SOLAR can also

incorporate covariate effects as well as multivariate

quantitative traits (Almasy et al. 1997c); discrete traits,

by use of a threshold model (Duggirala et al. 1997);

mixed discrete/quantitative-trait analyses; and genotype

# environment interaction (Towne et al. 1997).

general pedigreevariance-componentlinkage

Estimation of the IBD Probability Matrix for a Genetic

Marker

In the above formulation, all of the information re-

garding linkage is a functionof theestimated

For a given genetic marker, a number of methods have

been proposed to calculate this IBD probability matrix

(Amos et al. 1990; Curtis and Sham 1994; Whittemore

and Halpern 1994). One simple and effective approach

is to perform pairwise likelihood-based estimationof the

elements of amatrix by calculatingtheposteriorprob-

ˆPi

ability of genotypes at a completely linked pseudomar-

ker at which there is an extremely rare allele (i.e., an

allele frequency less than the expected mutation rate).

With this approach, the p for each pair of individuals

is evaluated by randomly assigning the rare homozygous

pseudomarker genotype to one of the individuals and

then calculating the likelihoods of seeing the three pos-

sible pseudomarker genotypes in the other individual

conditional on the marker information in the complete

pedigree. From the resulting posterior probabilities, it is

simple to calculate the three locus-specific k coefficients

for any marker and then to calculate the

method is relatively rapid for simple pedigrees but can

become tedious in complex pedigrees, especially ones

with multiple inbreeding loops. Any software that can

calculate two-point pedigree likelihoods can be used for

calculating IBD probabilities.

A second alternative for pedigrees of arbitrarysizeand

complexity is to calculate an estimate of all the elements

of thematrix jointly by Monte Carlo techniques.

Pi

When there is no missing genetic marker information in

a pedigree, exact IBD probabilities can rapidly be cal-

matrices.

ˆPi

estimate.This

p

culated by use of the algorithm of Davis et al. (1996).

Therefore, Monte Carlo methods can be used to impute

marker genotypes for individuals not typed in a pedigree

conditional on all other marker and pedigree informa-

tion. Once the marker genotype vector is filled in by

such a process, the exact maximum likelihood estimate

of p can be obtained immediately. The results of many

such imputations can be averaged by by use of the like-

lihood of the imputed marker genotype vector as a

weighting factor. There are many possible variations of

such a Monte Carlo approach, but all methods require

substantial computing for large pedigrees.

We use both of these approaches in our computer

program, SOLAR, and have noticed few differences be-

tween them across a wide range of practical applications

including extensive computer simulations. A practical

benefit of both approaches is the independence of major

aspects of the calculations, which renders the estimation

problem infinitely scalable with regard to parallel

computation.

Although itis comparativelystraightforwardtoobtain

an estimate of thematrix for any genetic marker,exact

P

calculation of multipoint IBD probabilities given a num-

ber of genetic markers is formidable except for relatively

small and simple pedigrees. Since it is well known that

exploitation of multipoint information can dramatically

improve the power to detect QTLs, fast and accurate

approximate methods would be of great benefit. In the

next section, we outline our approach to obtaining such

approximate multipoint IBD probabilities for any chro-

mosomal location.

Derivation of IBD Correlation Formulas for Multipoint

Analysis

Given the simplicity and accuracy of the Fulker

method (Fulker et al. 1995) for approximating multi-

Page 5

1202

Am. J. Hum. Genet. 62:1198–1211, 1998

Table 1

Possible Two-Locus Combinations of p for Relative Pairs Able to

Share Only One Allele IBD

SECOND LOCUS

FIRST LOCUS

,

p ? 0 i ? 0

1

,

1

2

p ?

1

i ? 1

TOTAL p

p ? 0, j ? 0

2

1

p ?

2

2

Total p

P00

P01

P10

P11

1 ? 2E(p)

,

j ? 1

2E(p)

1 ? 2E(p) 2E(p)

Table 2

Formulas for p11in Relative Pairs Related by a Single Line of

Descent

Type of Relative Pair

p11

Direct descent

Half-avuncular or

half-cousin

Full avuncular

Full cousin

(d?1)(d?1)

(1/2 )(1? v)

(d?1)(d?2)22

(1/2

1/2 (1? v)

d

1/2 (1? v)

)(1? v)

(d?2)

[v ? (1? v) ]

(2? 5v ? 8v ? 4v )

(d?3)

(2? 8v ? 15v ? 12v ? 4v )

d

23

234

NOTE.—d represents the degree of relationship.

point calculations for sib pairs, we decided to generalize

this approach to arbitrary pedigree relationships. Such

a general average sharing method requires that we for-

mulate all possible r(pi, pjFr, v) functions (i.e., IBD prob-

ability autocorrelation functions), which can be used to

provide the expected correlation in IBD between geno-

typed marker loci and any chromosomal location with

a known position relative to these marker loci. The cor-

relation in the proportion of alleles shared IBD by a

relative pair over some chromosomal distance can be

expressed with a simple formula:

Cov(p ,p )

j(p )j(p )

1

12

r(p ,p Fr,v) ?

12

,(11)

2

where Cov(p1,p2) is the covariance in IBD allele sharing

between locus 1 (the genotyped marker) and locus 2 (the

arbitrary chromosomal location at which IBD sharing

is being estimated), and j(p1) and j(p2) are the expected

standard deviations in IBD allele sharing at the two loci.

These standard deviations depend on the degree of re-

lationship between the relative pair under consideration

and will be the same for the two loci. Thus, the denom-

inator reduces to the expected variance in theproportion

of alleles shared IBD for the type of relative pair, Var(p).

This variance in IBD sharing can be calculated by use

of the formula

Var(p) ? E(p ) ? E(p)

is the expected IBD sharing for the relative pair.2f

The covariance is a simple function involving each

possible value of p at locus 1 and locus 2, adjusted by

E(p) and weighted by the probability of observing the

two-locus combination of p, pij:

?

ij

where

22

E(p) ?

[][]

Cov(p ,p ) ?

1

p p ? E(p) p ? E(p) .

ij

1

(12)

22

For unilineal relative pairs, the number of alleles shared

IBD (i and j) at locus 1 and locus 2 will take the values

0 and 1, with the resulting IBD probabilities p1and p2

being i/2 and i/2, which yields four possible two-locus

combinations of p. For bilineally related pairs able to

share two alleles IBD, such as siblings, i and j may also

be 2, resulting in nine possible combinations. If inbreed-

ing is present, i and j may equal 4 when both members

of a pair are autozygous for the same ancestral allele.

This results in 16 potential two-locus combinations of

allele sharing.

The process of obtaining the r function for any class

of unilineal relationship is straightforward and may best

be described by example.

An Example: Half–Grand-Avuncular Pairs

A half–grand-avuncular pair (fig. 1) are fourth-degree

relatives for whomVar(p) ? 7/256

They may share 0 or 1 alleles IBD, yielding p values of

0 and. To obtain the covariance for this relative pair,

2

we will need the probabilities of observing the four pos-

sible two-locus combination of p (table 1).

These probabilities can be determined by calculating

the probability of p2equaling 0 or

possible patterns of recombination between the two loci.

For example, let p11be the probability that 1 allele is

shared IBD at the second locus (

allele is shared at the first locus (

account all possible patterns of recombination. In figure

1, individuals 4 and 9 represent a half–grand-avuncular

pair. The probability that they share 1 allele IBD at the

first locus () is . Any allele shared IBD by 4 and

p ?

1

28

9 is necessarily also shared by intervening individuals 5

and 7. For 4 and 9 to share an allele at the second locus,

transmission from 2, the father of the half-sibs, to his

sons 4 and 5 must be either both nonrecombinant with

probability , or both recombinant with proba-

(1 ? v)

bility v2. In addition, transmissions from 5 to his son 7

and from 7 to 9 must both be nonrecombinant with

probability. Thus,

(1 ? v)

.

v)

For pairs related by a single line of descent, p11can

be calculated simply from one of four formulas provided

in table 2. These formulas use the degree of relationship

between the members of the pair and differ by whether

the pair is related through a direct line of descent(grand-

parental relationships), a half-sibling pair (half-avun-

cular and half-cousin relationships), a full sibling pair

descending through only one sib (full avuncular rela-

and.E(p) ? 1/16

1

, given p1and all

1

2

), given that 1

) and taking into

1

2

p ?

2

p ?

1

1

2

11

2

1

8

222

p

?

[v ? (1 ? v) ](1 ?

11

2

Page 6

Almasy and Blangero: Multipoint QTL Analysis in Pedigrees

1203

Table 3

Correlation Coefficients for IBD Allele Sharing in Various Types of Relative Pairs

RelationCorrelation in Proportion of Alleles Shared IBDE(p)Var(p)

Sibs

Half-sibs

Avuncular

Grandparent

First cousin

Half-avuncular

Grand-avuncular

Great-grandparent

Half–first cousin

First cousin, once

removed

Half–grand-avuncular

Great-grand-avuncular

Great-great-

grandparent

Second cousin

Half-cousin, once

removed

First cousin, twice

removed

Half–second cousin

Second cousin, once

removed

Third cousin

2

1 ? 4v ? 4v

1 ? 4v ? 4v

1 ? 5v ? 8v ? 4v

1 ? 2v

16

1 ?

v ? 10v ? 8v ? v

3

16

2

1 ? 4v ?

v ? v

3

1426

1 ?

v ?

v ? 8v ? v

33

84

2

1 ? v ? v

33

32

2

1 ?

v ? 8v ?

7

1

2

1

4

1

4

1

4

1

8

1

8

1

8

1

8

1

16

1

8

1

16

1

16

1

16

3

64

3

64

3

64

3

64

7

256

2

23

8

3

234

8

3

3

8

3

234

48

7

16

7

34

v ?

v

40

7

32

7

36

7

92

7

108

7

48

7

100

64

7

v

64

7

16

7

2345

1 ?

1 ?

1 ?

v ?

v ? 8v ?

80

v ?

7

v ?

v ?

v ?

v

1

16

1

16

1

16

7

256

7

256

7

256

16

7

234

v ?

16

7

2345

v ?

v ?

v ?

v

7

24

7

32

5

24

7

88

5

8

7

80

3

23

1 ?

1 ?

v ?

v ?

v ? v

v ?

1

16

1

32

7

256

15

1024

344

15

32

3

32

15

23456

v ?

v ?

v ?

v

16

3

176

15

208

15

128

15

32

15

2345

1 ?

v ?

v ?

v ?

v ?

v

1

32

15

1024

32

5

192

31

88

5

512

31

80

3

344

15

32

3

32

15

5

23456

1 ?

1 ?

v ?

v ?

v ?

v ?

v ?

768

31

v ?

672

31

v ?

320

31

v

1

32

1

64

15

1024

31

4096

64

31

2346

v ?

v ?

v ?

v

224

31

512

63

720

31

1888

63

1328

31

4096

63

1008

31

384

31

64

31

234567

1 ?

1 ?

v ?

v ?

v ?

v ?

v ? 48v ?

v ?

v ?

1664

21

v ?

928

21

v

1

64

1

128

31

4096

63

16384

5632

63

128

9

128

63

2345678

v ?

v ?

v ?

v ?

v

tionships), or a full sibling pair descending through both

sibs (full-cousin relationships).

The probabilities for the remaining two-locus sharing

states may besimilarly derivedfromthepossiblepatterns

of recombination, or for pairs that can share only 0 or

1 alleles IBD, they can be obtained by subtracting from

the marginal totals for single-locus sharing of 0 or 1

allele IBD (table 1). For the half–grand-avuncular pair

described above,

1

8

1

8

1

8

222

[]

p

? p

?? p

??

v ? (1 ? v) (1 ? v)

011011

and

7

8

3

4

p

?? p

?? p

000111

3

4

1

8

222

[]

??

v ? (1 ? v) (1 ? v) .

When these values are used, the covariance for a

half–grand-avuncular pair is

2

1

16

Cov(p ,p ) ? p

1

0 ?

200()

1 1

)(

16 2

)

16

1

16

? 2p

0 ??

01()

2

1

2

1

?p

?

,(13)

11(

and, after standardization and gathering of terms, the

correlation is given by

r(p ,p dhalf ? grand ? avuncular,v)

12

Cov(p ,p )

7/256

12

?

32

7

48

7

16

7

234

? 1 ?

v ? 8v ?

v ?

v .(14)

Table 3 shows E(p), Var(p), and the correlation between

IBD probabilities for other unilineal classes of relative

pairs. These relationships are the most common ob-

Page 7

1204

Am. J. Hum. Genet. 62:1198–1211, 1998

Table 4

Correlation Coefficients for IBD Allele Sharing in Relative Pairs with Multiple or Compound Relationships

RelationCorrelation in Proportion of Alleles Shared IBD E(p)Var(p)

Double–first cousin

Double–first cousin, once

removed

Double–second cousin (fig.

2a)

Double–second cousin (fig.

2b)

Double–second cousin (fig.

2c)

First cousin and second

cousin

Half-sib and first cousin

Half-sib and half-avuncular

Double–half-first cousin

Double–half-avuncular

Half-sib and half–first

cousin

16

3

8

3

234

1 ?

v ? 10v ? 8v ? v

1

4

3

32

143

18

731

9

1226

9

5557

36

1058

9

152

9

20

9

23456789

1 ?

v ? 32v ?

v ?

v ?

v ?

v ? 58v ?

v ?

v

1

8

3

64

176

21

785

21

2320

21

674

3

2248

7

4553

14

4828

21

2284

21

656

21

88

21

2345678910

1 ?

v ?

v ?

v ?

v ?

v ?

v ?

v ?

v ?

v ?

v

1

16

7

256

32

5

88

5

80

3

344

15

32

3

32

15

23456

1 ?

v ?

v ?

v ?

v ?

v ?

v

1

16

15

512

46

7

130

7

208

7

104

7

24

7

23456

1 ?

v ?

v ?

v ? 28v ?

v ?

v

1

16

7

256

352

63

32

7

248

21

46

7

32

v ? v

7

2

v ? 8v ?

16

2

v ? v

3

112

9

3

v ? v

472

63

4

160

63

32

63

23456

1 ?

1 ?

1 ? 4v ?

32

1 ?

1 ? 4v ?

v ?

v ?

v ?

v ?

v ?

8

7

v ?

v ?

v

10

64

3

8

3

8

1

8

1

4

63

1024

7

64

7

64

7

128

3

32

24

7

2

8

7

48

7

8

3

23

16

7

34

v ?

v

7

3

96

23

120

23

48

23

16

23

234

1 ?

v ?

v ?

v ?

v

5

16

23

256

Table 5

Probabilities of Two-Locus Combinations of p for Bilineal Relative

Pairs

SECOND

LOCUS

FIRST LOCUS

,

p ? 0 i ? 0

1

,

1

2

p ?

1

i ? 1

,

p ? 1 i ? 2

1

p ? 0, j ? 0

2

1

p ?

2

2

p ? 1, j ? 2

2

x y

00 00

x y ?x y

01 00

x y

01 01

x y ?x y

10 00

x y ?2x y ?x y

11 0010 01

x y ?x y

11 0101 11

00 10

x y

10 10

x y ?x y

11 10

x y

11 11

,

j ? 1

00 0100 1110 11

NOTE.—x and y represent the two locus-sharing probabilities for

the two independent lines of relationship.

served in human extended family studies. Table 4 pro-

vides the same information for a variety of relative pairs

related by multiple linesof descent.Notethatwhilesome

of these pairs have the same E(p) as pairs in table 3, the

variances may be different, since some pairs in table 4

can share both alleles IBD (

possible IBD allele sharing states, rather than four, and

complicates the calculation of the probability of each

sharing state. However, when a pair is related through

two independent lines of descent, the elements of the

matrices of sharing probabilities for each inde-2 # 2

pendent relationship can be multiplied to form a 3 #

matrix of sharing-state probabilities for the compound3

relationship (table 5). The first formula shown for dou-

ble–second cousins (table 4) applies only to pairs related

through double–first cousins (fig. 2a). Double–second

cousins also occur when two sets of first cousins marry

(fig. 2b) or when one person’s parent is cousin to both

of the other person’s parents (fig. 2c). Each of these

double–second cousin pairs have different correlation

formulas since the possible p values are not the same

(for the pair in fig. 2b, p may equal 1, while in figs. 2a

and 2c it cannot) and the possible patterns of recom-

bination also differ. The IBD sharing probability matrix

for the double–second cousins in figure 2b can be cal-

culated by multiplying the elements from the basic shar-

ing probability matrix for second cousins as described

above. However, the probabilities of the sharing states

for the double–second cousins in figures 2a and 2c can-

not make use of the formulas above, since the two lines

of descent pass through the same individual(s) and are

not independent. Thus, the two-locus sharing probabil-

). This leads to nine

p ? 1

ities for these types of second cousins were derived by

examining the possible patterns of recombination as de-

scribed for the half–grand-avuncular pair.

k2-Correlation Functions for Incorporating Dominance

Extension of the above results to allow for dominance

effects via the location-specific k2probabilities requires

that we formulate the possible

(i.e., the k2-autocorrelation functions). These can be ex-

pressed as

functions

r(k ,k Fr,v)

2

i

2

j

1

s?0

1

t?0 (27s)(27t)

SS

f

(s ? d )(t ? d )

7r

Var(d )

7r

7r

r(k ,k Fr,v) ?

22

i

,(15)

j

where the summations over s and t are performed over

the possible values (i.e., 0 and 1) of k2so that the nec-

essary probabilities are limited to f22, f02, f20, and f00. The

probabilities designated by f can be obtained from those

derived above for the p-autocorrelations for bilineal rel-

Page 8

Almasy and Blangero: Multipoint QTL Analysis in Pedigrees

1205

Figure 2

Three types of double–second cousins (blackened symbols)

Table 6

Correlation Coefficients for k2as a Function of v and Relationship

Relation

r(k ,k dr,v)

2i

2j

E(k2)Var(k2)

Siblings

Double–first cousin

Double–second

cousin (fig. 2b)

First cousin and sec-

ond cousin

Half-sib and first

cousin

Half-sib and half-

avuncular

Double–half-first

cousin

Double–half-

avuncular

Half-sib and

half–first cousin

16

3

128

15

1024

85

?

32

3

496

15

5888

v ?

365824

255

32

3

16

3

234

1 ?

1 ?

1 ?

v ?

v ?

v ?

v ?

v ?

384

5

63488

v ?

87744

v ?

v

1

4

1

16

3

16

15

256

1732

15

3

v ?

v ?

1696

15

4

v ?

15872

v ?

352

5

128

5

373376

255

11

v ?

64

15

2345678

v ?

v ?

157504

255

27136

51

v ?

282368

255

v ?

85

v ?

v ?

2048

51

v ?

v

256

v

1024

255

85255

7891012

v

85

1

256

255

65536

640

63

1024

21

9088

63

18128

63

8416

21

8420

21

16768

63

7552

63

2048

63

256

63

23456789 10

1 ?

v ?

v ?

v ?

v ?

v ?

v ?

v ?

v ?

v ?

v

1

64

63

4096

48

7

232

7

232

7

128

7

32

7

23456

1 ?

v ? 20v ?

v ?

v ?

v ?

v

1

8

7

64

40

7

96

7

128

7

96

7

32

7

2345

1 ?

v ?

v ?

v ?

v ?

v

1

8

7

64

512

63

640

21

4352

63

6464

63

6400

63

4096

63

512

21

256

63

2345678

1 ?

v ?

v ?

v ?

v ?

v ?

v ?

v ?

v

1

64

63

4096

32

5

272

15

448

15

448

15

256

15

64

15

23456

1 ?

v ?

v ?

v ?

v ?

v ?

v

1

16

15

256

32

5

272

15

448

15

448

15

256

15

64

15

23456

1 ?

v ?

v ?

v ?

v ?

v ?

v

1

16

15

256

atives. Specifically,

f

? 1 ? 2d ? f

00

Table 6 provides most of the required k2-autocorre-

lation functions that are encountered in studies of ex-

tended human families. In general, for a given relation-

ship class, we find that

r(p ,pFr,v) 1 r(k ,k Fr,v)

i

words, the correlation between k2values decays more

rapidly with genetic distance than does that for the p

values. For example, comparing the appropriate corre-

lation functionsfor sib

r(p ,pFsibling,v) ? r(k ,k Fsibling,v) ? (4v ? 20v ?

ij

2i

2j

, which is 10 for all32v ? 16v )

,, and

f

? pf

? f

? d ? f

7r

2222022022

.

7r

22

.Inother

j

2i

2j

pairs, we

1

3

. Therefore, the

find that

2

34

v 1 0

incorporation of dominance effects into a variance-com-

ponent model will be most useful when the QTL is com-

paratively close to a genetic marker.

Estimation of

Information

andMatrices by Use of Multipoint

P

K2

Given the p- and k2-correlation functions provided in

tables 3, 4, and 6, it is possible to estimate the

at any chromosomal location conditional on all of the

available genetic marker information and the map lo-

cations of the markers. A Haldane mapping function is

matrix

P

Page 9

1206

Am. J. Hum. Genet. 62:1198–1211, 1998

Table 7

Phenotyped Relative Pairs Informative for

Linkage in the Simulated Pedigrees

Degree (Coefficient) of Relationship

and Relationship Type

No. of

Pairs

First ( ):

Sibs

Parent-offspring

Second ( ):

4

Avuncular

Grandparent-grandchild

Half-sibs

Third ( ):

8

Cousins

Grand-avuncular

Half-avuncular

Fourth ():

16

Cousins once removed

Half-cousins

Great-grand-avuncular

Half–grand-avuncular

Fifth ():

32

Cousins twice removed

Second cousins

1

2

771

801

1

1,485

151

26

1

2,761

497

64

1

3,051

27

19

13

1

423

169

10,258

employed to relate genetic distances to v. To estimate

IBD probabilities at any chromosomal location, we have

chosen to generalize the regression-based averaging

method of Fulker et al. (1995) to arbitrary relationships.

Basically, for any pair of individuals of relationship r,

we find the vector of regression coefficients

available estimated marker-specific

, where the subscripts now refer to chromosomal lo-

pl

cations in centimorgans. This is done by the standard

regression method in which

on the(b )

r?

vector that predict

ˆ pz

?1

ˆˆ

b ? V(p )

r?

Cov(p ,p ) ,(16)

zz

?

where

ing that we have typed n markers on the chromosome),

is thecovariance matrix of the marker IBD

ˆ

V(p )

n # n

z

probabilities, and

ˆ

Cov(p ,p )

z

?

covariances between the marker IBD probabilities and

those at the chromosomal location

Fulker et al. (1995), the elements of

by the genetic distances between the markers, the

functions derived above, and the empirical

ˆ ˆ

r(p ,pFr,v)

ij

variances of the. Likewise, the elements of the vector

ˆ pi

are given by the product of

ˆ

Cov(p ,p )

z

?

and the empirical variances of the marker

Once obtained, thevector is used to estimate

b

r?

for the ijth pair of relatives by

is a vector of n regression coefficients (assum-

br?

is a vector of the expected

. As shown by

are determined

V(p)

?

ˆ

values

ˆˆ

r(p ,pFr,v)

?

ˆ pi

i

.

p

?

?

¯

ˆ

p

ˆ

? 2f ? b (p ? p ˆ) ,

rr?

(17)

?ij

where the symbolwithout a subscript indicates the

ˆ p

vector of marker IBD probabilities, and

mean vector. Subject to constraints on the acceptable

parameter space that are r dependent, equation (16) can

be used to estimate each pairwise element of the

trix, which is then used to structure the expected phe-

notypic covariances between relatives as shown in equa-

tion (6). The similarity of equation (7) and equation (17)

is also apparent, since equation (7) is the matrix pre-

diction equation when there is only a single marker.

A similar approach can be employed to obtain mul-

tipoint estimates ofby substituting the appropriate

k2?

expectations, k2-autocorrelations, empirical variances,

and means in equations (16) and (17).

is its empirical

¯p ˆ

ma-

ˆP?

Simulations

To evaluate theutility ofthismultipointvariance-com-

ponent method for detecting QTLs, we performed a se-

ries of computer simulations to assess its properties and

accuracy. In the first set of simulations, six quantitative

traits and genotype data were simulated for 200 repli-

cates of a data set containing 1,497 total individuals,

1,000 phenotyped, based approximately on the pedigree

structure of the San Antonio Family Heart Study. These

are extended pedigrees, including all available first-, sec-

ond-, and third-degree relatives of a proband and the

proband’s spouse as well as the married-in parents of

any descendants. Pedigree size ranges from 37 to 128

individuals; thus, multipoint quantitative-trait

¯ (x ? 65)

linkage analysis of these pedigrees would not be possible

with any previously published method. The number of

relative pairs with both members phenotyped is shown

in table 7 for each type of relative pair present in these

pedigrees. Although the SOLAR general pedigree vari-

ance-component linkage analysis uses IBD allele sharing

between these relative pairs, it should be noted that it

is not a relative-pair method as likelihoods are maxi-

mized over entire families considered jointly. The num-

ber of relative pairs of various types is shown in order

to illustrate the depth and complexity of these pedigrees.

Fully informative markers were simulated at a posi-

tion of 33 cM on a 100-cM chromosome. The alleles of

this fully informative marker were grouped togetherinto

“high” and “low” binsinvariouswaystoobtainbiallelic

QTLs whose most common allele took one of three pos-

sible generating values, 0.5, 0.7, or 0.9. Two generating

values of the additive effect parameter

were considered that produced either a 2- or 2.5-SD

difference between the contrasting QTL genotypes. For

these simulations, dominance effects were not included.

Using the six sets of generating parameters,wesimulated

six quantitative traits in which the relative variance due

to the QTL (i.e., the heritability due to the QTL) ranged

from .15 (where

p ? .9

Q

and). With CHRSIM (Speer et al.

p ? .5

a ? 1.25

Q

1

2

a ? (m ? m )

qq

and) to .44 (where

a ? 1

Page 10

Almasy and Blangero: Multipoint QTL Analysis in Pedigrees

1207

Table 8

Percentage of Simulation Replicates with a Maximum LOD Score x3.0 and Mean Maximum LOD Score

h2

DUE

TO

QTL

ALLELE

FREQUENCY

DISPLACE-

MENT

PERCENTAGE OF REPLICATES WITH MAXIMUM LOD x3.0 (MEAN MAXIMUM LOD)

FULLY IN-

FORMATIVE

MARKER AT

QTL

5-cM MAP

10-cM MAP

20-cM MAP

Two-pointMultipoint Two-pointMultipointTwo-point Multipoint

.44

.40

.33

.30

.22

.15

.5

.5

.7

.7

.9

.9

2.5

2.0

2.5

2.0

2.5

2.0

99.5 (13.14)

97.5 (10.44)

95.5 (7.30)

86.0 (6.00)

54.0 (3.71)

27.0 (2.10)

98.5 (6.45)

81.5 (5.11)

68.0 (3.89)

49.5 (3.27)

32.0 (2.45)

8.0 (1.60)

98.9 (7.63)

87.5 (5.99)

75.0 (4.39)

58.5 (3.61)

34.5 (2.52)

11.5 (1.59)

96.0 (6.05)

78.5 (4.81)

57.0 (3.59)

40.0 (2.95)

24.5 (2.17)

6.0 (1.39)

97.2 (6.86)

82.5 (5.32)

67.0 (4.01)

51.0 (3.31)

28.0 (2.26)

10.0 (1.45)

72.0 (4.24)

52.0 (3.31)

35.0 (2.62)

21.5 (2.14)

10.5 (1.58)

2.5 (1.05)

84.5 (5.05)

63.0 (3.88)

47.5 (2.99)

35.5 (2.49)

16.5 (1.72)

3.5 (1.12)

NOTE.—Simulated QTLs were biallelic and accounted for 15%–44% of the trait variance. The second and third columns provide the

frequency of the more common QTL allele and the displacement between homozygote means in standard deviation units, respectively.

1992; Terwilliger et al. 1993), marker loci were simu-

lated every 5 cM, based on allele number and frequency

patterns drawn from a commercially available screening

set. For each of the six independent traits/generating

models, two-point LOD scores were assessed at each of

the marker loci and at the fully informative marker un-

derlying the trait. Multipoint analysis was performed

with 5-, 10-, and 20-cM maps drawn from the 21 sim-

ulated markers, with IBD sharing estimated every 2 cM

for every relative pair. Both two-point and multipoint

linkage analyses were performed by use of the variance-

component linkage methods described above and im-

plemented in SOLAR.

Table 8 provides the mean maximum LOD scores and

the percentage of LOD scores 13.0 obtained for each

generating model. The fourth column of table 8 shows

the mean LOD obtained when the fully informative

marker directly on the QTL location was used. This

value reflects the maximum LOD scores obtainable in

these pedigrees under ideal conditions of marker place-

ment and heterozygosity and serves as a gold standard

against which to compare the other linkage analyses.

For all three densities of marker maps, multipoint var-

iance-component analysis, as compared to the best two-

point variance-component result, improved both the

mean maximum LOD score and the percentage of max-

imum LOD scores 13.0. For example, with a 5-cM map,

the mean LOD for multipoint analysis was an average

of 0.5 LOD units higher for the multipoint analysis over

the best two-point LOD when considered across all gen-

erating values. In addition, the percentages of maximum

LOD scores 13.0, which have standard errors ranging

from 0.1 to 1.8, are improved under all six generating

models. Table 8 also shows that a substantial amount

of linkage information is unavailable even at the 5-cM

density, which can be seen by the difference in the mean

LOD scores when the fully informative marker at the

QTL is compared to the mean multipoint LOD (13.14

vs. 7.63). Because we have arbitrarily placed the QTL

at a the midpoint of the 5-cM interval, simply adding

another marker within the interval would substantially

improve the LOD.

The increase in power with both multipoint variance-

component analysis and denser marker maps as well as

the accuracy of multipoint localization of the QTL are

illustrated in figure 3, which compares the LOD profiles,

averaged over the 200 simulations, for one of the sim-

ulated traits. Even with a sparse map with an intermar-

ker distance of 20 cM, multipoint analysis provided a

noticeable improvement in LOD score over the two-

point analyses, as well as an unbiased estimate of QTL

location.

For all of the generating models, the multipoint point

analysis produced excellent estimates of QTL location.

For example, for themodelin whichtheQTLheritability

was 0.44, the estimated locations were 33.21 ? 0.33,

33.03 ? 0.53, and 34.31 ? 0.59 for the 5-, 10-, and

20-cM scans, respectively. Similarly for the model in

which the QTL heritability was 0.30, the estimated lo-

cations were 32.67 ? 0.58, 32.15 ? 0.65, and 34.82

? 0.98. In all cases,theestimatedchromosomallocation

was not significantly different from the generating value.

Additional evidence that our multipoint variance-com-

ponent procedure yields unbiased estimates of QTL lo-

cation is provided elsewhere (Almasy et al. 1997c; Dug-

girala et al. 1997; Towne et al. 1997; Williams et al.

1997; J. T. Williams and J. Blangero, unpublished data).

The six sets of generating parameters used in these

simulations are effectively single major gene models in

which there are two QTL alleles acting in a simple co-

dominant manner. This straightforward model does not

take advantage of the strengths of the variance-com-

ponent linkage method. The existence of a single major

gene inherently violates the assumption of multivariate

normality on which the variance-component linkage

method is based. However,ithasbeendemonstratedthat

the method is robust to violations of this assumption

(Beaty et al. 1985). In addition, the use of a biallelic

Page 11

1208

Am. J. Hum. Genet. 62:1198–1211, 1998

Figure 3

at 33 cM and with an additive genetic heritability of .33.

Two-point and multipoint LOD score profiles for 5-, 10-, and 20-cM marker maps averaged over 200 simulations for a QTL

QTL is somewhat limiting, since the variance-compo-

nent methodology is capable of exploiting the greater

information content in a multiallelic QTL system.

In order to test the accuracy of our estimatesofgenetic

effect size, we performed a second set of simulations in

which, given a QTL allele frequencyof

a to produce a series of generating models in which the

additive genetic heritability due to the QTL

from 0.05 to 0.50 in increments of 0.05 units. In this

simulation, we also allowed for a residual genetic her-

itability of 0.20. For each generating model, 100 repli-

cates were assessed and quantitative-trait linkage anal-

ysis was performed on each. Figure 4 shows a plot of

the expectedand themeanofthemaximumlikelihood

hq

estimates ofat the expected QTL location. Figure 4

hq

clearly shows that the variance-component procedure

yields outstanding estimates of genetic effect size. These

simulations were also performed with a QTL allele fre-

quency of with similar results (not shown).

p ? .9

Q

,wechose

p ? .5

Q

varied

2

(h )

q

2

2

Discussion

This powerful variance-components method makes it

possible to perform multipoint linkage analysis with

quantitative-trait data in pedigrees of arbitrary size and

complexity. Such an analysis would previously have re-

quired either fragmentation of any large pedigrees into

smaller subsets, resulting in a reduction in power to de-

tect linkage, or the application of one of the new com-

puter intensive Monte Carlo–based parametric linkage

methods (e.g., the method of Heath 1997). The multi-

point IBD estimation method presented in this article

has already been utilized successfully in variance-com-

ponent linkage analyses of simulated data from Genetic

Analysis Workshop 10 (Almasy et al. 1997c) as well as

such quantitative traits as serum leptin (Comuzzie et al.

1997), and HDL-cholesterol levels (Almasy et al. 1997b)

in the extended pedigrees of the San Antonio Family

Heart Study and event-related brain potentials in the

Collaborative Study on the Genetics of Alcoholism (Al-

masy et al. 1997a; Porjesz et al. 1997; Begleiter et al.,

in press).

The IBD estimation procedure is quite efficient and

compares favorably to other multipoint methods suit-

able for use in pedigrees. In contrast to the Elston-Stew-

art algorithm (1971), in which computation increases

exponentially with the number of markers, or the

Lander-Green Hidden Markov Model (Lander and

Green 1987; Kruglyak et al. 1996), in which compu-

tation increases exponentially with the number of non-

founders in a pedigree, because the suggested multipoint

algorithms are linear functions of previously computed

IBDs, processing time increases only linearly for addi-

tional individuals or additional loci. For an input file

containing IBD information on 16 genotyped marker

loci for 20,854 relative pairs, SOLAR, running on a Sun

workstation, required only 1 min 10 s to estimate the

Page 12

Almasy and Blangero: Multipoint QTL Analysis in Pedigrees

1209

Figure 4

Plot of expected vs. estimated additive genetic heritability due to the QTL. Bars indicate ?1 standard error.

IBD matrix at an arbitrary chromosomal location. Such

computational speed makes it feasible to estimate mul-

tipoint IBD matrices every 1 cM along an entire chro-

mosome, even for very large data sets with many gen-

otyped markers. SOLAR was recently used to analyze

complex pedigree data from Genetic Analysis Workshop

10 (Almasy et al. 1997c), with 11,000 genotyped indi-

viduals and as many as 50 markers on a chromosome.

Similarly, we are employing this method on a large com-

plex baboon pedigree (with a pedigree size of 750 ani-

mals) and an extremely large pedigree of individuals

from an isolated human population (with a pedigree size

of 1,200 individuals). An additional benefit of this ap-

proach is that, once a marker data set is deemed final,

IBD calculations need be performed only once and the

resulting matrices stored for all future analyses. This

feature is particularly useful in large studies of complex

disease where many differentphenotypeshavebeenmea-

sured and each needs to be processed through genome-

wide linkage analysis.

IBD correlation formulashavepreviouslybeenderived

by a number of authors for limited classes of relative

pairs. Amos (1988) derived the IBD correlationsforhalf-

sibling, grandparent-grandchild, avuncular, and first-

cousin pairs by methods similar to those described

above. Feingold (1993) and colleagues (Feingold et al.

1993) employed a different strategy, using a Markov

approximation to derive these same four formulas for

use in affected relative pair–based linkage analysis using

IBD status. These authors and Lander and Kruglyak

(1995) were primarily interested in the p-autocorrela-

tions in order to assess the importance of correlated test

statistics in genome scanning. In this regard, it is useful

to point out that the Lander and Kruglyak crossover

rate parameter, which is central to their method for eval-

uating genomewide significance levels, is given by ?

. Thus, the crossover rate param-

[dr(p ,pFr,v)/dv]

vr0

ij

2

eter is an approximate measure of how rapidly the p-

autocorrelations decay with genetic distance and can be

obtained for any pairwise relationship simply as half the

absolute value of the coefficient associated with v1in the

appropriate r-function. For example, from table 4, we

can immediately determine that the crossover rate pa-

rameter for third cousins is

results in tables 3 and 4 can be used to extend obser-

vations on the behavior of correlated test statistics for

linkagemethodsbased

relationships.

The present study extends the IBD correlation for-

mulas to many other classes of relative pairs and pro-

vides a simple framework for deriving similar formulas

for any relative pair related by a single line of descent

or by multiple independent lines of descent. Simulations

suggest that multipoint variance-component linkage

analyses with IBDs calculated based on these correla-

tions recover an unbiased estimate of the location of a

gene and provide increased power to detect linkage even

with intermarker distances as widely spaced as 20 cM.

1lim

. Our

1(512/63) ? 256/63

2

onextendedpairwise

Page 13

1210

Am. J. Hum. Genet. 62:1198–1211, 1998

These multipoint IBD estimates remove an impediment

to making full use of the recent expansions of variance-

component linkage methodology, improving the power

to examine a wide variety of complex genetic models

for both quantitative and discrete traits in general

pedigrees.

Applications of quantitative-trait linkage analysis are

increasing rapidly. Because of the superior information

content of quantitative traits, genetic analysis of quan-

titative risk factors serves as a powerful tool for eluci-

dating the genetic mechanisms influencing common dis-

eases. Numerous strategies and sampling designs are

being formulated, and each has its own strengths and

weaknesses. It is well known that, in many situations,

extended pedigrees will dramatically outperformsmaller

family units such as sib pairs, sibships, or nuclear fam-

ilies with regard to the power to detect and accurately

localize QTLs (Wijsman and Amos 1997). Unfortu-

nately, although quantitative data have often been col-

lected in more extended kindreds, the lack of adequate

linkage tools has generally led to such rich data sets

being leached of their potential linkage information by

truncation to smallerfamilialunits.Recently,directcom-

parisons of pedigree-based and nuclear family–based

samples consisting of the same number of phenotyped

individuals in the same distribution of sibship sizes has

underscored the loss of power resulting from fragmen-

tation of a large pedigree-based sample (Duggirala et al.

1997; Towne et al. 1997; Williams et al. 1997). How-

ever, with the advent of the multipoint variance-com-

ponent linkage method, the superior power of extended

pedigrees can now be routinely and fully exploited for

the localization of QTLs.

The SOLAR software, which incorporates the pedi-

gree-based variance-component and multipoint IBD

methods described here, is freely available to interested

investigators in a compiled version. SOLAR can be ob-

tained through the Southwest Foundation for Biomed-

ical Research (http://www.sfbr.org).

Acknowledgments

This research was supported by NIH grants GM18897,

HL45522, GM31575, and DK44297. Pedigree drawings were

produced with the program Pedigree/Draw (Mamelka et al.

1988). The authors gratefully acknowledge the expert assis-

tance of T. Dyer in simulation of data analyzed in this article.

References

Almasy L, Blangero J, Porjesz B, Begleiter H, and COGA Col-

laborators (1997a) Genetic analysis of the N100 event-re-

lated brain potential. Am J Med Genet 74:595–596

Almasy L, Blangero J, Rainwater DL, VandeBerg JL, Mahaney

MC, Stern MP, MacCluer JW, et al (1997b) Two majorgenes

influence levels of unesterified cholesterol in an HDL sub-

fraction of HDL2a. Atherosclerosis 134:76

Almasy L, Dyer TD, Blangero J (1997c) Bivariate quantitative

trait linkage analysis: pleiotropy versus coincident linkages.

Genet Epidemiol 14:953–958

Amos CI (1988) Robust methods for the detection of genetic

linkage for data from extended families and pedigrees. PhD

thesis, Louisiana State University, New Orleans

——— (1994) Robust variance-components approach for as-

sessing genetic linkage in pedigrees. Am J Hum Genet 54:

535–543

Amos CI, Dawson DV, Elston RC (1990) The probabilistic

determination of identity-by-descent sharing for pairs of rel-

atives from pedigrees. Am J Hum Genet 47:842–853.

Amos CI, Zhu DK, Boerwinkle E (1996) Assessing genetic

linkage and association with robust components of variance

approaches. Ann Hum Genet 60:143–160

Beaty TH, Self SG, Liang KY, Connolly MA, Chase GA, Kwit-

erovich PO (1985) Use of robust variance components mod-

els to analyse triglyceride data in families. Ann Hum Genet

49:315–328

Begleiter H, Porjesz B, Reich T, Edenberg H,GoateA,Blangero

J, Almasy L, et al. Quantitative trait linkage analysis of

human event-related brain potentials: P3 voltage. Electroen-

ceph Clin Neurophysiol (in press)

Blangero J (1993) Statistical genetic approaches to human

adaptability. Hum Biol 65:941–966

Blangero J, Almasy L (1997) Multipoint oligogenic linkage

analysis of quantitative traits. Genet Epidemiol 14:959–964

Comuzzie AG, Hixson JE, Almasy L, Mitchell BD, Mahaney

MC, Dyer TD, Stern MP, et al (1997) A major quantitative

trait locus determining serum leptin levels and fat mass is

located on human chromosome 2. Nat Genet 15:273–275

Cotterman, CW (1940) A Calculus for Stastico-genetics. PhD

thesis, Ohio State University, Columbus

Curtis D, Sham PC (1994) Using risk calculation to implement

an extended relative pair analysis. Ann Hum Genet 58:

151–162

Davis S, Schroeder M, Goldin LR, Weeks DE (1996) Non-

parametric simulation-based statistics for detecting linkage

in general pedigrees. Am J Hum Genet 58:867–880

Duggirala R, Williams JT, Williams-Blangero S, Blangero J

(1997) A variance component approach to dichotomous

trait linkage analysis using a threshold model. Genet Epi-

demiol 14:987–992

Elston, RC, Stewart, J (1971) A general model for the genetic

analysis of pedigree data. Hum Hered 21:523–542

Feingold E (1993) Markov processes for modeling and ana-

lyzing a new genetic mapping method. J Appl Prob 30:

766–779

Feingold E, Brown PO, Siegmund D (1993) Gaussian models

for genetic linkage analysis using complete high-resolution

maps of identity by descent. Am J Hum Genet 53:234–251

Fulker DW, Cherny SS (1996) An improved multipoint sib-

pair analysis of quantitative traits. Behav Genet 26:527–532

Fulker DW, Cherny SS, Cardon LR (1995) Multipoint interval

mapping of quantitative trait loci, using sib pairs. Am J Hum

Genet 56:1224–1233

Goldgar DE (1990) Multipoint analysis of human quantitative

genetic variation. Am J Hum Genet 47:957–967

Haseman JK, Elston RC (1972) The investigation of linkage

between a quantitative traitandamarkerlocus.BehavGenet

2:3–19

Page 14

Almasy and Blangero: Multipoint QTL Analysis in Pedigrees

1211

Heath SC (1997) Markov chain Monte Carlo segregation and

linkage analysis for oligogenic models. Am J Hum Genet

61:748–760

Heath SC, Snow GL, Thompson EA, Tseng C, Wijsman EM

(1997) MCMC segregation and linkage analysis. Genet Ep-

idemiol 14:1011–1016

Hopper JL, Mathews JD (1982) Extensions to multivariate

normal models for pedigree analysis. Ann Hum Genet 46:

373–383

Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES (1996) Par-

ametric and nonparametric linkage analysis: a unified mul-

tipoint approach. Am J Hum Genet 58:1347–1363

Kruglyak L, Lander ES (1995) Complete multipoint sib-pair

analysis of qualitative and quantitative traits. Am J Hum

Genet 57:439–454

Lander ES, Green P (1987) Construction of multilocus genetic

maps in humans. Proc Natl Acad Sci USA 84:2363–2367

Lander ES, Kruglyak L (1995) Genetic dissection of complex

traits: guidelines for interpreting and reporting linkage re-

sults. Nat Genet 11:241–247

Lange K, Weeks D, Boehnke M (1988) Programs for pedigree

analysis: MENDEL, FISHER, and dGENE. GenetEpidemiol

5:471–472

Lange K, Westlake J, Spence MA (1976) Extensionstopedigree

analysis. III. Variance components by the scoring method.

Ann Hum Genet 39:485–491

Mamelka PM, Dyke B, MacCluer JW (1988) Pedigree/Draw

for the Apple Macintosh. Southwest Foundation for Bio-

medical Research, San Antonio

Mitchell BD, Ghosh S, Schneider JL, Birznieks G, Blangero J

(1997) Power of variance component linkage analysis to

detect epistasis. Genet Epidemiol 14:1017–1022

Porjesz, B, Begleiter, H, Blangero, J, Almasy, L, Reich, T,

COGA Collaborators (1997)QTLanalysis ofvisualP3com-

ponent of the event- related brain potential in humans. Am

J Med Genet 74:573

Schork NJ (1993) Extended multipoint identity-by-descent

analysis of human quantitative traits: efficiency, power, and

modeling considerations. Am J Hum Genet 53:1306–1319

Self SG, Liang K-Y (1987) Asymptotic properties of maximum

likelihood estimators and likelihood ratio tests under non-

standard conditions. J Am Stat Assoc 82:605–610

Sobel E, Lange K (1996) Descent graphs in pedigree analysis:

applications to haplotyping, location scores, and marker-

sharing statistics. Am J Hum Genet 58:1323–1337

Speer M, Terwilliger JD, Ott J (1992) A chromosome-based

method for rapid computer simulation. Am J Hum Genet

Suppl 51:A202

Stern M, Duggirala R, Mitchell B, Reinhart JL, Shivakumar

S, Shipman PA, Uresandi OC, et al (1996) Evidence for

linkage of regions on chromosomes 6 and 11 to plasma

glucose concentrations in Mexican Americans. Genome Res

6:724–734

Terwilliger JD, Speer M, Ott J (1993) Chromosome-based

method for rapid computer simulation in human genetic

linkage analysis. Genet Epidemiol 10:217–224

Todorov AA, Siegmund KD, Gu C, Borecki IB, Elston RC

(1997) Probabilities of identity-by-descent patterns in sib-

ships when the parents are not genotyped. Genet Epidemiol

14:909–913

Towne B, Siervogel RM, Blangero J (1997) Effectsofgenotype-

by-sex interaction on quantitative trait linkage analysis. Ge-

net Epidemiol 14:1053–1058

Whittemore AS, Halpern J (1994) Probability of gene identity

by descent: computation and applications. Biometrics 50:

109–117

Wijsman EM, Amos CI (1997) Genetic analysis of simulated

oligogenic traits in nuclear families and extended pedigrees:

summary of GAW10 contributions. Genet Epidemiol 14:

719–735

Williams JT, Duggirala R, Blangero J (1997) Statistical prop-

erties of a variance-components method for quantitative

trait linkage analysis in nuclear families and extended ped-

igrees. Genet Epidemiol 14:1065–1070