Multipoint quantitativetrait linkage analysis in general pedigrees.
ABSTRACT Multipoint linkage analysis of quantitativetrait loci (QTLs) has previously been restricted to sibships and small pedigrees. In this article, we show how variancecomponent linkage methods can be used in pedigrees of arbitrary size and complexity, and we develop a general framework for multipoint identitybydescent (IBD) probability calculations. We extend the sibpair multipoint mapping approach of Fulker et al. to general relative pairs. This multipoint IBD method uses the proportion of alleles shared identical by descent at genotyped loci to estimate IBD sharing at arbitrary points along a chromosome for each relative pair. We have derived correlations in IBD sharing as a function of chromosomal distance for relative pairs in general pedigrees and provide a simple framework whereby these correlations can be easily obtained for any relative pair related by a single line of descent or by multiple independent lines of descent. Once calculated, the multipoint relativepair IBDs can be utilized in variancecomponent linkage analysis, which considers the likelihood of the entire pedigree jointly. Examples are given that use simulated data, demonstrating both the accuracy of QTL localization and the increase in power provided by multipoint analysis with 5, 10, and 20cM marker maps. The general pedigree variance component and IBD estimation methods have been implemented in the SOLAR (Sequential Oligogenic Linkage Analysis Routines) computer package.

Article: Replication of obesity and diabetesrelated SNP associations in individuals from Yucatán, México.
Victor M. HernandezEscalante, Edna J. NavaGonzalez, V. Saroja Voruganti, Jack W. Kent, Karin Haack, Hugo A. LaviadaMolina, Fernanda MolinaSegui, Esther C. GallegosCabriales, Juan Carlos LopezAlvarenga, Shelley A. Cole, Marguerite J. Mezzles, Anthony G. Comuzzie, Raul A. Bastarrachea[Show abstract] [Hide abstract]
ABSTRACT: he prevalence of type 2 diabetes (T2D) is rising rapidly and in Mexicans is ∼19%. T2D is affected by both environmental and genetic factors. Although specific genes have been implicated in T2D risk few of these findings are confirmed in studies of Mexican subjects. Our aim was to replicate associations of 39 single nucleotide polymorphisms (SNPs) from 10 genes with T2Drelated phenotypes in a communitybased Mexican cohort. Unrelated individuals (n = 259) living in southeastern Mexico were enrolled in the study based at the University of Yucatan School of Medicine in Merida. Phenotypes measured included anthropometric measurements, circulating levels of adipose tissue endocrine factors (leptin, adiponectin, proinflammatory cytokines), and insulin, glucose, and blood pressure. Association analyses were conducted by measured genotype analysis implemented in SOLAR, adapted for unrelated individuals. SNP Minor allele frequencies ranged from 2.2 to 48.6%. Nominal associations were found for CNR1, SLC30A8, GCK, and PCSK1 SNPs with systolic blood pressure, insulin and glucose, and for CNR1, SLC30A8, KCNJ11, and PCSK1 SNPs with adiponectin and leptin (p < 0.05). Pvalues greater than 0.0014 were considered significant. Association of SNPs rs10485170 of CNR1 and rs5215 of KCNJ11 with adiponectin and leptin, respectively, reached near significance (p = 0.002). Significant association (p = 0.001) was observed between plasma leptin and rs5219 of KCNJ11.Frontiers in Genetics 11/2014; 5:16.  SourceAvailable from: Nicholas BlackburnNicholas B Blackburn, Jac C Charlesworth, James R Marthick, Elizabeth M Tegg, Katherine A Marsden, Velandai Srikanth, John Blangero, Ray M Lowenthal, Simon J Foote, Joanne L Dickinson[Show abstract] [Hide abstract]
ABSTRACT: Telomere length has a biological link to cancer, with excessive telomere shortening leading to genetic instability and resultant malignant transformation. Telomere length is heritable and genetic variants determining telomere length have been identified. Telomere biology has been implicated in the development of hematological malignancies (HMs), therefore, closer examination of telomere length in HMs may provide further insight into genetic etiology of disease development and support for telomere length as a prognostic factor in HMs. We retrospectively examined mean relative telomere length in the Tasmanian Familial Hematological Malignancies Study using a quantitative PCR method on genomic DNA from peripheral blood samples. Fiftyfive familial HM cases, 191 unaffected relatives of familial HM cases and 75 nonfamilial HM cases were compared with 758 population controls. Variance components modeling was employed to identify factors influencing variation in telomere length. Overall, HM cases had shorter mean relative telomere length (P=2.9×10 6) and this was observed across both familial and nonfamilial HM cases (P=2.2x10 4 and 2.2x10 5 , respectively) as well as additional subgroupings of HM cases according to broad subtypes. Mean relative telomere length was also significantly heritable (62.6%; P= 4.7x10 5) in the HM families in the present study. We present new evidence of significantly shorter mean relative telomere length in both familial and nonfamilial HM cases from the same population adding further support to the potential use of telomere length as a prognostic factor in HMs. Whether telomere shortening is the cause of or the result of HMs is yet to be determined, but as telomere length was found to be highly heritable in our HM families this suggests that genetics driving the variation in telomere length is related to HM disease risk.Oncology Reports 10/2014; · 2.30 Impact Factor  SourceAvailable from: JeanCharles Lambert
Article: Genomewide association study of kidney function decline in individuals of European descent.
Mathias Gorski, Adrienne Tin, Maija Garnaas, Gearoid M McMahon, Audrey Y Chu, Bamidele O Tayo, Cristian Pattaro, Alexander Teumer, Daniel I Chasman, John Chalmers, [......], Olli Raitakari, Andrew Johnson, Afshin Parsa, Murielle Bochud, Iris M Heid, Wolfram Goessling, Anna Köttgen, W H Linda Kao, Caroline S Fox, Carsten A Böger[Show abstract] [Hide abstract]
ABSTRACT: Genomewide association studies (GWASs) have identified multiple loci associated with crosssectional eGFR, but a systematic genetic analysis of kidney function decline over time is missing. Here we conducted a GWAS metaanalysis among 63,558 participants of European descent, initially from 16 cohorts with serial kidney function measurements within the CKDGen Consortium, followed by independent replication among additional participants from 13 cohorts. In stage 1 GWAS metaanalysis, singlenucleotide polymorphisms (SNPs) at MEOX2, GALNT11, IL1RAP, NPPA, HPCAL1, and CDH23 showed the strongest associations for at least one trait, in addition to the known UMOD locus, which showed genomewide significance with an annual change in eGFR. In stage 2 metaanalysis, the significant association at UMOD was replicated. Associations at GALNT11 with Rapid Decline (annual eGFR decline of 3 ml/min per 1.73 m(2) or more), and CDH23 with eGFR change among those with CKD showed significant suggestive evidence of replication. Combined stage 1 and 2 metaanalyses showed significance for UMOD, GALNT11, and CDH23. Morpholino knockdowns of galnt11 and cdh23 in zebrafish embryos each had signs of severe edema 72 h after gentamicin treatment compared with controls, but no gross morphological renal abnormalities before gentamicin administration. Thus, our results suggest a role in the deterioration of kidney function for the loci GALNT11 and CDH23, and show that the UMOD locus is significantly associated with kidney function decline.Kidney International advance online publication, 10 December 2014; doi:10.1038/ki.2014.361.Kidney international. 12/2014;
Page 1
Am. J. Hum. Genet. 62:1198–1211, 1998
1198
Multipoint QuantitativeTrait Linkage Analysis in General Pedigrees
Laura Almasy and John Blangero
Department of Genetics, Southwest Foundation for Biomedical Research, San Antonio
Summary
Multipoint linkage analysis of quantitativetrait loci
(QTLs) has previously been restricted to sibships and
small pedigrees. In this article, we show how variance
component linkage methods can be used in pedigrees of
arbitrary size and complexity, and we develop a general
framework for multipoint identitybydescent (IBD)
probability calculations. We extend the sibpair multi
point mapping approach of Fulker et al. to general rel
ative pairs. This multipoint IBD method uses the pro
portion of alleles shared identical by descent at
genotyped loci to estimate IBD sharing at arbitrary
points along a chromosome for each relative pair. We
have derived correlations in IBD sharing as a function
of chromosomal distance for relative pairs in general
pedigrees and provide a simple framework whereby
these correlations can be easily obtained for any relative
pair related by a single line of descent or by multiple
independent lines of descent. Once calculated, the mul
tipoint relativepair IBDs can be utilized in variance
component linkage analysis, which considers the likeli
hood of the entire pedigree jointly. Examples are given
that use simulated data, demonstrating both the accu
racy of QTL localization and the increase in power pro
vided by multipoint analysis with 5, 10, and 20cM
marker maps. The general pedigree variance component
and IBD estimation methods have been implemented in
the SOLAR (Sequential Oligogenic Linkage Analysis
Routines) computer package.
Introduction
Methods of linkage analysis that exploit identitybyde
scent (IBD) allele sharing between pairs of relatives are
Received January 27, 1998; accepted for publication March 13,
1998; electronically published April 17, 1998.
Address for correspondence and reprints: Dr. Laura Almasy, De
partment of Genetics, Southwest Foundation for BiomedicalResearch,
7620 Northwest Loop 410, P.O. Box760549,SanAntonio,TX78245
0549. Email: almasy@darwin.sfbr.org
? 1998 by The American Society of Human Genetics. All rights reserved.
00029297/98/62050026$02.00
widely used in the genetic analysis of complex traits as
these methods generally require few assumptions about
the genetic model underlying expression of the trait.
There are a limited range of IBD allelesharing methods
that can be used for quantitativetrait linkage analysis.
The best known of these is the sibpair approach of
Haseman and Elaston (1972). Recently, variancecom
ponent linkage analysis methods, which are more pow
erful than relative pair–based approaches and have the
added advantage of providing reasonable estimates of
the magnitude of effect of the detected locus, have been
developed (Goldgar 1990; Schork 1993; Amos 1994;
Blangero and Almasy 1997). These variancecomponent
methods have been extended to accommodate general
pedigrees of arbitrary size and complexity (Comuzzie et
al. 1997) and to allow analyses that include genotype
# environment interaction (Blangero 1993; Towne et
al. 1997), epistasis (Stern et al. 1996; Mitchell et al.
1997), threshold models for discrete traits (Duggirala et
al. 1997), and pleiotropy (Almasy et al. 1997c), as well
as multivariate and oligogenic analyses (Schork 1993;
Almasy et al. 1997c; Blangero and Almasy 1997; Wil
liams et al. 1997).
Multipoint linkage analysis increases the power to de
tect true linkages and decreases the falsepositive rate.
When linkage is detected, multipoint analysisalsoallows
support or confidence intervals to be determined for the
location of a gene. To date, practical application of mul
tipoint IBD methods has been confined to sibships or
small pedigrees (Fulker et al. 1995; KruglyakandLander
1995; Kruglyak et al. 1996; Todorov et al. 1997), al
though there have been some recent promising devel
opments utilizing computerintensive Monte Carlo–
based techniques (Sobel and Lange 1996; Heath 1997;
Heath et al. 1997) in large pedigrees.
The development of variancecomponent linkage
methodologies for use in extended families has created
a need for a multipoint IBD method suitable for use in
such pedigrees. In general, the computational burden for
exact multipoint calculations is considerable even in nu
clear families and is prohibitive in large pedigrees. To
alleviate this problem, Fulker et al. (1995) developed a
multipoint approximation for sib pairs that uses a linear
function of IBD values at genotyped markers to estimate
IBD sharing at arbitrary chromosomal locations. The
Fulker method is based on the evaluation of average
Page 2
Almasy and Blangero: Multipoint QTL Analysis in Pedigrees
1199
number of alleles shared IBD for a pair of siblings and,
although much less computationally expensive, has been
shown to be as effective as maximumlikelihood esti
mation of the exact multipoint IBD distribution (Fulker
and Cherny 1996). In this article, we extend this simple
approach to allow multipoint analysis in pedigrees of
unlimited size and complexity. After presenting ageneral
variancecomponents framework for oligogenic quan
titativetrait linkage analysis in arbitrary pedigrees, we
derive a series of functions for the correlation between
loci in IBD sharing as a function of chromosomal dis
tance in relative pairs found in extended families, in
cluding pairs as distant as third cousins (seventhdegree
relatives) and relatives related through multiple lines of
descent, such as double–first cousins and double–second
cousins. We then demonstrate the power and accuracy
of the method by using simulation techniques.
Method
VarianceComponent Linkage Analysis in General
Pedigrees
Thepedigreebasedvariancecomponent linkage
method uses an extension of the strategy developed by
Amos (1994) to estimate the genetic variance attribut
able to the region around a specificgeneticmarker.Gold
gar (1990) and Schork (1993) have proposed similar
variancecomponent models. This approach is based on
specifying the expected genetic covariances between ar
bitrary relatives as a function of the IBD relationships
at a quantitativetrait locus (QTL). The modeling frame
work used in variancecomponentanalysisisremarkably
general (Lange et al. 1976; Hopper and Mathews 1982),
although it is also parsimonious with regard to the num
ber of parameters that are required to be estimated rel
ative to that needed in penetrance model–based linkage
analysis. Also, unlike most penetrance model–free link
age analysis methods, the variancecomponent method
can be used both for localization of QTLs and for ob
taining good estimates of the relative importance of the
QTL in determining phenotypic variance in the popu
lation (Amos et al. 1996; Blangero and Almasy 1997;
Williams et al. 1997).
Let thequantitativephenotype,y,bewrittenasalinear
function of the n QTLs that influence it:
?
i?1
n
y ? m ?
g ? e ,
i
(1)
where m is the grand mean, giis the effect of the ith
QTL, and e represents a random environmental devia
tion. Assume giand e are uncorrelated random variables
with expectation 0 so that the variance of y is
. We also allow for both additive and dom
?
j ? j
i?1
g
e
i
2
y
j ?
n
22
inance effects, and therefore
the additive genetic variance due to the ith locus and
is the dominance variance. If we assume two allelic
jdi
variants, Q and q with frequencies of pQand (
at a given QTL, the genotypespecific means are given
by
m
? m ? a, m
? m ? d
QQQq
QTLspecific genetic variances are given by
2p (1 ? p )[a ? (1 ? 2p )d]
QQQ
For such a simple random effects model, we can easily
obtain the expected phenotypic covariance between the
trait values of any pair of relatives as
?
i?1
, whereis
2
gi
222
j ? j ? jj
aidiai
2
)1 ? pQ
, andand the
j ?
m
? m ? a
qq
2
ai
and.
222
j ? [2p (1 ? p )d]
diQQ
n
22
[]
Cov(y ,y ) ?
1
(k /2 ? k )j ? k j
1i
2i
,(2)
2
ai
2i di
where
represent the k coefficients of Cotterman (1940) with kji
being the ith QTLspecific probability of the pair of rel
atives sharing j alleles IBD. Similarly, the expected phe
notypic correlation between any pair of relatives is given
by
?
i?1
, and the k termsCov(y ,y ) ? E[(y ? m)(y ? m)]
1212
n
22
i
[]
r(y ,y ) ?
1
(k /2 ? k )h ? k d
1i
2i
,(3)
2
ai
2i
where
iance due to the additive genetic contribution of the ith
QTL, andis the proportion due to the dominance
di
effect. In the classical quantitative genetic variancecom
ponent model, we do not have information on specific
QTLs but utilize the expectation of the k probabilities
over the genome to obtain the following approximation:
is the proportion of the total phenotypic var
2
hai
2
22
Cov(y ,y ) ≈ 2fj ? d j ,
is the total additive genetic variance,
j
i?1
ai
is the total dominance genetic variance,
is the expected kinship coefficient
f ? E[(k /2 ? k )]
1i
2i
2
over the genome with2f ? R
efficient of relationship, and
probability of sharing 2 alleles IBD. Because we are gen
erally interested in theexamination ofoneorafewQTLs
at a time, we exploit the above approximation to reduce
the number of parameters that need to be considered.
For example, if we are focusing on the analysis of the
ith QTL in equation (1), we can absorb the effects of
all of the remaining QTLs in residual components of
covariance. Employing these residual covariance terms,
the expected phenotypic covariance between relatives is
well approximated by
(4)
12
a
t d
where
2
j ??
d
2
n
2
j ??
a
n
j
i?1
2
di
1
giving the expected co
is the expected
d ? E[k ]
t
2i
222
g
2
Cov(y ,y ) ? p j ? k j ? 2fj ? d j ,
12
i ai
(5)
2i di
t d
whereis the coefficient of relationship
p ? (k /2 ? k )
i
1i
2i
Page 3
1200
Am. J. Hum. Genet. 62:1198–1211, 1998
or the probability of a random allele being IBD at the
ith QTL, represents the residual additive genetic var
jg
iance, andnow represents the residual dominance
jd
genetic variance. The p and k2coefficients and their ex
pectations effectively structure the expected phenotypic
covariances and are the basis for much of quantitative
trait linkage analysis such as the sibpair difference
method of Haseman and Elston (1972). For any given
chromosomal location, p and k2can be estimated from
genetic marker data and information on thegeneticmap.
Given the simple model for phenotypic variation de
scribed above, it is possible to use data from pedigree
structures of arbitrary complexity to make inferences
regarding the localization and effect sizes of QTLs. For
the simple additive model in which n QTLs and an un
known number of residual polygenes influence a trait,
the covariance matrix for a pedigree can be written
2
2
n
22
g
2
e
ˆ
Q ?
P j ? 2Fj ? Ij ,
i ai
(6)
?
i?1
where
predicted proportion of genes that individuals j and l
share IBD at a QTL that is linked to a genetic marker
locus, F is the kinship matrix, and is an identitymatrix.
is a function of the estimated IBD matrix for a genetic
ˆPi
marker itself () and a matrix ofcorrelations
ˆP
m
the proportions of genes IBD at the marker and at the
QTL
is the matrix whose elements ( ) provide the
ˆPp
iijl
I
between
B
ˆˆ
P ? 2F ? B(r,v) , (P ? 2F) ,
i
(7)
m
where v is the recombination frequency between marker
locus m and QTL i, and the elements b ? r(p ,p Fr,v)
are the correlations between theIBDprobabilities,where
r denotes rth type of kinship relationship. Equation (7)
is a matrix generalization of the results provided by
Amos (1994). The rfunctions provide the autocorre
lation functions between IBD probabilities as a function
of genetic distance, and they also allow prediction of the
matrix at any chromosomal location given
ˆ
P
at correlated locations (e.g., when v ! .5). Derivation of
the r functions for arbitrary pedigree relationships is
provided below.
By assuming multivariate normality as a working
model within pedigrees, the likelihood of any pedigree
can be easily written and numerical procedures can be
used to estimate the variancecomponent parameters.
For the model in equation (6), the lnlikelihood of a
pedigree of t individuals with phenotypic vector
given by
ijim
estimates
ˆ
P
is
y
22
g
2
e
ln L(m,j ,j ,j ,bFy,X)
ai
t
2
1
2
1
2
?
?1
? ?
ln (2p) ?
ln FQF ?
DQ D ,(8)
where m is the grand trait mean,
is a matrix of covariates, and
coefficients associated with these covariates. Likelihood
estimation assuming multivariate normality can be
shown to yield consistent parameterestimatesevenwhen
the distributional assumptions are violated (Beaty et al.
1985; Amos 1994). By performing an extensive series
of simulations, we have confirmed the consistency of
variancecomponent estimates of genetic effect size
(Blangero and Almasy 1997; J. T. Williams and J. Blan
gero, unpublished data).
Using the variancecomponent model, we can test the
null hypothesis that the additive genetic variance due to
the ith QTL equals zero (no linkage) by comparing the
likelihood of this restricted model with that of a model
in which the variance due to the ith QTL is estimated.
The difference between the two log10likelihoods pro
duces a LOD score that is the equivalent of the classical
LOD score of linkage analysis. Twice the difference in
logelikelihoods of these two models yields a test statistic
that is asymptotically distributed as a
variable and a point mass at zero (Self and Liang
x1
1987). When multiple QTLs are jointly considered, the
resulting likelihoodratio test statistic has a more com
plex asymptotic distribution that continues to be a mix
ture of x2distributions.
This basic model has been extended to incorporate a
number of more complex genetic models by allowing
for additional sources of genetic and nongenetic vari
ance. In multilocus models, an additive # additive com
ponent of epistatic variance can be estimated by use of
the Hadamard product of
P
the coefficient matrix that structures the expected co
variances among pedigree members (Mitchell et al.
1997). Dominance # dominance, additive # domi
nance, and dominance # additive variance components
also can be specified by Hadamard products of appro
priate andcoefficient matrices. For example, al
P
K2
lowing for additive # additive interactions betweentwo
QTLs leads to the following equation for the phenotypic
covariance matrix of a pedigree:
,
D ? (y ? m ? Xb) X
is thematrixofregression
b
:mixture of a
1 1
2 2
2
matrices for each locus as
222
ˆˆˆˆ
Q ? P j ? P j ? (P , P )j
1 a12 a2
?
i?3
12
a1#a2
n
22
g
2
e
ˆ
?
P j ? 2Fj ? Ij .
i ai
(9)
A household or shared environment effect can be added
by an additional variance component with a coefficient
matrix ( ) whose elements are 1 if the relative pair in
H
Page 4
Almasy and Blangero: Multipoint QTL Analysis in Pedigrees
1201
Figure 1
Half–grandavuncular pair (blackened symbols)
question shares the environmental exposure or 0 oth
erwise. This simple incorporation of a shared household
component leads to the following model for the phe
notypic covariance matrix
?
i?1
n
22
g
2
h
2
e
ˆ
Q ?
Pj ? 2Fj ? Hj ? Ij .
i ai
(10)
The
method, with all of the above described extensions, has
been implemented in a computer analysis package called
Sequential Oligogenic Linkage Analysis Routines (SO
LAR), which employs the computer programs FISHER
and SEARCH (Lange et al. 1988) for likelihood optim
ization in quantitativetrait analysis. For any of the com
plex genetic models described above, SOLAR can also
incorporate covariate effects as well as multivariate
quantitative traits (Almasy et al. 1997c); discrete traits,
by use of a threshold model (Duggirala et al. 1997);
mixed discrete/quantitativetrait analyses; and genotype
# environment interaction (Towne et al. 1997).
general pedigreevariancecomponentlinkage
Estimation of the IBD Probability Matrix for a Genetic
Marker
In the above formulation, all of the information re
garding linkage is a functionof theestimated
For a given genetic marker, a number of methods have
been proposed to calculate this IBD probability matrix
(Amos et al. 1990; Curtis and Sham 1994; Whittemore
and Halpern 1994). One simple and effective approach
is to perform pairwise likelihoodbased estimationof the
elements of amatrix by calculatingtheposteriorprob
ˆPi
ability of genotypes at a completely linked pseudomar
ker at which there is an extremely rare allele (i.e., an
allele frequency less than the expected mutation rate).
With this approach, the p for each pair of individuals
is evaluated by randomly assigning the rare homozygous
pseudomarker genotype to one of the individuals and
then calculating the likelihoods of seeing the three pos
sible pseudomarker genotypes in the other individual
conditional on the marker information in the complete
pedigree. From the resulting posterior probabilities, it is
simple to calculate the three locusspecific k coefficients
for any marker and then to calculate the
method is relatively rapid for simple pedigrees but can
become tedious in complex pedigrees, especially ones
with multiple inbreeding loops. Any software that can
calculate twopoint pedigree likelihoods can be used for
calculating IBD probabilities.
A second alternative for pedigrees of arbitrarysizeand
complexity is to calculate an estimate of all the elements
of thematrix jointly by Monte Carlo techniques.
Pi
When there is no missing genetic marker information in
a pedigree, exact IBD probabilities can rapidly be cal
matrices.
ˆPi
estimate.This
p
culated by use of the algorithm of Davis et al. (1996).
Therefore, Monte Carlo methods can be used to impute
marker genotypes for individuals not typed in a pedigree
conditional on all other marker and pedigree informa
tion. Once the marker genotype vector is filled in by
such a process, the exact maximum likelihood estimate
of p can be obtained immediately. The results of many
such imputations can be averaged by by use of the like
lihood of the imputed marker genotype vector as a
weighting factor. There are many possible variations of
such a Monte Carlo approach, but all methods require
substantial computing for large pedigrees.
We use both of these approaches in our computer
program, SOLAR, and have noticed few differences be
tween them across a wide range of practical applications
including extensive computer simulations. A practical
benefit of both approaches is the independence of major
aspects of the calculations, which renders the estimation
problem infinitely scalable with regard to parallel
computation.
Although itis comparativelystraightforwardtoobtain
an estimate of thematrix for any genetic marker,exact
P
calculation of multipoint IBD probabilities given a num
ber of genetic markers is formidable except for relatively
small and simple pedigrees. Since it is well known that
exploitation of multipoint information can dramatically
improve the power to detect QTLs, fast and accurate
approximate methods would be of great benefit. In the
next section, we outline our approach to obtaining such
approximate multipoint IBD probabilities for any chro
mosomal location.
Derivation of IBD Correlation Formulas for Multipoint
Analysis
Given the simplicity and accuracy of the Fulker
method (Fulker et al. 1995) for approximating multi
Page 5
1202
Am. J. Hum. Genet. 62:1198–1211, 1998
Table 1
Possible TwoLocus Combinations of p for Relative Pairs Able to
Share Only One Allele IBD
SECOND LOCUS
FIRST LOCUS
,
p ? 0 i ? 0
1
,
1
2
p ?
1
i ? 1
TOTAL p
p ? 0, j ? 0
2
1
p ?
2
2
Total p
P00
P01
P10
P11
1 ? 2E(p)
,
j ? 1
2E(p)
1 ? 2E(p) 2E(p)
Table 2
Formulas for p11in Relative Pairs Related by a Single Line of
Descent
Type of Relative Pair
p11
Direct descent
Halfavuncular or
halfcousin
Full avuncular
Full cousin
(d?1)(d?1)
(1/2 )(1? v)
(d?1)(d?2)22
(1/2
1/2 (1? v)
d
1/2 (1? v)
)(1? v)
(d?2)
[v ? (1? v) ]
(2? 5v ? 8v ? 4v )
(d?3)
(2? 8v ? 15v ? 12v ? 4v )
d
23
234
NOTE.—d represents the degree of relationship.
point calculations for sib pairs, we decided to generalize
this approach to arbitrary pedigree relationships. Such
a general average sharing method requires that we for
mulate all possible r(pi, pjFr, v) functions (i.e., IBD prob
ability autocorrelation functions), which can be used to
provide the expected correlation in IBD between geno
typed marker loci and any chromosomal location with
a known position relative to these marker loci. The cor
relation in the proportion of alleles shared IBD by a
relative pair over some chromosomal distance can be
expressed with a simple formula:
Cov(p ,p )
j(p )j(p )
1
12
r(p ,p Fr,v) ?
12
,(11)
2
where Cov(p1,p2) is the covariance in IBD allele sharing
between locus 1 (the genotyped marker) and locus 2 (the
arbitrary chromosomal location at which IBD sharing
is being estimated), and j(p1) and j(p2) are the expected
standard deviations in IBD allele sharing at the two loci.
These standard deviations depend on the degree of re
lationship between the relative pair under consideration
and will be the same for the two loci. Thus, the denom
inator reduces to the expected variance in theproportion
of alleles shared IBD for the type of relative pair, Var(p).
This variance in IBD sharing can be calculated by use
of the formula
Var(p) ? E(p ) ? E(p)
is the expected IBD sharing for the relative pair.2f
The covariance is a simple function involving each
possible value of p at locus 1 and locus 2, adjusted by
E(p) and weighted by the probability of observing the
twolocus combination of p, pij:
?
ij
where
22
E(p) ?
[][]
Cov(p ,p ) ?
1
p p ? E(p) p ? E(p) .
ij
1
(12)
22
For unilineal relative pairs, the number of alleles shared
IBD (i and j) at locus 1 and locus 2 will take the values
0 and 1, with the resulting IBD probabilities p1and p2
being i/2 and i/2, which yields four possible twolocus
combinations of p. For bilineally related pairs able to
share two alleles IBD, such as siblings, i and j may also
be 2, resulting in nine possible combinations. If inbreed
ing is present, i and j may equal 4 when both members
of a pair are autozygous for the same ancestral allele.
This results in 16 potential twolocus combinations of
allele sharing.
The process of obtaining the r function for any class
of unilineal relationship is straightforward and may best
be described by example.
An Example: Half–GrandAvuncular Pairs
A half–grandavuncular pair (fig. 1) are fourthdegree
relatives for whomVar(p) ? 7/256
They may share 0 or 1 alleles IBD, yielding p values of
0 and. To obtain the covariance for this relative pair,
2
we will need the probabilities of observing the four pos
sible twolocus combination of p (table 1).
These probabilities can be determined by calculating
the probability of p2equaling 0 or
possible patterns of recombination between the two loci.
For example, let p11be the probability that 1 allele is
shared IBD at the second locus (
allele is shared at the first locus (
account all possible patterns of recombination. In figure
1, individuals 4 and 9 represent a half–grandavuncular
pair. The probability that they share 1 allele IBD at the
first locus () is . Any allele shared IBD by 4 and
p ?
1
28
9 is necessarily also shared by intervening individuals 5
and 7. For 4 and 9 to share an allele at the second locus,
transmission from 2, the father of the halfsibs, to his
sons 4 and 5 must be either both nonrecombinant with
probability , or both recombinant with proba
(1 ? v)
bility v2. In addition, transmissions from 5 to his son 7
and from 7 to 9 must both be nonrecombinant with
probability. Thus,
(1 ? v)
.
v)
For pairs related by a single line of descent, p11can
be calculated simply from one of four formulas provided
in table 2. These formulas use the degree of relationship
between the members of the pair and differ by whether
the pair is related through a direct line of descent(grand
parental relationships), a halfsibling pair (halfavun
cular and halfcousin relationships), a full sibling pair
descending through only one sib (full avuncular rela
and.E(p) ? 1/16
1
, given p1and all
1
2
), given that 1
) and taking into
1
2
p ?
2
p ?
1
1
2
11
2
1
8
222
p
?
[v ? (1 ? v) ](1 ?
11
2
Page 6
Almasy and Blangero: Multipoint QTL Analysis in Pedigrees
1203
Table 3
Correlation Coefficients for IBD Allele Sharing in Various Types of Relative Pairs
RelationCorrelation in Proportion of Alleles Shared IBDE(p)Var(p)
Sibs
Halfsibs
Avuncular
Grandparent
First cousin
Halfavuncular
Grandavuncular
Greatgrandparent
Half–first cousin
First cousin, once
removed
Half–grandavuncular
Greatgrandavuncular
Greatgreat
grandparent
Second cousin
Halfcousin, once
removed
First cousin, twice
removed
Half–second cousin
Second cousin, once
removed
Third cousin
2
1 ? 4v ? 4v
1 ? 4v ? 4v
1 ? 5v ? 8v ? 4v
1 ? 2v
16
1 ?
v ? 10v ? 8v ? v
3
16
2
1 ? 4v ?
v ? v
3
1426
1 ?
v ?
v ? 8v ? v
33
84
2
1 ? v ? v
33
32
2
1 ?
v ? 8v ?
7
1
2
1
4
1
4
1
4
1
8
1
8
1
8
1
8
1
16
1
8
1
16
1
16
1
16
3
64
3
64
3
64
3
64
7
256
2
23
8
3
234
8
3
3
8
3
234
48
7
16
7
34
v ?
v
40
7
32
7
36
7
92
7
108
7
48
7
100
64
7
v
64
7
16
7
2345
1 ?
1 ?
1 ?
v ?
v ? 8v ?
80
v ?
7
v ?
v ?
v ?
v
1
16
1
16
1
16
7
256
7
256
7
256
16
7
234
v ?
16
7
2345
v ?
v ?
v ?
v
7
24
7
32
5
24
7
88
5
8
7
80
3
23
1 ?
1 ?
v ?
v ?
v ? v
v ?
1
16
1
32
7
256
15
1024
344
15
32
3
32
15
23456
v ?
v ?
v ?
v
16
3
176
15
208
15
128
15
32
15
2345
1 ?
v ?
v ?
v ?
v ?
v
1
32
15
1024
32
5
192
31
88
5
512
31
80
3
344
15
32
3
32
15
5
23456
1 ?
1 ?
v ?
v ?
v ?
v ?
v ?
768
31
v ?
672
31
v ?
320
31
v
1
32
1
64
15
1024
31
4096
64
31
2346
v ?
v ?
v ?
v
224
31
512
63
720
31
1888
63
1328
31
4096
63
1008
31
384
31
64
31
234567
1 ?
1 ?
v ?
v ?
v ?
v ?
v ? 48v ?
v ?
v ?
1664
21
v ?
928
21
v
1
64
1
128
31
4096
63
16384
5632
63
128
9
128
63
2345678
v ?
v ?
v ?
v ?
v
tionships), or a full sibling pair descending through both
sibs (fullcousin relationships).
The probabilities for the remaining twolocus sharing
states may besimilarly derivedfromthepossiblepatterns
of recombination, or for pairs that can share only 0 or
1 alleles IBD, they can be obtained by subtracting from
the marginal totals for singlelocus sharing of 0 or 1
allele IBD (table 1). For the half–grandavuncular pair
described above,
1
8
1
8
1
8
222
[]
p
? p
?? p
??
v ? (1 ? v) (1 ? v)
011011
and
7
8
3
4
p
?? p
?? p
000111
3
4
1
8
222
[]
??
v ? (1 ? v) (1 ? v) .
When these values are used, the covariance for a
half–grandavuncular pair is
2
1
16
Cov(p ,p ) ? p
1
0 ?
200()
1 1
)(
16 2
)
16
1
16
? 2p
0 ??
01()
2
1
2
1
?p
?
,(13)
11(
and, after standardization and gathering of terms, the
correlation is given by
r(p ,p dhalf ? grand ? avuncular,v)
12
Cov(p ,p )
7/256
12
?
32
7
48
7
16
7
234
? 1 ?
v ? 8v ?
v ?
v .(14)
Table 3 shows E(p), Var(p), and the correlation between
IBD probabilities for other unilineal classes of relative
pairs. These relationships are the most common ob
Page 7
1204
Am. J. Hum. Genet. 62:1198–1211, 1998
Table 4
Correlation Coefficients for IBD Allele Sharing in Relative Pairs with Multiple or Compound Relationships
RelationCorrelation in Proportion of Alleles Shared IBD E(p)Var(p)
Double–first cousin
Double–first cousin, once
removed
Double–second cousin (fig.
2a)
Double–second cousin (fig.
2b)
Double–second cousin (fig.
2c)
First cousin and second
cousin
Halfsib and first cousin
Halfsib and halfavuncular
Double–halffirst cousin
Double–halfavuncular
Halfsib and half–first
cousin
16
3
8
3
234
1 ?
v ? 10v ? 8v ? v
1
4
3
32
143
18
731
9
1226
9
5557
36
1058
9
152
9
20
9
23456789
1 ?
v ? 32v ?
v ?
v ?
v ?
v ? 58v ?
v ?
v
1
8
3
64
176
21
785
21
2320
21
674
3
2248
7
4553
14
4828
21
2284
21
656
21
88
21
2345678910
1 ?
v ?
v ?
v ?
v ?
v ?
v ?
v ?
v ?
v ?
v
1
16
7
256
32
5
88
5
80
3
344
15
32
3
32
15
23456
1 ?
v ?
v ?
v ?
v ?
v ?
v
1
16
15
512
46
7
130
7
208
7
104
7
24
7
23456
1 ?
v ?
v ?
v ? 28v ?
v ?
v
1
16
7
256
352
63
32
7
248
21
46
7
32
v ? v
7
2
v ? 8v ?
16
2
v ? v
3
112
9
3
v ? v
472
63
4
160
63
32
63
23456
1 ?
1 ?
1 ? 4v ?
32
1 ?
1 ? 4v ?
v ?
v ?
v ?
v ?
v ?
8
7
v ?
v ?
v
10
64
3
8
3
8
1
8
1
4
63
1024
7
64
7
64
7
128
3
32
24
7
2
8
7
48
7
8
3
23
16
7
34
v ?
v
7
3
96
23
120
23
48
23
16
23
234
1 ?
v ?
v ?
v ?
v
5
16
23
256
Table 5
Probabilities of TwoLocus Combinations of p for Bilineal Relative
Pairs
SECOND
LOCUS
FIRST LOCUS
,
p ? 0 i ? 0
1
,
1
2
p ?
1
i ? 1
,
p ? 1 i ? 2
1
p ? 0, j ? 0
2
1
p ?
2
2
p ? 1, j ? 2
2
x y
00 00
x y ?x y
01 00
x y
01 01
x y ?x y
10 00
x y ?2x y ?x y
11 0010 01
x y ?x y
11 0101 11
00 10
x y
10 10
x y ?x y
11 10
x y
11 11
,
j ? 1
00 0100 1110 11
NOTE.—x and y represent the two locussharing probabilities for
the two independent lines of relationship.
served in human extended family studies. Table 4 pro
vides the same information for a variety of relative pairs
related by multiple linesof descent.Notethatwhilesome
of these pairs have the same E(p) as pairs in table 3, the
variances may be different, since some pairs in table 4
can share both alleles IBD (
possible IBD allele sharing states, rather than four, and
complicates the calculation of the probability of each
sharing state. However, when a pair is related through
two independent lines of descent, the elements of the
matrices of sharing probabilities for each inde2 # 2
pendent relationship can be multiplied to form a 3 #
matrix of sharingstate probabilities for the compound3
relationship (table 5). The first formula shown for dou
ble–second cousins (table 4) applies only to pairs related
through double–first cousins (fig. 2a). Double–second
cousins also occur when two sets of first cousins marry
(fig. 2b) or when one person’s parent is cousin to both
of the other person’s parents (fig. 2c). Each of these
double–second cousin pairs have different correlation
formulas since the possible p values are not the same
(for the pair in fig. 2b, p may equal 1, while in figs. 2a
and 2c it cannot) and the possible patterns of recom
bination also differ. The IBD sharing probability matrix
for the double–second cousins in figure 2b can be cal
culated by multiplying the elements from the basic shar
ing probability matrix for second cousins as described
above. However, the probabilities of the sharing states
for the double–second cousins in figures 2a and 2c can
not make use of the formulas above, since the two lines
of descent pass through the same individual(s) and are
not independent. Thus, the twolocus sharing probabil
). This leads to nine
p ? 1
ities for these types of second cousins were derived by
examining the possible patterns of recombination as de
scribed for the half–grandavuncular pair.
k2Correlation Functions for Incorporating Dominance
Extension of the above results to allow for dominance
effects via the locationspecific k2probabilities requires
that we formulate the possible
(i.e., the k2autocorrelation functions). These can be ex
pressed as
functions
r(k ,k Fr,v)
2
i
2
j
1
s?0
1
t?0 (27s)(27t)
SS
f
(s ? d )(t ? d )
7r
Var(d )
7r
7r
r(k ,k Fr,v) ?
22
i
,(15)
j
where the summations over s and t are performed over
the possible values (i.e., 0 and 1) of k2so that the nec
essary probabilities are limited to f22, f02, f20, and f00. The
probabilities designated by f can be obtained from those
derived above for the pautocorrelations for bilineal rel
Page 8
Almasy and Blangero: Multipoint QTL Analysis in Pedigrees
1205
Figure 2
Three types of double–second cousins (blackened symbols)
Table 6
Correlation Coefficients for k2as a Function of v and Relationship
Relation
r(k ,k dr,v)
2i
2j
E(k2)Var(k2)
Siblings
Double–first cousin
Double–second
cousin (fig. 2b)
First cousin and sec
ond cousin
Halfsib and first
cousin
Halfsib and half
avuncular
Double–halffirst
cousin
Double–half
avuncular
Halfsib and
half–first cousin
16
3
128
15
1024
85
?
32
3
496
15
5888
v ?
365824
255
32
3
16
3
234
1 ?
1 ?
1 ?
v ?
v ?
v ?
v ?
v ?
384
5
63488
v ?
87744
v ?
v
1
4
1
16
3
16
15
256
1732
15
3
v ?
v ?
1696
15
4
v ?
15872
v ?
352
5
128
5
373376
255
11
v ?
64
15
2345678
v ?
v ?
157504
255
27136
51
v ?
282368
255
v ?
85
v ?
v ?
2048
51
v ?
v
256
v
1024
255
85255
7891012
v
85
1
256
255
65536
640
63
1024
21
9088
63
18128
63
8416
21
8420
21
16768
63
7552
63
2048
63
256
63
23456789 10
1 ?
v ?
v ?
v ?
v ?
v ?
v ?
v ?
v ?
v ?
v
1
64
63
4096
48
7
232
7
232
7
128
7
32
7
23456
1 ?
v ? 20v ?
v ?
v ?
v ?
v
1
8
7
64
40
7
96
7
128
7
96
7
32
7
2345
1 ?
v ?
v ?
v ?
v ?
v
1
8
7
64
512
63
640
21
4352
63
6464
63
6400
63
4096
63
512
21
256
63
2345678
1 ?
v ?
v ?
v ?
v ?
v ?
v ?
v ?
v
1
64
63
4096
32
5
272
15
448
15
448
15
256
15
64
15
23456
1 ?
v ?
v ?
v ?
v ?
v ?
v
1
16
15
256
32
5
272
15
448
15
448
15
256
15
64
15
23456
1 ?
v ?
v ?
v ?
v ?
v ?
v
1
16
15
256
atives. Specifically,
f
? 1 ? 2d ? f
00
Table 6 provides most of the required k2autocorre
lation functions that are encountered in studies of ex
tended human families. In general, for a given relation
ship class, we find that
r(p ,pFr,v) 1 r(k ,k Fr,v)
i
words, the correlation between k2values decays more
rapidly with genetic distance than does that for the p
values. For example, comparing the appropriate corre
lation functionsfor sib
r(p ,pFsibling,v) ? r(k ,k Fsibling,v) ? (4v ? 20v ?
ij
2i
2j
, which is 10 for all32v ? 16v )
,, and
f
? pf
? f
? d ? f
7r
2222022022
.
7r
22
.Inother
j
2i
2j
pairs, we
1
3
. Therefore, the
find that
2
34
v 1 0
incorporation of dominance effects into a variancecom
ponent model will be most useful when the QTL is com
paratively close to a genetic marker.
Estimation of
Information
andMatrices by Use of Multipoint
P
K2
Given the p and k2correlation functions provided in
tables 3, 4, and 6, it is possible to estimate the
at any chromosomal location conditional on all of the
available genetic marker information and the map lo
cations of the markers. A Haldane mapping function is
matrix
P
Page 9
1206
Am. J. Hum. Genet. 62:1198–1211, 1998
Table 7
Phenotyped Relative Pairs Informative for
Linkage in the Simulated Pedigrees
Degree (Coefficient) of Relationship
and Relationship Type
No. of
Pairs
First ( ):
Sibs
Parentoffspring
Second ( ):
4
Avuncular
Grandparentgrandchild
Halfsibs
Third ( ):
8
Cousins
Grandavuncular
Halfavuncular
Fourth ():
16
Cousins once removed
Halfcousins
Greatgrandavuncular
Half–grandavuncular
Fifth ():
32
Cousins twice removed
Second cousins
1
2
771
801
1
1,485
151
26
1
2,761
497
64
1
3,051
27
19
13
1
423
169
10,258
employed to relate genetic distances to v. To estimate
IBD probabilities at any chromosomal location, we have
chosen to generalize the regressionbased averaging
method of Fulker et al. (1995) to arbitrary relationships.
Basically, for any pair of individuals of relationship r,
we find the vector of regression coefficients
available estimated markerspecific
, where the subscripts now refer to chromosomal lo
pl
cations in centimorgans. This is done by the standard
regression method in which
on the(b )
r?
vector that predict
ˆ pz
?1
ˆˆ
b ? V(p )
r?
Cov(p ,p ) ,(16)
zz
?
where
ing that we have typed n markers on the chromosome),
is thecovariance matrix of the marker IBD
ˆ
V(p )
n # n
z
probabilities, and
ˆ
Cov(p ,p )
z
?
covariances between the marker IBD probabilities and
those at the chromosomal location
Fulker et al. (1995), the elements of
by the genetic distances between the markers, the
functions derived above, and the empirical
ˆ ˆ
r(p ,pFr,v)
ij
variances of the. Likewise, the elements of the vector
ˆ pi
are given by the product of
ˆ
Cov(p ,p )
z
?
and the empirical variances of the marker
Once obtained, thevector is used to estimate
b
r?
for the ijth pair of relatives by
is a vector of n regression coefficients (assum
br?
is a vector of the expected
. As shown by
are determined
V(p)
?
ˆ
values
ˆˆ
r(p ,pFr,v)
?
ˆ pi
i
.
p
?
?
¯
ˆ
p
ˆ
? 2f ? b (p ? p ˆ) ,
rr?
(17)
?ij
where the symbolwithout a subscript indicates the
ˆ p
vector of marker IBD probabilities, and
mean vector. Subject to constraints on the acceptable
parameter space that are r dependent, equation (16) can
be used to estimate each pairwise element of the
trix, which is then used to structure the expected phe
notypic covariances between relatives as shown in equa
tion (6). The similarity of equation (7) and equation (17)
is also apparent, since equation (7) is the matrix pre
diction equation when there is only a single marker.
A similar approach can be employed to obtain mul
tipoint estimates ofby substituting the appropriate
k2?
expectations, k2autocorrelations, empirical variances,
and means in equations (16) and (17).
is its empirical
¯p ˆ
ma
ˆP?
Simulations
To evaluate theutility ofthismultipointvariancecom
ponent method for detecting QTLs, we performed a se
ries of computer simulations to assess its properties and
accuracy. In the first set of simulations, six quantitative
traits and genotype data were simulated for 200 repli
cates of a data set containing 1,497 total individuals,
1,000 phenotyped, based approximately on the pedigree
structure of the San Antonio Family Heart Study. These
are extended pedigrees, including all available first, sec
ond, and thirddegree relatives of a proband and the
proband’s spouse as well as the marriedin parents of
any descendants. Pedigree size ranges from 37 to 128
individuals; thus, multipoint quantitativetrait
¯ (x ? 65)
linkage analysis of these pedigrees would not be possible
with any previously published method. The number of
relative pairs with both members phenotyped is shown
in table 7 for each type of relative pair present in these
pedigrees. Although the SOLAR general pedigree vari
ancecomponent linkage analysis uses IBD allele sharing
between these relative pairs, it should be noted that it
is not a relativepair method as likelihoods are maxi
mized over entire families considered jointly. The num
ber of relative pairs of various types is shown in order
to illustrate the depth and complexity of these pedigrees.
Fully informative markers were simulated at a posi
tion of 33 cM on a 100cM chromosome. The alleles of
this fully informative marker were grouped togetherinto
“high” and “low” binsinvariouswaystoobtainbiallelic
QTLs whose most common allele took one of three pos
sible generating values, 0.5, 0.7, or 0.9. Two generating
values of the additive effect parameter
were considered that produced either a 2 or 2.5SD
difference between the contrasting QTL genotypes. For
these simulations, dominance effects were not included.
Using the six sets of generating parameters,wesimulated
six quantitative traits in which the relative variance due
to the QTL (i.e., the heritability due to the QTL) ranged
from .15 (where
p ? .9
Q
and). With CHRSIM (Speer et al.
p ? .5
a ? 1.25
Q
1
2
a ? (m ? m )
qq
and) to .44 (where
a ? 1
Page 10
Almasy and Blangero: Multipoint QTL Analysis in Pedigrees
1207
Table 8
Percentage of Simulation Replicates with a Maximum LOD Score x3.0 and Mean Maximum LOD Score
h2
DUE
TO
QTL
ALLELE
FREQUENCY
DISPLACE
MENT
PERCENTAGE OF REPLICATES WITH MAXIMUM LOD x3.0 (MEAN MAXIMUM LOD)
FULLY IN
FORMATIVE
MARKER AT
QTL
5cM MAP
10cM MAP
20cM MAP
TwopointMultipoint TwopointMultipointTwopoint Multipoint
.44
.40
.33
.30
.22
.15
.5
.5
.7
.7
.9
.9
2.5
2.0
2.5
2.0
2.5
2.0
99.5 (13.14)
97.5 (10.44)
95.5 (7.30)
86.0 (6.00)
54.0 (3.71)
27.0 (2.10)
98.5 (6.45)
81.5 (5.11)
68.0 (3.89)
49.5 (3.27)
32.0 (2.45)
8.0 (1.60)
98.9 (7.63)
87.5 (5.99)
75.0 (4.39)
58.5 (3.61)
34.5 (2.52)
11.5 (1.59)
96.0 (6.05)
78.5 (4.81)
57.0 (3.59)
40.0 (2.95)
24.5 (2.17)
6.0 (1.39)
97.2 (6.86)
82.5 (5.32)
67.0 (4.01)
51.0 (3.31)
28.0 (2.26)
10.0 (1.45)
72.0 (4.24)
52.0 (3.31)
35.0 (2.62)
21.5 (2.14)
10.5 (1.58)
2.5 (1.05)
84.5 (5.05)
63.0 (3.88)
47.5 (2.99)
35.5 (2.49)
16.5 (1.72)
3.5 (1.12)
NOTE.—Simulated QTLs were biallelic and accounted for 15%–44% of the trait variance. The second and third columns provide the
frequency of the more common QTL allele and the displacement between homozygote means in standard deviation units, respectively.
1992; Terwilliger et al. 1993), marker loci were simu
lated every 5 cM, based on allele number and frequency
patterns drawn from a commercially available screening
set. For each of the six independent traits/generating
models, twopoint LOD scores were assessed at each of
the marker loci and at the fully informative marker un
derlying the trait. Multipoint analysis was performed
with 5, 10, and 20cM maps drawn from the 21 sim
ulated markers, with IBD sharing estimated every 2 cM
for every relative pair. Both twopoint and multipoint
linkage analyses were performed by use of the variance
component linkage methods described above and im
plemented in SOLAR.
Table 8 provides the mean maximum LOD scores and
the percentage of LOD scores 13.0 obtained for each
generating model. The fourth column of table 8 shows
the mean LOD obtained when the fully informative
marker directly on the QTL location was used. This
value reflects the maximum LOD scores obtainable in
these pedigrees under ideal conditions of marker place
ment and heterozygosity and serves as a gold standard
against which to compare the other linkage analyses.
For all three densities of marker maps, multipoint var
iancecomponent analysis, as compared to the best two
point variancecomponent result, improved both the
mean maximum LOD score and the percentage of max
imum LOD scores 13.0. For example, with a 5cM map,
the mean LOD for multipoint analysis was an average
of 0.5 LOD units higher for the multipoint analysis over
the best twopoint LOD when considered across all gen
erating values. In addition, the percentages of maximum
LOD scores 13.0, which have standard errors ranging
from 0.1 to 1.8, are improved under all six generating
models. Table 8 also shows that a substantial amount
of linkage information is unavailable even at the 5cM
density, which can be seen by the difference in the mean
LOD scores when the fully informative marker at the
QTL is compared to the mean multipoint LOD (13.14
vs. 7.63). Because we have arbitrarily placed the QTL
at a the midpoint of the 5cM interval, simply adding
another marker within the interval would substantially
improve the LOD.
The increase in power with both multipoint variance
component analysis and denser marker maps as well as
the accuracy of multipoint localization of the QTL are
illustrated in figure 3, which compares the LOD profiles,
averaged over the 200 simulations, for one of the sim
ulated traits. Even with a sparse map with an intermar
ker distance of 20 cM, multipoint analysis provided a
noticeable improvement in LOD score over the two
point analyses, as well as an unbiased estimate of QTL
location.
For all of the generating models, the multipoint point
analysis produced excellent estimates of QTL location.
For example, for themodelin whichtheQTLheritability
was 0.44, the estimated locations were 33.21 ? 0.33,
33.03 ? 0.53, and 34.31 ? 0.59 for the 5, 10, and
20cM scans, respectively. Similarly for the model in
which the QTL heritability was 0.30, the estimated lo
cations were 32.67 ? 0.58, 32.15 ? 0.65, and 34.82
? 0.98. In all cases,theestimatedchromosomallocation
was not significantly different from the generating value.
Additional evidence that our multipoint variancecom
ponent procedure yields unbiased estimates of QTL lo
cation is provided elsewhere (Almasy et al. 1997c; Dug
girala et al. 1997; Towne et al. 1997; Williams et al.
1997; J. T. Williams and J. Blangero, unpublished data).
The six sets of generating parameters used in these
simulations are effectively single major gene models in
which there are two QTL alleles acting in a simple co
dominant manner. This straightforward model does not
take advantage of the strengths of the variancecom
ponent linkage method. The existence of a single major
gene inherently violates the assumption of multivariate
normality on which the variancecomponent linkage
method is based. However,ithasbeendemonstratedthat
the method is robust to violations of this assumption
(Beaty et al. 1985). In addition, the use of a biallelic
Page 11
1208
Am. J. Hum. Genet. 62:1198–1211, 1998
Figure 3
at 33 cM and with an additive genetic heritability of .33.
Twopoint and multipoint LOD score profiles for 5, 10, and 20cM marker maps averaged over 200 simulations for a QTL
QTL is somewhat limiting, since the variancecompo
nent methodology is capable of exploiting the greater
information content in a multiallelic QTL system.
In order to test the accuracy of our estimatesofgenetic
effect size, we performed a second set of simulations in
which, given a QTL allele frequencyof
a to produce a series of generating models in which the
additive genetic heritability due to the QTL
from 0.05 to 0.50 in increments of 0.05 units. In this
simulation, we also allowed for a residual genetic her
itability of 0.20. For each generating model, 100 repli
cates were assessed and quantitativetrait linkage anal
ysis was performed on each. Figure 4 shows a plot of
the expectedand themeanofthemaximumlikelihood
hq
estimates ofat the expected QTL location. Figure 4
hq
clearly shows that the variancecomponent procedure
yields outstanding estimates of genetic effect size. These
simulations were also performed with a QTL allele fre
quency of with similar results (not shown).
p ? .9
Q
,wechose
p ? .5
Q
varied
2
(h )
q
2
2
Discussion
This powerful variancecomponents method makes it
possible to perform multipoint linkage analysis with
quantitativetrait data in pedigrees of arbitrary size and
complexity. Such an analysis would previously have re
quired either fragmentation of any large pedigrees into
smaller subsets, resulting in a reduction in power to de
tect linkage, or the application of one of the new com
puter intensive Monte Carlo–based parametric linkage
methods (e.g., the method of Heath 1997). The multi
point IBD estimation method presented in this article
has already been utilized successfully in variancecom
ponent linkage analyses of simulated data from Genetic
Analysis Workshop 10 (Almasy et al. 1997c) as well as
such quantitative traits as serum leptin (Comuzzie et al.
1997), and HDLcholesterol levels (Almasy et al. 1997b)
in the extended pedigrees of the San Antonio Family
Heart Study and eventrelated brain potentials in the
Collaborative Study on the Genetics of Alcoholism (Al
masy et al. 1997a; Porjesz et al. 1997; Begleiter et al.,
in press).
The IBD estimation procedure is quite efficient and
compares favorably to other multipoint methods suit
able for use in pedigrees. In contrast to the ElstonStew
art algorithm (1971), in which computation increases
exponentially with the number of markers, or the
LanderGreen Hidden Markov Model (Lander and
Green 1987; Kruglyak et al. 1996), in which compu
tation increases exponentially with the number of non
founders in a pedigree, because the suggested multipoint
algorithms are linear functions of previously computed
IBDs, processing time increases only linearly for addi
tional individuals or additional loci. For an input file
containing IBD information on 16 genotyped marker
loci for 20,854 relative pairs, SOLAR, running on a Sun
workstation, required only 1 min 10 s to estimate the
Page 12
Almasy and Blangero: Multipoint QTL Analysis in Pedigrees
1209
Figure 4
Plot of expected vs. estimated additive genetic heritability due to the QTL. Bars indicate ?1 standard error.
IBD matrix at an arbitrary chromosomal location. Such
computational speed makes it feasible to estimate mul
tipoint IBD matrices every 1 cM along an entire chro
mosome, even for very large data sets with many gen
otyped markers. SOLAR was recently used to analyze
complex pedigree data from Genetic Analysis Workshop
10 (Almasy et al. 1997c), with 11,000 genotyped indi
viduals and as many as 50 markers on a chromosome.
Similarly, we are employing this method on a large com
plex baboon pedigree (with a pedigree size of 750 ani
mals) and an extremely large pedigree of individuals
from an isolated human population (with a pedigree size
of 1,200 individuals). An additional benefit of this ap
proach is that, once a marker data set is deemed final,
IBD calculations need be performed only once and the
resulting matrices stored for all future analyses. This
feature is particularly useful in large studies of complex
disease where many differentphenotypeshavebeenmea
sured and each needs to be processed through genome
wide linkage analysis.
IBD correlation formulashavepreviouslybeenderived
by a number of authors for limited classes of relative
pairs. Amos (1988) derived the IBD correlationsforhalf
sibling, grandparentgrandchild, avuncular, and first
cousin pairs by methods similar to those described
above. Feingold (1993) and colleagues (Feingold et al.
1993) employed a different strategy, using a Markov
approximation to derive these same four formulas for
use in affected relative pair–based linkage analysis using
IBD status. These authors and Lander and Kruglyak
(1995) were primarily interested in the pautocorrela
tions in order to assess the importance of correlated test
statistics in genome scanning. In this regard, it is useful
to point out that the Lander and Kruglyak crossover
rate parameter, which is central to their method for eval
uating genomewide significance levels, is given by ?
. Thus, the crossover rate param
[dr(p ,pFr,v)/dv]
vr0
ij
2
eter is an approximate measure of how rapidly the p
autocorrelations decay with genetic distance and can be
obtained for any pairwise relationship simply as half the
absolute value of the coefficient associated with v1in the
appropriate rfunction. For example, from table 4, we
can immediately determine that the crossover rate pa
rameter for third cousins is
results in tables 3 and 4 can be used to extend obser
vations on the behavior of correlated test statistics for
linkagemethodsbased
relationships.
The present study extends the IBD correlation for
mulas to many other classes of relative pairs and pro
vides a simple framework for deriving similar formulas
for any relative pair related by a single line of descent
or by multiple independent lines of descent. Simulations
suggest that multipoint variancecomponent linkage
analyses with IBDs calculated based on these correla
tions recover an unbiased estimate of the location of a
gene and provide increased power to detect linkage even
with intermarker distances as widely spaced as 20 cM.
1lim
. Our
1(512/63) ? 256/63
2
onextendedpairwise
Page 13
1210
Am. J. Hum. Genet. 62:1198–1211, 1998
These multipoint IBD estimates remove an impediment
to making full use of the recent expansions of variance
component linkage methodology, improving the power
to examine a wide variety of complex genetic models
for both quantitative and discrete traits in general
pedigrees.
Applications of quantitativetrait linkage analysis are
increasing rapidly. Because of the superior information
content of quantitative traits, genetic analysis of quan
titative risk factors serves as a powerful tool for eluci
dating the genetic mechanisms influencing common dis
eases. Numerous strategies and sampling designs are
being formulated, and each has its own strengths and
weaknesses. It is well known that, in many situations,
extended pedigrees will dramatically outperformsmaller
family units such as sib pairs, sibships, or nuclear fam
ilies with regard to the power to detect and accurately
localize QTLs (Wijsman and Amos 1997). Unfortu
nately, although quantitative data have often been col
lected in more extended kindreds, the lack of adequate
linkage tools has generally led to such rich data sets
being leached of their potential linkage information by
truncation to smallerfamilialunits.Recently,directcom
parisons of pedigreebased and nuclear family–based
samples consisting of the same number of phenotyped
individuals in the same distribution of sibship sizes has
underscored the loss of power resulting from fragmen
tation of a large pedigreebased sample (Duggirala et al.
1997; Towne et al. 1997; Williams et al. 1997). How
ever, with the advent of the multipoint variancecom
ponent linkage method, the superior power of extended
pedigrees can now be routinely and fully exploited for
the localization of QTLs.
The SOLAR software, which incorporates the pedi
greebased variancecomponent and multipoint IBD
methods described here, is freely available to interested
investigators in a compiled version. SOLAR can be ob
tained through the Southwest Foundation for Biomed
ical Research (http://www.sfbr.org).
Acknowledgments
This research was supported by NIH grants GM18897,
HL45522, GM31575, and DK44297. Pedigree drawings were
produced with the program Pedigree/Draw (Mamelka et al.
1988). The authors gratefully acknowledge the expert assis
tance of T. Dyer in simulation of data analyzed in this article.
References
Almasy L, Blangero J, Porjesz B, Begleiter H, and COGA Col
laborators (1997a) Genetic analysis of the N100 eventre
lated brain potential. Am J Med Genet 74:595–596
Almasy L, Blangero J, Rainwater DL, VandeBerg JL, Mahaney
MC, Stern MP, MacCluer JW, et al (1997b) Two majorgenes
influence levels of unesterified cholesterol in an HDL sub
fraction of HDL2a. Atherosclerosis 134:76
Almasy L, Dyer TD, Blangero J (1997c) Bivariate quantitative
trait linkage analysis: pleiotropy versus coincident linkages.
Genet Epidemiol 14:953–958
Amos CI (1988) Robust methods for the detection of genetic
linkage for data from extended families and pedigrees. PhD
thesis, Louisiana State University, New Orleans
——— (1994) Robust variancecomponents approach for as
sessing genetic linkage in pedigrees. Am J Hum Genet 54:
535–543
Amos CI, Dawson DV, Elston RC (1990) The probabilistic
determination of identitybydescent sharing for pairs of rel
atives from pedigrees. Am J Hum Genet 47:842–853.
Amos CI, Zhu DK, Boerwinkle E (1996) Assessing genetic
linkage and association with robust components of variance
approaches. Ann Hum Genet 60:143–160
Beaty TH, Self SG, Liang KY, Connolly MA, Chase GA, Kwit
erovich PO (1985) Use of robust variance components mod
els to analyse triglyceride data in families. Ann Hum Genet
49:315–328
Begleiter H, Porjesz B, Reich T, Edenberg H,GoateA,Blangero
J, Almasy L, et al. Quantitative trait linkage analysis of
human eventrelated brain potentials: P3 voltage. Electroen
ceph Clin Neurophysiol (in press)
Blangero J (1993) Statistical genetic approaches to human
adaptability. Hum Biol 65:941–966
Blangero J, Almasy L (1997) Multipoint oligogenic linkage
analysis of quantitative traits. Genet Epidemiol 14:959–964
Comuzzie AG, Hixson JE, Almasy L, Mitchell BD, Mahaney
MC, Dyer TD, Stern MP, et al (1997) A major quantitative
trait locus determining serum leptin levels and fat mass is
located on human chromosome 2. Nat Genet 15:273–275
Cotterman, CW (1940) A Calculus for Stasticogenetics. PhD
thesis, Ohio State University, Columbus
Curtis D, Sham PC (1994) Using risk calculation to implement
an extended relative pair analysis. Ann Hum Genet 58:
151–162
Davis S, Schroeder M, Goldin LR, Weeks DE (1996) Non
parametric simulationbased statistics for detecting linkage
in general pedigrees. Am J Hum Genet 58:867–880
Duggirala R, Williams JT, WilliamsBlangero S, Blangero J
(1997) A variance component approach to dichotomous
trait linkage analysis using a threshold model. Genet Epi
demiol 14:987–992
Elston, RC, Stewart, J (1971) A general model for the genetic
analysis of pedigree data. Hum Hered 21:523–542
Feingold E (1993) Markov processes for modeling and ana
lyzing a new genetic mapping method. J Appl Prob 30:
766–779
Feingold E, Brown PO, Siegmund D (1993) Gaussian models
for genetic linkage analysis using complete highresolution
maps of identity by descent. Am J Hum Genet 53:234–251
Fulker DW, Cherny SS (1996) An improved multipoint sib
pair analysis of quantitative traits. Behav Genet 26:527–532
Fulker DW, Cherny SS, Cardon LR (1995) Multipoint interval
mapping of quantitative trait loci, using sib pairs. Am J Hum
Genet 56:1224–1233
Goldgar DE (1990) Multipoint analysis of human quantitative
genetic variation. Am J Hum Genet 47:957–967
Haseman JK, Elston RC (1972) The investigation of linkage
between a quantitative traitandamarkerlocus.BehavGenet
2:3–19
Page 14
Almasy and Blangero: Multipoint QTL Analysis in Pedigrees
1211
Heath SC (1997) Markov chain Monte Carlo segregation and
linkage analysis for oligogenic models. Am J Hum Genet
61:748–760
Heath SC, Snow GL, Thompson EA, Tseng C, Wijsman EM
(1997) MCMC segregation and linkage analysis. Genet Ep
idemiol 14:1011–1016
Hopper JL, Mathews JD (1982) Extensions to multivariate
normal models for pedigree analysis. Ann Hum Genet 46:
373–383
Kruglyak L, Daly MJ, ReeveDaly MP, Lander ES (1996) Par
ametric and nonparametric linkage analysis: a unified mul
tipoint approach. Am J Hum Genet 58:1347–1363
Kruglyak L, Lander ES (1995) Complete multipoint sibpair
analysis of qualitative and quantitative traits. Am J Hum
Genet 57:439–454
Lander ES, Green P (1987) Construction of multilocus genetic
maps in humans. Proc Natl Acad Sci USA 84:2363–2367
Lander ES, Kruglyak L (1995) Genetic dissection of complex
traits: guidelines for interpreting and reporting linkage re
sults. Nat Genet 11:241–247
Lange K, Weeks D, Boehnke M (1988) Programs for pedigree
analysis: MENDEL, FISHER, and dGENE. GenetEpidemiol
5:471–472
Lange K, Westlake J, Spence MA (1976) Extensionstopedigree
analysis. III. Variance components by the scoring method.
Ann Hum Genet 39:485–491
Mamelka PM, Dyke B, MacCluer JW (1988) Pedigree/Draw
for the Apple Macintosh. Southwest Foundation for Bio
medical Research, San Antonio
Mitchell BD, Ghosh S, Schneider JL, Birznieks G, Blangero J
(1997) Power of variance component linkage analysis to
detect epistasis. Genet Epidemiol 14:1017–1022
Porjesz, B, Begleiter, H, Blangero, J, Almasy, L, Reich, T,
COGA Collaborators (1997)QTLanalysis ofvisualP3com
ponent of the event related brain potential in humans. Am
J Med Genet 74:573
Schork NJ (1993) Extended multipoint identitybydescent
analysis of human quantitative traits: efficiency, power, and
modeling considerations. Am J Hum Genet 53:1306–1319
Self SG, Liang KY (1987) Asymptotic properties of maximum
likelihood estimators and likelihood ratio tests under non
standard conditions. J Am Stat Assoc 82:605–610
Sobel E, Lange K (1996) Descent graphs in pedigree analysis:
applications to haplotyping, location scores, and marker
sharing statistics. Am J Hum Genet 58:1323–1337
Speer M, Terwilliger JD, Ott J (1992) A chromosomebased
method for rapid computer simulation. Am J Hum Genet
Suppl 51:A202
Stern M, Duggirala R, Mitchell B, Reinhart JL, Shivakumar
S, Shipman PA, Uresandi OC, et al (1996) Evidence for
linkage of regions on chromosomes 6 and 11 to plasma
glucose concentrations in Mexican Americans. Genome Res
6:724–734
Terwilliger JD, Speer M, Ott J (1993) Chromosomebased
method for rapid computer simulation in human genetic
linkage analysis. Genet Epidemiol 10:217–224
Todorov AA, Siegmund KD, Gu C, Borecki IB, Elston RC
(1997) Probabilities of identitybydescent patterns in sib
ships when the parents are not genotyped. Genet Epidemiol
14:909–913
Towne B, Siervogel RM, Blangero J (1997) Effectsofgenotype
bysex interaction on quantitative trait linkage analysis. Ge
net Epidemiol 14:1053–1058
Whittemore AS, Halpern J (1994) Probability of gene identity
by descent: computation and applications. Biometrics 50:
109–117
Wijsman EM, Amos CI (1997) Genetic analysis of simulated
oligogenic traits in nuclear families and extended pedigrees:
summary of GAW10 contributions. Genet Epidemiol 14:
719–735
Williams JT, Duggirala R, Blangero J (1997) Statistical prop
erties of a variancecomponents method for quantitative
trait linkage analysis in nuclear families and extended ped
igrees. Genet Epidemiol 14:1065–1070
View other sources
Hide other sources
 Available from John Blangero · Jun 6, 2014
 Available from nih.gov