Fractioned DNA pooling: a new costeffective strategy for fine mapping of quantitative trait loci.
ABSTRACT Selective DNA pooling (SDP) is a costeffective means for an initial scan for linkage between marker and quantitative trait loci (QTL) in suitable populations. The method is based on scoring marker allele frequencies in DNA pools from the tails of the population trait distribution. Various analytical approaches have been proposed for QTL detection using data on multiple families with SDP analysis. This article presents a new experimental procedure, fractionedpool design (FPD), aimed to increase the reliability of SDP mapping results, by "fractioning" the tails of the population distribution into independent subpools. FPD is a conceptual and structural modification of SDP that allows for the first time the use of permutation tests for QTL detection rather than relying on presumed asymptotic distributions of the test statistics. For situations of family and cross mapping design we propose a spectrum of new tools for QTL mapping in FPD that were previously possible only with individual genotyping. These include: joint analysis of multiple families and multiple markers across a chromosome, even when the marker loci are only partly shared among families; detection of families segregating (heterozygous) for the QTL; estimation of confidence intervals for the QTL position; and analysis of multiplelinked QTL. These new advantages are of special importance for pooling analysis with SNP chips. Combining SNP microarray analysis with DNA pooling can dramatically reduce the cost of screening large numbers of SNPs on large samples, making chip technology readily applicable for genomewide association mapping in humans and farm animals. This extension, however, will require additional, nontrivial, development of FPD analytical tools.

Article: Impact of climatic factors on genetic diversity of Stipa breviflora populations in Inner Mongolia.
[Show abstract] [Hide abstract]
ABSTRACT: Genetic diversity of Stipa breviflora populations in the Inner Mongolian grasslands of China and its possible correlation with climatic factors was examined using geographic information systems and random amplified polymorphism DNA analysis. A total of 308 bands were produced with 28 arbitrary decamer oligonucleotide. Three major findings were demonstrated. First, the genetic diversity of S. breviflora was high but lower than that of Stipa grandis and Stipa krylovii. Second, genetic distances between the populations analyzed using the unweighted pair group method and the Mantel test had a highly positive correlation with geographical distances, indicating that spatial separation of this species in the studied area produced genetic shift in the population. Finally, both canonical correspondence and Pearson's analyses revealed strong correlations between genetic differentiation and temperature in the area. We therefore conclude that temperature variations play an important role in genetic differentiations among the investigated S. breviflora populations.Genetics and molecular research: GMR 01/2012; 11(3):208193. · 0.99 Impact Factor  SourceAvailable from: Timothy A Linksvayer[Show abstract] [Hide abstract]
ABSTRACT: The molecular basis of complex traits is increasingly understood but a remaining challenge is to identify their coregulation and interdependence. Pollen hoarding (pln) in honeybees is a complex trait associated with a wellcharacterized suite of linked behavioral and physiological traits. In European honeybee stocks bidirectionally selected for pln, worker (sterile helper) ovary size is pleiotropically affected by quantitative trait loci that were initially identified for their effect on foraging behavior. To gain a better understanding of the genetic architecture of worker ovary size in this model system, we analyzed a series of crosses between the selected strains. The crossing results were heterogeneous and suggested nonadditive effects. Three significant and three suggestive quantitative trait loci of relatively large effect sizes were found in two reciprocal backcrosses. These loci are not located in genome regions of known effects on foraging behavior but contain several interesting candidate genes that may specifically affect workerovary size. Thus, the genetic architecture of this life history syndrome may be comprised of pleiotropic, central regulators that influence several linked traits and other genetic factors that may be downstream and trait specific.Heredity 11/2010; 106(5):894903. · 4.11 Impact Factor  SourceAvailable from: Francis GalibertRichard Guyon, Michaelle Rakotomanga, Naoual Azzouzi, Jean Pierre Coutanceau, Celine Bonillo, Helena D'Cotta, Elodie Pepey, Lucile Soler, Marguerite RodierGoud, Angelique D'Hont, Matthew A Conte, Nikkie Em van Bers, David J Penman, Christophe Hitte, Richard Pma Crooijmans, Thomas D Kocher, Catherine OzoufCostaz, Jean Francois Baroiller, Francis Galibert[Show abstract] [Hide abstract]
ABSTRACT: The Nile tilapia (Oreochromis niloticus) is the second most farmed fish species worldwide. It is also an important model for studies of fish physiology, particularly because of its broad tolerance to an array of environments. It is a good model to study evolutionary mechanisms in vertebrates, because of its close relationship to haplochromine cichlids, which have undergone rapid speciation in East Africa. The existing genomic resources for Nile tilapia include a genetic map, BAC end sequences and ESTs, but comparative genome analysis and maps of quantitative trait loci (QTL) are still limited. We have constructed a highresolution radiation hybrid (RH) panel for the Nile tilapia and genotyped 1358 markers consisting of 850 genes, 82 markers corresponding to BAC end sequences, 154 microsatellites and 272 single nucleotide polymorphisms (SNPs). From these, 1296 markers could be associated in 81 RH groups, while 62 were not linked. The total size of the RH map is 34,084 cR3500 and 937,310 kb. It covers 88% of the entire genome with an estimated intermarker distance of 742 Kb. Mapping of microsatellites enabled integration to the genetic map. We have merged LG8 and LG24 into a single linkage group, and confirmed that LG16LG21 are also merged. The orientation and association of RH groups to each chromosome and LG was confirmed by chromosomal in situ hybridizations (FISH) of 55 BACs. Fifty RH groups were localized on the 22 chromosomes while 31 remained small orphan groups. Synteny relationships were determined between Nile tilapia, stickleback, medaka and pufferfish. The RH map and associated FISH map provide a valuable geneordered resource for gene mapping and QTL studies. All genetic linkage groups with their corresponding RH groups now have a corresponding chromosome which can be identified in the karyotype. Placement of conserved segments indicated that multiple interchromosomal rearrangements have occurred between Nile tilapia and the other model fishes. These maps represent a valuable resource for organizing the forthcoming genome sequence of Nile tilapia, and provide a foundation for evolutionary studies of East African cichlid fishes.BMC Genomics 06/2012; 13:222. · 4.40 Impact Factor
Page 1
Copyright ? 2007 by the Genetics Society of America
DOI: 10.1534/genetics.106.070011
Fractioned DNA Pooling: A New CostEffective Strategy for Fine
Mapping of Quantitative Trait Loci
A. Korol,*,1Z. Frenkel,* L. Cohen,* E. Lipkin†and M. Soller†
*Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel and†Department of Genetics,
Hebrew University of Jerusalem, Jerusalem 91904, Israel
Manuscript received December 20, 2006
Accepted for publication June 11, 2007
ABSTRACT
Selective DNA pooling (SDP) is a costeffective means for an initial scan for linkage between marker
and quantitative trait loci (QTL) in suitable populations. The method is based on scoring marker allele
frequencies in DNA pools from the tails of the population trait distribution. Various analytical approaches
have been proposed for QTL detection using data on multiple families with SDP analysis. This article
presents a new experimental procedure, fractionedpool design (FPD), aimed to increase the reliability of
SDP mapping results, by ‘‘fractioning’’ the tails of the population distribution into independent subpools.
FPD is a conceptual and structural modification of SDP that allows for the first time the use of
permutation tests for QTL detection rather than relying on presumed asymptotic distributions of the test
statistics. For situations of family and cross mapping design we propose a spectrum of new tools for QTL
mapping in FPD that were previously possible only with individual genotyping. These include: joint
analysis of multiple families and multiple markers across a chromosome, even when the marker loci are
only partly shared among families; detection of families segregating (heterozygous) for the QTL;
estimation of confidence intervals for the QTL position; and analysis of multiplelinked QTL. These new
advantages are of special importance for pooling analysis with SNP chips. Combining SNP microarray
analysis with DNA pooling can dramatically reduce the cost of screening large numbers of SNPs on large
samples, making chip technology readily applicable for genomewide association mapping in humans and
farm animals. This extension, however, will require additional, nontrivial, development of FPD analytical
tools.
A
linkage for QTL of small effect is difficult and requires
large mapping populations, with consequent high cost
of marker genotyping. Similar situations also arise in
association studies based on linkage disequilibrium
(LD). A costeffective solution to reduce costs associ
ated with genotyping large mapping populations is to
replace individual genotyping by DNA analysis in pools
of individuals coming from the high and the low tails
of the mapping population distribution. This concept,
referred to as ‘‘tail analysis’’ (Hillel et al. 1990;
Dunnington et al. 1992; Plotsky et al. 1993), ‘‘bulked
segregant analysis’’ (Giovannoni et al. 1991; Michelmore
et al. 1991), or ‘‘selective DNA pooling (SDP)’’ (Darvasi
and Soller 1994), was proposed for QTL analysis and for
testing of linkage between markers and a major gene.
Darvasi and Soller (1994) provided a detailed quan
titative analysis of this procedure, based on comparing
marker allele frequency (which can be obtained by den
sitometry) in the pooled DNA samples; a number of
CHIEVING reasonable statistical power of designs
for detecting marker–quantitative trait loci (QTL)
authors have proposed useful corrections to obtain
reliable estimates of SNP allele frequencies in pools
(Visscher and Le Hellard 2003; Zou and Zhao 2004,
2005; Craig et al. 2005). The SDP procedure can readily
be extended to situations, such as halfsib or fullsib de
signs, where the mapping population consists of several
families. It was applied for genome scanning for QTL
affectingmilkproductiontraitsusingmicrosatellitemarkers
(Lipkin et al. 1998; Mosig et al. 2001).
Various approaches have been proposed for obtain
ing QTL position and its confidence interval with SDP
(Dekkers 2000; Carleos et al. 2003; Brohede et al.
2005; Johnson 2005). Among the problems with such
analyses are varying proportion of family founders
heterozygous at both the QTL and the markers; hetero
geneity of the families with respect to QTL effects; dif
ferent information content of different marker loci;
allelesharingbetweenthefoundersiresanddamsofthe
families;varyingproportionofsharedmarkerlociamong
families, laboratories, and populations; effects of popu
lation admixture; variation of PCR efficiency for marker
alleles; and the use of asymptotic, difficulttojustify ap
proximations of teststatistic distributions. Wang et al.
(2007) provide leastsquares and maximumlikelihood
generalizationsofDekkers(2000)andaddressanumber
1Correspondingauthor:InstituteofEvolution,UniversityofHaifa,Mount
Carmel, Haifa 31905, Israel. Email: korol@research.haifa.ac.il
Genetics 176: 2611–2623 (August 2007)
Page 2
of the shortcomings of existing methodology. Recently,
DNApoolinganalysesusingSNPmarkershavealsobeen
employed in some human mapping studies based on
populationwide association tests or involving compari
son of pools of healthy and affected individuals (Sham
et al. 2002; Butcher et al. 2004; Schnack et al. 2004;
Brohede et al. 2005; Tamiya et al. 2005). These SNP
based association tests are also subject to many of the
statistical limitations listed above. When analyses are
basedonindividualselectivegenotyping,analyticalsolu
tions are available for most of these problems (Lander
and Botstein 1989; Darvasi and Soller 1992; Ronin
et al. 1998). This is not the case when the analyses are
based on SDP. Thus, despite many publications sup
porting pooling analysis, concerns remain about the
reliability of the marker–QTL associations obtained in
this way.
A ‘‘fractionedpool’’ approach, in which the tails of
the population distribution are randomly allocated
among a number of independent subpools, has been
considered by a few authors, with the objective of ob
taining an empirical standard error for estimates of
markerallelefrequenciesinpools(e.g.,Shametal.2002),
or for optimization of pool number/pool size, from
the viewpoint of amplification fidelity (Brohede et al.
2005). In the present article, the fractionedpool con
cept is extended to provide a complete analytical sys
tem for QTL linkage mapping analysis by selective
DNA pooling, termed fractionedpool design (FPD)
(Figure 1). The FPD removes many of the above statis
tical limitations. The FPD analysis is not limited by an
assumptionofnormaldistributionofthetrait.However,
the tails of trait distribution (corresponding to high
and low trait values) must contain a sufficient number
of individuals to achieve a reasonably high detection
power.
For the first time in selective DNA pooling, the FPD
allows QTL detection based onpermutation tests rather
than on assumed asymptotic distributions of test statis
tics and estimation of confidence intervals for QTL
position and effect based on jackknife or bootstrap re
sampling techniques. It also allows estimating the test
statistic more accurately than in the case of a singlepool
per tail. The proposedmethodisillustrated using Monte
Carlo simulations. Successful validation of the FPD for
genomewide studies of quantitative variation opens a
new perspective for highly reliable and costefficient
largescale QTL analysis, unattainable by standard SDP
analytical procedures.
STANDARD SELECTIVE DNA POOLING APPROACH
TO QTL MAPPING
The experimental material for QTL mapping based
on SDP consists of individuals selected from the tails of
the mapping population trait distributions. The proce
dures considered here are suitable for mapping popu
lations composed of full or halfsib families or multiple
families within F2or BC populations. The simulated
examples employed to illustrate the proposed method
ology correspond to multiple halfsib daughter families
(e.g., a population based on artificial insemination as
found in dairy cattle). Each family consists of the prog
eny of a different sire and is represented by some given
number of daughters per tail selected out of all pheno
typed daughters of that family.
Assume that a sire is heterozygous at a QTL affecting
trait value, and designate as a positive sire QTL allele the
sire QTL allele increasing trait value and as a negative
sireQTLallelethesireQTLalleledecreasingtraitvalue.
ThenthefrequencyofthepositivesireQTLallelewillbe
higher in the group of daughters having high trait value
and lower in the group of daughters having low trait
value; the oppositewill betruefor the negative sire QTL
allele. Through hitchhiking effects, this difference in
the frequency of the positive and negative sire QTL
alleles in groups with high and low trait values produces
a parallel difference in the frequency of sire marker
allelesatmarkerlociheterozygousinthesiresthatarein
coupling linkage to these heterozygous QTL. Analyzing
sire markerallele frequency differences at several
marker loci enables the position of the QTL on the
chromosome to be estimated.
It is convenient to denote the two pools as high (H)
and low (L), respectively, and the two sire alleles at the
linked marker locus m (m ¼ 1,..., M) as alleles Amand
Bm, respectively. Using this notation, we define the
statistic Dmas a characteristic of sire allele divergence
in the two tails,
Dm¼ ½ðFHAm? FLAmÞ ? ðFHBm? FLBmÞ?=2
(Lipkin et al. 1998), where FHAmis the frequency of
allele Amin the high pool, and FHBm, FLAm, and FLBm
are definedaccordingly. When thereare only twoalleles
at the marker locus as in the case of SNP markers, FAm
and FBmare in perfect negative correlation, and hence
only one of the alleles need be included in estimating
ð1Þ
Figure 1.—Constructing multiple subpools. Trait distribu
tion in each family is divided into three parts: individuals with
high or low trait values that make up the high and low tails
and individuals with intermediate trait values. At each tail, in
dividuals are grouped randomly into subpools. NTcharacter
izes the number of individuals with corresponding trait values
in a family. L1, L2, L3, and L4 are lowtail subpools; H1, H2,
H3, and H4 are hightail subpools.
2612 A. Korol et al.
Page 3
Dm. However, when there are multiple alleles at the
marker locus as in the case of microsatellite markers,
FHAmand FHBmare not perfectly correlated, and hence
both contain independent information on Dm. In this
case, the accuracy of estimation of D is improved by
averaging estimates from both alleles as shown in (1).
The estimate from allele Bmis given a minus sign in (1)
because changes due to a linked QTL in allele Bmare in
opposite direction to those in allele Am, as noted above
(see Lipkin et al. 1998 for details).
To illustrate how the QTL substitution effect influen
ces the expected value of Dstatistics, consider a single
QTL case for the halfsib design. Let QTL q be diallelic
with sire QTL genotype A(q)B(q)and equal frequencies
of alleles A(q)and B(q)in the dam population. In this
situation, the proportions of QTL genotypes in the
progeny are 25% A(q)A(q), 50% A(q)B(q), and 25%
B(q)B(q). Let the targeted quantitative trait be normally
distributed with residual variance s2and mean value
dependent on QTL genotype: m ? d for B(q)B(q), m for
A(q)B(q), and m 1 d for A(q)A(q). For 10% cutoff tails of
trait distribution and allele substitution effect of QTL
d/s ¼ 0.3, 0.2, and 0.15, the expected value of D(q)
(defined analogously to Dm) will be 0.26, 0.17, and 0.14,
respectively. Assume further that marker locus m is
triallelicwithallelesAm,Bm,andCm;thesire’shaplotypes
are AmA(q)and BmB(q); allele frequencies in the dam
population are 0.25 for Am, 0.25 for Bm, and 0.5 for Cm;
and marker and QTL alleles in the dam population are
in linkage equilibrium. Then, if marker m is coincident
with QTL q ½i.e., marker allele Amis inherited from the
sireonlywithA(q)andBmonlywithB(q)?,theexpectation
ofDmshouldbehalfofD(q)(i.e.,0.13,0.085,and0.07for
d/s ¼ 0.3, 0.2, and 0.15, respectively).
For detecting the chromosomes with QTL effects,
one can consider for every marker m the statistic x2
taken over all F families heterozygous for the marker m,
m
x2
m¼ SfD2
f ;m=VarDf ;m;
ð2Þ
where Var Df,mis the sampling variance of Df,mfor the
f family at the m marker. When the selected trait is not
affected by the tested chromosome (H0hypothesis), x2
is presumed to follow a x2distribution with d.f. ¼ F
(numberoffamilies),enablingax2testforthepresence
of a QTL linked to the marker (Weller et al. 1990).
m
THE ANALYTICAL SYSTEM OF FPD
By joint analysis of these sire markerallele frequency
differences, Dm, at several marker loci, one can estimate
thechromosomalpositionofthedetectedQTL.Forone
or several families heterozygous for the same QTL,
fittingafunctionofchromosomalpositionsforobserved
Dmvaluesatthepolymorphicmarkerlocicanbeusedfor
estimation of the QTL position (similar to the proce
duresdescribedbyKearsey1998andRoninetal.1999).
SingleQTL model: For a singleQTL situation, the
expectation of statistic Dmis proportional to (1 ? 2rm),
where rmis the recombination rate between the marker
m and the QTL q. In (1) the sign of statistic Dmdepends
on which of the two sire marker alleles was designated
Amand which was designated Bm. In what follows we
assume that marker haplotypes of sire are known and
marker alleles from one haplotype are designated by Am
and from another by Bm, m ¼ 1,..., M, where M is the
number of marker loci included in the haplotype (note
that FPD methods also apply in the case of unknown
phases; see Unknown marker linkage phase in the sire
below). Value rmdepends on location of marker m and
unknown location (x(q)) of the putative QTL on the
chromosome. Hence the expectation of Dm can be
represented as
EDm¼ l½1 ? 2rmðxðqÞÞ?;
ð3Þ
where l is the (expected) value (henceforth ‘‘lvalue’’)
of D for a marker that coincides with the QTL, and
rm(x(q)) is the recombination rate between the marker
andtheQTLandwillbezeroforamarkerlocatedatx(q).
Assuming absence of interference, rmcan be calculated
using theHaldanemodel, rm(y) ¼ 0.5(1 ? exp{?0.02y}),
where y is the map distance in centimorgans between xm
andtheunknowncoordinatex(q)oftheQTL(Figure2).
The information on all markers scored for the same
chromosome can be combined to derive the unknown
coefficients l and x(q). These parameters can be esti
mated(analogouslytoWangetal.2007)usingastandard
leastsquares approach (by minimizing the following
criterion):
SmfDm? l½1 ? 2rmðxðqÞÞ?g2=VarDm???!
The sampling variance of Dm(Var Dm) can be calculated
bywaysreviewedinShametal.(2002)andBrohedeetal.
(2005). Employment of expression (3) by using crite
rion(4)canberepresentedintermsofastandardlinear
model,
xðqÞ;lmin:
ð4Þ
Dm¼ l½1 ? 2rmðxðqÞÞ?1em
Figure 2.—One QTL on the chromosome. Expectation of
the statistic D for markers situated at various locations on the
chromosome. Value ED is calculated by formula (3) (using the
Haldane model of recombination). Height of the graph at
the QTL position x0¼ x(q)is a characteristic of the QTL effect
on markers in this family (family lvalue).
Fractioned DNA Pooling2613
Page 4
(Wang et al. 2007), or in matrix notations, D ¼ Xl 1 e.
Here values emare residuals, including both sampling
andtechnicalerrors,withvarianceequaltoVarDm;D,X,
and e are vectors of Dm, ½1 ? 2rm(x(q))? and emcor
respondingly, m ¼ 1,..., M, and M is the number of
markers. The test statistic, calculated at given putative
QTLposition,canthenbewrittenasx2¼Sm{Dm?l½1?
2rm(x(q))?}2/Var Dm. However, because the correlations
between values of Dmfor linked markers are not taken
into account in (4), the statistical quality (sampling
variance) of the estimates obtained by this criterion is
not optimal. We therefore use a more general optimi
zationcriterionthatdoestakecorrelationsintoaccount.
Let emin the linear model be correlated with cor
relationsdefinedbymatrixG.Then,usingageneralized
leastsquares approach, parameters can be estimated by
minimizing the following criterion (for simplicity of
designation, we write it in matrix form):
ðCðD ? XlÞÞ9G?1CðD ? XlÞ ???!
HereCisthediagonalmatrixof(VarDm)?0.5.Foragiven
x(q)putative position of the QTL q parameter l min
imizing criterion (5) is equal to (X9C9G?1CX)?1X9
C9G?1CD. Coefficients of matrix G can be calculated
using correlation coefficients defined under the hy
pothesis of no QTL in the chromosome.
For example, if sire alleles at markers m1and m2are
not presented in the dam population and there are no
technical errors, then the correlation coefficient looks
like r ¼ Corr(Dm1;Dm2) ¼ 1 ? 2r, where r is the re
combination rate between markers m1 and m2. The
estimated lvalue can serve as a test statistic combining
the information from multiple markers along the
chromosome. In our simulations correlations were
obtained analytically using only recombination distance
between markers and frequencies of the two sire alleles
in dam population: r ¼ Corr(Dm1;Dm2) ¼ (1 ? 2r)Var
D0/Var D, where Var D0 and Var D are analytical
estimations of variances of the Dvalue in the cases of
zero and nonzero frequencies of sire alleles in the dam
population. Alternatively, correlations among Dmvalues
canbeestimatedusingthemaximumlikelihoodmethod
(Wang et al. 2007).
In the same manner it is possible to combine the
informationfrom severalfamilies with respectto agiven
chromosome, assuming that all sires that are heterozy
gous at a QTL on that chromosome are heterozygous at
one and the same QTL with respect to location (x(q)),
although the size of the QTL effect may vary among
sires. Thus, for the oneQTL assumption and multiple
families and letting lfrepresent the lvalue for the fsire
Equation 3 will be modified as
xðqÞ;lmin:
ð5Þ
Df ;m¼ lf½1 ? 2rf ;mðxðqÞÞ?:
ð3aÞ
Correspondingly, the estimation criterion will be
SfSmfDf ;m? lf½1 ? 2rf ;mðxðqÞÞ?g2=VarDf ;m????????? ?!
xðqÞ;lf;f ¼1;...;F
min
ð4aÞ
or,takingintoaccountthecorrelationbetweenvaluesof
D for linked markers,
SfðCfðDf? XflfÞÞ9G?1
fCfðDf? XflfÞ ???????? ?!
xðqÞ;lf;f ¼1;...;F
min:
ð5aÞ
Using this expression, the unknown parameters can be
obtained in the following way. At each of the chromo
somal positions x ¼ x(i)taken consecutively with some
step (e.g., 1 cM), values lf, f ¼ 1,..., F, can be found
analytically. For every family, the r value in (3a) is
calculated using recombination distance between loca
tion of marker m and current location x(i). Then, the
position minimizing the criterion can be taken as the
best position x(q).
After fitting the model (3a), by using criteria (4a) or
(5a), the statistic S(lf)2can serve to conduct an overall
permutation test (see below), instead of using the
asymptotic x2properties of statistic (2). If we assume
one QTL in the chromosome common to all QTL
heterozygous sires, then lfwill represent the expected
value of the test statistic at the marker locus coinciding
with (or closest to) the QTL. All other segregating
markers for this sire f will display a decreasing function
of the distance between the marker and the QTL.
Hence, an immanent property of our approach (similar
to the model of Kearsey 1998 or Ronin et al. 1999) is
that for single QTL, lfrepresents the approximation of
D at the presumed position x0coinciding with the QTL.
Thus, lf‘‘absorbs’’the information of all markers of the
sire, and statistic S(lf)2does this cumulatively across
sires, by fitting one and only one QTL position, due to the
assumption of one shared QTL.
QTL detection based on FPD permutation tests:
Employment of the FPD allows new types of tests for
QTLdetection,basedonpermutationofsubpools,asan
analog of permutations of individual trait or genotype
scores in selective genotyping analysis. These tests do
not depend on assumptions as to asymptotic distribu
tionoftheteststatisticsandprovideaspectrumofuseful
analytical options. In particular, these tests can be em
ployed for detecting chromosomes with QTL effects,
discriminating between sires homozygous and hetero
zygous for the detected QTL, and comparing and con
trasting hypotheses about one, two, or more QTL per
chromosome. The simplest of the proposed permuta
tion tests is based on random reshuffling of the in
dividual subpools between tails of the trait distribution.
This process is repeated many times, and each time the
test statistics are recalculated. In general terms, the
proportion of permuted test statistics that are greater
than the observed test statistic is the type I error of the
test (Doerge and Churchill 1996). If H0{no QTL
2614A. Korol et al.
Page 5
effect} is correct for a particular marker, such a permu
tation will not have an appreciable effect on the level of
the test statistics. Thus, in most cases the observed test
statistic will lie well within the range of permuted
statistics. If the H1alternative is correct, reshuffling will
destroy the marker–trait (i.e., marker–tail) connection.
This will be manifested as a strong reduction of the test
statistics in the majority of permutation runs. Thus, the
observed test statistic in this case will exceed all but a
small fraction of the permuted statistics. The test can be
applied to any of the possible test statistics: x2
Equation2,estimatedlfrom Equations 4or 5,orS(lf)2
from (4a) or (5a).
The total number of different reshuffling configura
tions per family, Rf, is a function of the number of
subpools per tail. In the case of the same number of
subpools for the high and low tails, S,
?
In the case of an unequal number of pools per tail,
?
where SLis the number of lowtrait subpools and SHis
the number of hightrait subpools. Thus, for S ina the
range 4–8 pools per tail, Rfvaries from 35 to 6435.
Clearly, the total number of configurations with multi
ple families is a product of corresponding numbers for
families R ¼Q
combinations. The number of combinations is impor
tant, because the lowest possible Pvalue in permutation
is equal to 1/R.
Detecting chromosomes with QTL effects: Tests based
on x2
several families can be estimated as the proportion of
random permutation runs of pool configurations,
havingteststatisticvaluex2
on initial nonreshuffled data. To set significance levels
when a number of markers are considered on the same
chromosome, it is necessary to correct for multiple
comparisons, e.g., by controlling the false discovery rate
(FDR) (Benjamini and Hochberg 1995) or the pro
portion of false positives (PFP) (Fernando et al. 2004).
Alternatively,achromosomewisetestcanbeproposed
analogous to the approaches applied in standard in
terval mapping under individual genotyping. In that
case,foreachsetofkmarkerintervals,intervalanalysisis
conducted and the maximum (across intervals) LOD
value (max LODk) or the maximum Ftest (max Fk) for
regressionbased models is calculated. Then, the signif
icance of the putative QTL effect of the tested chromo
someisestimatedastheproportionofpermutationruns
(i.e., samples corresponding to H0obtained by random
reshuffling of the trait scores relative to the multilocus
mfrom
Rf ¼ 0:5
2S
S
?
? 4S?1:5:
Rf ¼
SL1SH
SL
?
¼
SL1SH
SH
??
;
fRf. Even for a minimal S ¼ 4, a design
with five families will give R ¼ 355? 5.2 3 107
m: The significance of QTL effect for marker m in
m(Equation2)$x2
mobtained
marker genotypes), where max LODk/Fkwas equal to or
higher than the max LOD/Fkvalue calculated for the
nonreshuffled data (Doerge and Churchill 1996).
Applying this approach to the FPD analysis, instead of
max LOD we can employ max x2¼ maxmx2
for the nonreshuffled and reshuffled configurations of
subpools, where maxmx2
which x2is at a maximum. Note that in the case of max
x2statistics, the fitted model does not include any pa
rameterscharacterizingQTLeffectandposition,sinceit
is based on singlemarker analysis. In contrast, the max
LOD/Fktest is preceded by building a genetic model
that depends on unknown parameters and obtaining
maximumlikelihood (least squares, in the case of the
regression model) estimates of the parameters.
Significance of the putative QTL effect of the tested
chromosome can also be estimated by the Pvalue of the
highest significant marker on the chromosome (taking
into account the problem of multiple comparisons).
Individual Pvalues for marker m can be calculated by a
permutation test (using test statistic x2
(Welleretal.1990).UsingtheFDRapproach(Benjamini
and Hochberg 1995) to control for multiple compar
isons,wedenotecorrespondingsignificancethresholdsby
TFDRðIÞforthepermutationtestandTFDRðIIÞforthex2test,
respectively.
Permutation test based on lf: The permutation test
based on x2
mosome,butinformationcontainedintherelativeloca
tions of the markers is ignored. In standard individual
genotypingschemes,singlemarkeranalysisandinterval
analysis are close with respect to QTL detection power
at moderate to high marker density. However, at low
marker density, interval analysis is more powerful. This
is due to the fact that loss of power caused by QTL–
marker recombination can be estimated as ?r/2 and
?r2/4, for singlemarker analysis and interval analysis,
respectively.
ItwasfoundthatinFPD,asinstandardQTLmapping
analysis based on individual genotyping, hypothesis
testing is more efficient and flexible, if conducted on
the basis of fitting a mapping model aimed at QTL
detection or at discriminating between more complex
situations (such as single or multiple QTL on a chro
mosome, mode of QTL action and interaction, and
linkage vs. pleiotropy as sources of genetic correlation).
In this context, by including marker positions, models
(3a), (4a), and (5a) presented above allow extracting
the information about QTL presence and location on
the tested chromosome through joint analysis of linked
markers.Asshownbysimulation(Table1),powerofthe
max x2test is less than that of the S(lf)2test. Pre
sumably, this is due to the fact that the max x2test does
not utilize all of the information potentially contribut
ing to QTL detection power. Thus, for a singlefamily
analysis, the estimated lvalue (from Equation 4 or 5)
wouldbethepreferredstatisticforthepermutationtest.
mcalculated
mis the value for the marker for
m) or a x2test
mtakes into account all markers on a chro
Fractioned DNA Pooling 2615
Page 6
For multiplefamily analysis statistics, S(lf)2and
maximum lfacross all families (maxfjlfj), with family
specific leastsquares estimates of lvalues being derived
from (3a) and (4a) (or 5a), can serve to conduct the
overall experimentwise permutation test across families
and markers of the analyzed chromosome. In the FPD
methodology, each marker is represented in (4a) ½or in
(5a)? by its position relative to the unknown location of
the putative QTL, rather than by its name. Conse
quently, there is no need for full coincidence of poly
morphic marker loci among the families. In principle,
the system will work even with zero overlapping of
polymorphic marker loci among families. This is an
importantadvantageoftheproposedmethodologyover
the standard SDP methodology (Darvasi and Soller
1997), in which the test statistics is calculated for each
marker locus across families polymorphic for the
marker, and itis not possible tocompensate for markers
at which the sire is homozygous by including informa
tion from neighboring heterozygous markers.
Detecting sires heterozygous at the QTL: For analysis of a
single family, f, within a multiplefamily analysis, the
estimated value of jlfj or maxmx2
statistic for the permutation test. The significance of a
sire f is then determined as the proportion of permu
tations of the runs made over all families, where the
statistic of QTL effect jlfj was greater than that for
nonreshuffled data. Sires of families where the test
statistics (jlfj or maxmx2
taken to be homozygous at the QTL. On this basis, sires
can be subdivided into two groups, QTL homozygous
and QTL heterozygous.
Estimating the confidence interval of QTL position:
bootstrap/jackknife analysis: One of the major param
eters characterizing the detected QTL is the accuracy of
the estimatedparameters, especially ofQTLposition, as
given by its standard error or confidence interval. The
f ;mcan be used as a test
f ;m) are not significant can be
most common way to evaluate confidence interval of
QTL position within the framework of individual or
selective genotyping is by using resampling procedures
such as bootstrap or jackknife (Ronin et al. 1998). The
95% confidence interval of QTL location can then be
taken as the narrowest interval that includes 95% of the
resamplingbased estimates of QTL position. Alterna
tively, the confidence interval of QTL location can be
characterized by mean value ? xðqÞ, standard error (SE),
and standard deviation (SD) of the resamplingbased
estimates. The proposed FPD methodology, for the first
time, allows resampling procedures to be applied for
DNA pooling analysis. As in the individual genotyping
application of these procedures, multiple samples are
generatedfrom theinitialdatasetbysamplingsubpools
within tails with return (bootstrap analysis) or without
return (jackknife analysis). Each such sample is treated
using the same model that was applied to the total sam
ple, and the variation of the derived parameters among
the samples is employed to get a SD for each estimated
parameter and (if needed) a SE for its mean value. The
only difference in application of these procedures in
FPD is that pools are resampled instead of individuals.
With new chipbased technologies of SNP analysis, a
high number of densely spaced polymorphic markers
maybecomeavailableforFPDorintervalmappinganal
ysis. In this case, the resampling procedure may be mod
ified to include simultaneous resampling of markers
within chromosomes and subpools within tails so that
different jackknife or bootstrap runs may include not
fully coinciding sets of markers for a given family.
Simulation data: To illustrate the proposed method
ologywesimulatedsituationscorrespondingtomultiple
halfsib daughter families (a population based on arti
ficial insemination, e.g., dairy cattle). Each family con
sists of the progeny of a different sire, with each sire
family being represented by a certain number (10% of
TABLE 1
Effect of number of markers (M) under the FPD on the confidence interval (C.I.) of QTL location, comparisonwise error
rate (Pvalue), and statistical power, according to the test for significance and standardized allele substitution effect at
the QTL (d/s), using simulated data
C.I.
PvaluePower
d/s
M
D
SD
S(lf)2
maxjlj
0.053
0.030
0.049
maxx2
TFDRðIÞ
TFDRðIIÞ
S(lf)2(%)maxx2(%)
0.225
13
7
4.1
5.1
6.4
3.1
3.3
3.6
0.003
0.002
0.004
0.007
0.008
0.006
0.015
0.018
0.016
0.074
0.076
0.061
99
99
98
56
59
64
0.1525
13
7
4.9
6.3
7.8
5.2
5.2
5.9
0.008
0.021
0.021
0.104
0.071
0.098
0.056
0.126
0.130
0.067
0.112
0.101
0.250
0.260
0.299
92
90
89
27
24
28
Tests of significance: S(lf)2, maxjlj, maxx2, TFDRðIÞ, and TFDRðIIÞ. See text for details. Power was calculated at Pvalue ¼ 5%.
Values D and SD characterize the center and size of the confidence interval obtained in jackknife iterations (see text). Parameters
of the simulations: chromosome length 120 cM. A single QTL was situated in position 40 cM. Number of families, F ¼ 10 (5
families, sire heterozygous at the QTL; 5 families, sire homozygous at the QTL); number of daughters per family, N ¼ 2000; pro
portion of the population selected to each tail, 0.10; number of subpools per tail, S ¼ 4. Values are the mean based on 10 sim
ulation data sets; for every data set, 500 permutations and 100 jackknife iterations were made.
2616 A. Korol et al.
Page 7
the total) of daughters per tail selected out of all
phenotyped daughters ofthatfamily. Inour simulations
we used a normally distributed trait with constant
variance s2and mean value depending on QTL geno
type. Each of QTL q was assumed additive and diallelic
with alleles A(q)and B(q). Frequencies of alleles A(q)and
B(q)in dams were set to 0.50. Frequencies of marker
alleles in the dams were 0.25 Am, 0.25 Bm, and 0.25 Cm,
where Amand Bmare sire alleles and Cmrepresents all
other alleles. Amand A(q)are alleles of one of the hap
lotypes of the sire for all m ¼ 1,..., M, q ¼ 1,..., Q; Bm
and B(q)are alleles of the other haplotype of the sire;
all loci are from one chromosome. Positions of loci
(markers and simulated QTL) on the chromosome are
defined by recombination distance from the most prox
imal locus. In the same way we define position(s) for
putative QTL. Recombination events in the sire gamete
were simulated as independent for different parts of
the chromosome (recombination rate between loci was
calculated using distance on the linkage map and the
Haldane model). Linkage equilibrium among all alleles
(markers and QTL) was assumed in the dams.
Each progeny genotype was simulated by indepen
dently generating a haplotype inherited from the sire
and a haplotype inherited from a dam. The haplotype
inherited from the dam was simulated by randomly choos
ing alleles for each locus proportionally to their fre
quencies in the dams. The haplotype inherited from the sire
was simulated as follows: The allele in the most proximal
locus was chosen randomly from one of the two sire
alleles (with probability 0.5). This allele determined the
starting sire haplotype. The allele in every subsequent
locus on the chromosome was chosen with probability
1 ? r from the same haplotype as in the previous locus
and with probability r from the alternative haplotype,
whereristherecombinationratebetweenthesetwocon
secutiveloci.Thetraitvalueforeachsimulatedindividual
in the progeny was set equal to the mean trait value for
theinherited QTL genotype plus a normally distributed
random value with mean zero and variance s2. In the
singleQTL case, mean trait value was defined as m ?
d(q), m, and m 1 d(q)for genotypes B(q)B(q), A(q)B(q), and
A(q)A(q), correspondingly. Value d(q)was not necessarily
the same for all families. In the case of two QTL (q ¼ 1,
2),traitmeanvaluewasm?d(1)?d(2),m?d(2),m1d(1)?
d(2), m ? d(1), m, m 1 d(1), m ? d(1)1 d(2), m 1 d(2), and
m 1 d(1) 1 d(2) for genotypes B(1)B(1)B(2)B(2),
A(1)B(1)B(2)B(2),
A(1)A(1)B(2)B(2),
A(1)B(1)A(2)B(2),
A(1)A(1)A(2)B(2),
A(1)B(1)A(2)A(2), and A(1)A(1)A(2)A(2), respectively. In
the simulations, QTLgenotype frequencies in the tails
of trait distribution for a given tail cutoff depend on the
proportion d/s ¼ dðqÞ=
and s2. In our simulations we used m ¼ 0 and s2¼ 1.
Subdivision of the individuals in the tails of the trait
distribution into subpools was random. The number of
individuals in each subpool was equal if the number of
B(1)B(1)A(2)B(2),
B(1)B(1)A(2)A(2),
ffiffiffiffiffi
s2
p
, rather than on the mvalue
individuals in the tail was divisible by the number of
subpools; otherwise it could differ by one individual.
Simulated technical error standard deviation associated
with estimation of marker allele frequencies in a pool
was set at 0.02 (absolute value). For analysis of the
simulated data, the marker haplotypes of the sires were
assumed known.
Example of QTL analysis by FPD: The scheme of
QTL analysis by FPD for the case of a single QTL per
chromosome is illustrated using a simulated example
with six halfsib families, three segregating for sire
allelesatthesimulatedQTL(i.e.,thesiresofthefamilies
are heterozygous at the simulated QTL) and three not
segregating for the sire alleles at the simulated QTL.
Results are shown in Figure 3.
Various numbers of markers were employed in the
different families (with some regions being represented
by neighboring but not coinciding marker loci), illus
trating the ability of the FPD analytical system to deal
with cases when markers are not shared among families.
To simulate such a situation, we initially generated for
each family a high excess of markers with identical
chromosome positions. Then, the majority of markers
for each family were declared ‘‘homozygous,’’ and only
a small proportion of markers were randomly selected
to be ‘‘heterozygous.’’ A QTL with standardized allele
substitution effect d/s ¼ 0.3 was simulated at location
40 cM on the chromosome of 120 cM length. There
were 2000 daughters per family; a proportion 0.10 of
total daughters (i.e., 200 daughters) was selected for
each tail, and there were four subpools per tail. The
overall permutation test conducted after fitting the
estimation model (5a) gave significance P ¼ 0.009 (in
1000 permutations). Pvalues per family were respec
tively 0.029, 0.029, 0.029, 0.94, 0.69, and 0.74 (based on
permutation tests within families, where only 35 possi
ble different permutations exist for the 4 1 4 subpool
configurations). Corresponding Pvalues for the fami
lies obtained in an experimentwise permutation test
were 0.018, 0.012, 0.023, 0.483, 0.344, and 0.428 (1000
random permutations). QTL positions estimated using
all six families or only the three families with significant
effect (Pvalue ,0.05) were 43.9 cM with standard
deviation of estimated position among runs (SD ¼
2.8) and 43.6 (SD ¼ 2.6), respectively (based on 500
jackknifes). On the basis of the jackknife procedure,
QTL detection power for the entire set of families was
estimated as follows. Threshold values of the test
statistics S(lf)2were obtained from the permutation
test for significance levels 5 and 1%. QTL ‘‘detection
power’’ was then estimated as the proportion of jack
knife runs where the test statistics exceeded the thresh
old value at the chosen significance level. Calculated in
this way, estimated powers for Pvalues ¼ 0.05 and 0.01
were 99 and 82%, respectively.
Comparing the quality of mapping for different num
bers of markers: A few more examples with singleQTL
Fractioned DNA Pooling2617
Page 8
chromosomes were simulated with 10 sire families (5
withsireheterozygousand5withsirehomozygousatthe
QTL), with two standardized allele substitution effects
at the QTL (0.2 and 0.15) situated at position x(q)¼ 40
cM, and with three marker densities (9, 13, and 25
evenlyspacedmarkersper120cMchromosome)(Table
1). Population size, proportion selected to the tails, and
number of subpools per tail were as in Figure 1. Table 1
presents the results for the six parameter combinations,
with 10 independent Monte Carlo data sets simulated
for each combination; for every simulated data set 500
permutations of subpools and 100 jackknife iterations
were made. For each of the 10 simulated data sets we
calculated the standard deviation of the difference
between estimated QTL position ? xðqÞand the simulated
one x(q)¼ 40 cM among the 100 jackknife iterations.
The mean of these standarddeviations across all 10 data
sets, denoted SD, characterizes the size of the confi
dence interval of estimated QTL position. In addition,
for each data set we calculated the difference between
the mean of estimated QTL position based on the 100
iterations and the simulated position. The mean square
of these differences, denoted D, characterizes the shift
of the center of the confidence interval relative to the
true value. Table 1 shows that increasing the number of
markers reduces D more efficiently than SD. As one
would expect, SD (and hence the size of the confidence
interval)ishigherinthecaseofd/s¼0.15comparedto
d/s ¼ 0.2 (5.4 vs. 3.3).
Table 1 also allows acomparison of different methods
of testing the significance of QTL effect. Among the
modelfreetestsbasedonx2
the best results seem to be provided by the permutation
test for maxx2statistics (for d/s ¼ 0.2) and by the
TFDRðIÞtestalsobasedonpermutations(ford/s¼0.15).
According to the presented results, the TFDRðIÞtest based
on permutations gave a much higher level of signifi
cancethantheTFDRðIIÞtestbasedonx2asymptoticapprox
imation (Pvalues were lower by an order of magnitude).
m,maxx2,TFDRðIÞ,andTFDRðIIÞ,
The modelbased test using the S(lf)2statistic instead
of maxx2resulted in a further severalfold decrease in
Pvalues (see Table 1). In accordance with the ranking
of the test statistics for Pvalues, S(lf)2also proved to be
superiorwithrespecttodetectionpower(i.e.,resultingin
thelowestproportionoffalsenegativedeclarationsinthe
case of the given fixed Pvalue ¼ 0.05). Estimated power
of the test based on S(lf)2was very high (?0.9 for d/s ¼
0.15 and $0.98 for d/s ¼ 0.20). When d/s ¼ 0.15,
estimated power of this test increased slightly with
increasing number of markers M. Estimated power of
the test based on maxx2was also higher for d/s ¼ 0.20
than for d/s ¼ 0.15. Nevertheless, unlike S(lf)2, power
for this test did not increase with increasing M; indeed,
what may even be an opposite tendency was observed for
d/s ¼ 0.20). This observation can be explained as
follows: With increasing M, the probability that in
permutation runs, the x2
will be higher than maxmx2
alsoincreases. Conversely, increasing M also can increase
the power of this test if the additional markers belong to
the vicinity of the QTL (not shown).
Multiple linked QTL analysis—two or more QTL on
the chromosome: In the case of two or more QTL per
chromosome, expected D at the marker locus is defined
by the expected frequencies of sire alleles in the high
and low pools at the closest situated QTL and by
recombination rates between marker and QTL. Let K
be the number of QTL in the chromosome and de
nominate the QTL according to their locations ½i.e., x(1)
, x(2),..., x(K)?. The expectationof Dfor amarker at
location x can then be written in the form
mvalue for one of the markers
min initial pool configuration
EDfðxÞ ¼
lf ;1ð1 ? 2rxðxð1ÞÞÞ;
lf ;Kð1 ? 2rxðxðKÞÞÞ;
Df ;xðqÞ;xðq11ÞðxÞ;
x #xð1Þ
x $xðKÞ
x 2 ½xðqÞ;xðq11Þ?;q ¼ 1;...;K ? 1;
8
:
<
ð6Þ
where
Figure 3.—QTL analysis of multiple families
with some nonshared markers. Six families with
2000 daughters each were simulated (three fam
ilies with sire heterozygous for a single QTL situ
ated at position 40 cM with allele substitution
effect d/s ¼ 0.3 and three families with sire ho
mozygous at the QTL). Chromosome length
was 120 cM with 6–10 markers per family; a pro
portion 0.10 of all daughters was selected to each
tail in each family. Individuals in both tails were
randomly subdivided into four subpools. (a) D
value across the markers for each family (solid
and open squares, triangles, and diamonds represent D in families with QTLheterozygous and homozygous sires correspond
ingly); (b) the results of jackknife resampling analysis (90% confidence intervals of lvalues for each family are shown by vertical
lines, estimated in 500 jackknifes). The experimentwise Pvalue in a permutation test based on S(lf)2was 0.012 (in 1000 permu
tations). The corresponding experimentwise permutation test Pvalues per family were 0.018, 0.012, 0.023, 0.483, 0.344, and 0.428
Estimated QTL position on all six families or on three families with a significant (Pvalue ,0.05) lvalue was 43.9 cM (SD ¼ 2.8)
and 43.6 (SD ¼ 2.6) cM, respectively. Estimated power for Pvalue ¼ 0.05 was 99%.
2618A. Korol et al.
Page 9
Df ;xðqÞ;xðq11ÞðxÞ ¼
lf ;q1lf ;q11
2ð1 ? rxðqÞðxðq11ÞÞÞð1 ? rxðxðqÞÞ ? rxðxðq11ÞÞÞ
1lf ;q11? lf ;q
2rxðqÞðxðq11ÞÞðrxðxðqÞÞ ? rxðxðq11ÞÞÞ:
Here lf,qis the characteristic of the qth QTL in family f,
and x(q)is the location of this QTL. Value rx(x(q)) is the
recombination rate between the marker loci situated in
positions x and x(q). The origin of Equation 6 is similar
to Equation 3 (for details see also Wang et al. 2007): Let
lf,1,..., lf,Kbe expectations for Dvalues of markers
coinciding with corresponding QTL. Assuming absence
of interference we can consider the expectation of
Dvalues separately for each interval between QTL. For
the two end intervals x , x(1)and x . x(K)Equation 6
has the same form as Equation 3. For other intervals
the absolute value of the expectation of D is reduced
by corresponding double recombination (double re
combination is not a factor for the end intervals). The
estimation criterionfortheregressionmethodtakesthe
following form:
SfSmfDf ;m? EDf ;mg2=VarDf ;m???????????????? ?!
xðqÞ;lf ;q;f ¼1;...;F;q¼1;...;K
min:
ð7Þ
Fitting the model by using criteria (7) can be expressed
in terms of the linear model
Df ¼ Xflf1ef;
where lfis a vector of lf,1,..., lf,Kand coefficients
of matrix Xfare equal to corresponding multipliers in
Equation 6. Taking into account the correlation be
tween values of D for linked markers and using the
generalized leastsquares approach, the estimation cri
terion takes the form
SfðCfðDf? XflfÞÞ9G?1
ðDf? XflfÞ ?????????????? ?!
HerematricesGandCarelikeinEquation5a.Forgiven
putative QTL positions, vector lfof parameters lf,1,...,
lf,Kminimizing criterion (8) can be calculated as
fCf
xðqÞ;lf ;q;f ¼1;...;F;q¼1;...;K
min:
ð8Þ
^lf ¼ ðX9fC9fG?1
fCfXfÞ?1X9fC9fG?1
fCfDf:
EveninthecaseofonlytwoQTLonthechromosome,
various situations can exist. These include heterozygos
ity of different sires for one, two, or none of the QTL
and the linkage phase between the QTL (coupling vs.
repulsion) in the sires that are heterozygous for both
QTL. Thus, in addition to the foregoing tests of signifi
cance, the situation with linked QTL calls for compar
isons of H2vs. H1(twoQTL vs. singleQTL hypotheses)
for the entire data set as well as for each family. However,
in this article we demonstrate only the potential of the
FPD system to analyze linked QTL, leaving the detailed
analysis of various scenarios for a future publication.
The example, presented in Figure 4, is based on one
simulateddatasetof10families.Eachsirewassimulated
heterozygous for two linked QTL (half of the sires in
coupling phase and half in repulsion phase) with allele
substitution effects d/s ¼ 0.3 at locations 30 and 80 cM
on a chromosome of length 120 cM with 13 evenly
spaced markers (at positions 0, 10, 20,..., 120 cM).
Population size, proportion selected to the tails, and
number of subpools per tail were as in Figure 1. After
fitting a twoQTL model and using FPD analysis to
Figure 4.—Analysis with multiplelinked QTL. Simulated were 10 families heterozygous for two linked QTL, 5 in coupling and 5
in repulsion phase. Thirteen markers were evenly spaced on a chromosome of length 120 cM. QTL 1 and QTL 2 were simulated in
positions 30 and 80 cM, respectively. The allele substitution effect at both QTL in all 10 families was d/s ¼ 0.3. Alleles at QTL 1 and
QTL 2 that came from dams were simulated as independent cases. The number of daughters per family was 2000; the proportion
of total population selected to each tail was 0.10. (a) Dvalues for all families and markers. Points corresponding to a given family
are connected by a line. (b) lValues and their standard errors in 500 jackknifes for every family. Clear separation is observed
between the first five sires (QTL in coupling phase) and the last five sires (QTL in repulsion phase). (c) Simulated (solid circle)
and estimated (open circle) positions of QTL. The curve encloses the area where the position of QTL was estimated in $90% of
500 jackknifes {included points with integer coordinates (x, y) such that in $5 jackknifes, estimated QTL positions belonged in the
interval (x 6 0.5, y 6 0.5 cM).
Fractioned DNA Pooling2619
Page 10
detect the two QTL, the estimated QTL positions were
within 2 cM from the simulated positions. Standard
errors in 500 jackknifes were 1.7 and 0.8 for QTL 1 and
QTL 2, respectively. The high quality of the analysis is
duetothehighallelesubstitutioneffectsinthetwoQTL
and the relatively large map distance between them.
More diverse sires with respect to their QTL structure
(heterozygous at one, two, or none of the QTL) are also
treatable with relative ease within the framework of the
twoQTL FPD model.
General scheme of FPD QTL analysis: To conclude
the analytical section, we present here a general scheme
of the proposed system of FPD QTL analysis (Figure 5).
The suggested integrative algorithm includes: (A) fit
ting the mapping model, (B) an overall test of signif
icance (using lfvaluebased models for conducting
permutation tests), (C) detecting nonsignificant (QTL
homozygous) sires, (D) removing the homozygous sires
and repeating the tests, (E) estimating QTL detection
power, and (F) conductingjackknifeanalysistoevaluate
the confidence interval for the estimated position of
detected QTL. This scheme can be further extended to
take into account the possibilities of multiplelinkedQTL
analysis, including: fitting multiplelinkedQTL models;
comparing multiplelinked and singleQTL models
(testing H0vs. H1and H2and H1vs. H2); detection of
siresheterozygousforzero, one,ormultiplelinkedQTL;
and estimating the confidence intervals of the chromo
somal positions of the detected QTL.
Unknown marker linkage phase in the sire: In the
case of unknown marker–QTL linkage phase (sire
marker haplotypes), the algebraic sign of the statistic
Dmis not uniquely defined. For markers with unknown
phase these signs (plus or minus) can be found through
optimization of criteria (4), (5), (4a), (5a), (7), or (8)
(with the minimum now taken over all possible combi
nations of signs). To make optimization in this case
moreeffective,someheuristicscanbeused.Forasingle
QTL model where marker phase in the sire is not
known, it is reasonable to allocate the same sign (say,
plus) to the Dvalues for all markers. For the model with
two QTL on the chromosome, it is reasonable to con
sider Dvalues changing sign no more than once, e.g.,
positive for the first m markers and negative for the
others(ifthetwoQTLinthesireareinrepulsivephases).
Optimization of the signs of Dvalues can result in an
increase in the false positive declaration rate. Indeed,
it can convert some families with noisy fluctuating
Dvalues around zero to have Dvalues of one sign. This
can greatly increase jlj and, hence, falsely cause a QTL
homozygous family to be declared heterozygous. There
fore, external information about linkage phases of the
maker loci reduces the proportion of false positive
families.
Choosing the number of subpools: The multiple
pool approach was previously proposed as a means of
improving the quality of allele frequency estimates
(Sham et al. 2002; Brohede et al. 2005). Within this
framework, the problem of ‘‘optimal size’’ of pools was
primarily considered from the aspect of amplification
fidelity (Brohede et al. 2005) and as a way to obtain an
adequate estimate of variation of marker allele frequen
cies Var Df,m(e.g., Sham et al. 2002). In the present study,
the number of pools affects the number of possible
differentpermutationsand jackknifesand hence affects
Pvalues and power of the analysis.
Todemonstratethedependenceofanalysisqualityon
the number of subpools per tail, a series of simulation
experiments were conducted. Situations with one,
three, and five families were simulated. The proportion
of individuals taken to the tails was 0.10 as in the
previous simulations. The individuals in the tails were
then randomly subdivided into four, six, or eight sub
pools of equal size. The family sizes were 960 and 1920.
As above achromosome of120 cM length with13 evenly
spaced markers was assumed, and the QTL was simu
lated in position 40 cM with allele substitution effects
d/s ¼ 0.3, 0.2, and 0.15. For each parameter combina
tion, 10 Monte Carlo data sets were simulated; for every
set 1000 permutations and 100 jackknife iterations
were made (with exactly one pool per tail per family
being excluded in each jackknife run). The results are
summarized in Table 2.
It was found that a higher number of subpools does
not reduce the standard error of estimated QTL loca
tion, if the percentage of excluded pools is the same in
each jackknife iteration (not shown). However, if in
each jackknife iteration exactly one pool per tail is
excluded, SD and confidence intervals became smaller
with a higher number of subpools (Table 2) but less
robust(i.e.,samplingvarianceoftheconfidenceinterval
center and its size are higher), because different runs
Figure 5.—The general scheme of QTL analysis by the FPD
method.
2620A. Korol et al.
Page 11
are more dependent. This can explain why value D does
notalwaysdecreasewithincreasingnumberofsubpoolsS.
In contrast, Pvalues decreased asymptotically with the
number of subpools until some limit determined by
QTL allele substitution effect, number and proportion
of QTLpolymorphic families, number of daughters
per family, proportion of daughters taken to each tail,
number and positions of markers on the chromosome,
and technical error of densitometric estimation of pool
frequencies. Results summarized in Table 2 demon
strate the variation of Pvalue and power of the analysis
thatcanbeachievedindifferentsituations.Asexpected,
better results were obtained in situations with a greater
number of families, a greater number of progeny per
family, and a greater allele substitution effect d/s of
QTL. The unexpected smaller D and SD for the one
family situation in the case of d/s ¼ 0.15 (compared to
d/s ¼ 0.2) can be explained by a shortcoming of
criterion (5a): In the case of absence of or very small
QTL effect, the difference in the criterion values for
different x(q)is very small; and the smallest value tends
to be observed for x(q)close to the average marker
position (60 cM in our situation). In other words, under
H0, the estimated position is not uniformly distributed
along the chromosome (not shown). Note that the
lowest possible Pvalue in permutation is equal to 1/R,
where R is the number of different permutations. If we
are ‘‘satisfied’’ with Pvalues $a, then no more than 5/a
different permutations are needed. Hence, in the case
ofonlyonefamilyweneed?S¼log4R11.5¼log4(5/a)1
1.5 subpools. For the experimentwise permutation test
in F similar families we need S ¼ log4(R)1/F1 1.5 ¼ 1/F
log4(5/a)11.5subpoolspertail,per family.Thus,from
the point of view of maximizing the number of different
permutations, it is more effective to analyze more
families than to make more subpools per family. The
relative cost of additional families, subpools, markers, and
desired QTL detection power and mapping accuracy
definesacosteffectivestrategy for theinitialgenomescan
forQTLbyFPD.Clearly,theaboveaspectsofamplification
fidelity and estimation of variation of marker allele fre
quencies considered by Brohede et al. (2005), Sham et al.
(2002), and other authors should also be an important
part of designing FPD experiments.
Correlations between Dvalues and quality of the
analysis: Taking into account correlations between D
values for linked markers, i.e., using a generalized least
squares method (Equations 5, 5a, and 8), will probably
not increase the QTL detecting power and accuracy of
the QTL position estimates in the majority of practical
situations. When substitution effects, number of daugh
ters per family, and number of families are small, the
sampling variance of Dmis high relative to its expected
value. Taking the correlations into account will increase
the sampling variance and reduce the expected value
for each marker (Montgomery and Peck 1992). This
makes the analysis less robust. The leastsquares optimi
zation criterion, whenH0istrue, follows a x2distribution
TABLE 2
Effect of number of subpools per tail (S) under the FPD on characteristics D and SD of the confidence interval for QTL
location, comparisonwise error rate (Pvalue), and statistical power, according to number of families (F), number of
daughters per family (N), and standardized allele substitution effect at the QTL (d/s), using simulated data
D
SD
PvaluePower at P ¼ 0.05
S ¼ 4 (%)
—
—
—
FN d/s
S ¼ 4
10.4
14.1
11.0
S ¼ 6
10.1
14.0
10.3
S ¼ 8
10.3
13.7
11.1
S ¼ 4
4.7
14.6
11.6
S ¼ 6
4.0
12.0
8.8
S ¼ 8
3.5
10.7
6.7
S ¼ 4
0.056
0.083
0.156
S ¼ 6
0.007
0.043
0.135
S ¼ 8
0.005
0.028
0.110
S ¼ 6 (%)
79
32
—
S ¼ 8 (%)
89
56
—
11920 0.3
0.2
0.15
3 9600.3
0.2
0.15
4.7
10.3
14.6
3.8
12.1
14.7
3.5
12.0
14.9
5.4
11.4
19.2
4.9
7.1
15.7
3.2
6.9
13.2
0.003
0.030
0.195
0.003
0.021
0.203
0.002
0.023
0.208
59
30
—
89
52
—
94
52
—
3 19200.3
0.2
0.15
2.9
5.7
10.7
3.1
5.2
10.1
3.2
5.4
10.1
1.9
4.4
6.9
1.5
3.4
5.6
1.2
3.0
5.1
0.001
0.003
0.028
0.001
0.002
0.024
0.001
0.003
0.011
94
56
46
99
82
72
99
92
76
5 960 0.3
0.2
0.15
2.7
5.7
14.3
2.9
5.8
15.1
2.9
5.4
14.9
2.6
5.3
12.0
1.8
4.3
9.6
1.6
3.3
8.4
0.001
0.013
0.081
0.001
0.009
0.070
0.001
0.006
0.067
87
44
—
99
63
—
99
71
—
Pvalues and power were calculated using the permutation test based on S(Af)2(see text). Power was calculated for the thresh
old of the statistics corresponding to Pvalue ¼ 0.05 (shown only for situations where the observed experimentwise Pvalue did not
exceed 0.05). Characteristics D and SD of the confidence interval for QTL location were obtained from the jackknife iterations.
Parameters of the simulations: chromosome length 120 cM. A single QTL was situated at position 40 cM. Number of markers M ¼
13. Proportion of population selected to each tail, 0.10. One subpool per tail was excluded in each jackknife. Values represent
mean of 10 simulation data sets; for every data set 1000 permutations of subpools and 100 jackknife iterations were made to es
timate Pvalue, power, D, and SD.
Fractioned DNA Pooling2621
Page 12
with degrees of freedom equal to the number of terms
in the sum. Parameters minimizing this criterion also
maximize the likelihood function, but the difference
between the criterion values for different putative QTL
positions is small (not shown). Nevertheless, by taking
thecorrelations intoaccount,wereducetheconfidence
interval and discrepancy between the estimated and
simulated QTL positions (data not shown).
DISCUSSION AND PROSPECTS
Genomewide scans for the detection of marker–QTL
linkage or linkage disequilibrium for QTL of small
effect require large mapping populations and hence
involve a high cost of marker genotyping. Even more
challenging are the requirements of population size
fromtheviewpointofQTLmappingaccuracy.Infamily
based analysis, the confidenceintervals for the estimated
QTL chromosomal position are of tens of centimorgans
even for QTL of moderate effects (Darvasi and Soller
1997; Ronin et al. 2003). A costeffective solution is to
replace individual genotyping by DNA analysis in pools
using individuals from the tails of the trait distribution
(Hillel et al. 1990; Darvasi and Soller 1994) or al
ternative phenotypic groups in the case of discontinu
ous variation (Giovannoni et al. 1991; Michelmore
et al. 1991). To increase the fidelity of pooling analysis,
Dekkers (2000) proposed a method of joint treatment
of multiple markers by scanning a chromosome with a
sliding window (see also Johnson 2005 for further
developments in LD QTL analysis).
Although the ideaof using a multiplepool design has
been discussed previously (Sham et al. 2002; Brohede
et al. 2005), the objectives of those studies were to im
prove the quality of the allelefrequency estimates and
corresponding variances. In addition to these uses, the
proposed FPD system utilizes the multiplepool design
toprovideawidespectrumofnewanalyticaloptionsthat
were previously possible only with individual genotyp
ing. These new options are of special importance in the
light of accumulating evidence on reliability of pooling
analysis with SNP chips. Combining SNP microarray
analysis with DNA pooling can reduce dramatically the
cost of screening large numbers of SNPs on large sam
ples, making chip technology applicable for genome
wide association mapping in humans and farm animals
(Butcher et al. 2004; Brohede et al. 2005; Craig et al.
2005). The FPD analysis relaxes some of the previous
limitations of the pooling analysis by utilizing the infor
mation provided by multiple subpools within tails. This
allows a flexible analytical system in QTL detection
based on resampling procedures (permutations, boot
straps, and jackknifes), rather than on asymptotic as
sumptions (Sham et al. 2002; Carleos et al. 2003),
enabling evaluation of the confidence interval of QTL
position and discriminating between different hypoth
eses of trait genetic architecture.
Allowing for resampling analysis via the FPD does
comeatacost ofrequiring multiplesubpoolsper tail.In
the situations when multiple traits are analyzed, indi
viduals need to be separated into subpools in the tails
of trait distribution for every trait. In these situations
the number of subpools may be close to the number of
individuals in the mapping population (if traits are not
strongly correlated), thereby reducing the advantage
of the pooling method. Another disadvantage is that
this method only partially utilizes haplotype information
compared to individual selective genotyping. However,
a partial solution to this problem could be provided by
using multivariate tails of the multidimensional trait
distribution rather than traitspecific tails (Ronin et al.
1998).
The proposed methodology allows joint analysis of
multiple families and multiple markers across a chro
mosome, even if the markers are only partly shared (or
even not shared at all) among families. Resampling pro
cedures permit confidence intervals to be constructed
for familyspecific lvalues. These intervals allow iden
tification of families for which the founder sire was
homozygous at the QTL. The FPD analysis permits ex
tension to cases of two or more QTL on the same
chromosome. All this provides costeffective options for
sequential family and regionspecific increase of marker
density to improve the QTL mapping resolution and
accuracy and to reduce type I (false positive) and type II
(false negative) errors. Of special interest is the exten
sion of pooling methodology to genome expression
analysis (Alba et al. 2004; Kendziorski et al. 2005). The
cautious optimism of pooling RNA expressed by these
authors can be considered as justifying the extension of
the FPD to RNA analysis.
Themajoradvantageofpopulationbasedratherthan
familybased mapping is in its potential for fine and
ultrafine mapping due to accumulation of historical
recombination events. Recent findings on the existence
of linkage disequilibrium block and estimates of the
sizesoftheseblocksestablishabasisforLD(association)
mapping. Still, for lociwithsmallto moderate effects on
the target traits one of the major limiting factors is the
size of the effect and not the degree of recombination
(diversity of haplotypes). Consequently, very large sam
plesizesarerequiredmakingpoolinganalysisextremely
attractive.Therefore,weplantoextendthefractionated
pooling design to LDbased QTL analysis.
We thank J. Dekkers for very constructive criticism and helpful
suggestions. This research was supported in part by grant QLK5CT
200102379 (BovMAS project) under the European Union FP5
programandby aPh.D.fellowshipfromtheUniversity ofHaifatoZ. F.
LITERATURE CITED
Alba, R., Z. J. Fei, P. Payton, Y. Liu, S. L. Moore et al., 2004
cDNA microarrays, and gene expression profiling: tools for dis
secting plant physiology and development. Plant J. 39: 697–714.
ESTs,
2622A. Korol et al.
Page 13
Benjamini, Y., and Y. Hochberg, 1995
ery rate  a practical and powerful approach to multiple testing. J.
R. Stat. Soc. Ser. B 57: 289–300.
Brohede, J., R. Dunne, J. D. Mckay and G. N. Hannan, 2005
an algorithm for accurate estimation of SNP allele frequencies in
small equimolar pools of DNA using data from high density mi
croarrays. Nucleic Acids Res. 33: e142.
Butcher, L. M., E. Meaburn, L. Liu, C. Fernandes, L. Hill et al.,
2004 Genotyping pooled DNA on microarrays: a systematic ge
nome screen of thousands of SNPs in large samples to detect
QTLs for complex traits. Behav. Genet. 34: 549–555.
Carleos, C., J. A. Baro, J. Canon and N. Corral, 2003
variances of QTL estimators with selective DNA pooling. J. Hered.
94: 175–179.
Craig, D. W., M. J. Huentelman, D. HuLince, V. L. Zismann, M. C.
Kruer et al., 2005 Identification of disease causing loci using
an arraybased genotyping approach on pooled DNA. BMC
Genomics 6: 138.
Darvasi, A., and M. Soller, 1992
nation of linkage between a marker locus and a quantitative trait
locus. Theor. Appl. Genet. 85: 353–359.
Darvasi, A., and M. Soller, 1994
mination of linkage between a molecular marker and a quantita
tive trait locus. Genetics 138: 1365–1373.
Darvasi, A., and M. Soller, 1997
solving power and confidence interval of QTL map location. Be
hav. Genet. 27: 125–132.
Dekkers, J. C. M., 2000 Quantitative trait locus mapping based on
selective DNA pooling. Anim. Breed. Genet. 117: 1–16.
Doerge, R. W., and G. A. Churchill, 1996
multiple loci affecting a quantitative character. Genetics 142:
285–294.
Dunnington, E. A., A. Haberfeld, L. C. Stallard, P. B. Siegel and
J. Hillel, 1992Deoxyribonucleicacid fingerprint bandslinked
to loci coding for quantitative traits in chickens. Poult. Sci. 71:
1251–1258.
Fernando, R. L., D. Nettleton, B. R. Southey, J. C. Dekkers, M. F.
Rothschild et al., 2004 Controlling the proportion of false
positives in multiple dependent tests. Genetics 166: 611–619.
Giovannoni, J. J., R. A. Wing, M. W. Ganal and S. D. Tanksley,
1991 Isolation of molecular markers from specific chromo
somal intervals using DNA pools from existing mapping popula
tions. Nucleic Acids Res. 19: 6553–6558.
Hillel, J., R. Avner, C. BaxterJones, E.A. Dunnington, A. Cahaner
et al., 1990 DNA fingerprints from blood mixes in chickens and
turkeys. Anim. Biotechnol. 2: 201–204.
Johnson, T., 2005Multipoint linkage disequilibrium mapping
using multilocus allele frequency data. Ann. Hum. Genet. 69:
474–497.
Kearsey, M. J., 1998The principles of QTL analysis (a minimal
mathematics approach). J. Exp. Bot. 49: 1619–1623.
Kendziorski, C., R. A. Irizarry, K. S. Chen, J. D. Haag and M. N.
Gould, 2005On the utility of pooling biological samples in mi
croarray experiments. Proc. Natl. Acad. Sci. USA 102: 4252–4257.
Lander, E. S., and D. Botstein, 1989
underlying quantitative traits using RFLP linkage maps. Genetics
121: 185–194.
Controlling the false discov
PPC:
Asymptotic
Selective genotyping for determi
Selective DNA pooling for deter
A simple method to calculate re
Permutation tests for
Mapping Mendelian factors
Lipkin, E., M. O. Mosig, A. Darvasi, E. Ezra, A. Shalom et al.,
1998Quantitative trait locus mapping in dairy cattle by means
of selective milk DNA pooling using dinucleotide microsatellite
markers: analysis of milk protein percentage. Genetics 149:
1557–1567.
Michelmore, R. W., I. Paran and R. V. Kesseli, 1991
of markers linked to diseaseresistance genes by bulked segregant
analysis: a rapid method to detect markers in specific genomic
regions by using segregating populations. Proc. Natl. Acad. Sci.
USA 88: 9828–9832.
Montgomery, D. C., and E. A. PECK, 1992
gression Analysis, Ed. 2. John Wiley & Sons, New York.
Mosig, M. O., E. Lipkin, G. Khutoreskaya, E. Tchourzyna, M.
Soller et al., 2001 A whole genome scan for quantitative trait
loci affecting milk protein percentage in IsraeliHolstein cattle, by
means of selective milk DNA pooling in a daughter design, using
an adjusted false discovery rate criterion. Genetics 157: 1683–1698.
Plotsky, Y., A. Cahaner, A. Haberfeld, U. Lavi, S. J. Lamont et al.,
1993 DNA fingerprint bands applied to linkage analysis with
quantitative trait loci in chickens. Anim. Genet. 24: 105–110.
Ronin, Y., A. Korol, M. Shtemberg, E. Nevo and M. Soller,
2003Highresolution mapping of quantitative trait loci by se
lective recombinant genotyping. Genetics 164: 1657–1666.
Ronin, Y. I., A. B. Korol and J. I. Weller, 1998
to detect quantitative trait loci affecting multiple traits: interval
mapping analysis. Theor. Appl. Genet. 97: 1169–1178.
Ronin, Y. I., A. B. Korol and E. Nevo, 1999
trait mapping analysis of linked quantitative trait loci: some as
ymptotic analytical approximations. Genetics 151: 387–396.
Schnack, H. G., S. C. Bakker, R. Van’t Slot, B. M. Groot, R. J.
Sinke et al., 2004Accurate determination of microsatellite al
lele frequencies in pooled DNA samples. Eur. J. Hum. Genet.
12: 925–934.
Sham, P., J. S. Bader, I. Craig, M. O’Donovan and M. Owen,
2002DNA pooling: a tool for largescale association studies.
Nat. Rev. Genet. 3: 862–871.
Tamiya, G., M. Shinya, T. Imanishi, T. Ikuta, S. Makino et al.,
2005Whole genome association study of rheumatoid arthritis
using 27 039 microsatellites. Hum. Mol. Genet. 14: 2305–2321.
Visscher, P. M., and S. Le Hellard, 2003
SNPbased association studies using DNA pools. Genet. Epide
miol. 24: 291–296.
Wang, J., K. J. Koehler and J. C. M. Dekkers, 2007
ping of quantitative trait loci with selective DNA pooling data.
Genet. Sel. Evol. (in press).
Weller, J. I., Y. Kashi and M. Soller, 1990
granddaughter designs for determining linkage between marker
loci and quantitative trait loci in dairycattle. J. Dairy Sci. 73:
2525–2537.
Zou, G. H., and H. Y. Zhao, 2004
genotyping and DNA pooling on association studies. Genet. Epi
demiol. 26: 1–10.
Zou, G. H., and H. Y. Zhao, 2005
different family structures using pooled DNA. Ann. Hum. Genet.
69: 429–442.
Identification
Introduction to Linear Re
Selective genotyping
Single and multiple
Simple method to analyze
Interval map
Power of daughter and
The impacts of errors in individual
Familybased association tests for
Communicating editor: M. W. Feldman
Fractioned DNA Pooling2623