Recombination rates in admixed individuals identified by ancestry-based inference

Article · July 2011with44 Reads
DOI: 10.1038/ng.894 · Source: PubMed
Abstract
Studies of recombination and how it varies depend crucially on accurate recombination maps. We propose a new approach for constructing high-resolution maps of relative recombination rates based on the observation of ancestry switch points among admixed individuals. We show the utility of this approach using simulations and by applying it to SNP genotype data from a sample of 2,565 African Americans and 299 African Caribbeans and detecting several hundred thousand recombination events. Comparison of the inferred map with high-resolution maps from non-admixed populations provides evidence of fine-scale differentiation in recombination rates between populations. Overall, the admixed map is well predicted by the average proportion of admixture and the recombination rate estimates from the source populations. The exceptions to this are in areas surrounding known large chromosomal structural variants, specifically inversions. These results suggest that outside of structurally variable regions, admixture does not substantially disrupt the factors controlling recombination rates in humans.
5 Figures
© 2011 Nature America, Inc. All rights reserved.
Nature GeNetics ADVANCE ONLINE PUBLICATION 1
ARTICLES
The extent to which patterns of recombination vary across human
populations remains uncertain. Increasing evidence has suggested a
high concordance between populations in large-scale recombination
rates and more variation between populations in small-scale recom-
bination rates1–5. The lack of high-resolution genome-wide recom-
bination maps for admixed individuals, such as African Americans,
has limited the possibility of incorporating admixed populations in
comparative analyses of recombination rates. The development of new
genome-wide recombination maps is therefore an essential step for
understanding recombination in admixed populations and enabling
broader comparative analyses.
Generating new recombination maps has traditionally depended
on observations of recombination events in pedigrees6. Large-scale
applications of this approach have been limited to a few samples
of European descent with unusually detailed genealogic data, such
as samples from Iceland7,8, Mormons from Utah9 and Hutterites10.
For example, a recombination map based on inferences from about
15,000 meioses in the Icelandic pedigree genotyped with nearly
300,000 SNPs achieved a resolution of recombination rate varia-
tion down to the 10-kb scale8. In contrast, for non-European and
admixed populations, such as African Americans, the best available
pedigree-based maps use many fewer meioses and ~1,000 micro-
satellites or less11 ,12.
Assessment of linkage disequilibrium (LD), or the non-random
association of alleles on chromosomes, in unrelated individuals pro-
vides a second, more indirect means for inferring recombination
rates in a population. The advent of high-density, genome-wide SNP
data has enabled LD-based maps to achieve a resolution of about
1 kb13,14 and has shown that recombination rates at such fine scales
are dominated by recombination hotspots. Using LD-based maps in
analyses of short target regions1–4 and genome-wide SNP data5, com-
parisons between populations have documented some variation in
small-scale recombination rates but very little variation in large-scale
recombination rates. LD-based maps, however, conflate the effec-
tive population size and recombination rates, which complicates the
interpretation of inter-population variation in recombination6. This
conflation of the effective population size and recombination rate is
particularly problematic in regions where recent natural selection
has reduced the effective population size6,15. In addition, care must
be taken when applying LD-based approaches to recently admixed
Recombination rates in admixed individuals identified by
ancestry-based inference
Daniel Wegmann1, Darren E Kessner2, Krishna R Veeramah1, Rasika A Mathias3–5, Dan L Nicolae6–9,
Lisa R Yanek3,4, Yan V Sun10–12, Dara G Torgerson8,9,13, Nicholas Rafaels5,14, Thomas Mosley11,15,
Lewis C Becker3,4, Ingo Ruczinski5,14, Terri H Beaty5,16, Sharon L R Kardia10,11, Deborah A Meyers13,17,
Kathleen C Barnes3,5, Diane M Becker3,4, Nelson B Freimer18 & John Novembre1,2
Studies of recombination and how it varies depend crucially on accurate recombination maps. We propose a new approach for
constructing high-resolution maps of relative recombination rates based on the observation of ancestry switch points among
admixed individuals. We show the utility of this approach using simulations and by applying it to SNP genotype data from a
sample of 2,565 African Americans and 299 African Caribbeans and detecting several hundred thousand recombination events.
Comparison of the inferred map with high-resolution maps from non-admixed populations provides evidence of fine-scale
differentiation in recombination rates between populations. Overall, the admixed map is well predicted by the average proportion
of admixture and the recombination rate estimates from the source populations. The exceptions to this are in areas surrounding
known large chromosomal structural variants, specifically inversions. These results suggest that outside of structurally variable
regions, admixture does not substantially disrupt the factors controlling recombination rates in humans.
1Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, USA. 2Interdepartmental Program in Bioinformatics, University of California,
Los Angeles, California, USA. 3Department of Medicine, Johns Hopkins University, Baltimore, Maryland, USA. 4For the Genetic Study of Atherosclerosis Risk (GeneSTAR)
consortium. 5For the Genetic Research on Asthma in the African Diaspora (GRAAD) consortium. 6Department of Medicine, University of Chicago, Chicago, Illinois, USA.
7Department of Statistics, University of Chicago, Chicago, Illinois, USA. 8Department of Human Genetics, University of Chicago, Chicago, Illinois, USA. 9For the Chicago
Asthma Genetics (CAG) and Collaborative Study on the Genetics of Asthma (CSGA) consortium. 10Department of Epidemiology, University of Michigan, Ann Arbor, Michigan,
USA. 11For the Genetic Epidemiology Network of Arteriopathy (GENOA) consortium. 12Department of Epidemiology, Emory University, Atlanta, Georgia, USA. 13For the Severe
Asthma Research Program (SARP) consortium. 14Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA. 15Department of
Medicine, University of Mississippi, Jackson, Mississippi, USA. 16Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.
17Center for Human Genomics, Wake Forest University School of Medicine, Winston-Salem, North Carolina, USA. 18Center for Neurobehavioral Genetics, Semel Institute for
Neuroscience and Human Behavior, University of California, Los Angeles, California, USA. Correspondence should be addressed to J.N. (jnovembre@ucla.edu).
Received 4 February; accepted 1 July; published online 20 July 2011; doi:10.1038/ng.894
© 2011 Nature America, Inc. All rights reserved.
2 ADVANCE ONLINE PUBLICATION Nature GeNetics
ARTICLES
populations because these methods are based on population genetic
models of populations at demographic equilibrium1,16.
To address the need for genome-wide recombination maps in
admixed samples, we report here an ancestry-switch–based method
for constructing high-resolution genome-wide recombination maps.
We used this method to infer a recombination map from genotypes
at >570,000 SNPs in 2,864 admixed African-American and African
Caribbean individuals. Because of the levels of admixture in this
sample, we observed approximately 90 ancestry switch points per
individual, each of which indicates the location of a recombination
event in the history of the sample; thus, our map is based on roughly
250,000 unique recombination events. With the inferred map, we
investigated whether there is evidence for population differentiation
in recombination rates and to what extent admixture has a global and
local effect on recombination patterns.
RESULTS
Recently admixed individuals derive their ancestors from two or more
diverged populations, and thus, their chromosomes are mosaics of
segments with different origins (Fig. 1). Switch points in ancestry
along a chromosome mark locations where a recombination event
occurred between ancestral chromosomes of different origins. In
principle, ancestry switch-point events will be a random sample of
all recombination events, and by tallying the location of such events
across a large number of individuals, we can infer relative rates of
recombination across the genome.
Our approach for identifying the locations of ancestry switch points
is based on a previously developed Hidden Markov model (HMM) for
admixture that matches chromosomal segments of admixed individuals
to reference haplotypes from the ancestral populations17. To account
for uncertainty in the locations of ancestry switch points, we imple-
mented an algorithm to compute the probability of an ancestry switch
between two markers conditional on an individual’s genotype data,
and we based our inferences on these probabilities (Online Methods).
Moreover, to pool evidence for recombination across individuals, we
developed an empirical Bayes approach. Our method produced two
estimators: (i) the individual-based estimator,
cjk
i( )
, of the number of
switches between positions j and k in individual i; and (ii) the sample-
wide estimator, rjk, of the number of ancestry switch events in the
history of the admixed sample between positions j and k.
Validation of the approach using simulations
To investigate the resolution of this ancestry-switch approach, we
tested our methods using a series of simulations of a simple model of
African-American admixture with identical sample sizes and marker
density to those found in our study sample. An example of the inferred
number of ancestry switches from a random simulated segment from
one individual is shown in Figure 2a.
To assess specificity, we investigated 50,000 randomly chosen locations
more than 1 Mb away from the nearest switch point (Fig. 2b, red line).
For more than 95% of those 1-Mb windows, the value of
cjk
i( )
fell below
0.025, suggesting that the method produces little false evidence for ances-
try switches where there are none. To assess sensitivity, we computed
cjk
i( )
for symmetric intervals around isolated ancestry switch points (Fig. 2b,
black line). If well calibrated, the method should find values of
cjk
i( )
equal
to 1 for these intervals. For intervals of 1 Mb around true switch points,
we found the median
cjk
i( )
to be approximately 1, and when we investi-
gated at what scale the median
cjk
i( )
= 0.85, we found it to be at roughly
the 200-kb scale (Fig. 2b). These results suggest that our method resolves
single switch points fairly well at the 200-kb scale and above.
Ancestors of variable ancestry Haplotypes from
population 1
A/G G/G A/T C/C A/C T/G
Haplotypes from
population 2
Sampled admixed individual
a b
Figure 1 Sketch of the haplotype-copying Hidden Markov model used
to detect ancestry switch points. (a) Yellow and blue represent the
chromosomal segments of different ancestry and the shades of each color
represent different haplotypes from each ancestry. Recombination creates
a mosaic of haplotypes regardless of origin but recombination events
between haplotypes of different ancestries leave signatures that can be
detected in descendant, admixed individuals. (b) The genotypes observed
for such an individual form observed states of a Hidden Markov model in
which underlying states are based on which haplotypes from a reference
population each allele of the genotype is copied.
Figure 2 Sensitivity and specificity of
inference. (a) Estimated number of switches
(
cjk
i( )
) between neighboring SNPs obtained
for a simulated individual with two ancestry
switches (vertical dashed lines). Below,
the comparison at the 50-kb scale of the
estimated rates (rjk) and the underlying
recombination map used to perform the
simulations for this segment. Both maps are
normalized to the same total rate. (b) The
inferred number of switch points (
) as
function of the size of the interval between
locations j and k. The black line represents
the median for symmetric intervals around
a single, isolated switch point. The red line
represents the median for intervals with zero
simulated switch points and which are located
at least 1 Mb away from the closest switch
point. Dashed lines mark the 2.5% and 97.5%
quantiles. (c) Comparison of the inferred rates (rjk) with the true rates across all segments at 10-kb (blue), 50-kb (orange) and 1-Mb (red) scales.
The 2.5% and 97.5% quantiles are shown with dashed lines. All maps have been normalized to the same total rate for comparison.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
0.2
0.4
0.6
0.8
1.0
1 kb 10 kb 100 kb
0
2
4
6
8
10
012345
True
Inferred
Recombination
rate (cM/Mb)
Position on simulated segment (Mb)
0 0.2 0.4 0.6 0.8 1.0
0
0.5
1.0
1.5
2.0 10 Kb
50 Kb
1 Mb
True recombination rates
Inferred
recombination rates
Inferred number
of ancestry switches
Inferred number
of ancestry switches
1 Mb
Interval size
a b
c
© 2011 Nature America, Inc. All rights reserved.
Nature GeNetics ADVANCE ONLINE PUBLICATION 3
ARTICLES
For switch-points close to the ends of
our simulated segments, we found a con-
sistent bias downward in the values of
cjk
i( )
(Supplementary Fig. 1). This bias was to be
expected, as it is only through analysis of several consecutive mark-
ers that evidence for a switch point can be derived. We thus did not
attempt to infer recombination rates within 5 Mb of chromosome
ends or centromeres (Supplementary Note). Finally, as with other
methods for inferring recombinations, we observed a ‘multiple hits’
problem, such that if more than one switch point occurred within a
1-Mb interval,
cjk
i( )
would typically be underestimated. For example,
cjk
i( )
often takes values close to zero when the actual value is two or takes
values close to one when the actual value is three (Supplementary
Fig. 2). This problem is not evident if two switch points are spaced
more than 1 Mb apart (Supplementary Fig. 1), and thus should not
be a major problem for analysis of African-American samples, as
simulations indicate the fraction of switch points with spacing <1 Mb
is small when admixture has been recent (Supplementary Fig. 2).
Nonetheless, we developed a refined estimator of recombination,
rjk, that corrects for the multiple hits problem and, more importantly,
pools information across individuals in an empirical Bayes framework
(Online Methods and Supplementary Fig. 3).
To assess how well the estimator rjk performs at inferring recombi-
nation, we estimated maps of relative recombination rates from our
simulated datasets and compared them to the ‘true’ maps we used to
simulate the data. The correlation between the true and inferred rates
was 0.99 at a 1-Mb scale, 0.90 at the 100-kb scale, 0.86 at the 50-kb
scale and 0.71 at the 10-kb scale (Supplementary Fig. 3). Plots of
the inferred versus true recombination rates (Fig. 2c) revealed that
the map produces unbiased estimates of the rates at the 1-Mb and
50-kb scales, whereas at the smaller 10-kb scale there is evidence of
a downward bias in the map. Based on these results, we focused the
presentation of our results on the 1-Mb–scale map to represent large-
scale recombination patterns and on the 50-kb–scale map to represent
finer scales. Visual inspection of randomly chosen examples at the
50-kb scale (such as that shown in Fig. 2a) shows that the inferred
map captures most of the major recombinational features that are
found in the simulated map.
One potential drawback of either approach we took is a possible
overestimation of recombination if a large number of switch points
across individuals descended from the same ancestral event (that is, if
switch points are inherited in an identical-by-descent manner in the
sample). Using simulations, we found that under reasonable assump-
tions about the population size of African Americans and African
Caribbeans, it would be rare for a given ancestry switch to be observed
twice in our study sample (Supplementary Fig. 4).
Application to an African-American and African-Caribbean sample
We applied our approach to a study sample consisting of 2,565
African-American and 299 African-Caribbean individuals gathered
from four studies (GeneSTAR18, GENOA19,20, GRAAD21–23 and
SARP and CAG-CSGA24; Supplementar y Table 1). This sample has
a mean African-ancestry coefficient of ~0.81 with a 95% quantile
range of 0.54–0.96 (Supplementary Fig. 5), a broad range that is
consistent with previous studies of African-American and African-
Caribbean samples25–28. We used as reference panels for the ancestral
populations the HapMap YRI and HapMap CEU panels. Although
neither of these panels is an exact representation of the ancestral
populations of the admixed individuals in the sample used here,
previous studies17,25 and our own principal component analyses
(Supplementary Fig. 6) suggest these two panels are reasonable
proxies for the source populations.
We denote the map we generated as the ‘AfAdm map, and we
compared this map to the recently published deCODE map based
on Icelandic pedigrees as well as published LD-based maps for the
HapMap CEU and YRI samples (labeled deCODE, HapMapYRI and
HapMapCEU, respectively). When comparing the AfAdm map to
the HapMap-based maps, there is the potential to overestimate the
similarity between the maps because the HapMap samples served as
the reference panels for our method. We investigated the potential
magnitude of this effect through simulations and determined that
by using a trimmed Pearson correlation coefficient, any possible
bias as a result of shared data was minimized (Online Methods,
Supplementary Note and Supplementary Fig. 7). Unless otherwise
noted, all the correlations reported for scales <1 Mb are trimmed
Pearson correlations.
At the 1-Mb scale, we found a strong visual concordance and cor-
relations greater than 0.9 among all the maps (Fig. 3a, Table 1 and
see Supplementary Table 2 for additional scales). This degree of cor-
relation suggests broad-scale similarity of the recombination maps
across human populations, and that all three methods have the power
to infer recombination maps well at this scale.
At scales finer than 1 Mb, there was a more coarse correspondence
between recombination maps (Fig. 3b, Table 1 and Supplementary
Table 2). For example, at the 50-kb scale, the correlation of the
AfAdm map with the HapMapCEU map is 0.611 and is 0.697 with
the HapMapYRI map. The observed decay of correlation at smaller
observation scales more likely reflects the impact of sampling error
10 15 20 25 30 35 40
0
1
2
3
19.5 20.0 20.5 21.0
0
2
4
6
8
Recombination rate (cM/Mb)
0.001 0.01 0.1 1
0.01
0.03
0.1
0.3
1
Proportion of total recombination
deCODE
HapMapCEU
HapMapYRI
AfAdm
Proportion of sequence
Position on chromosome 1 (Mb)
a
b
c
Figure 3 Comparison of the African admixture-
based map to existing maps. (a) Example of
1-Mb–scale map from 50 Mb of chromosome 1.
(b) Example of 50-kb–scale map from the
2.5-Mb section of chromosome 1 indicated
by the gray box in a. (c) Proportion of the
total recombination in various proportions of
sequence intervals at the 50-kb scale.
Table 1 Correlations between recombination maps
HapMap
CEU
HapMap
YRI
HapMap
80%:20% deCODE AfAdm
HapMap CEU 1.000 0.922 0.951 0.939 0.900
HapMap YRI 0.738 1.000 0.997 0.934 0.922
HapMap 80%:20% 0.844 0.985 1.000 0.948 0.929
Decode 0.789 0.734 0.788 1.000 0.924
AfAdm 0.611 0.697 0.712 0.666 1.000
We report Pearson correlations at the 1-Mb (above diagonal) and 50-kb (below
diagonal) scales. See Supplementary Table 2 for additional scales.
© 2011 Nature America, Inc. All rights reserved.
4 ADVANCE ONLINE PUBLICATION Nature GeNetics
ARTICLES
than drastic underlying recombination rate differences across sam-
ples. As evidence, we note the correlation between the deCODE and
HapMapCEU maps is 0.789 at a 50-kb scale (Table 1) even though
both maps are based on populations of northern European descent.
Investigation of what proportion of the genome contains the high-
est recombination rates provided further evidence for the general
similarities between the maps. In the AfAdm map, we found that
recombinations concentrate in a fraction of the sequence (recom-
bination hotspots); for instance, at the 50-kb scale, 10% of the total
recombinations accumulate in about 1.2% of the genomic sequence
(Fig. 3c). This level of enrichment in the AfAdm map is similar to
the level found in the HapMapCEU and the deCODE maps and is
only slightly higher than in the HapMapYRI map (Fig. 3c). We note
that because the inferred hottest fraction of the genome likely con-
tains regions whose recombination rates have been overestimated by
chance, the observed level of enrichment may be upwardly biased
for each map in ways that depend on the sampling error specific to
each map’s estimates.
Despite the general similarity of all maps, there is evidence of sub-
tle increases in similarity between recombination maps from more
closely related populations. For example, the deCODE pedigree
map correlates more strongly with the HapMapCEU map than the
HapMapYRI map, whereas the AfAdm map correlates more strongly
with the HapMapYRI map (Fig. 4a,b). We also observed this pattern
when investigating recombination hotspot sharing (Fig. 4c,d). The
overlap between AfAdm and HapMapYRI hotspots is significantly
higher than the AfAdm overlap with HapMapCEU hotspots (0.32
compared to 0.23, P = 2 × 10−5 for hotspots defined as the 50-kb
intervals with the top 1% largest rates). In contrast, deCODE hotspots
overlap better with HapMapCEU hotspots
(0.35 compared 0.32, P = 0.0297 on the same
scale as used in the previous comparison).
Further, the genome-wide European
ancestry proportion of an individual in
our sample is positively correlated with the
fraction of switch points in that individual
inferred to be in HapMapCEU hotspots
(r = 0.102, P < 10 8) and negatively cor-
related with the fraction inferred to be in
HapMapYRI hotspots (r = −0.122, P < 10−10).
These results corroborate arguments that
fine-scale recombination rate modifiers
differ across populations and suggest that,
because the ancestry in AfAdm individuals
is predominantly African, our sample has
recombination patterns that are more like
the HapMapYRI population. Given these
results, we attempted an admixture mapping
approach to identify loci that would explain
the usage of HapMapYRI as opposed to
HapMapCEU hotspots. We did not identify
any significant associations between hotspot
usage and local ancestr y (Supplementary
Note), but this is likely due to a lack of power
because of limited sample size and because
of the limitation that the ancestry switches
we observed took place across several gen-
erations on varied genotypic backgrounds.
Using a regression-based approach, we
estimated what proportional weight would
lead to the observed AfAdm rates if the rates
are a weighted average of HapMapYRI and HapMapCEU rates. We
estimated proportional weights of 0.79 at 50-kb, 0.75 at 100-kb and
0.68 at 1-Mb scales (Supplementary Fig. 8). For completely iden-
tical maps, the estimated proportional weight would be an equal
weighting of each map, so the trending toward 0.5 observed here
may be caused by the global similarity of the maps at larger scales
(Supplementary Fig. 8). We note that this regression-based approach
may be biased toward the map with the smaller sampling error. Given
that the two HapMap maps were inferred with the same approach
from samples of similar size, we did not expect large differences in
sampling error between the maps. The results thus suggest that the
AfAdm map can be coarsely approximated as an 80%:20% weighted
average of the HapMapYRI and HapMapCEU maps. This weighting
would be expected from the average ancestry coefficient in the sample
(~80%:20% African:European ancestry).
We next sought to identify intervals where the recombination rate
differs from an 80%:20% average of the HapMapYRI and HapMapCEU
maps. The region where the AfAdm map showed the strongest
deficit in recombination when compared to the other maps lies at
the centromeric end of a common inversion in 8p23.1 (at ~12 Mb
on chromosome 8; Fig. 5a)29,30. The same segment has been found,
using coarser-scale microsatellite-based maps11,12, to be the site of the
largest map differences in the genome between Europeans and both
Asians and African Americans. This inversion region is also charac-
terized by several duplications and deletions29–31, which may contrib-
ute to the complexity of the region, and we note that all three methods
(pedigree, LD-based and admixture-based methods) gave differing
estimates of recombination rates at the telomeric end of the inversion
(at ~8 Mb on chromosome 8). This is not the only region with structural
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
10 kb 100 kb 1 Mb
a b c d
Correlation between maps
AfAdm
HapMapCEU
HapMapYRI
10 kb 100 kb 1 Mb
deCODE AfAdm deCODE
0
0.2
0.4
0.6
0.8
1.0
0.001 0.01 0.1 1
Hotspot overlap between maps
0.001 0.01 0.1 1
Proportion of sequenceInterval size
Figure 4 Population differences in recombination patterns. (ad) Independent of scale, the AfAdm
map correlates better (a) and shares more hotspots (c) with the HapMapYRI than the HapMapCEU
map. In contrast, the deCODE map correlates better (b) and shares more hotspots (d) with the
HapMapCEU than the HapMapYRI map. Hotspots are defined as the 50-kb intervals with the
top 1% largest rates.
8 10 12
Chromosome 8 (Mb)
36 38 40
Chromosome 9 (Mb)
deCODE
HapMapCEU
HapMapYRI
AfAdm
42 44 46 48
Chromosome 17 (Mb)
42 44 46 48
0
1
2
3
4
Chromosome 14 (Mb)
Recombination rate (cM/Mb)
a b c d
Figure 5 Recombination rates in notable genomic locations. (a) The region with the largest deficit
of the AfAdm map just outside the known inversion on chromosome 8p23.1–8p22 (gray). (b) The
region with a large deficit of the AfAdm map on chromosome 9 near the boundary of multiple
known polymorphic inversions. (c) The inversion on chromosome 17q21.31 (gray). (d) A region on
chromosome 14 with an elevated average European-ancestry proportion (gray) framed by local peaks
of recombination.
© 2011 Nature America, Inc. All rights reserved.
Nature GeNetics ADVANCE ONLINE PUBLICATION 5
ARTICLES
variation that appears to differ among the maps. Indeed, four out of
the top five regions where the AfAdm map showed strong deficits in
recombination contained large inversions (Table 2). An example of this
is the region just outside the centromere on chromosome 9 (Fig. 5b),
which harbors both a small inversion32 and large copy number varia-
tions (CNVs)33. Large inversions do not, however, always affect rate
estimates in the AfAdm map. For example, the 17q21.31 region har-
bors a large 900-kb inversion with a 20% frequency in Europeans that
is rarely found in African samples34, but the rate estimates in this
region do not differ between the maps (Fig. 5c).
Among the regions with the greatest elevation in recombina-
tion rates relative to an 80%:20% average of the HapMapYRI and
HapMapCEU maps, the pattern observed here is more ambiguous;
only 11 of the 27 such regions that we investigated harbor structural
variations (Table 2 and Supplementary Table 3). We found the most
strikingly elevated recombination rates in the major histocompatibil-
ity complex region (Supplementary Fig. 9), which is known to have
high levels of genetic diversity and population differentiation35. Using
quartet families in a subset of the data, we found that this elevation
in inferred ancestry switches is not concordant with family-based
recombination rates (see the Discussion section, Supplementary Note
and Supplementary Fig. 9). We also note two regions with large CNVs
on chromosome 2 (Supplementary Fig. 10) and 14 (Fig. 5d), each
consisting of two closely spaced peaks of elevated recombination rates
flanking regions with an elevated level of European ancestry across
individuals. In seven regions, the excess in recombination is caused
by a particularly low rate in the HapMapYRI map (Supplementary
Table 3). A possible explanation for such regions is selection specific
to the Yoruban population, which can bias LD-based estimates of
recombination downward6,15.
DISCUSSION
We have introduced a method for inferring recombination rates based
on ancestry switch points. Simulations suggest that this method per-
forms well for the sample size and SNP density of the data that we
analyzed here. We obtained further support for the method by using
it to infer a recombination map for African-American and African-
Caribbean individuals (the AfAdm map); this map corresponds well
to published maps from other populations while also permitting for
the investigation of fine-scale recombination patterns in admixed
populations. This ancestry-switch approach should be much less
sensitive than LD-based methods to local distortions of LD caused
by natural selection (for example, in selective sweep regions). In an
ancestry-switch approach, such distortions would arise only when unu-
sually strong selection has occurred in the typically brief period since
admixture between ancestral populations. The approach also has an
inherent efficiency in that the number of switch points observed per
genotyped individual is relatively large. For example, in the African
Americans and African Caribbeans sampled here, we observed roughly
90 switch points (recombination events) per genotyped individual
(Supplementary Note) as opposed to the ~30 such events that are
expected from genotyping multiple individuals in a pedigree to observe
an informative meiosis.
A disadvantage of the ancestry-switch approach is that, like LD-
based methods, it does not readily allow one to infer absolute recom-
bination rates or to identify recombination events unique to individual
parents. Hence, it is not an optimal approach for investigation of vari-
ation in recombination between individuals or sexes. Additionally,
with the SNP markers considered here, the ancestry-switch method
resolves events within individuals less precisely (roughly a 200-kb
scale) than does direct investigation of dense SNP markers in pedi-
grees. The resolution of the ancestry-switch approach will improve
by using variants that differ in frequency between the populations
ancestral to admixed groups (Supplementary Fig. 11), and large-scale
sequencing efforts are expected to identify more of such loci36. With
the current level of resolution, sampling error is clearly contributing
to the observed differences and similarities between the maps we
investigated. For example, we showed that the AfAdm map is more
like the HapmapYRI map than the HapMapCEU map (Fig. 4), but
we also found that the HapMapYRI map (and HapMapCEU map)
correlated better with the deCODE map than the AfAdm map at the
1-Mb and 50-kb scales (Table 1). This pattern would be expected if
recombination rates are fairly similar across populations and if the
AfAdm map has a higher sampling error than the deCODE map,
both of which are likely true. The AfAdm map is based on ~250,000
events resolved at a scale of roughly 200 kb each, whereas the deCODE
map is based on ~600,000 events resolved to a scale of ~10 kb each.
To circumvent this issue, we used comparisons of the HapMapYRI
and HapMapCEU maps to the AfAdm map alone (Fig. 4a) and the
deCODE map alone (Fig. 4b) to investigate population differences
in recombination rates.
Table 2 Regions for which the AfAdm map differs most from a
80%:20% average of the HapMapYRI and HapMapCEU maps.
Regions where the AfAdm map has lower rate estimates are shown
at top, followed by regions where the AfAdm map has higher rates.
Chr. PositionaDifferencebStructural variations
8 11.4–13.3 −1.93 4.7-Mb inversion29,30
9 37.6–39.5 −1.04 36-kb inversion32/8-Mb CNV33
10 124.9–126.8 −1.00
16 21.7–23.5 −0.89 1.1-Mb inversion39
7 5.1–6.4 −0.87 1-Mb inversion39,41
22 24.5–26.9 1.42 500-kb CNV33,42,43
8 133.8–135.6 1.30
16 81.1–83.0 1.23 1-kb inversion41
22 34.8–36.5 1.23
14 45.9–47.6 1.20 1-Mb CNV42
8 7.2–9.1 1.10 4.7-Mb inversion29,30
18 22.1–23.7 1.04
3 51.4–54.3 1.03 45-kb inversion32,44
14 93.3–95.1 1.03 1-kb inversion45
14 43.2–44.9 1.03 1-Mb CNV46
2 59.8–63.3 1.03 2.9-Mb CNV33
15 67.6–68.9 0.99
5 30.6–32.2 0.99
14 50.9–52.1 0.97
16 64.0–65.4 0.97
6 16.1–18.4 0.96
10 72.5–74.0 0.94 36-kb inversion32/1.2-Mb CNV47
9 6.8–8.1 0.91
8 23.1–24.4 0.88 2.5-Mb CNV33
7 102.7–104.1 0.84
To identify the regions, we identified the top 1% of the intervals with the greatest
difference between the AfAdm map and the HapMapYRI and HapMapCEU maps
computed on 1-Mb intervals spaced every 50 kb. We joined intervals whose endpoints
were not more than 1-Mb apart from each other, and we examined and present here
only regions supported by at least five intervals. We omitted seven regions where visual
inspection revealed that the difference was not caused by the AfAdm rate but rather by
the HapMapYRI rate (Supplementary Table 3). The reported structural variations were
observed in surveys of structural variations in random samples of European or African
individuals and were not further than 1 Mb away from the focus regions. In addition,
CNVs had to be at least 500 kb in length to be included, and we only report here the
largest CNV in the region. The intervals before collapsing the data are shown in
Supplementary Figure 10.
aPosition in Mb. bLargest difference per region, given in cM. Negative values imply lower rates
in the AfAdm map. Chr., chromosome; CNV, copy number variation.
© 2011 Nature America, Inc. All rights reserved.
6 ADVANCE ONLINE PUBLICATION Nature GeNetics
ARTICLES
By comparing the AfAdm map to existing maps, we were able to
make several observations: (i) there is evidence for subtle population
differences in recombination rates between African and European
populations, (ii) African-European admixed individuals appear to
have recombination rates that are, on average, intermediate between
the African and European rates and (iii) the degree to which the
rates are intermediate is predictable from the average ancestry coef-
ficient (~80% African and ~20% European) in our sample. Further,
in admixed individuals, recombinations appear to be concentrated
at hotspots in a manner correlated with ancestry: individuals with
more African ancestry have recombinations at hotspots found in the
HapMapYRI map, and individuals with more European ancestry have
recombinations at hotspots found in the HapMapCEU map. These
observations are consistent with the differentiation between popula-
tions for fine-scale recombination rates1–5 and with the European-
African differentiation at PRDM9, the only known major locus
affecting fine-scale recombination rates37.
Because admixed individuals will often be heterozygous at recom-
bination modifier loci for alleles from different ancestral populations,
the mode of genetic action of modifier alleles that are differentiated
between populations should mediate observed recombination pat-
terns. For example, among known modifier loci, inversions suppress
recombination in an underdominant fashion, and PRDM9 alleles may
act additively37. It is still unknown whether hotspot motifs that inter-
act with PRDM9 are recessive or dominant, although its clear there are
epistatic interactions between hotspot motif loci and PRDM9 (refs.
37,38). In our analysis, the AfAdm map appears as one would expect
if the recombination phenotype were determined predominantly
by additive factors: the AfAdm map has rates that, on average, are
intermediate between the HapMapCEU and HapMapYRI rates and
which are biased toward HapMapYRI rates in a proportion consist-
ent with the average proportion of African ancestr y in our sample.
We speculate that the approximately additive behavior of small-scale
recombination rates observed here is largely caused by the influence
of PRDM9 acting additively37 on hotspot motifs that may themselves
have largely additive effects.
Many of the departures from additive expectations that we found
fell near other regions known to be exceptional in the genome for
containing large structural variations. In particular, most regions
that showed strong deficits in recombination contain inversions. This
observation suggests the capacity of polymorphic structural variation
to disrupt local recombination rates may be enhanced in admixed
individuals, perhaps by elevated heterozygosity. A caveat to these
results is that SNP genotypes in regions of structural variation are less
reliable and may confound rates estimated by recombination inference
methods. In addition, rates may be biased in regions with long-range
LD and/or high levels of diversity because HMMs are overly simpli-
fied models of such regions39,40. We suspect rates in high-diversity
regions will more likely be overestimates, as we confirmed in the
major histocompatibility complex region (Supplementary Note and
Supplementary Fig. 9).
For future applications, we note that the ancestry-switch method is
extendible to three-way admixtures and thus can be applied to infer
recombination maps in other settings, such as for admixed Latino indi-
viduals, who in some cases combine descent from Native-American,
European and African ancestral populations. Admixture maps might
be compared to LD-based maps to detect selective sweeps, much like
how pedigree-based maps have recently been used15. Finally, given
that the power of the ancestry-switch method is improved by sampling
additional admixed individuals and that the density of available SNP
markers is increasing, we speculate that an ancestry-switch approach
will become an increasingly powerful, scalable tool for fine-scaled
recombination analysis.
URLs. Novembre group webpage, http://www.eeb.ucla.edu/Faculty/
Novembre/; IMPUTE, https://mathgen.stats.ox.ac.uk/impute/
impute_v2.html; deCODE recombination maps, http://www.decode.
com/addendum/.
METHODS
Methods and any associated references are available in the online
version of the paper at http://www.nature.com/naturegenetics/.
Note: Supplementary information is available on the Nature Genetics website.
ACKNOWLEDGMENTS
J.N., D.W. and K.R.V. were funded by a Searle Scholar Program award to J.N. N.B.F.
was supported by US National Institutes of Health (NIH) grants R01HL087679
and RL1MH083268. The sample assembled is compiled from the larger efforts and
the generous sharing of data from four major consortiums. For the GeneSTAR
consortium (L.C.B., D.R.B., L.R.Y. and R.A.M.), support came from NIH grants
HL072518 and M01-RR00052. For the CAG-CSGA consortium (D.A.M., D.G.T.
and D.L.N.), support came from NIH grants U01 HL49596, R01 HL072414, R01
HL087665 and RC2 HL101651, and special thanks is given to C. Ober. For the
GENOA samples (Y.V.S. and S.L.R.K.), support came from NIH grants HL087660
and HL100245, and special thanks is given to E. Boerwinkle. For the GRAAD
consortium (K.C.B., N.R., I.R., T.H.B. and R.A.M.), support came from NIH grants
HL087699, HL49612, AI50024, AI44840, HL075417, HL072433, AI41040, ES09606,
HL072433 and RR03048, US Environmental Protection Agency grant 83213901,
and National Institute of General Medical Sciences (NIGMS) grant S06GM08015,
and special thanks are given to A.V. Grant, L. Gao, C. Vergara, Y.J. Tsai, P. Gao,
M.C. Liu, P. Breysse, M.B. Bracken, J. Hoh, E.W. Pugh, A.F. Scott, G. Abecasis,
T. Murray, T. Hand, M. Yang, M. Campbell, C. Foster, J.B. Hetmanski, R. Ashworth,
C.M. Ongaco, K.N. Hetrick and K.F. Doheny. K.C.B. was supported in part by
the Mary Beryl Patch Turnbull Scholar Program. R.A.M. was supported in part by
the Mosaic Initiative Award from Johns Hopkins University. We thank C. Jaquish
and the NHLBI STAMPEED program for their support of this collaboration. We
also acknowledge G. Coop, A. di Rienzo, K. Lohmueller, and M. Przeworski for
helpful discussions and comments on a draft of the manuscript.
AUTHOR CONTRIBUTIONS
J.N. and N.B.F. conceived of the project, and D.W., J.N., N.B.F. and D.L.N. designed
the analyses. D.G.T. and D.L.N. worked as part of the Chicago Asthma Genetics
(CAG) and Collaborative Study on the Genetics of Asthma (CGSA) consortium to
gather and prepare primary data for subsequent analysis. R.A.M., L.R.Y., L.C.B. and
D.M.B. worked as part of the Genetic Study of Atherosclerosis Risk (GeneSTAR)
Consortium to gather and prepare primary data for subsequent analysis. I.R., N.R.,
R.A.M., T.H.B. and K.C.B. worked as part of the Genetic Research on Asthma in
the African Diaspora (GRAAD) consortium to gather and prepare primary data
for subsequent analysis. Y.V.S., T.M. and S.L.R.K. worked as part of the Genetic
Epidemiology Network of Arteriopathy (GENOA) consortium to gather and
prepare primary data for subsequent analysis. D.G.T. and D.A.M. worked as part
of the Severe Asthma Research Program (SARP) to gather and prepare primary
data for subsequent analysis. D.W., D.E.K., K.R.V. and J.N. developed tools for the
analysis and performed the analysis. D.W., N.B.F. and J.N. drafted the manuscript
and revised it with D.E.K., K.R.V., R.A.M., D.L.N., L.R.Y., Y.V.S., L.C.B., N.R., I.R.,
T.H.B., S.L.R.K., D.A.M., K.C.B. and D.M.B.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.
Published online at http://www.nature.com/naturegenetics/.
Reprints and permissions information is available online at http://www.nature.com/
reprints/index.html.
1. Crawford, D.C. et al. Evidence for substantial fine-scale variation in recombination
rates across the human genome. Nat. Genet. 36, 700–706 (2004).
2. Evans, D.M. & Cardon, L. A comparison of linkage disequilibrium patterns and
estimated population recombination rates across multiple populations. Am. J. Hum.
Genet. 76, 681–687 (2005).
3. Graffelman, J., Balding, D., Gonzalez-Neira, A. & Bertranpetit, J. Variation in estimated
recombination rates across human populations. Hum. Genet. 122, 301–310
(2007).
© 2011 Nature America, Inc. All rights reserved.
Nature GeNetics ADVANCE ONLINE PUBLICATION 7
ARTICLES
4. Serre, D., Nadon, R. & Hudson, T.J. Large-scale recombination rate patterns are
conserved among human populations. Genome Res. 15, 1547–1552 (2005).
5. Laayouni, H. et al. Similarity in recombination rate estimates highly correlates with
genetic differentiation in humans. PLoS ONE 6, e17913 (2011).
6. Clark, A.G., Wang, X. & Matise, T. Contrasting methods of quantifying fine structure
of human recombination. Annu. Rev. Genomics Hum. Genet. 11, 45–64 (2010).
7. Kong, A. et al. A high-resolution recombination map of the human genome. Nat.
Genet. 31, 241–247 (2002).
8. Kong, A. et al. Fine-scale recombination rate differences between sexes, populations
and individuals. Nature 467, 1099–1103 (2010).
9. Broman, K.W., Murray, J.C., Sheffield, V.C., White, R.L. & Weber, J.L. Comprehensive
human genetic maps: individual and sex-specific variation in recombination. Am.
J. Hum. Genet. 63, 861–869 (1998).
10. Coop, G., Wen, X., Ober, C., Pritchard, J.K. & Przeworski, M. High-resolution
mapping of crossovers reveals extensive variation in fine-scale recombination
patterns among humans. Science 319, 1395–1398 (2008).
11. Jorgenson, E. et al. Ethnicity and human genetic linkage maps. Am. J. Hum. Genet.
76, 276–290 (2005).
12. Ju, Y.S. et al. A genome-wide Asian genetic map and ethnic comparison: the
GENDISCAN study. BMC Genomics 9, 554 (2008).
13. International HapMap Consortium. et al. A second generation human haplotype
map of over 3.1 million SNPs. Nature 449, 851–861 (2007).
14. Myers, S., Bottolo, L., Freeman, C., McVean, G. & Donnelly, P.A. Fine-scale map
of recombination rates and hotspots across the human genome. Science 310,
321–324 (2005).
15. O’Reilly, P.F., Birney, E. & Balding, D.J. Confounding between recombination and
selection, and the Ped/Pop method for detecting selection. Genome Res. 18,
1304–1313 (2008).
16. McVean, G.A.T. et al. The fine-scale structure of recombination rate variation in the
human genome. Science 304, 581–584 (2004).
17. Price, A.L. et al. Sensitive detection of chromosomal segments of distinct ancestry
in admixed populations. PLoS Genet. 5, e1000519 (2009).
18. Johnson, A.D. et al. Genome-wide meta-analyses identifies seven loci associated with
platelet aggregation in response to agonists. Nat. Genet. 42, 608–613 (2010).
19. Daniels, P.R. et al. Familial aggregation of hypertension treatment and control in
the Genetic Epidemiology Network of Arteriopathy (GENOA) study. Am. J. Med. 116,
676–681 (2004).
20. FBPP Investigators. Multi-center genetic study of hypertension: The Family Blood
Pressure Program (FBPP). Hypertension 39, 3–9 (2002).
21. Barnes, K.C. et al. Linkage of asthma and total serum IgE concentration to markers
on chromosome 12q: evidence from Afro-Caribbean and Caucasian populations.
Genomics 37, 41–50 (1996).
22. Mathias, R.A. et al. A genome-wide association study on African-ancestry populations
for asthma. J. Allergy Clin. Immunol. 125, 336–346.e4 (2010).
23. Zambelli-Weiner, A. et al. Evaluation of the CD14/-260 polymorphism and house
dust endotoxin exposure in the Barbados Asthma Genetics Study. J. Allergy Clin.
Immunol. 115, 1203–1209 (2005).
24. Moore, W.C. et al. Characterization of the severe asthma phenotype by the National
Heart, Lung, and Blood Institute’s Severe Asthma Research Program. J. Allergy
Clin. Immunol. 119, 405–413 (2007).
25. Bryc, K. et al. Genome-wide patterns of population structure and admixture in West
Africans and African Americans. Proc. Natl. Acad. Sci. USA 107, 786–791
(2010).
26. Murray, T. et al. African and non-African admixture components in African Americans
and an African Caribbean population. Genet. Epidemiol. 34, 561–568 (2010).
27. Parra, E.J. et al. Estimating African American admixture proportions by use of
population-specific alleles. Am. J. Hum. Genet. 63, 1839–1851 (1998).
28. Tang, H. et al. Recent genetic selection in the ancestral admixture of Puerto Ricans.
Am. J. Hum. Genet. 81, 626–633 (2007).
29. Antonacci, F. et al. Characterization of six human disease-associated inversion
polymorphisms. Hum. Mol. Genet. 18, 2555–2566 (2009).
30. Deng, L. et al. An unusual haplotype structure on human chromosome 8p23 derived
from the inversion polymorphism. Hum. Mutat. 29, 1209–1216 (2008).
31. Giglio, S. et al. Olfactory receptor-gene clusters, genomic-inversion polymorphisms, and
common chromosome rearrangements. Am. J. Hum. Genet. 68, 874–883 (2001).
32. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human
genomes. Nature 453, 56–64 (2008).
33. Redon, R. et al. Global variation in copy number in the human genome. Nature 444,
444–454 (2006).
34. Stefansson, H. et al. A common inversion under selection in Europeans. Nat. Genet.
37, 129–137 (2005).
35. Bergström, T.F., Josefsson, A., Erlich, H.A. & Gyllensten, U. Recent origin of HLA-
DRB1 alleles and implications for human evolution. Nat. Genet. 18, 237–242
(1998).
36. The 1,000 Genomes Project Consortium. A map of human genome variation from
population-scale sequencing. Nature 467, 1061–1073 (2010).
37. Berg, I.L. et al. PRDM9 variation strongly influences recombination hot-spot activity
and meiotic instability in humans. Nat. Genet. 42, 859–863 (2010).
38. Baudat, F. et al. PRDM9 is a major determinant of meiotic recombination hotspots
in humans and mice. Science 327, 836–840 (2010).
39. Bansal, V., Bashir, A. & Bafna, V. Evidence for large inversion polymorphisms in
the human genome from HapMap data. Genome Res. 17, 219–230 (2007).
40. Price, A.L. et al. Long-range LD can confound genome scans in admixed populations.
Am. J. Hum. Genet. 83, 132–135, author reply 135–139 (2008).
41. Feuk, L. et al. Discovery of human inversion polymorphisms by comparative analysis
of human and chimpanzee DNA sequence assemblies. PLoS Genet. 1, e56 (2005).
42. Conrad, D.F. et al. Origins and functional impact of copy number variation in the
human genome. Nature 464, 704–712 (2010).
43. Itsara, A. et al. Population analysis of large copy number variants and hotspots of
human genetic disease. Am. J. Hum. Genet. 84, 148–161 (2009).
44. Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37,
727–732 (2005).
45. McKernan, K.J. et al. Sequence and structural variation in a human genome
uncovered by short-read, massively parallel ligation sequencing using two-base
encoding. Genome Res. 19, 1527–1541 (2009).
46. Zogopoulos, G. et al. Germ-line DNA copy number variation frequencies in a large
North American population. Hum. Genet. 122, 345–353 (2007).
47. de Smith, A.J. et al. Array CGH analysis of copy number variation identifies 1,284
new genes variant in healthy white males: implications for association studies of
complex diseases. Hum. Mol. Genet. 16, 2783–2794 (2007).
© 2011 Nature America, Inc. All rights reserved.
Nature GeNetics doi:10.1038/ng.894
ONLINE METHODS
Samples and genotyping. We inferred relative recombination rates from
African-descendant admixed samples (predominantly African Americans)
gathered from four independent projects: GeneSTAR18, GENOA19,20,
GRAAD21–23, and SARP and CAG-CSGA24. A detailed description of each
sample is provided in the Supplementary Note. For the recombination rate
inference, we excluded pedigree-related individuals and obtained a total of
2,864 unrelated African-American samples. GRAAD is unique in having
938 individuals sampled from the United States (from Baltimore, Maryland
and Washington, DC), which we refer to as the GRAADi sample, and 299
individuals sampled from Barbados, which we refer to as the GRAADii sample.
When we repeated this inference after excluding all GRAADii samples, the rate
estimates were largely unchanged (the correlation between estimates without
and without GRAADii samples were well above 99%, independent of scale;
Supplementar y Note and Supplementary Fig. 12).
The samples were typed on the Illumina Human1M-Duo (SARP and
CAG-CSGA), Illumina Human 1Mv1C (GeneSTAR), Illumina Human650Y
(GRAAD) and Affymetrix 6.0 (GENOA) platforms. Because they differ in
the set of available SNPs and there are concerns about merging data, we took
several steps to make sure to conservatively merge the data, in particular
attempting to avoid allele strand flip issues (Supplementar y Note and
Supplementary Fig. 13).
Reference panels. In line with previous reports25 , we found in explora-
tory principal component analysis plots that our admixed sample stratifies
between the African (YRI) and European (CEU) populations from HapMap3
(Supplementary Fig. 6). In our analysis, we thus used 234 and 230 phased
haplotypes from the CEU and YRI samples, respectively, available from the
HapMap project (Supplementary Note).
Simulations for validation. We generated a total of 120 Mb of data, consist-
ing of 6-Mb segments randomly chosen from each of the chromosomes 1
through 20. For each of those segments, we simulated a model widely used
in the population genetic literature for African Americans (for example, see
refs. 48–50): a diploid, randomly mating population of 20,000 individuals
followed forward in time for seven non-overlapping generations, where the first
generation was 80% African and 20% European individuals. Recombination
events were placed along the segments following a 50%:50% average of the
HapMapCEU and HapMapYRI maps. Founder haplotypes were generated
using MACS51 and assumed a demographic model previously proposed52,
with recombination following the same map as used above. The resulting SNPs
were sub-sampled to match the corresponding SNP densities among our sam-
ples and the frequency spectra of the CEU and YRI HapMap samples. We
also selected 230 and 234 phased haplotypes randomly from the African and
European samples, respectively, to serve as reference panels. For investigating
reference panel bias, we inferred recombination maps from the pattern of LD
present in the reference panels using LDhat16. See the Supplementary Note
for more information.
Inference of ancestr y switch points and relative recombination rates.
Our initial approach was based on summing the posterior mean number of
ancestr y switch events across individuals. For an interval on the chromo-
some between SNP markers j and k, define
cjk
i( )
as a variable that takes on the
values 0, 1 or 2 depending on whether there is an ancestr y switch on neither,
one or both chromosomes between markers j and k. Given genotype data
for individual i (D(i)), a set of reference haplotypes from two source popula-
tions (H) and admixture parameters (
θ
, for example, time since admixture),
the posterior mean of
cjk
i( )
, (that is,
E c D H
jk
i i
[ | , , ]
( ) ( ) q
, which we denote
cjk
i( )
) can
be computed under probabilistic models of admixture. Here we developed
algorithms for computing
cjk
i( )
using the HMM-based models for admixture
introduced in a previous study17 (Supplementary Note). Our first estimator
of a relative recombination rate between markers j and k then is:
c c
jk jk
i
i
N
=
=
1
( )
where N is the number of sampled individuals.
Although straightforward, this approach computes
cjk
i( )
based only on informa-
tion on that single individual, and as in many statistical inference problems,
power can be gained by pooling information across individuals. In addition,
this approach does not account for ‘multiple hits’. For example, if an even
number of ancestry switch events takes place between markers j and k on both
chromosomes,
cjk
i( )
will be 0, despite the unobser ved ancestry switch events.
By simulation, we found that both of these factors hinder this method from
accurate inference in regions of high recombination.
To improve upon this method, we developed a post-processing step that
reframes the inference in an empirical Bayes framework and corrects for the
multiple hits problem. Define
sjk
i( )
as the number of switch events between
markers j and k (which takes values in {0,1,2,3,…}). Because
cjk
i( )
is a highly
informative summary statistic of an individuals genotype data, we can perform
inference on
sjk
i( )
based on
cjk
i( )
rather than the original data D. Specifically, we
use Bayes Theorem to compute
E s c
jk
i
jk
i
[ | ]
( ) ( )
as
E s
d p s d p s d
c
c
jk
idjk
i
jk
i
d
jk
ijk
i
( | )
( | ) ( )
( )
( ) ( )
( )
( )
=
= ⋅ =
=
=
0
0
= ⋅ =p s d p s dcjk
i
jk
i
jk
i
( | ) ( ) .
( ) ( ) ( )
The likelihood
p s dcjk
i
jk
i
( | )
( ) ( ) =
is difficult to obtain analytically, and so we
approximated its value using simulations (Supplementary Note). Pooling of
information across individuals enters by an empirical Bayes approach in which
we set the prior
p s d
jk
i
( )
( ) =
according to an initial estimator based on the same
data. In this case, we set
p s d c
jk
ijk
( )
( ) = ∝
(Supplementary Note). The posterior
expectation on the total number of switch points across all N individuals (Sjk)
is then given by
r E S c c E s c
jk jk jk jk
N
i
N
jk
i
jk
i
= … =
=
[ | , , ] [ | ].
( ) ( ) ( ) ( )1
1
Although this approach does not detect recombination events between chro-
mosomal segments of similar ancestry, the number of ancestry switch events
is expected to be proportional to the recombination rate in the region, and so
we use rjk as a relative rate estimator of recombination.
Simulations show our empirical Bayes method results in substantially
improved estimates of relative recombination rates (Supplementary Fig. 3),
and hence, we present results only for the empirical Bayes approach.
The computations giving rise to the inferred rjk require assumptions about
several parameters of the HMM, such as the time since admixture and the
population miscopying rate (Supplementary Note). The results shown are
for a set of parameters previously suggested17 for African-American samples
(Supplementary Table 4). We also investigated whether alternative parameters
would result in improved performance but found that the suggested param-
eters worked as well as or better than reasonable alternatives (Supplementary
Note and Supplementary Fig. 14).
Accommodating disparate marker inter vals and construction of recombi-
nation maps. In the above presentation, we ignored that not all individuals
have markers genotyped on the same intervals. To address this, if we are esti-
mating recombination in an interval between markers at physical coordinates
e and f, we take the convention of replacing
cjk
i( )
with
c c
ef
i
j j
i
j
L
e f j
( ) ( )
( )
,,= ⋅
=
+
1
1
1
a
where the sum runs over all L markers typed in individual i, and
α
e f, j is the
proportional overlap between the interval [e, f] and the interval defined by
markers j and j + 1 (that is,
α
e f , j [0, 1]). This adjustment to
cjk
i( )
is a form
of linear interpolation.
We generated maps with constant interval sizes of 10 kb, 15 kb, 20 kb,
33 kb, 50 kb, 75 kb, 100 kb, 150 kb, 200 kb, 250 kb, 333 kb, 500 kb, 750 kb,
1 Mb and 3 Mb. Whereas we used non-overlapping intervals to compute all
reported metrics (such as correlations), we used maps where the midpoints
of the intervals were always shifted by 5% of the interval size to find intervals
with largest differences between maps and for plotting. In addition, for plot-
ting, we scaled the map so that the total length of our map corresponded to a
(1)(1)
(2)(2)
(3)(3)
© 2011 Nature America, Inc. All rights reserved.
Nature GeNetics
doi:10.1038/ng.894
rate of 1.04 cM/Mb (in line with the total length of the sex-averaged map from
ref. 8). The inferred recombination maps are available on the Novembre group
webpage (see URLs).
Comparison to existing recombination maps. We compared the recombination
map inferred from the African-American and African-Caribbean dataset to four
existing, fine-scaled recombination maps. The HapMapCEU and HapMapYRI
maps, two widely used maps based on patterns of LD in HapMap populations,
were obtained from the IMPUTE website53 (see URLs). We also downloaded
the pedigree-based deCODE map8 (see URLs). For all these maps, we recom-
puted maps of various interval sizes matching those maps generated from our
African-American samples by interpolation. Further, we discarded the first
5 Mb on each telomeric end of every chromosome and all centromeric loca-
tions (Supplementary Note). Intervals overlapping unsequenced regions of the
human reference genome were discarded following previous studies8. Note that
the intervals of our non-overlapping 10-kb map precisely match those of the
deCODE map. Correlation figures between maps are based on Pearsons correla-
tion coefficients. To avoid bias when comparing published maps at scales below
1 Mb, we trimmed the 20% intervals with lowest estimated rates because we found
the estimation errors of the LD maps and the switch-point–based map to be
correlated at small scales (Supplementary Note and Supplementary Fig. 7).
Analsyis of AfAdm maps as a weighted average of the HapmapYRI and
HapMapCEU maps. Let A, Y and C represent the AfAdm, HapMapYRI and
HapMapCEU rates within an interval. We fit a model in which A is a convex
linear combination of the Y and C maps: A = aY + (1 − a)C. To estimate a, note that
we can subtract C from both sides to obtain (AC) = a(Y C) and, hence, use a
linear regression of (AC) on (Y C) to estimate a. For the regression approach
to compare the AfAdm map with the HapMapCEU and HapMapYRI maps, we
computed robust regressions with the rlm function in the MASS package in R.
48. Long, J.C. The genetic structure of admixed populations. Genetics 127, 417–428
(1991).
49. Pfaff, C.L. et al. Population structure in admixed populations: effect of admixture
dynamics on the pattern of linkage disequilibrium. Am. J. Hum. Genet. 68, 198–207
(2001).
50. Pool, J.E. & Nielsen, R. Inference of historical changes in migration rate from the
lengths of migrant tracts. Genetics 181, 711–719 (2009).
51. Chen, G.K., Marjoram, P. & Wall, J.D. Fast and flexible simulation of DNA sequence
data. Genome Res. 19, 136–142 (2009).
52. Schaffner, S.F. et al. Calibrating a coalescent simulation of human genome sequence
variation. Genome Res. 15, 1576–1583 (2005).
53. Howie, B.N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation
method for the next generation of genome-wide association studies. PLoS Genet.
5, e1000529 (2009).
    • We used the program RASPberry (Wegmann et al. 2011). For the inference of local ancestry, we assumed a constant default recombination rate of 5 cM/Mb, following estimates for P. trichocarpa by Tuskan et al. (2006), but identified suitable values for the remaining model parameters by choosing the combination resulting in the highest likelihood.
    [Show abstract] [Hide abstract] ABSTRACT: Natural hybrid zones have proven to be precious tools for understanding the origin and maintenance of reproductive isolation (RI) and therefore species. Most available genomic studies of hybrid zones using whole or partial genome resequencing approaches have focused on comparisons of the parental source populations involved in genome admixture, rather than exploring fine-scale patterns of chromosomal ancestry across the full admixture gradient present between hybridizing species. We have studied three well-known European 'replicate' hybrid zones of Populus alba and P. tremula, two wide-spread, ecologically divergent forest trees, using up to 432 505 Single Nucleotide Polymorphisms (SNPs) from Restriction site Associated DNA (RAD) sequencing. Estimates of fine-scale chromosomal ancestry, genomic divergence, and differentiation across all 19 poplar chromosomes revealed strikingly contrasting results, including an unexpected preponderance of F1 hybrids in the centre of genomic clines on the one hand, and genomically localized, spatially variable shared variants consistent with ancient introgression between the parental species on the other. Genetic ancestry had a significant effect on survivorship of hybrid seedlings in a common garden trial, pointing to selection against early-generation recombinants. Our results indicate a role for selection against recombinant genotypes in maintaining RI in the face of apparent F1 fertility, consistent with the intra-genomic 'coadaptation' model of barriers to introgression upon secondary contact. Whole genome resequencing of hybridizing populations will clarify the roles of specific genetic pathways in RI between these model forest trees and may reveal which loci are affected most strongly by its cyclic breakdown. This article is protected by copyright. All rights reserved.
    Full-text · Article · Feb 2016
    • For P. balsamifera, we used accessions from 46 provenances throughout the species range obtained from the Agriculture and Agri-Food Canada AgCanBaP collection (Soolanayakanahally et al. 2009). For local ancestry analysis in RASPberry (Wegmann et al. 2011), 50 reference individuals and 68 admixed individuals (36 P. trichocarpa individuals with P. balsamifera admixture, and 32 P. balsamifera individuals with P. trichocarpa admixture) were selected from the sympatric zone between P. trichocarpa and P. balsamifera as well as from allopatric populations (Figure 1). These 118 individuals were selected from a collection of 435 P. trichocarpa and 448 P. balsamifera genotypes, using a previous genome wide admixture analysis (see Supporting Information: Materials and Methods; Geraldes et.
    [Show abstract] [Hide abstract] ABSTRACT: Natural hybrid zones in forest trees provide systems to study the transfer of adaptive genetic variation by introgression. Previous landscape genomic studies in Populus trichocarpa, a keystone tree species, indicated genomic footprints of admixture with its sister species P. balsamifera and identified candidate genes for local adaptation. Here, we explored patterns of introgression and signals of local adaptation in P. trichocarpa and P. balsamifera, employing genome resequencing data from three chromosomes in pure species and admixed individuals from wild populations. Local ancestry analysis in admixed P. trichocarpa revealed a telomeric region in chromosome 15 with P. balsamifera ancestry, containing several candidate genes for local adaptation. Genomic analyses revealed signals of selection in certain genes in this region (e.g. PRR5, COMT1), and functional analyses based on gene expression variation and correlations with adaptive phenotypes suggest distinct functions of the introgressed alleles. In contrast, a block of genes in chromosome 12 paralogous to the introgressed region showed no signs of introgression or signatures of selection. We hypothesize that the introgressed region in chromosome 15 has introduced modular, or cassette-like variation into P. trichocarpa. These linked adaptive mutations are associated with a block of genes in chromosome 15 that appear to have undergone neo- or sub-functionalization relative to paralogs in a duplicated region on chromosome 12 that show no signatures of adaptive variation. The association between P. balsamifera introgressed alleles with the expression of adaptive traits in P. trichocarpa supports the hypothesis that this is a case of adaptive introgression in an ecologically important foundation species. This article is protected by copyright. All rights reserved.
    Full-text · Article · Jan 2016
    • corresponds to the per-base L 2 error. We also show the correlation of ρ andˆρandˆ andˆρ at different scales, as in Wegmann et al. (2011), so that Cor B (ρ, ˆ ρ) is the correlation of the true and estimated recombination rates over a physical distance of B bases, evaluated at the positions 0, B, 2B, . . . , 1 Mb.
    [Show abstract] [Hide abstract] ABSTRACT: Two-locus sampling probabilities have played a central role in devising an efficient composite likelihood method for estimating fine-scale recombination rates. Due to mathematical and computational challenges, these sampling probabilities are typically computed under the unrealistic assumption of a constant population size, and simulation studies have shown that resulting recombination rate estimates can be severely biased in certain cases of historical population size changes. To alleviate this problem, we develop here two distinct methods to compute the sampling probability for variable population size functions that are piecewise constant. The first is a novel formula that can be evaluated by numerically exponentiating a large but sparse matrix. The second method is importance sampling on genealogies, based on a characterization of the optimal proposal distribution that extends previous results to the variable-size setting. The resulting proposal distribution is highly efficient, with an average effective sample size (ESS) of nearly 98% per sample. Through a simulation study, we show that accounting for population size changes improves inference of recombination rates.
    Article · Oct 2015 · BioEssays
    • The protein encoded by this gene, the PR domain zinc finger 9 (PRDM9)[17], is a meioticspecific histone (H3) methyltransferase with a C-terminal tandem repeat zinc finger (ZnF) domain that accumulates at recombination sites through its recognition of a speciesspecific and highly mutagenic repetitive DNA motif[17 – 19]. In humans, the high nucleotide variation detected at this locus suggests that the protein (together with the repetitive DNA motif that it recognizes) is highly mutable enabling binding to new motifs as soon as they are formed[20,21]. Studies in laboratory strains of mice, moreover, have revealed that the PRDM9 protein could be directly involved in the recruitment of the recombination initiation machinery during meiosis[22].
    File · Data · Apr 2015 · BioEssays
    • This confirms that hotspots are highly transient[23,24]with a turnover time of their location shorter than the total divergence time of humans and chimpanzees. In contrast, the recombination rate at the scale of megabases is remarkably similar among human populations[14,15]and comparison to chimpanzee has revealed a strong correlation of recombination rate at this scale[22]. The discrepancy between divergence on fine scale and conservation on large scale indicates different forms of control of recombination rate on different scales.
    [Show abstract] [Hide abstract] ABSTRACT: Recombination maps of ancestral species can be constructed from comparative analyses of genomes from closely related species, exemplified by a recently published map of the human-chimpanzee ancestor. Such maps resolve differences in recombination rate between species into changes along individual branches in the speciation tree, and allow identification of associated changes in the genomic sequences. We describe how coalescent hidden Markov models are able to call individual recombination events in ancestral species through inference of incomplete lineage sorting along a genomic alignment. In the great apes, speciation events are sufficiently close in time that a map can be inferred for the ancestral species at each internal branch - allowing evolution of recombination rate to be tracked over evolutionary time scales from speciation event to speciation event. We see this approach as a way of characterizing the evolution of recombination rate and the genomic properties that influence it.
    Full-text · Article · Sep 2014
    • Unlinked 1 kb genome segments can yield genealogical information by applying a coalescent-based scheme (Burgess and Yang, 2008) to estimate ancient population sizes, divergence times and gene flow from single samples representative of entire populations (Gronau et al., 2011). Recent admixture can be inferred without estimating LD by exploiting recombination and haplotype switch points (Wegmann et al., 2011) and can be extended to the origin, dispersal and spread of infectious microbes over space and time (Lemey et al., 2010). An extensive variety of tools for scrutinizing bacterial population structure and recombination have been developed (http://pubmlst.org/software/).
    [Show abstract] [Hide abstract] ABSTRACT: The transformation of DNA sequencing technologies has enabled more powerful and comprehensive genetic profiling of microbes. The sheer number of informative loci provided by genome-sequencing allows the investigation of structural variation and horizontal gene transfer as well as delivering novel insights into genetic origins, evolution and epidemiological history. Microbial genomes can be sequenced en masse at high coverage but have associated challenges of high mutation rates and low conservation of genome structure. Consequently, detecting changes in DNA sequences requires a nuanced approach specific to the organism, availability of similar genomes, and types of variation. Here, we outline the high power of genome-sequencing to detect a wide scope of polymorphism classes. Samples without related species on which to scaffold a genome sequence require specific assembly methods that can be enhanced by progressive procedures for improvement. Polymorphism identification depends on genome structure, and error rates in closely related specimens can be reduced by incorporating population-level information. The development of genome analysis platforms is hastening the optimization of variant discovery and has direct applications for pathogen surveillance. Robust variant screening facilitates more sensitive scrutiny of population history, including the origin and emergence of infectious agents, and a deeper understanding of the selective processes that shape microbial phenotypes.
    Full-text · Chapter · Jul 2014 · BioEssays
Show more
Article
August 2011 · Nature Reviews Genetics · Impact Factor: 36.98
    Studies of genome-wide patterns of human recombination have so far been limited to a small number of populations. Two new studies — one in African Americans and the other in African Americans and African Caribbeans — highlight recombination differences between human populations and hint at their molecular genetic basis.
    Article
    August 2011 · Nature Genetics · Impact Factor: 29.35
      Advances in both pedigree-based and population-based genetic maps in recent years have helped unravel some of the mysteries of human meiotic recombination. The publication of the first admixture-derived human genetic maps offers a new approach for inferring recombination events and provides insight into variation in recombination rate patterns across populations.
      Article
      December 2013 · Bioinformatics · Impact Factor: 4.98
        Summary: forqs is a forward-in-time simulation of recombination, quantitative traits and selection. It was designed to investigate haplotype patterns resulting from scenarios where substantial evolutionary change has taken place in a small number of generations due to recombination and/or selection on polygenic quantitative traits.Availability and implementation: forqs is implemented as a... [Show full abstract]
        Article
        January 2012
          Background: A whole-genome genotyping array has previously been developed for Malus using SNP data from 28 Malus genotypes. This array offers the prospect of high throughput genotyping and linkage map development for any given Malus progeny. To test the applicability of the array for mapping in diverse Malus genotypes, we applied the array to the construction of a SNP-based linkage map of an... [Show full abstract]
          Discover more