Copyright 2001 by the Genetics Society of America
Recombination, Balancing Selection and Phylogenies in MHC
and Self-Incompatibility Genes
Mikkel H. Schierup, Anders M. Mikkelsen and Jotun Hein
Bioinformatics Research Center (BiRC), Department of Ecology and Genetics, University of Aarhus, 8000 Aarhus C., Denmark
Manuscript received May 23, 2001
Accepted for publication October 1, 2001
as a function of recombinational distance from a site under selection is investigated. We find that the
shape of the phylogenetic treeis independent of the distance to the site under selection.Only the timescale
changes from the value predicted by Takahata’s allelic genealogy at the site under selection, converging
with increasing recombination to the timescale of the neutral coalescent. However, if nucleotide sequences
are simulated over a recombining region containing a site under balancing selection, a phylogenetic tree
constructed while ignoring such recombination is strongly affected. This is true even for small rates of
recombination. Published studies of multiallelic balancing selection, i.e., the major histocompatibility
bility of fungi, all observe allelic genealogies with unexpected shapes. We conclude that small absolute
levels ofrecombination are compatible withthese observed distortionsof the shape ofthe allelic genealogy,
suggesting a possible cause of these observations. Furthermore, we illustrate that the variance in the
coalescent with recombination process makes it difficult to locate sites under selection and to estimate
the selection coefficient from levels of variability.
These systems include gametophytic (Emerson 1939)
and sporophytic (Kusaba et al. 1997) self-incompatibil-
ity systems in plants, incompatibility systems in fungi
(May and Matzke 1995), and some of the MHC genes
in vertebrates (Anderson et al. 1986; Hughes and Nei
are maintained at intermediate frequencies and nucleo-
tide sequence variation among alleles often exceeds 30%.
of polymorphism. Overdominant selection with (close
to) equal selection coefficient is sufficient (and appears
necessary) to explain the data (Takahata and Nei 1990).
With incompatibility systems, the polymorphism can be
explained by the inherent selection (Vekemans and
Slatkin 1994; Schierup et al. 1998).
Population genetics theory has successfully explained
important aspect of the pattern of polymorphism in
these systems is not yet well understood. This is the
shape of the phylogenetic tree of the alleles. Takahata
(1990) showed for symmetrical overdominance that the
allelic genealogy (i.e., the phylogeny of functionally dif-
OCI under multiallelic balancing selection are the
most polymorphic genes known in eukaryotes.
ferent alleles) can be approximated well by a Moran
process with a constant number of allelic classes with
equal death rates. Such a Moran process satisfies the
assumptions of the neutral coalescent (Kingman 1982)
with time scaled appropriately through a scaling factor
fS. This implies that the phylogenetic tree of alleles has
the same expected shape as the neutral coalescent, dif-
fering only in the timescale. Extension to gametophytic
self-incompatibility has shown that fSis very large (?1000)
for realistic population sizes and mutation rates to new
specificities (Vekemans and Slatkin 1994). Uyeno-
yama (1997) characterized the shape of the phyloge-
netic tree through four ratios calculated from the
branch lengths of the trees and scaled to have approxi-
mate means of one under the neutral coalescent with
no recombination. She found by simulation that the
values of these ratios for allelic genealogies of gameto-
dent of the overall sequence variability (i.e., the mutation
rate). Allelic genealogies in sporophytic self-incompati-
bility (Schierup et al. 1998) and fungal incompatibility
(May et al. 1999) are also expected to have a shape
closeto theneutralcoalescent,when measuredthrough
However, when these ratios are applied to real se-
quence data of functionally different alleles, they show
significant deviations from coalescent expectations
1999; Table 1; for definition of ratios, see statistics).
The main deviation is that the terminal branches are
much longer than expected (RSD? 1). This pattern of
Corresponding author: Mikkel H. Schierup, Bioinformatics Research
Center (BiRC), Department of Ecology and Genetics, University of
Aarhus, Ny Munkegade, Bldg. 540, 8000 Aarhus C., Denmark.
Genetics 159: 1833–1844 (December 2001)
1834M. H. Schierup, A. M. Mikkelsen and J. Hein
deviation is remarkably consistent over the four differ-
ent kinds of systems, even though these are based on
completely different molecular mechanisms. Two hypoth-
eses have been put forward to explain this observation.
zygosity in gametophytic self-incompatibility (SI) leads
to accumulation of recessive deleterious variants
through sheltering. The probability of invasion and the
retention time of a new specificity would then decrease
over time because it would be selected against when
forming heterozygotes with the specificity it arose from.
Richman and Kohn (1999) suggested, on the basis of
a statistical analysis of phylogenetic trees of alleles, that
divergent alleles are preferentially maintained in game-
tophytic SI. However, these two hypotheses have not yet
been quantitatively investigated theoretically. Either of
them is not likely to play an equally strong role in each
of the four distinct systems. For example, homozygotes
can be formed in the MHC and sporophytic SI but
not in gametophytic SI, and Uyenoyama’s hypothesis
quantitatively depends on the frequency of homozy-
Takahata’s (1990) allelic genealogy is an infinite al-
leles model that treats alleles as entities that cannot be
broken up by recombination. It is thus an important
assumption for application of this theory to sequence
data that recombination does not occur. However, in-
tragenic recombination/gene conversion has been re-
ported within genes of the MHC (Bergstrom et al.
1998), gametophytic self-incompatibility (Wang et al.
2001), sporophytic self-incompatibility (Awadalla and
Charlesworth 1999; Schierup et al. 2001), and fungal
incompatibility (May and Matzke 1995). In light of
this it is important to investigate how violation of the
no-recombination assumption affects genealogical in-
Here, the expected effects of recombination on the
shape of the genealogy are investigated. We simulate a
simple model of multiallelic selection with recombina-
tion using an extension of Hudson’s (1983) algorithm
for the coalescent with recombination. The main as-
sumption is that variation in a single nonrecombining
spot on the sequence is subject to selection. This spot
could either be a single nucleotide site or a collection of
adjacent nucleotides forming a specificity-determining
First, we investigate the genealogy as a function of the
recombination distance from the spot under selection.
At the spot under selection, we expect a neutral-shaped
genealogy of alleles with an extended timescale, which
depends on the number of allelic classes and the muta-
tion rate to new specificities (Takahata 1990). Suffi-
ciently far from the spot under selection we expect that
genealogical trees of sequences are determined by the
transforms into the neutral coalescent. Does the shape
Analysis of population data sets of self-recognition systems
PIST test of
P ? 0.001
Bergstrom et al. (1998)
Rattus fuscipes greyii
P ? 0.001
P ? 0.05
Seddon and Baverstock (1999)
P ? 0.05
May et al. (1999)
Richman and Kohn (1999)
P ? 0.01
Richman et al. (1996)
P ? 0.05
Richman et al. (1995)
Wang et al. (2001)
P ? 0.05
Schierup et al. (2001)
P ? 0.01
P ? 0.001
Kusaba et al. (1997)
P ? 0.01
P ? 0.001
Kusaba et al. (1997)
aSequences were downloaded from GenBank and aligned with ClustalX (Thompson et al. 1997). Trees were reconstructed using DNAdist with F81 model and Kitsch
(Felsenstein 1995), and the four statistics were calculated from the branch lengths. Other reconstruction methods gave very similar results.
bValues of the four statistics were taken from the literature.
cThis test is the informative sites test of Worobey (2001). The test was performed using the PIST 1.0 software, following closely the recommendations of Worobey (2001),
including using PAUP* (Swofford 2000). NS, nonsignificant.
dThis test followed Awadalla et al. (1999) closely, except that only sites with ?30% frequency were included. Significance was assessed by 1000 permutations of the
variable sites. NS, nonsignificant.
1843 Recombination and Phylogenies
Dwyer, K. G., M. A. Balent, J. B. Nasrallah and M. E. Nasrallah,
1991 DNA sequences of self-incompatibility genes from Brassica
campestris and B. oleracea: polymorphism predating speciation.
Plant Mol. Biol. 16: 481–486.
Emerson, S., 1939A preliminary survey of the Oenothera organensis
population. Genetics 24: 524–537.
Felsenstein, J., 1995 PHYLIP (Phylogeny Inference Package) Version
3.572. Distributed over the Worldwide Web, Seattle.
Griffiths, R. C., and P. Marjoram, 1996
samples of DNA sequences with recombination. J. Comput. Biol.
Hudson, R. R., 1983 Properties of a neutral allele model with intra-
genic recombination. Theor. Popul. Biol. 23: 183–201.
Hudson, R. R., and N. L. Kaplan, 1988
Hughes, A. L., and M. Nei, 1988
at major histocompatibility complex class I loci reveals overdomi-
nant selection. Nature 335: 167–170.
Jukes, T. H., and C. R. Cantor, 1969
Munro. Academic Press, New York.
Kingman, J. F. C., 1982The coalescent. Stoch. Proc. Appl. 13: 235–
Kusaba, M., T. Nishio, Y. Satta, K. Hinata and D. Ockendon, 1997
Striking sequence similarity in inter- and intra-specific compari-
sons of class I SLG alleles from Brassica oleracea and Brassica camp-
estris: implications for the evolution and recognition mechanism.
Proc. Natl. Acad. Sci. USA 94: 7673–7678.
May, G., and E. Matzke, 1995Recombination and variation at the
a mating-type of Coprinus-cinereus. Mol. Biol. Evol. 12: 794–802.
May, G., F. Shaw, H. Badrane and X. Vekemans, 1999
ture of balancing selection: fungal mating compatibility gene
evolution. Proc. Natl. Acad. Sci. USA 96: 9172–9177.
Nishio, T., and M. Kusaba, 2000
SRK in Brassica oleracea L. Ann. Bot. 85 (Suppl. A): 141–146.
Richman, A. D., and J. R. Kohn, 1999
genetic polymorphisms. Proc. Natl. Acad. Sci. USA 96: 168–172.
Richman, A. D., T.-H. Kao, S. W. Schaeffer and M. K. Uyenoyama,
1995S-allele sequence diversity in natural populations of Sola-
num carolinense (Horsenettle). Heredity 75: 405–415.
Richman, A. D., M. K. Uyenoyama and J. R. Kohn, 1996
(ground cherry) assessed by RT-PCR. Heredity 76: 497–505.
on traditional phylogenetic analysis. Genetics 156: 879–891.
tionary dynamics of sporophytic self-incompatibility alleles in
plants. Genetics 147: 835–846.
genealogies in sporophytic self-incompatibility systems in plants.
Genetics 150: 1187–1198.
Schierup, M. H., B. K. Mable, P. Awadalla and D. Charlesworth,
2001Identification and characterization of a polymorphic re-
ceptor kinase gene linked to the self-incompatibility locus of
Arabidopsis lyrata. Genetics 158: 387–399.
Seddon, J. M., and P. R. Baverstock, 1999
major histocompatibility complex (MHC) polymorphism in pop-
ulations of the Australian bush rat. Mol. Ecol. 8: 2071–2079.
Sims, T. L., 1993Genetic regulation of self-incompatibility. Crit.
Rev. Plant Sci. 12: 129–167.
Swofford, D. L., 2000PAUP*. Phylogenetic Analysis Using Parsi-
mony (*and Other Methods), Version 4. Sinauer Associates, Sunder-
Takahata, N., 1990A simple genealogical structure of strongly
balanced allelic lines and transspecies evolution of polymor-
phism. Proc. Natl. Acad. Sci. USA 87: 2419–2423.
Takahata, N., and M. Nei, 1990
nant and frequency-dependent selection and polymorphism of
major histocompatibility complex loci. Genetics 124: 967–978.
nation at HLA loci. Immunogenetics 47: 430–441.
Takahata, N., Y. Satta and J. Klein, 1992
ancing selection at major histocompatibility complex loci. Genet-
ics 130: 925–938.
Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin and
0.01 in some self-recognition systems, then some of the
conclusions based on allelic phylogenies of self-incom-
patibility systems and MHC systems should be carefully
reconsidered. The level of trans-specific evolution (TSE;
i.e., polymorphism shared between species) is very high
in both MHC (Ayala 1995) and self-incompatibility
(Richman and Kohn 1999; Nishio and Kusaba 2000).
There is no doubt that some of allelic lineages diverged
prior to speciation, but, since ignoring recombination
leads to longer terminal branches in the inferred tree,
one may greatly overestimate the number of such trans-
specific lineages. If, e.g., a molecular clock is applied,
this implies that more lineages appear to coalesce in
the common ancestor of the species. In MHC, sequence
data from introns (Bergstrom et al. 1998) have shown
that trans-specific polymorphism of DRB1 in humans
and chimpanzee is significantly smaller than estimated
from the exon data (Ayala 1995), in good agreement
with evidence for gene conversion in the exons (Berg-
bility systems may be similarly affected. This has implica-
tions for methods of paleogenetics (Takahata et al.
1992), where the number of trans-specific lineages can
be used to estimate long-term evolutionary parameters.
A final consequence of recombination is that the in-
tensity of selection is very difficult to estimate. Figure 2
showed that stronger selection affects only a very minor
part of the sequence because the rest of the sequence
“escapes” the balancing selection through recombina-
tion. Thus, that selection is acting can be inferred from
an increased level of polymorphism but the strength
and location of selection cannot be determined with
accuracy when recombination occurs.
Ancestral inference from
The coalescent process in
Pattern of nucleotide substitution
Evolution of protein mole-
Sequence diversity of SLG and
M.H.S. thanks D. Charlesworth and P. Awadalla for continued dis-
cussions. X. Vekemans, F. B. Christiansen, Marcy K. Uyenoyama, and
two anonymous referees made useful suggestions about the manu-
script. T. Christensen is thanked for computer programming. The
study was supported by grants nos. 9701412 and 1262 from the Danish
Natural Sciences Research Council and by the Basic Research in Com-
puter Science Centre of the Danish National Research Foundation.
Variation on islands:
Anderson, M. A., E. C. Cornish, S.-L. Mau, E. G. Williams, R.
Hoggart et al., 1986Cloning of cDNA for a stylar glycoprotein
associated with expression of self-incompatibility in Nicotiana
alata. Nature 321: 38–44.
Awadalla, P., and D. Charlesworth, 1999
selection at Brassica self-incompatibility loci. Genetics 152: 413–
Awadalla, P., A. Eyre-Walker and J. M. Smith, 1999
equilibrium and recombination in hominid mitochondrial DNA.
Science 286: 2524–2525.
Ayala, F. J., 1995The myth of Eve: molecular biology and human
origins. Science 270: 1930–1936.
Bergstrom, T. F., A. Josefsson, H. A. Erlich and U. Gyllensten,
1998Recent origin of HLA-DRB1 alleles and implications for
human evolution. Nat. Genet. 18: 237–242.
et al., 2000Determining the physical limits of the Brassica S
locus by recombinational analysis. Plant Cell 12: 23–33.
Allelic genealogy under overdomi-
Polymorphism and bal-
1844 M. H. Schierup, A. M. Mikkelsen and J. Hein
D. G. Higgins, 1997
strategies for multiple sequence alignmentaided by quality analy-
sis tools. Nucleic Acids Res. 24: 4876–4882.
lating self-incompatibility in natural populations of flowering
plants. Genetics 147: 1389–1400.
Vekemans, X., and M. Slatkin, 1994
at a gametophytic self-incompatibility locus. Genetics 137: 1157–
Wall, J. D., 2000 A comparison of estimators of the population
recombination rate. Mol. Biol. Evol. 17: 156–163.
The ClustalX windows interface: flexible
Wang, X., A. L. Hughes, T. Tsukamoto, T. Ando and T.-h. Kao,
2001Evidence that intragenic recombination contributes to al-
lelic diversity of the S-RNAse gene at the self-incompatibility(S)
locus in Petunia inflata. Plant Physiol. 125: 1012–1022.
Wiuf, C., and J. Hein, 1997On the number of ancestors to a DNA
sequence. Genetics 147: 1459–1468.
Worobey, M., 2001A novel approach to detecting and measuring
recombination: new insights into evolution in viruses, bacteria,
and mitochondria. Mol. Biol. Evol. 18: 1425–1434.
Gene and allelic genealogies
Communicating editor: N. Takahata