Positive selection and gene conversion drive the evolution of a brain-expressed snoRNAs cluster.
ABSTRACT HBII-52 small nucleolar RNAs (snoRNAs) are brain-expressed posttranscriptional modifiers of serotonin receptor 2C RNA. They are organized in a cluster of 47 highly homologous gene copies spanning 100 kb at chromosome 15q11.2. Nucleotide diversity at HBII-52 snoRNA gene cluster in African and European descent populations was analyzed via resequencing of 25 functional snoRNA gene copies. Ninety-four variants were detected, from which 74 are novel. Only 16 variants are shared between Africans and Europeans. We also report a novel Yoruba-specific copy-number variant representing a 5.2-kb polymorphic deletion and resulting in a chimerical functional snoRNA copy. In both populations, the snoRNA genes are characterized by high density of single nucleotide polymorphisms and an excess of low-frequency variants. However, the variability patterns are strictly population specific and there is an extreme divergence in allele frequencies in both resequencing and HapMap data. Several tests of neutrality strongly suggest that the observed extreme population divergence at the HBII-52 region results from positive selection in Europeans. Our analysis of HBII-52 nucleotide variability spectrum shows that gene conversion is the main factor introducing variability at the cluster. Sixty-five substitutions (69%) correspond to a paralogous sequence variant (PSV) in another copy and occur at potential gene conversion tracts of >5 bp. We detected several interparalogue gene-conversion events that involve more than one PSV, with individual frequency patterns suggestive of recurrent gene conversion. Analysis based on derived and ancestral allele distribution shows that gene conversion is at least twice more frequent than point mutations. Gene conversion is an important factor in disrupting patterns of linkage disequilibrium (LD) at short scales. Consistent with this, we detect punctual breaks of LD at gene conversion sites while the overall LD at the HBII-52 cluster is high in both study populations.
Article: Brain-specific small nucleolar RNAs.[show abstract] [hide abstract]
ABSTRACT: Small nucleolar RNAs (snoRNAs) are a group of noncoding RNAs that function mainly as guides for modification of ribosomal RNAs (rRNAs) and small nuclear RNAs (snRNAs). A subgroup of snoRNAs was found to be predominantly expressed in the brain; and interestingly, these brain-specific snoRNAs (b-snoRNAs) appear not to be involved in modification of rRNAs and snRNAs, raising the question of what their function and targets might be. Expression studies of b-snoRNAs in mice have shown potential involvement of two b-snoRNAs, MBII-48 and MBII-52, in learning and memory. HBII-52, the human homolog of MBII-52, appears to be involved with regulation of 5-HT(2C) receptor subunit mRNA. Furthermore, several reports link the disruption of expression of a specific b-snoRNA, HBII-85, with a neurobehavioral disorder, Prader-Willi syndrome. This paper reviews the current knowledge of the properties, expression, and functions of b-snoRNAs.Journal of Molecular Neuroscience 02/2006; 28(2):103-9. · 2.89 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Noncoding regions are usually less subject to natural selection than coding regions and so may be more useful for studying human evolution. The recent surveys of worldwide DNA variation in four 10-kb noncoding regions revealed many interesting but also some incongruent patterns. Here we studied another 10-kb noncoding region, which is in 6p22. Sixty-six single-nucleotide polymorphisms were found among the 122 worldwide human sequences, resulting in 46 genotypes, from which 48 haplotypes were inferred. The distribution patterns of DNA variation, genotypes, and haplotypes suggest rapid population expansion in relatively recent times. The levels of polymorphism within human populations and divergence between humans and chimpanzees at this locus were generally similar to those for the other four noncoding regions. Fu and Li's tests rejected the neutrality assumption in the total sample and in the African sample but Tajima's test did not reject neutrality. A detailed examination of the contributions of various types of mutations to the parameters used in the neutrality tests clarified the discrepancy between these test results. The age estimates suggest a relatively young history in this region. Combining three autosomal noncoding regions, we estimated the long-term effective population size of humans to be 11,000 +/- 2800 using Tajima's estimator and 17,600 +/- 4700 using Watterson's estimator and the age of the most recent common ancestor to be 860,000 +/- 258,000 years ago.Genetics 10/2006; 174(1):399-409. · 4.39 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: In this report, we compare and contrast three previously published Bayesian methods for inferring haplotypes from genotype data in a population sample. We review the methods, emphasizing the differences between them in terms of both the models ("priors") they use and the computational strategies they employ. We introduce a new algorithm that combines the modeling strategy of one method with the computational strategies of another. In comparisons using real and simulated data, this new algorithm outperforms all three existing methods. The new algorithm is included in the software package PHASE, version 2.0, available online (http://www.stat.washington.edu/stephens/software.html).The American Journal of Human Genetics 12/2003; 73(5):1162-9. · 11.20 Impact Factor
Positive Selection and Gene Conversion Drive the Evolution of a Brain-
Expressed snoRNAs Cluster
Miroslava Ogorelkova,*1Arcadi Navarro,??§k Francesca Vivarelli,* Anna Ramirez-Soriano,? and
*Genetic Causes of Disease Group, Genes and Disease Program, Centre for Genomic Regulation (CRG), Barcelona, Spain; ?CeGen-
Barcelona, Spanish National Genotyping Center, Barcelona, Spain; ?Universitat Pompeu Fabra (UPF), Barcelona, Spain; §Population
Genomics Node (GNV8), National Institute for Bioinformatics (INB), Spain; and kInstitucio ´ Catalana de Recerca i Estudis Avanc xats,
and Departament de Cie `ncies Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain
HBII-52 small nucleolar RNAs (snoRNAs) are brain-expressed posttranscriptional modifiers of serotonin receptor 2C
RNA. They are organized in a cluster of 47 highly homologous gene copies spanning 100 kb at chromosome 15q11.2.
Nucleotide diversity at HBII-52 snoRNA gene cluster in African and European descent populations was analyzed via
resequencing of 25 functional snoRNA gene copies. Ninety-four variants were detected, from which 74 are novel. Only
16 variants are shared between Africans and Europeans. We also report a novel Yoruba-specific copy-number variant
representing a 5.2-kb polymorphic deletion and resulting in a chimerical functional snoRNA copy. In both populations,
the snoRNA genes are characterized by high density of single nucleotide polymorphisms and an excess of low-frequency
variants. However, the variability patterns are strictly population specific and there is an extreme divergence in allele
frequencies in both resequencing and HapMap data. Several tests of neutrality strongly suggest that the observed extreme
population divergence at the HBII-52 region results from positive selection in Europeans.
Our analysis of HBII-52 nucleotide variability spectrum shows that gene conversion is the main factor introducing
variability at the cluster. Sixty-five substitutions (69%) correspond to a paralogous sequence variant (PSV) in another
copy and occur at potential gene conversion tracts of .5 bp. We detected several interparalogue gene-conversion events
that involve more than one PSV, with individual frequency patterns suggestive of recurrent gene conversion. Analysis
based on derived and ancestral allele distribution shows that gene conversion is at least twice more frequent than point
mutations. Gene conversion is an important factor in disrupting patterns of linkage disequilibrium (LD) at short scales.
Consistent with this, we detect punctual breaks of LD at gene conversion sites while the overall LD at the HBII-52 cluster
is high in both study populations.
Small nucleolar RNAs (snoRNAs) generally range
from 60 to 300 nucleotides (nt) in length and guide the
site-specific modification of target RNAs via short regions
of complementary recognition. There are two major classes
of snoRNAs: the H/ACA box snoRNAs, which guide pseu-
douridilation of target RNAs, and the C/D box snoRNAs
guiding 2#-O-ribose methylation (reviewed in Kiss
2002). Until recently, snoRNAs were thought to have as
exclusive targets various classes of small noncoding RNAs
(ncRNAs). This view has changed with the identification of
many novel ‘‘orphan’’ box C/D and H/ACA snoRNAs that
lack complementarities to ncRNAs (reviewed in Kiss 2002;
Rogelj 2006). Among those, a cluster of C/D box snoR-
NAs, named HBII-52, was discovered on chromosome
15q11.2 (Cavaille ´ et al. 2000; De los Santos et al. 2000).
The cluster spans nearly 100 kb and consists of 47
HBII-52 snoRNA copies, with a size of 81–84 bp and high
sequence homology (average identity to the consensus se-
quence .94.9%), which are embedded in repetitive units of
1.9 kb with an overall sequence similarity of less than 75%
(Cavaille ´ et al. 2000). HBII-52 snoRNAs are encoded
within the downstream introns of alternatively spliced tran-
scripts of SNURF–SNRPN, a large complex transcription
unit of nearly 460 kb (Runte et al. 2001). Within its up-
stream part, SNURF–SNRPN gene encodes SNURF protein
and SmN spliceosomal protein (Gray et al. 1999) and
downstream hosts also several other snoRNA genes
(HBII-13, HBII-85 cluster, HBII-436, HBII-437, HBII-
438A, and HBII-438) (Runte et al. 2001). The SNURF–
SNRPN is under the coordinated control of proximally
located imprinting center and is transcriptionally active
only on the paternal chromosome (Runte et al. 2001).
HBII-52 snoRNAs are expressed exclusively in
the brain, and most copies have 18 nt of phylogenetically
conserved complementarity to the serotonin receptor 2C
(5-HT2CR) RNA, within a region containing editing sites
and an exonic splicing silencer (Cavaille ´ et al. 2000;
Kishore and Stamm 2006). The sequence of complementary
is located next to the D-box, in a region typically involved in
target RNA recognition (Kiss 2002). It has been recently
shown that rat (RBII-52) orthologues of HBII-52 promote
the formation of the functional splice isoform of 5-HT2CR
via interaction with the splicing silencer and inclusion of
exon Vb (Kishore and Stamm 2006). Mouse orthologues
(MBII-52) have been shown in vitro to affect the editing
of 5-HT2CR mRNA (Vitali et al. 2006) and lack of HBII-
52 expression due to paternal proximal 15q deletions in pa-
icant increase of the edited 5-HT2CR mRNA isoforms
(Kishore and Stamm 2006). Editing changes the amino acid
et al. 2001). Even subtle changes in the fine proportional
balance of alternatively spliced and differentially edited
5-HT2CR isoforms are expected to affect the normal seroto-
portant regulatory role. Consistent with this hypothesis,
MBII-52 is reported to be upregulated in mouse hippocam-
pus during learning (Rogelj et al. 2003). Twenty-four of 47
1Present address: Gendiag.exe, S.L., c/Juan de Sada 32, Barcelona,
Key words: HBII-52, snoRNAs, nucleotide diversity, gene
conversion, positive selection, CNV.
Mol. Biol. Evol. 26(11):2563–2571. 2009
Advance Access publication August 3, 2009
? The Author 2009. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: email@example.com
human BII-52 copies are predicted to be functional modula-
tors of 5-HT2CR RNA posttranscriptional modifications,
having the 18 nt of complementary sequence and intact el-
ementsimportant for snoRNAprocessingand function, such
as C/D boxes and 5#-3# terminal stem (Watkins et al. 1996;
Xia et al. 1997).
Orthologues of HBII-52 have been found in all mam-
key, BII-52 snoRNAs clusters contain a lower number of
copies (15 MBII-52, 20 RBII-52, and 16 RhBII-52 copies,
respectively) with less divergence (average identity to the
consensus sequence .99%) (Cavaille ´ et al. 2000; Rhesus
macaque Genome Project; http://www.hgsc.bcm.tmc.edu/
projects/rmacaque/). In primates, the cluster has undergone
considerable expansion, with higher divergence between
copies. In humans, the average number of nucleotide differ-
ences from the consensus is 4.76% (Cavaille ´ et al. 2000).
We hypothesized that gene conversion between homo-
inating between changes with different functional impact
may be the main forces shaping the nucleotide diversity
of the cluster. To test this hypothesis, we studied the nucle-
otide variability of 25 HBII-52 gene copies in two human
populations—Europeans (from Spain) and Yorubas. We
analyzed the evolutionary dynamics of the cluster, looking
for signatures of gene conversion and of the adaptive evo-
lution processes that might have been associated with the
HBII-52 copy-number expansion and increased divergence
between homologues in humans.
Materials and Methods
DNA Samples and Screening for Sequence Variants at
Sequence variability at individual HBII-52 copies was
analyzed in 70 unrelated individuals of Spanish origin and
in 30 African trios (Yoruba of Ibadan, Nigeria). In addition,
30 Centre d’E´tude du Polymorphisme Humain (CEPH)
(Utah residents with ancestry from Northern and Western
Europe) trios were analyzed for the presence of a polymor-
phic deletion atHBII-52-31 to HBII-52-34.TotalDNA was
prepared from immortalized lymphocyte cell lines follow-
ing standard procedures. Spanish samples were obtained
from anonymous blood donors. The Yoruba and CEU sam-
ples were collected within the HapMap project (The Inter-
national HapMap Consortium 2003) and cell lines were
purchased from the Coriell Institute for Medical Research
(Yoruba Samples HAPMAPPT03; CEPH samples HAP-
Primers for polymerase chain reaction (PCR) amplifi-
cation of individual HBII-52 copies were designed using
FastPCR software, which discriminates repetitive sequen-
ces (Kalendar 2007). Primers and amplification conditions
for the analyzed individual copies are listed in supplemen-
tary table 5, Supplementary Material online. Sequence var-
iability in the Spanish sample was analyzed by denaturing
high performance liquid chromatography (dHPLC) or di-
rect sequencing as indicated in supplementary table 5, Sup-
plementary Material online. Prior to process to dHPLC
analysis of the samples, PCR amplicons for each copy from
five randomly selected individuals were sequenced in order
to confirm that they are single copy sequence specific.
Sequence variability in the Yoruba population sample
was accessed by direct sequencing only. PCR amplification
was carried out with Taq polymerase (Roche) following
the manufacturer’s instructions. Direct sequencing of
PCR amplicons was performed with the corresponding for-
ward and reverse primers using the BigDye Terminator
v3.1 Cycle Sequencing Kit (Applied Biosystems) accord-
ing to the manufacturer’s instructions. Sequencing was
carried out using Genetic Analyzer 3730XL (Applied
Biosystems); dHPLC was carried out using WAVE System
Analysis of Deletion of HBII-52-31 to 34
The region flanking rs8025461 (chr15:23023309) and
ples from Yoruba families presenting missegregation of var-
iants HBII-52-33(þ113A.G) (families 004, 012, 018, 051,
F: 5#-agcaccttgtggcttctgcagg (chr15:23022354–23022375)
temperature of 61 ?C with the Expand High Fidelity PCR
served. To define the deletion breakpoints, the band was se-
quenced with primers seqF: 5#-gtcccaggagtttgtcgctc and
seqR: 5#-ctgccctacattggggctgaagg using the BigDye Termi-
nator v3.1 Cycle Sequencing Kit (Applied Biosystems)
following the manufacturer’s instructions.
The frequency of the deletion in Yoruba and CEU
was assessed by a PCR assay where a product (471 bp)
is obtained only if deletion is present. The PCR was carried
out with Taq polymerase (Roche) with primers delF:
5#-ggagggccaacagtgtagtgc and delR: 5#-ttcgggacccagtgtc-
ctc, annealing at 60 ?C, according to the manufacturer’s
Analysis of Nucleotide Diversity and Tests of Neutrality
Nucleotide diversity and tests of neutrality were car-
ried out with DnaSP software version 4.10.9 (Rozas J and
Rozas R 1999). The permutation test for Fstwas performed
as follows: We took pseudosamples (with replacement) of
genic single nucleotide polymorphisms (SNPs) from the
whole genome formed by as many SNPs in our sample
(94) and we computed their average Fst. The P value is
given bythenumber oftimes outof1,000 permutations that
the average Fstof the pseudosample was larger than our
In order to obtain realistic distributions for Tajima’s D
and Fay and Wu’s H, simulations have been performed us-
ing COSI version 1.1 (Schaffner et al. 2005). The param-
eters were as specified in the calibrated demography
2564 Ogorelkova et al.
provided with the COSI source code with modifications: 1)
We have assumed an infinite-sites model, 2) a length of
6,500 bp (total size of the resequenced amplicons), 3) three
different values of recombination, and 4) a fixed number of
segregating sites (S). We have assumed no recombination
(R) and two values of R: R 5 10?8and R 5 10?7, which
correspond to the rounded average and maximum recombi-
nation found in the human genome (Kong et al. 2002). The
number of segregating sites (S) has been fixed to 92, as this
corresponds to the total number of segregating sites consid-
ered for analysis in the two resequenced populations. In the
analysis of the simulations, two different strategies have
been followed: 1) We have simulated 10,000 samples with
S 5 92 and we have analyzed them all; 2) 400,000 samples
were simulated with S 5 92 and from these, we have se-
lected for the analysis 1,000 samples with S 5 61 ± 1
in Africans and S 5 47 ± 1 in Europeans, in order to match
thenumber ofsegregatingsitesobtainedin ourresequenced
SWEEP analyses were performed using SWEEP soft-
ware version 1.0 (Sabeti et al. 2002) with the HapMap
(release 21; HapMap data release 21/phaseII, July06;
http://www.hapmap.org/) SNPs (HapMap data release 21/
phaseII, July06) contained in the HBII-52 cluster. Signifi-
cance was tested according to the methods of Sabeti et al.
(2002) that are based on determining ifthe length of the hap-
lotype containing the putatively favored variant of the SNPs
is extreme as compared with a reference set of SNPs of sim-
ilar frequencies. If haplotypes around a given core SNP are
significantly longer (at 5%) than the reference distribution,
under the test is that a selective sweep driven by this form of
selection increases the frequency of SNPs so quickly that re-
combination cannot break down the chromosome that con-
tains the selected variant and, thus, the variant will have
a long haplotype around it.
Results and Discussion
To study nucleotide diversity at the HBII-52 cluster,
we selected 25 copies from which 24 are presumably func-
tional (table 1). Nucleotide variability in these copies and in
the surrounding sequence was analyzed on genomic PCR
amplicons by dHPLC or direct sequencing in 30 African
trios (Yoruba of Ibadan, Nigeria) sampled within the Hap-
Map project (The International HapMap Consortium 2003)
and in 70 unrelated Spanish individuals (further referred as
Europeans). We sequenced and analyzed a total of 6,316 bp
in these individuals. Ninety-four variants were detected: 92
SNPs, an insertion of 1 bp, and a combination of a 1-bp
deletion/single nucleotide substitution, which we consid-
registered at NCBI databases (dbSNP build 129, http://
known frequency and/or validation. Of these 94 variants,
only 16 SNPs were common for both Africans and
In Europeans, 47 SNPs were found, 31 of which are
exclusive of this population (supplementary table 2, Sup-
plementary Material online). There is a high prevalence
of rare variants, with 36 SNPs (76.6%) having minor allele
frequency (MAF) under 0.05, and 14 variants (29.8%)
being singletons. These values are roughly similar to the
average values reported for three autosomal noncoding
10-kb regions in Europeans (Zhao et al. 2006). The nucle-
otide diversity of the analyzed sequence, as estimated from
the average number of pairwise differences, is lower than
reported for other autosomal genic or noncoding regions in
Europeans (Halushka et al. 1999; Frisse et al. 2001; Patil
et al. 2001; Yu et al. 2002; Zhao et al. 2006). However,
the region is characterized by high average nucleotide di-
versity estimated from the number of segregating sites (S)
density of SNPs with an excess of low-frequency variants
characterize the HBII-52 cluster in Europeans.
In Yoruba, from 63 identified SNPs, 47 are exclusive
of Africans (supplementary table 2, Supplementary Mate-
rial online). Comparison with data from other resequenced
regions in Africans shows a similar proportion of HBII-52
rare SNPs; however, the average nucleotide diversity is
higher than reported (Yu et al. 2002; Zhao et al. 2006).
The nucleotide diversity of HBII-52 copies in Europeans
and Africans is graphically represented in figure 1. Al-
though nucleotide diversity in short DNA fragments is
prone to strong stochastic effect, variability of individual
copies tends to correlate between both populations. Several
copies in the proximal and distal parts of the cluster (HBII-
52-4, HBII-52-9, HBII-52-14, HBII-52-39, HBII-52-42,
and HBII-52-44) are characterized by very low or virtually
no diversity in both populations.
The allele frequencies of shared SNPs are very distinct
between Africans and Europeans. All of them have a MAF
? 0.1 in Europeans, whereas only one-fourth of the corre-
sponding alleles have such low MAF in Africans. Indeed,
frequencies have reversed for these alleles: One-third of the
frequent alleles in Africans actually correspond to minor
alleles in Europeans. A similar allele frequency pattern
can be observed for 151 HapMap SNPs at the HBII-52 clus-
ter (fig. 2). In this data set, one-third of the alleles with fre-
quency ? 0.1 in the CEU population represent the common
alleles in Yoruba. Differences in SNP allele frequencies be-
tween Africans and populations of European descent (ED)
have been commonly observed and are mainly due to demo-
graphic effects and population differentiation (International
in the HBII-52 snoRNA cluster, there is an extreme diver-
gence in allele frequencies between Africans and ED popu-
lation compared with genomewide HapMap data and
ENCODE regions (International HapMap Consortium
2005; ENCODE Project Consortium 2007). Considering
the frequencies of HapMap SNPs in regions that are shared
between the two populations, average Fstvalues between the
Yoruba and ED population is 0.2429, significantly higher
than the genomewide average of 0.1181 obtained from
HapMap’s intragenic SNPs (P , 0.01, in a permutation
test, see Materials and Methods). Consistent with the SNPs
Positive Selection and Gene Conversion2565
frequencies, shared haplotypes are found at different
frequencies in Yoruba and CEU (supplementary fig. 1,
Supplementary Material online).
We identified 11 SNPs that occur in the functional
consensus elements of the snoRNAs (terminal stem, C/
D boxes and antisense box, see supplementary table 2,
Supplementary Material online). These variants are
expected to impair the processing and/or maturation of
the snoRNA, or its functional interaction with the 5-
HT2CR RNA. For 10 of these SNPs, the derived alleles
are rare, as would be expected if they were under purify-
ing selection due to deleterious effect (supplementary
table 2, Supplementary Material online). Three pre-
sumably functional SNPs are shared between Africans
Analysis of Nucleotide Diversity and Tests of Neutrality (Tajima’s D and Fay’s and Wu’s H) at HBII-52 Locus in European
(A) and Yoruba (B) Populations
Human CopyBead Size (bp)
Polymorphism Frequency-Based Tests
Fay & Wu’s H
Bead size of the analyzed fragment; S—segregating sites in the human population; p—average number of nucleotide differences; Hp—nucleotide diversity (per site)
from p; HW—nucleotide diversity (per site) from S; Taj’s D—Tajima’s D. All analyzed snoRNAs are presumably functional except for HBII-52-30. Copies HBII-52-33 and
HBII-52-34 being part of a polymorphic deletion in Yoruba were excluded from analysis in this population. The comprehensive data on human–chimpanzee divergence and
all performed tests are present in supplementary table 1, Supplementary Material online.
2566Ogorelkova et al.
and Europeans. In both populations, the derived alleles
are at very low frequencies.
Polymorphic Deletion at the Distal Part of HBII-52
Cluster in Yoruba
In Yoruba, we detected Mendelian errors in the allele
segregation of two SNPs (HBII-52-33[þ113A.G] and
HBII-52-34[21A.T]) (see supplementary table 2, Supple-
mentary Material online) located in adjacent snoRNA copies
at the distal part of the cluster. The Mendelian errors were
observed in five families for HBII-52-33(þ113A.G) and
in one family for HBII-52-34(21A.T). The type of segre-
gationwas suggestiveofa deletion.Forhemizygousindivid-
uals from these families, we estimated the location of the
deletion boundaries by identifying the closest nearby SNPs
genotyped by HapMap for which the individuals were het-
erozygous. With this approach, we defined the deletion
breakpoints to be downstream of rs8025461 (chr15:23,
023,309) and upstream of rs2714757 (chr15:23,029,326),
with a deletion size of 6,017 bp or less. We designed
a PCR assay with primers flanking these SNPs in order to
amplify the deletion allele. In hemizygous individuals, the
genomic PCR amplification produced a fragment of ;2
Sequencing of the PCR products from two unrelated hemi-
zygous individuals showed a loss of 5,279 bp of genomic
sequence (chr15:23,023,386-23,028,665), with deletion
breakpoints within HBII-52-31 and HBII-52-34 (fig. 3).
The deletion is a precise rearrangement that leads to a novel
functional chimerical HBII-52 copy formed by the proximal
partof HBII-52-31 (a nonfunctional copy) and the distal part
of HBII-52-34 (a functional copy), and to the complete loss
of two HBII-52 copies (HBII-52-32, a nonfunctional one,
and HBII-52-33, a functional one, see fig. 3). The frequency
of the deletion in Yoruba is 12.1%, as determined by a PCR
assay with primers flanking the breakpoints, and it was not
present in the tested 30 CEU trios. The polymorphic
deletion is associated with Yoruba-specific haplotypes
(supplementary fig. 1, Supplementary Material online) sug-
gesting its origin after the split of African and non-African
origin of the deletion in the common ancestral population
in Africa and a posterior lost in the studied non-African
The high frequency of the deletion allele in Yoruba
suggests that it might be favored by selection. We tested
this hypothesis by frequency-based neutrality tests and by
SWEEP haplotype structure test as discussed below.
We did not detect prints of positive selection in Yoruba
suggesting that the deletion is a rather neutral population-
specific copy-number variant (CNV) whose frequency
pattern is likely shaped by drift. The genetic redundancy
of HBII-52 snoRNAs might compensate for a limited
copy-number variability. Alternatively, the novel chime-
rical functional HBII-52-31/34 copy may compensate
for the loss of the HBII-52-33. Finally, it is also possible
that the functional importance of the lost functional
HBII-52-33 copy is rather small because different
468910 11 12 13 14 15 16 21 22 29 30 33 34 36 38 39 40 41 42 43 44
Nucleotide diversity (%)
FIG. 1.—Nucleotide diversity at HBII-52 snoRNA copies. Graphical representation of the nucleotide diversity of the analyzed HBII-52 copies in
Africans (marked by a circle, blue line) and Europeans (marked by a triangle, red line). Copies HBII-52-31 and HBII-52-34 being part of the
polymorphic deletion in Yoruba are excluded for this population.
0 0,10,2 0,3 0,4 0,50,60,70,80,91
FIG. 2.—SNP allele frequency distribution in ED populations and
Africans for the HBII-52 snoRNA region (chr15:22,966,120-23,085,251).
The data set consists of 243 SNPs (151 HapMap phase II genotyped SNPs
polymorphic at least in one of the populations CEU and YRI, and 92
SNPs identified in the present study and genotyped in YRI and
34(21A.T) were excluded as being part of CNV. The minor allele is
defined as the allele opposite to the reference allele according to NCBI
Build 36 (NCBI Build 36; http://www.ncbi.nlm.nih.gov/mapview/
Positive Selection and Gene Conversion 2567
snoRNA copies might be unequally represented within
the expressed mature pool.
The nucleotide divergence between HBII-52 copies
and the consensus sequence is 4.8%, and gene conversion
intracopy variability. From 94 identified nucleotide var-
iants, 65 correspond to a paralogous sequence variant
(PSV) in another copy, with potential gene conversion
tracts of 5 nt or more (supplementary table 2, Supplemen-
tary Material online). In addition, three more variants might
have originated from PSVs, with potential gene conversion
tracts between 2 and 5 nt (supplementary table 2, Supple-
mentary Material online). Considering all polymorphisms,
the longest detectable potential gene conversion tract be-
tween paralogues is 92 bp, between HBII-52-39 (donor)
29(18C.T) variant (data not shown). We detected six in-
terparalogue gene conversion events involving more than
one PSV (supplementary fig. 2, Supplementary Material
online), with potential gene conversion tracts between 5
and 90 nt. For three of the ‘‘converted’’ copies, the haplo-
types originating from the interparalogue gene conversion
tract are present in both Africans and Europeans. Surpris-
ingly, linkage disequilibrium (LD) coefficient (r2) of less
than 1 has been observed for SNPs potentially originating
from a single gene conversion event, except for copy HBII-
52-43 in Europeans. This is likely explained by recurrent
gene conversion events, either allelic (within the same
copy) or between paralogous copies.
Given the short length of the individual resequenced
fragments (average lengthof253bp),theusual ‘‘long align-
ment’’–based methods to estimate gene conversion rates
and tract lengths (Betra ´n et al. 1997; Weiller 1998) cannot
be applied. Toestimate the relative importance of gene con-
version versus mutation in the generation of nucleotide var-
iability at the region under study, we considered the
proportions of the different kinds of transitions and trans-
versions at the resequencing SNPs in the HBII-52 cluster
and compared them with genomewide data. The distribu-
tion of ancestral and derived alleles in that region does
not fit the genomewide HapMap (release 21) distribution
but, in contrast, there is a clear tendency for derived alleles
to be nucleotides that are present in other copies (P value ,
0.001 in Fisher’s exact test, see supplementary table 3,Sup-
plementary Material online). This is clearly suggestive of
gene conversion, because variants that are not present in
other copies (and, thus, cannot have originated from them)
do not deviate from HapMap expectations in the propor-
tions of ancestral and derived alleles. Gene conversion
can explain such patterns more parsimoniously than the
multiple parallel mutations in different copies that would
be required otherwise. If all variants that might have orig-
inated from other copies are considered, extant variability
would have arisen by gene conversion between three and
four times more frequently than by mutation. This can
be considered as an upper threshold of observed gene-
conversion rates and, considering the average human mu-
tation rate of 2.5 ? 10?8(Nachman and Crowell 2000;
Kondrashov 2003), it translates to almost 10?7per nucle-
otide per generation. A lower threshold for gene conversion
rates can be obtained by considering as gene conversion
events only these variants that might have originated from
other copies and subtracting genomewide expectations.
This produces a figure of around two times more variants
appearing by gene conversion than by mutation. Finally,
one has to consider that these would be rates of observed
gene conversion. Most gene conversion events between
paralogous copies are not detectable because in our case
sequences are highly homologous and fragments are usu-
ally too short to unequivocally detect gene conversion
tracts. Assuming a fixed tract length of 500 bp (Frisse et al.
2001), gene conversion rates per nucleotide per generation
would be around 5 ? 10?5.These rates are roughly compa-
rable with the rates of gene conversion calculated by Rozen
et al. (2003) for human chromosome Y palindromes.
Effects of Gene Conversion on LD
There are two recombination hotspots at the two ex-
tremes of the HBII-52 cluster, and the region itself is char-
acterized by low recombination rates and high LD in both
CEU and Yoruba populations (HapMap data release 21/pha-
seII, July06), see supplementary figure 3, Supplementary
FIG. 3.—Schematic representation of HBII-52 deletion with breakpoints in HBII-52-31 and HBII-52-34. The sequence of the hybrid HBII-52 copy
as determined by sequencing of the deletion allele is shown. Functional HBII-52 elements are shown in bold. PSVs between HBII-31 and HBII-34 are
shown in red for HBII-52-31 and in blue for HBII-52-34. The region of potential deletion breakpoint is underlined. The deletion relative to the
chromosome position, HBII-52 copies, and potential breakpoint sequence is schematically represented.
2568Ogorelkova et al.
Material online. In CEU, there are fewer and longer haplo-
type blocks compared with Yoruba, consistent with the ge-
nomewide average LD distribution in Africans and non-
Africans (International HapMap Consortium 2005). High
LD over the cluster is also observed for the resequencing
SNPs described here. The overall LD pattern in Yoruba does
not change when the SNPs identified in this study were an-
alyzed together with the HapMap-genotyped SNPs.
Gene conversion at an SNP site is expected to lead to
the break of LD between this marker and the rest within
a haplotype block. This does not necessarily affect the
LD of the region, because the rest of the markers continue
to be in disequilibrium (Pritchard and Przeworski 2001;
Wall and Pritchard 2003). This is consistent with our ob-
servation of ‘‘overrepresentation’’ of SNPs at gene conver-
sion hotspots in ‘‘illegitimate’’ recombination events within
haplotype blocks (see supplementary table 4 and supple-
mentary fig. 2, Supplementary Material online). There
are 63 events of historical recombination (D# , 1) between
markers within HBII-52 haplotype blocks in Yoruba, and
SNPs at gene conversion hotspots are involved in 52 of
those events (82.5%), with a distance as short as 1 bp (sup-
plementary table 4, Supplementary Material online).
Natural Selection at HBII-52 Cluster
The footprint of natural selection in shaping the var-
iabilityof a given genomic regions can be studied in several
ways by means of either frequency spectrum or haplotype
analysis. We performed an analysis of the frequency spec-
trum of mutations in our concatenated fragments using
DNAsp (Rozas J and Rozas R 1999). In the Spanish pop-
ulation, variability is characterized by many low-frequency
variants around a region of very low variability (table 1).
This is strongly suggestive of a recent selective sweep
driven by positive selection. This impression is confirmed
by the negative and very extreme values of the standard fre-
Tajima’s D (Tajima 1989) and Fay and Wu’s H (Fay
and Wu 2000), which are, respectively, ?2.01 and
?13.24 for the Spanish population (table 1). The value
of Tajima’s D is significant when compared with the avail-
able genomewide averages (empirical P value , 0.05 in re-
lation to both the data sets by Stephens et al. 2001 and
SeattleSNPs, Kelley et al. 2006; SeattleSNPs; http://pga.gs.
genome averages because it is not normalized. To study Fay
and Wu’s H—and the rest of the frequency statistics—we
have performed coalescent simulations considering the same
distribution of variability in the Spanish and Yoruba popu-
lations observed in our data. We used COSI (Schaffner et al.
2005), a program integrating a calibrated demographic
model. In these simulations, under the most conservative
model of no recombination with matching number of segre-
extremely significant for Europeans (table 2). These results
are particularly striking when one considers theoretical re-
sults (Innan 2003) showing that gene conversion between
paralogues makes frequency spectrum neutrality tests highly
conservative, because it acts like recombination and reduces
the variance of these statistics (Innan 2003). Analyzing in-
dividual HBII-52 copies provides less significant results
due to lack of variability and, thus, statistical power, in such
short fragments. Still, some extreme values can be detected
in certain copies.
Yoruba population suggesting either that the sweep is exclu-
sive of Europeans or that gene conversion reduces the power
the most frequent haplotypes in the Spanish populations,
which putatively would have had their frequency increased
(supplementary fig. 1, Supplementary Material online).
Recent positive selection (,10,000 years ago) may
create unusually extended haplotypes around the selected
variant (Sabeti et al. 2002). In order to test a hypothesis
ofaveryrecent selective sweep,weanalyzedourdatausing
haplotype structure test as described in the text of the sup-
plementary figure 4, Supplementary Material online. Inter-
estingly, in both populations, the results of the analysis
were suggestive of selective sweep (see supplementary
fig. 4, Supplementary Material online). To analyze the sig-
nificance of these results, we performed an extended anal-
ysis that considered the entire 1-Mb region containing the
snoRNA cluster in the center. The data were compared with
SWEEP analysis results (see Materials and Methods) using
as areference aset of168 genes related toimmunity (Walsh
et al. 2006). This approach did not provide significant cores
Coalescent Simulations of Nucleotide Diversity and Frequency-Based Tests of Neutrality
Taj’s DD# Fu & Li
F# Fu & Li
D Fu & Li
F Fu & Li
H Fay & Wu
S 61 þ 47Africa
*P , 0.05; **P , 0.0001.
S92 corresponds to simulations with the set of 92 segregating sites. ‘‘S 61 þ 47’’ corresponds to simulations with 61 ± 1 segregating sites in Africans and 47 ± 1
segregating sites in Europeans, respectively. R—recombination rate; S—segregating sites; p—average number of nucleotide differences; Taj’s D–Tajima’s D.
Positive Selection and Gene Conversion2569
in our region (between rs2739834 and rs8032628) for any
of the populations.
Fay and Wu’s H test detects positive selection of
roughly ,80,000 years ago, which is compatible with
the split of African and Eurasian modern humans approx-
imately 100,000 years ago (Templeton 2002; Sabeti et al.
2006). Therefore, selection at the HBII-52 cluster in Euro-
peansmighthave occurredshortlyafter thesplit. Consistent
with this hypothesis, the selective sweep could not be de-
tected with the haplotype structure test whose sensitivity is
limited to very recent positive selection of approximately
be totally ruled out, because the effect of high gene conver-
sion on a test based on the length of haplotypes is probably
reducing the power of the test.
The variant or variants possibly favored by positive
selection in Europeans could not be predicted among
the known SNPs. To-be-discovered functional SNPs in in-
ter-snoRNA regions and in copies not analyzed here might
be implicated as well as additional CNVs with functional
impact. Considering the genetic redundancy of the
snoRNAs, haplotypes combining multiple variants at both
nucleotide and copy-number level are more likely to have
an adaptive advantage, rather than ‘‘outstanding’’ single
Selection-driven evolution of brain-expressed ncRNAs
might have contributed to the primate brain complexity (re-
viewed in Satterlee et al. 2007). Consistent with this hypoth-
esis, accelerated evolution of a novel ncRNA specifically
expressed in the developing human neocortex was reported
(Pollard et al. 2006). Our work provides another example of
adaptive evolution in human ncRNA genes.
Supplementary tables 1–5 and supplementary figures
1–4 are available at Molecular Biology and Evolution on-
This work was supported by ‘‘Genoma Espan ˜a’’ and
‘‘Genome Canada,’’ Ministerio de Educacio ´n y Ciencia
(the Spanish Government), and the Sixth Framework Pro-
gramme (FP6) of the European Union (Marie Curie Intra-
European Fellowship granted to M.O.). We are grateful to
Cecilia Garcı ´a for the excellent technical assistance. Data
submission: The sequence data from this study have been
submitted to dbSNP under submission nos. ss105111477–
Barrett JC, Fry B, Maller J, Daly MJ. 2005. Haploview: analysis
and visualization of LD and haplotype maps. Bioinformatics.
Betra ´n E, Rozas J, Navarro A, Barbadilla A. 1997. The
estimation of the number and the length distribution of gene
conversion tracts from population DNA sequence data.
Cavaille ´ J, Buiting K, Kiefmann M, Lalande M, Brannan CI,
Horsthemke B, Bachellerie JP, Brosius J, Hu ¨ttenhofer A.
2000. Identification of brain-specific and imprinted small
nucleolar RNA genes exhibiting an unusual genomic
organization. Proc Natl Acad Sci. 97:14311–14316.
De los Santos T, Schweizer J, Rees CA, Francke U. 2000. Small
evolutionarily conserved RNA, resembling C/D box small
nucleolar RNA, is transcribed from PWCR1, a novel
imprinted gene in the Prader–Willi deletion region, which is
highly expressed in brain. Am J Hum Genet. 67:1067–1082.
ENCODE Project Consortium. 2007. Identification and analysis
of functional elements in 1% of the human genome by the
ENCODE pilot project. Nature. 447:799–816.
Fay JC, Wu CI. 2000. Hitchhiking under positive Darwinian
selection. Genetics. 155:1405–1413.
Frisse L, Hudson RR, Bartoszewicz A, Wall JD, Donfack J, Di
Rienzo A. 2001. Gene conversion and different population
histories may explain the contrast between polymorphism
and linkage disequilibrium levels. Am J Hum Genet. 69:
Gray TA, Saitoh S, Nicholls RD. 1999. An imprinted,
mammalian bicistronic transcript encodes two independent
proteins. Proc Natl Acad Sci. 96:5616–5621.
Halushka MK, Fan JB, Bentley K, Hsie L, Shen N, Weder A,
Cooper R, Lipshutz R, Chakravarti A. 1999. Patterns of
single-nucleotide polymorphisms in candidate genes for
blood-pressure homeostasis. Nat Genet. 22:239–247.
Innan H. 2003. A two-locus gene conversion model with
selection and its application to the human RHCE and RHD
genes. Proc Natl Acad Sci USA. 100:8793–8798.
International HapMap Consortium. 2003. The International
HapMap Project. Nature. 426:789–796.
International HapMap Consortium. 2005. A haplotype map of the
human genome. Nature. 437:1299–1320.
Kalendar R. 2007. FastPCR:a PCR primer and probe design and
repeat sequence searching software with additional tools for
the manipulation and analysis of DNA and protein.
Kelley JL, Madeoy J, Calhoun JC, Swanson W, Akey JM. 2006.
Genomic signatures of positive selection in humans and the
limits of outlier approaches. Genome Res. 16:980–989.
Kishore S, Stamm S. 2006. The snoRNA HBII-52 regulates
alternative splicing of the serotonin receptor 2C. Science.
Kiss T. 2002. Small nucleolar RNAs: an abundant group of
noncoding RNAs with diverse cellular functions. Cell.
Kondrashov AS. 2003. Direct estimates of human per nucleotide
mutation rates at 20 loci causing Mendelian diseases. Hum
Kong A, Gudbjartsson DF, Sainz J, et al. (16 co-authors). 2002.
A high-resolution recombination map of the human genome.
Nat Genet. 31:241–247.
Nachman MW, Crowell SL. 2000. Estimate of the mutation rate
per nucleotide in humans. Genetics. 156:297–304.
Patil N, Berno AJ, Hinds DA, et al. (22 co-authors). 2001. Blocks
of limited haplotype diversity revealed by high-resolution
scanning of human chromosome 21. Science. 294:1719–1723.
Pollard KS, Salama SR, Lambert N, et al. (16 co-authors). 2006.
An RNA gene expressed during cortical development evolved
rapidly in humans. Nature. 443:167–172.
Price RD, Weiner DM, Chang MS, Sanders-Bush E. 2001. RNA
editing of the human serotonin 5-HT2C receptor alters
receptor-mediated activation of G13 protein. J Biol Chem.
Pritchard JK, Przeworski M. 2001. Linkage disequilibrium in
humans: models and data. Am J Hum Genet. 69:1–14.
2570Ogorelkova et al.
Rogelj B. 2006. Brain-specific small nucleolar RNAs. J Mol
Rogelj B, Hartmann CE, Yeo CH, Hunt SP, Giese KP. 2003.
Contextual fear conditioning regulates the expression of
brain-specific small nucleolar RNAs in hippocampus. Eur J
Rozas J, Rozas R. 1999. DnaSP version 3:an integrated program
for molecular population genetics and molecular evolution
analysis. Bioinformatics. 15:174–175.
Rozen S, Skaletsky H, Marszalek JD, Minx PJ, Cordum HS,
Waterston RH, Wilson RK, Page DC. 2003. Abundant gene
conversion between arms of palindromes in human and ape Y
chromosomes. Nature. 423:873–876.
Runte M, Hu ¨ttenhofer A, Gross S, Kiefmann M, Horsthemke B,
Buiting K. 2001. The IC-SNURF-SNRPN transcript serves as
a host for multiple small nucleolar RNA species and as an
antisense RNA for UBE3A. Hum Mol Genet. 10:2687–2700.
Sabeti PC, Reich DE, Higgins JM, et al. (17 co-authors). 2002.
Detecting recent positive selection in the human genome from
haplotype structure. Nature. 419:832–837.
Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P,
Shamovsky O, Palma A, Mikkelsen TS, Altshuler D,
Lander ES. 2006. Positive natural selection in the human
lineage. Science. 312:1614–1620.
Satterlee JS, Barbee S, Jin P, Krichevsky A, Salama S, Schratt G,
Wu DY. 2007. Noncoding RNAs in the brain. J Neurosci.
Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D.
2005. Calibrating a coalescent simulation of human genome
sequence variation. Genome Res. 15:1576–1583.
Stephens M, Donnelly P. 2003. A comparison of Bayesian
methods for haplotype reconstruction from population
genotype data. Am J Hum Genet. 73:1162–1169.
Stephens M, Smith N, Donnelly P. 2001. A new statistical
method for haplotype reconstruction from population data.
Am J Hum Genet. 68:978–989.
Tajima F. 1989. Statistical method for testing the neutral mutation
hypothesis by DNA polymorphism. Genetics. 123:585–595.
Templeton A. 2002. Out of Africa again and again. Nature.
Vitali P, Basyuk E, Le Meur E, Bertrand E, Muscatelli F,
Cavaille ´ J, Huttenhofer A. 2006. ADAR2-mediated editing of
RNA substrates in the nucleolus is inhibited by C/D small
nucleolar RNAs. J Cell Biol. 169:745–753.
Wall JD, Pritchard JK. 2003. Assessing the performance of the
haplotype block model of linkage disequilibrium. Am J Hum
Walsh EC, Sabeti P, Hutcheson HB, et al. (16 co-authors). 2006.
Searching for signals of evolutionary selection in 168 genes
related to immune function. Hum Genet. 119:92–102.
Elements essential for processing intronic U14 snoRNA are
located at the termini of the mature snoRNA sequence and
include conserved nucleotide boxes C and D. RNA. 2:118–133.
Weiller GF. 1998. Phylogenetic profiles:a graphical method for
detecting genetic recombinations in homologous sequences.
Mol Biol Evol. 15:326–335.
Xia L, Watkins NJ, Maxwell ES. 1997. Identification of specific
nucleotide sequences and structural elements required for
intronic U14 snoRNA processing. RNA. 3:17–26.
Yu N, Chen FC, Ota S, Jorde LB, Pamilo P, Patthy L,
Ramsay M, Jenkins T, Shyue SK, Li WH. 2002. Larger
genetic differences within Africans than between Africans and
Eurasians. Genetics. 161:269–274.
Zhao Z, Yu N, Fu YX, Li WH. 2006. Nucleotide variation and
haplotype diversity in a 10-kb noncoding region in three
continental human populations. Genetics. 174:399–409.
Stephanie Santorico, Associate Editor
Accepted July 24, 2009
Positive Selection and Gene Conversion2571