Page 1

A Population Genetic Model for the Maintenance of R2

Retrotransposons in rRNA Gene Loci

Jun Zhou.¤a, Michael T. Eickbush.¤b, Thomas H. Eickbush*

Department of Biology, University of Rochester, Rochester, New York, United States of America

Abstract

R2 retrotransposable elements exclusively insert into the tandemly repeated rRNA genes, the rDNA loci, of their animal

hosts. R2 elements form stable long-term associations with their host, in which all individuals in a population contain many

potentially active copies, but only a fraction of these individuals show active R2 retrotransposition. Previous studies have

found that R2 RNA transcripts are processed from a 28S co-transcript and that the likelihood of R2-inserted units being

transcribed is dependent upon their distribution within the rDNA locus. Here we analyze the rDNA locus and R2 elements

from nearly 100 R2-active and R2-inactive individuals from natural populations of Drosophila simulans. Along with previous

findings concerning the structure and expression of the rDNA loci, these data were incorporated into computer simulations

to model the crossover events that give rise to the concerted evolution of the rRNA genes. The simulations that best

reproduce the population data assume that only about 40 rDNA units out of the over 200 total units are actively transcribed

and that these transcribed units are clustered in a single region of the locus. In the model, the host establishes this

transcription domain at each generation in the region with the fewest R2 insertions. Only if the host cannot avoid R2

insertions within this 40-unit domain are R2 elements active in that generation. The simulations also require that most

crossover events in the locus occur in the transcription domain in order to explain the empirical observation that R2

elements are seldom duplicated by crossover events. Thus the key to the long-term stability of R2 elements is the stochastic

nature of the crossover events within the rDNA locus, and the inevitable expansions and contractions that introduce and

remove R2-inserted units from the transcriptionally active domain.

Citation: Zhou J, Eickbush MT, Eickbush TH (2013) A Population Genetic Model for the Maintenance of R2 Retrotransposons in rRNA Gene Loci. PLoS Genet 9(1):

e1003179. doi:10.1371/journal.pgen.1003179

Editor: David J. Begun, University of California Davis, United States of America

Received June 15, 2012; Accepted November 2, 2012; Published January 10, 2013

Copyright: ? 2013 Zhou et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits

unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This study was supported by NIH grant R01GM042790. The funders had no role in study design, data collection and analysis, decision to publish, or

preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: thomas.eickbush@rochester.edu

¤a Current address: Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America

¤b Current address: Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America

. These authors contributed equally to this work.

Introduction

Abundant ribosomal RNA (rRNA) is essential for cellular

metabolism during all periods of development. The genes

encoding these RNAs reside as nearly identical tandemly repeated

units with each unit composed of an 18S, 5.8S and 28S rRNA

gene (Figure 1A). Surprisingly, these tandem genes, referred to as

rDNA loci, serve as a genomic niche for the insertion of various

mobile elements [1]. These elements block the production of

functional rRNA from inserted units, however, the effects of this

potential disruption of rRNA production is minimized because

organisms typically contain many more rDNA units than are

needed for transcription [2–4].

The retrotransposon, R2, is the best understood of the rDNA

specific elements. R2 elements are present in many animal phyla

[5–7] but have been most intensively studied in Drosophila [8,9].

The same lineage of R2 elements is present in most Drosophila

groups, and no evidence has been found for horizontal jumps

between species [8]. While difficult to establish definitively, this co-

evolution of R2 with its host may extend back to the origin of

the major animal phyla [6,7,10,11]. Clearly a balance must be

maintained between the levels of retrotransposition required to

preserve the elements and the number of rDNA units needed to

maintain host fitness.

While permitting long term maintenance, the equilibrium

between the rDNA loci and R2 elements appears highly dynamic,

as the size of the rDNA loci vary greatly between individuals, and

individual copies of R2 are rapidly gained and lost from each

locus [12,13]. A critical contributor to this dynamic equilibrium is

the frequent unequal crossovers between the tandem repeats of

the rDNA loci, which preserve the high levels of sequence identity

between rRNA genes (Figure S1). Attempts have been made over

the years to model this concerted evolution of the rRNA genes

[14–16]. Recently we incorporated the presence of transposable

elements into standard crossover models of rDNA locus evolution

[17]. Varying the rates of crossover, R2 retrotransposition, and

the number of rDNA units required for host fitness, stable

populations could be simulated with rDNA loci of various sizes

and levels of R2 insertion. Unfortunately, because little was

known about of the forces that controlled R2 activity, these

simulations simply assumed low rates of retrotransposition in all

individuals with R2.

PLOS Genetics | www.plosgenetics.org1January 2013 | Volume 9 | Issue 1 | e1003179

Page 2

Recent studies have now provided a better understanding of the

regulation of R2 activity in Drosophila simulans. First, regulation of

R2 activity appears to be at the level of transcription with control

over transcription mapping to the rDNA locus itself [18]. Second,

R2 elements do not encode their own promoter but are co-

transcribed with the rDNA unit with their mature R2 transcript

processed from the co-transcript by a ribozyme encoded at the 59

end of R2 [19]. Finally, R2 transcription correlates best with the

distribution of R2 elements across the rDNA locus rather than the

size of the rDNA locus or the number of R2 insertions [20].

Animals with no R2 transcripts contain at least one large region of

rDNA units free of R2, while animals with R2 transcripts contain a

more uniform distribution of R2 across the rDNA locus and thus

no large region free of R2 insertions [18,20]. Based on these

findings, we proposed a ‘‘transcription domain’’ model of R2

regulation in which the host identifies for transcription that region

of the rDNA locus with the lowest level of R2 insertions. In this

model, individual copies of R2 are transcribed only when the

largest contiguous region of the rDNA locus free of R2 insertions is

less than the size of the transcription domain (Figure 1B).

In this report we have expanded our study of natural

populations of Drosophila simulans to obtain better estimates of the

range of rDNA locus size and number of R2 in active and inactive

individuals. New computer simulations incorporating the tran-

scription domain model for R2 regulation are able to generate

stable populations containing rDNA loci with the dynamic

properties found in natural populations. Crossover frequency

and location, rates of retrotransposition, transcription domain size,

and reduction in host fitness are each evaluated for their effects on

the final equilibrium between mobile element and host.

Results

Range of rDNA locus size and R2 number in natural

populations

Correlation of R2 activity with the various properties of an

rDNA locus is simplified in D. simulans because all rDNA units in

this species are located in one locus on the X chromosome [21]. In

a previous report [20], R2 transcript levels were determined for

180 lines each containing one rDNA locus from a natural

population in San Diego, CA or Atlanta, GA (iso-rDNA lines).

Eighteen lines representing the range of R2 transcript levels were

then selected to determine the sizes of their rDNA loci and

number of R2 copies. No correlation was detected between R2

transcript levels and either rDNA locus size or number of R2

elements. To better define the range of locus size and R2 number

in the two populations, these values were determined again for the

original 18 lines as well as for an additional 77 randomly chosen

lines from the two populations (see Materials and Methods).

Mean rDNA locus size was found to be 230 units (range 132–

373) for the 44 iso-rDNA lines from San Diego and 219 (range

115–386) for the 51 iso-rDNA lines from Atlanta. The mean R2

number was 52 (range 23–70) copies for the San Diego population

and 50 (range 31–79) for the Atlanta population. Based on the

insignificant difference in the range of values obtained for the two

populations (R2 number, P=0.75; rDNA locus size, P=0.42,

Kolmogorov-Smirnov test) as well as the similar numbers of

individuals with detectable levels of R2 transcription in the two

populations [20], all subsequent analyses use the combined data

sets. The distribution of locus sizes and R2 copy number

determined for the 95 iso-rDNA lines are shown in Figure 2A.

The number of rDNA units per locus varied over a 3-fold range

(115 to 386 units), as did the R2 number (23 to 79 copies). A

significant correlation was found between the rDNA locus size and

the number of R2 (Spearman rank correlation r=0.47, P=1028).

The physical properties of the rDNA locus in the 95 lines were

then compared to the level of R2 transcription. A trend towards

higher levels of R2 transcripts was associated with smaller rDNA

loci (Figure 2B), loci containing more R2 elements (Figure 2C),

and loci containing higher fractions of R2-inserted units

(Figure 2D). However, there was considerable scatter of transcript

levels associated with all ranges of locus size, R2 number and

insertion density. These properties of the rDNA locus are thus, not

adequate predictors of R2 transcription.

Frequency of R2 element duplications by recombination

Crossovers between sister chromatids have been suggested to be

the major recombinational force at work in the concerted

evolution of rDNA loci [1]. In the absence of retrotransposition

repeated crossovers in combination with negative selection against

inserted units will eventually eliminate R2-inserted units from the

rDNA locus [17]. However, in the short term, crossovers can

duplicate those R2-inserted units that are located within the offset

between the two sister chromatids (Figure S1). It is possible to

determine whether individual R2-inserted units have been

duplicated by crossovers because many R2s have distinctive 59

truncations generated during their retrotransposition [12,13].

Such 59 truncations are a characteristic property of the target-

primed reverse transcription mechanism used by non-LTR

retrotransposons [5,22,23].

Sensitive PCR assays using one primer upstream of the 28S

rDNA insertion site in combination with multiple primers

throughout the R2 element have been developed to score all 59

truncated R2s within individual rDNA loci [12,13,20]. By

quantifying the signal associated with each PCR band these

assays can be used to score whether the individual 59 truncated

elements exist as one, two, three etc. copies in the rDNA locus

[20,24]. Of the 386 R2 59 truncations present in the 18 original

lines representing the range of R2 transcript levels in the D.

simulans populations [20], 335 (86.8%) were determined to be

single-copy, 41 were present in two copies, 9 were present in three

copies, and 1 was present in four copies. This infrequent

duplication of R2-inserted units has also been found for stocks of

Author Summary

Selfish transposable elements survive in eukaryotic ge-

nomes despite the elaborate mechanisms developed by

the hosts to limit their activity. One accessible system that

simplifies the complex interactions between element and

host involves the R2 elements, which exclusively insert in

the tandemly arranged rRNA genes. R2 exhibits remarkable

stability in animal lineages even though each insertion

inactivates one rRNA gene. Here we determine the size of

the rDNA locus and R2 number in natural isolates of

Drosophila simulans. Combined with previous data con-

cerning the expression and regulation of R2, we develop a

detailed population genetic model for rRNA gene and R2

evolution that duplicates all properties of the rRNA loci in

natural populations. Critical components of the model are

that only a contiguous 40 unit array of rRNA gene units are

needed for transcription, that R2 elements are active only

when present in this transcription domain, and that most

of the crossovers in the rDNA loci occur in this domain.

These results suggest that the key to the long-term

survival of R2 is the redistribution of rDNA units in the

locus brought about by the crossovers that maintain

sequence identity in all rDNA units.

Model of R2 Maintenance

PLOS Genetics | www.plosgenetics.org 2January 2013 | Volume 9 | Issue 1 | e1003179

Page 3

D. melanogaster and D. simulans undergoing long-term propagation

in the laboratory [12,13,20]. It is also consistent with the low

number of trace sequencing reads (equal to the coverage frequency

of that genome) for each 59 truncation found in the genome

sequencing projects of D. simulans and other Drosophila species

[9]. As described in the next section, the infrequent duplication of

R2 copies by crossovers represented a critical property that helped

to differentiate various models for recombinations within the

rDNA locus of D. simulans.

Previous simulation models cannot reproduce the

population data

Computer simulations as well as theoretical models have shown

that intrachromosomal (between sister chromatids) and interchro-

mosomal (between homologues) crossovers can account for the

concerted evolution of tandemly repeated DNA sequences within

a locus and between loci in a population [14–16]. To aid our

studies of the forces that influence the number and stability of R2

we incorporated into these unequal crossover models the presence

of active retrotransposons specific to the rDNA loci [17]. The

crossovers were located at random uniformly throughout the locus.

Because each R2 blocks the function of the inserted rDNA unit,

our model assumed individuals with less than a minimum number

of uninserted units had reduced fitness (i.e. produced less than the

maximum number of offspring), thereby preventing R2 from

inserting into all rDNA units. Because little was known of the

factors that control R2 activity, our model also assumed that

retrotranspositions occurred at a constant low rate in individuals

with R2s. Varying the crossover rate, the retrotransposition rate,

and the number of uninserted units needed for peak fitness

resulted in stable simulated populations with rDNA loci of various

mean sizes and levels of inserted units [17].

Simulations using these simple models were extended to allow

an analysis of more properties of the loci at equilibrium, in

particular the duplication frequency of the R2 elements. Crossover

frequencies and retrotransposition rates were readily identified

that generated stable populations with mean rDNA locus size (225

units) and R2 number (50) similar to that observed in natural

populations of D. simulans. However, as shown in Figure 3, while

the distribution of locus sizes (number of rDNA units) in these

simulations was similar to the empirically derived sizes for the

natural populations (Figure 3A and 3B, left panels), two other

properties of the simulated loci did not agree with the population

data. As shown in the middle panels in Figure 3A and 3B, the

Figure 1. Diagram of the rDNA locus and how the distribution of R2 gives rise to R2-active and R2-inactive individuals. (A) rDNA loci

are composed of tandem repeated rRNA genes with some 28S rRNA genes containing an R2 insertion. Each repeat contains one transcription unit

with 18S, 5.8S and 28S rRNA genes (black bars) separated by spacer regions (open bars). R2 elements encode a large open reading frame, ORF,

(orange bar) with short 59 and 39 untranslated regions (UTRs). The largest block of uninserted rDNA units is identified and determines what

contiguous block of rDNA units are transcribed, the transcription domain. (B) The transcription domain model for the regulation of R2 activity is based

on data suggesting that the host activates for transcription a contiguous block of rDNA units containing the fewest R2-inserted units [18,20]. The

transcription domain is centered on the largest contiguous area of uninserted rDNA units. The remaining rDNA units are packaged into a

transcriptionally inactive chromatin form. If the largest area free of R2 insertions is larger than the transcription domain, then no transcription of R2-

inserted units occur. If the largest area free of R2 insertions is smaller than the transcription domain, then transcription of R2-inserted units does occur

giving rise to retrotransposition events.

doi:10.1371/journal.pgen.1003179.g001

Model of R2 Maintenance

PLOS Genetics | www.plosgenetics.org3 January 2013 | Volume 9 | Issue 1 | e1003179

Page 4

simulations generated loci containing less than 20 and over 80 R2-

inserted units which are outside the range seen in natural

populations. As shown in the right panels of Figure 3A and 3B,

the simulated data also did not fit the observed frequencies of R2

duplications for the populations. In the simulations, over 30% of

the R2 elements were duplicated by crossovers to a level of four or

more copies, while only one example of such high levels of

duplications was seen in the natural populations. The narrow

range in the number of R2 copies per locus and the infrequent

duplications of R2 elements suggested that R2-inserted units are

largely excluded from the crossovers within the rDNA loci.

A better fit to the empirical population data could be generated

by limiting the locations of the crossovers in the simulations to

positions near the center of each rDNA locus (Figure S2).

However, what needed to be incorporated into the simulations

was a means to regulate R2 activity such that while all individuals

in the population contained R2 insertions, only a fraction of the

individuals contained transcriptionally active R2 elements. Most

important, this R2 activity had to be essentially independent of R2

number or locus size (Figure 2).

Simulation of an rDNA locus with an active transcription

domain

Simulations based on the model that R2 transcription does not

occur when there is a large region of the rDNA locus free of R2

insertions, the transcription domain model [18,20, Figure 1], could

reproduce the population data. The incorporation of this model

into a simulation program is described in the Materials and

Methods and diagrammed in Figure S3. Based on previous

estimates of the number of rDNA units needed in Drosophila for

transcription [4,25,26], the size of the transcription domain was

varied from 30–70 units. At each generation the middle position of

the transcription domain was centered on the largest contiguous

block of rDNA units with no R2 insertions. Because R2 transcripts

are processed from a 28S rRNA co-transcript [19], in cases where

the largest R2-free block was larger than the defined transcription

domain, the transcription domain would be ‘‘R2 free’’ and no R2

transcription would occur. However, in cases where the define

transcription domain was larger than the largest contiguous R2-

free block, the transcription domain would not be R2-free and R2

transcription would occur from those R2 elements within the

domain (see Figure 1B). The probability of retrotransposition

increased as the number of R2 elements within the domain

increased. Because all rDNA units in D. simulans are on the X

chromosome, the size of each transcription domain on the two X

chromosomes of females was set at one half the size of the

transcription domain on the single X chromosome of males. Host

fitness was determined by the number of uninserted rDNA units

available for transcription, as in our previous simulations [17]. In

the domain model, however, the number of rDNA units activated

Figure 2. Properties of the rDNA loci derived from natural populations of D. simulans and their correlation with the level of R2

transcription. (A) Range of rDNA locus size (diamonds) and R2 number (squares) for 95 iso-rDNA locus lines. The standard errors are shown for the

six replicates conducted of each determination (see Materials and Methods). A positive correlation was found between the number of rDNA units

(locus size) and the number of R2 (Spearman rank correlation r=0.47, P=1028). (B) Using R2 transcript levels previously determined for these same

lines [20], no correlation was found between the locus size and the R2 transcript levels (r=20.16, P=0.142). (C) No correlation was also found

between the number of R2 and the R2 transcript level (r=0.14, P=0.187). (D) A small but significant correlation was found between the fraction of

the rDNA units inserted with R2 elements and R2 transcript levels (r=0.32, P=0.003).

doi:10.1371/journal.pgen.1003179.g002

Model of R2 Maintenance

PLOS Genetics | www.plosgenetics.org4January 2013 | Volume 9 | Issue 1 | e1003179

Page 5

for transcription was set at a number somewhat higher than that

needed by the host for maximum fitness. Therefore, independent

of the total number of inserted units, the presence of only a few

inserted units within the activated domain could reduce host

fitness. To enable the simulations to duplicate as closely as possible

the known properties of R2, half of the retrotransposition events

generated 59 truncated (dead-on-arrival) R2 copies [12,13]. These

truncated copies inactivated rDNA units, played a role in the

identification of the transcription domain, and influenced host

fitness but could not contribute to the generation of new R2

copies. Finally, because the empirical data suggested that R2

insertions are seldom duplicated by recombination (Figure 3A,

right panel), crossovers were localized to various degrees within or

near the transcription domain.

Shown in Figure 3C are simulated populations in which the

transcription domain was set at 40 units and the crossover rate and

R2 retrotransposition frequency were adjusted such that rDNA

loci were generated with a mean size of 225 units and a mean

number of 50 R2-inserted units. As shown in Figure S4, the final

equilibrium was independent of the starting properties of the

Figure 3. Comparison of the rDNA loci from natural populations with computer simulated loci generated by simple crossover

models of concerted evolution. (A) The empirical data determined for rDNA loci from the natural populations in Figure 2 are re-plotted to show

the distributions of rDNA locus size (left panel), total R2 number per locus (middle panel), and the number of R2 copies duplicated by crossovers

(right panel). The R2 duplication frequency was derived from the approach used in ref. 20 to count the total number of R2 copies in 18 rDNA loci. (B)

Simulation data based on the modeling approach described in ref. 17 in which the crossover events are uniformly distributed throughout the rDNA

locus. The following parameters were used. Population size=4000; generations=10000; replicates=60; number of uninserted rDNA units required

for peak fitness=100; maximum fecundity=6; SCE rate=0.3; ICE rate=0.0001; crossover offset=1–8 rDNA units; R2 retrotransposition rate=0.009 for

all loci containing R2 elements; loop deletion rate=0.00005; deletion size=1–15 rDNA units. See Materials and Methods for a description of these

parameters. How these parameters influence the size of the rDNA locus and number of inserted units can be found in ref. 17. The three panels

showing the distributions of locus size, number of R2, and R2 duplication state are shown below the corresponding data from the natural

populations. (C) Simulation data based on the transcription domain model for the regulation of R2 elements in a population. The following

parameters were used (also described in the Materials and Methods). Population size=5000; generations=50000; replicates=60; transcription

domain size=40; number of uninserted rDNA units in the domain required for peak fitness=34; maximum fecundity=6; SCE rate=0.2 and clustered

near the transcription domain with s=0.05; ICE rate=0.0001, s=0.05; crossover offset=1–11 rDNA units; R2 retrotransposition rate=0.18 times a

square root function of the number of full-length R2 copies in the domain, s=0.4; loop deletion rate=0.00007 times the size of the rDNA locus;

element induced deletion rate=0.0065 times the number of full-length R2 copies in the domain; deletion size=1–30 rDNA units, s=0.2. The panels

containing the distributions of locus size, the number of R2, and R2 duplication state are again shown below the corresponding data from the natural

populations.

doi:10.1371/journal.pgen.1003179.g003

Model of R2 Maintenance

PLOS Genetics | www.plosgenetics.org5 January 2013 | Volume 9 | Issue 1 | e1003179