ArticlePDF Available

High rate of DNA loss in Drosophila melanogaster and Drosophila virilis species groups

Authors:

Abstract and Figures

We recently proposed that patterns of evolution of non-LTR retrotransposable elements can be used to study patterns of spontaneous mutation. Transposition of non-LTR retrotransposable elements commonly results in creation of 5' truncated, "dead-on-arrival" copies. These inactive copies are effectively pseudogenes and, according to the neutral theory, their molecular evolution ought to reflect rates and patterns of spontaneous mutation. Maximum parsimony can be used to separate the evolution of active lineages of a non-LTR element from the fate of the "dead-on-arrival" insertions and to directly assess the relative frequencies of different types of spontaneous mutations. We applied this approach using a non-LTR element, Helena, in the Drosophila virilis group and have demonstrated a surprisingly high incidence of large deletions and the virtual absence of insertions. Based on these results, we suggested that Drosophila in general may exhibit a high rate of spontaneous large deletions and have hypothesized that such a high rate of DNA loss may help to explain the puzzling dearth of bona fide pseudogenes in Drosophila. We also speculated that variation in the rate of spontaneous deletion may contribute to the divergence of genome size in different taxa by affecting the amount of superfluous "junk" DNA such as, for example, pseudogenes or long introns. In this paper, we extend our analysis to the D. melanogaster subgroup, which last shared a common ancestor with the D. virilis group approximately 40 MYA. In a different region of the same transposable element, Helena, we demonstrate that inactive copies accumulate deletions in species of the D. melanogaster subgroup at a rate very similar to that of the D. virilis group. These results strongly suggest that the high rate of DNA loss is a general feature of Drosophila and not a peculiar property of a particular stretch of DNA in a particular species group.
Content may be subject to copyright.
293
Mol. Biol. Evol. 15(3):293–302. 1998
q
1998 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
High Rate of DNA Loss in the Drosophila melanogaster and
Drosophila virilis Species Groups
Dmitri A. Petrov
1
and Daniel L. Hartl
Department of Organismic and Evolutionary Biology, Harvard University
We recently proposed that patterns of evolution of non-LTR retrotransposable elements can be used to study patterns
of spontaneous mutation. Transposition of non-LTR retrotransposable elements commonly results in creation of 5
9
truncated, ‘dead-on-arrival’ copies. These inactive copies are effectively pseudogenes and, according to the neutral
theory, their molecular evolution ought to reflect rates and patterns of spontaneous mutation. Maximum parsimony
can be used to separate the evolution of active lineages of a non-LTR element from the fate of the ‘dead-on-
arrival’ insertions and to directly assess the relative frequencies of different types of spontaneous mutations. We
applied this approach using a non-LTR element, Helena, in the Drosophila virilis group and have demonstrated a
surprisingly high incidence of large deletions and the virtual absence of insertions. Based on these results, we
suggested that Drosophila in general may exhibit a high rate of spontaneous large deletions and have hypothesized
that such a high rate of DNA loss may help to explain the puzzling dearth of bona fide pseudogenes in Drosophila.
We also speculated that variation in the rate of spontaneous deletion may contribute to the divergence of genome
size in different taxa by affecting the amount of superfluous ‘junk’ DNA such as, for example, pseudogenes or
long introns. In this paper, we extend our analysis to the D. melanogaster subgroup, which last shared a common
ancestor with the D. virilis group approximately 40 MYA. In a different region of the same transposable element,
Helena, we demonstrate that inactive copies accumulate deletions in species of the D. melanogaster subgroup at a
rate very similar to that of the D. virilis group. These results strongly suggest that the high rate of DNA loss is a
general feature of Drosophila and not a peculiar property of a particular stretch of DNA in a particular species
group.
Introduction
One of the most striking differences in the genomic
organization of Drosophila and that of mammals is in
the relative abundance of pseudogenes. Pseudogenes are
very common in mammals, with many functional genes
having more than one pseudogene counterpart and some
genes having as many as 200 pseudogene copies (Wei-
ner, Deininger, and Efstratiadis 1986). In contrast, in
Drosophila, very few pseudogenes have been identified.
Two Drosophila genes originally identified as pseudo-
genes (Jeffs and Ashburner 1991; Sullivan et al. 1994)
were later shown to be novel functional genes (Long
and Langley 1993; Begun 1997).
Beyond the challenge of trying to account for such
a striking asymmetry between Drosophila and mam-
mals, the dearth of bona fide pseudogenes in Drosophila
also impedes the study of molecular evolution in this
organism. The problem is that in the absence of pseu-
dogenes, it is very hard, or even impossible, to reliably
estimate the rates and patterns of spontaneous mutation.
This is because mutation is too infrequent to observe
directly and, unless one studies neutrally evolving se-
quences such as pseudogenes, the observed pattern of
DNA variation within or between species reflects not
only the intrinsic pattern and rate of mutation but also
the selective differences between different mutational
classes. The problem is particularly acute for the study
1
Present address: Harvard University Society of Fellows.
Key words: Helena, non-LTR retrotransposable elements, biased
spontaneous mutation, deletions and insertions, pseudogenes, C-value
paradox.
Address for correspondence and reprints: Dmitri Petrov, Harvard
University Society of Fellows, 78 Mt. Auburn Street, Cambridge, Mas-
sachusetts 02138. E-mail: dpetrov@oeb.harvard.edu.
of indels (insertions and deletions), which are often
highly deleterious, and therefore the observed frequen-
cies of indels are undoubtedly profoundly affected by
selection.
Although Drosophila lacks bona fide pseudogenes,
it harbors a class of transposable elements, non-LTR re-
trotransposons, that commonly generate nonfunctional,
pseudogenelike copies as a by-product of the transpo-
sitional cycle. We have recently proposed that it is pos-
sible to distinguish the evolution of the pseudogenelike
copies from the evolution of the transpositionally active
lineages, thereby allowing the analysis of the neutral
mutational processes without the need for a large num-
ber of bona fide pseudogenes.
When an active lineage of a non-LTR retrotrans-
posable element undergoes several independent trans-
positions, it is expected to produce pseudogenelike cop-
ies with essentially identical sequences. After transpo-
sition, each one of these copies undergoes independent
neutral evolution, which should result in the accumula-
tion of unique point substitutions, deletions, and inser-
tions. If we sample enough independently transposed el-
ements to represent all of the active lineages, we can
expect that substitutions that are shared among two or
more elements will be those that have occurred in the
active lineages themselves. The element-specific substi-
tutions, on the other hand, are those that have accumu-
lated in neutral fashion since the time of transposition,
and these substitutions can be used to directly assess the
relative frequencies of different types of spontaneous
mutations. (For a more detailed account, see Petrov and
Hartl [1998].)
We have successfully applied this approach to a
non-LTR element, Helena, in the D. virilis group and,
in particular, have demonstrated a surprisingly high rate
294 Petrov and Hart
of accumulation of relatively large deletions and the vir-
tual absence of insertions (Petrov, Lozovskaya, and
Hartl 1996). If these results are general for most of the
DNA sequences in Drosophila, the fixation of deletions
would result in the accelerated loss of superfluous DNA
and would also help to explain the observed lack of
bona fide pseudogenes in Drosophila. We also hypoth-
esized that variations in the rate and the size of indels
might contribute to the divergence of genome size in
different lineages (the so-called ‘C-value’ paradox de-
fined by Thomas 1971) by affecting the amount of
‘junk’ DNA in the form of pseudogenes, long introns,
intergenic sequences, and so forth.
The validity of these predictions hinges in part on
whether the observed high rate of DNA loss in non-
functional copies of Helena is specific to a particular
363 bp of sequence in a particular species group or
whether the high rate of deletion is a more general phe-
nomenon. To address these concerns, we have studied
the pattern of indel formation and accumulation using a
different part of Helena in a phylogenetically distant
group of Drosophila, the D. melanogaster species sub-
group, which last shared a common ancestor with D.
virilis 40 MYA (Russo, Takezaki, and Nei 1995). The
results presented in this paper demonstrate that the pat-
tern of indel formation is very similar in the D. mela-
nogaster and D. virilis groups, suggesting that a high
rate of DNA loss is common in Drosophila.
Materials and Methods
DNA Source
Wild-type stocks of eight species in the D. mela-
nogaster species subgroup and a strain of D. pseudoob-
scura were obtained from the Drosophila species stock
center at Bowling Green, Ohio, or from the species col-
lection in our laboratory. We have utilized the following
stocks: D. erecta (14021-0224.0), D. sechellia (14021-
0248.1), D. orena (14021-0245.0), D. pseudoobscura
(14011-0121.0), D. mauritiana (w
pch
strain), D. simulans
(Congo14), D. yakuba (Tai26), D. teissieri (Brazza-
ville8), and D. melanogaster (Brazzaville).
Primers, PCR, and DNA Sequencing
Primer sequences were designed using Oligo
y
4.0
(National Biosciences) and supplied by Gibco. PCR re-
actions were carried out in PTC-100 thermal cyclers (MJ
Research) or on a Perkin-Elmer DNA thermal cycler.
DNA sequencing was carried out on an ABI373A au-
tomated DNA sequencer (Perkin Elmer) with the Taq
Cycle Sequencing Dye-Primer or DyeDeoxy Terminator
kits from Perkin Elmer. Every nucleotide position was
sequenced at least once in both direction.
All clones were obtained by cloning products of
PCR reactions carried out with the primers Helena96
(5
9
-CTAAATAACTGCCGAAACAT-3
9
) and Hele-
na1429 (5
9
-CCTTGCCGTTTGAGTCGTCT-3
9
). These
primers amplify an internal 1,357-bp region of the pu-
tative reverse transcriptase gene in Helena. The PCR
reactions were carried out under the following condi-
tions: 96
8
C for 30 s, 42
8
Cor52
8
C for 1 min, and 72
8
C
for 4 min. In all cases, total genomic DNA was used as
template for PCR. Cloning was carried out using the
TA-cloning kit (Invitrogen) without prior size fraction-
ation of the DNA. We picked 48 different clones from
each PCR reaction for further analysis.
All clones were tested for the presence and the
sizes of inserts using PCR with the M13 Universal and
Reverse primers. The majority of inserts were of the
predicted size of approximately 1,400 bp, with a few
clones being somewhat smaller (1,300–700 bp). After
sequencing multiple clones (approximately 12 per spe-
cies), we eliminated all identical sequences, because
those most likely correspond to the same genomic in-
sertion of Helena. We also eliminated all sequences that
did not align with the Helena consensus and are likely
a result of spurious PCR.
One important concern with using PCR to collect
multiple clones for the analysis of length mutations is
that the PCR is more efficient in amplifying markedly
smaller templates and could possibly bias our sampling
procedure. We have implemented a number of steps to
minimize any potential bias. First of all, we used long
extension times in the initial PCR reaction, which
should reduce any preferrential amplification of shorter
templates. Cloning and sequencing a large number of
clones and using one representative of each unique se-
quence for further analysis should also reduce the effect
of the bias for shorter sequences, because even if a
shorter sequence is overrepresented in the initial pool of
clones, as long as the longer sequence is present at least
once, it will also appear in the final data set. The size
distribution of clones after PCR did not reveal any sub-
stantial bias for shorter clones. In the actual experi-
ments, most clones were similar in size, ranging from
1,317 to 1,102 bp, making the presence of significant
PCR bias unlikely. In addition, we can directly show
that shorter clones have not been vastly overrepresented
in our sample. For instance, the clone H-sechellia455
(1,254 bp) was present twice in our sample, whereas H-
sechellia468 (752 bp, the shortest sequence in our data
set) was cloned only once. Furthermore, the rate of de-
letions relative to the point substitutions is identical for
the 11 shortest sequences in the data set and for the 11
longest ones.
Sequence Analysis
Alignment of all sequences was done with the aid
of the MacVector, Genejockey, and Sequencher 2.0
(GeneCodes) software packages. The sequence portions
corresponding to the primers Helena96 and Helena1496
were removed prior to analysis. The phylogenetic anal-
ysis used maximum parsimony carried out with the
PAUP software package (Swofford 1991). We used all
the characters in the nucleotide alignment at equal
weight. Deletions were treated as missing data. We also
used the MacClade software package (Maddison and
Maddison 1992) to aid in tree manipulations. All se-
quences were deposited in GenBank under the accession
numbers AF012030–AF012052. The alignment used in
this study can be downloaded from http://www.oeb.
harvard.edu/hartl/lab/dmitri.html.
DNA Loss in Drosophila 295
Statistical Methods
Relative rates of deletions versus nucleotide sub-
stitutions, and of deletions versus insertions, were esti-
mated using maximum likelihood under the assumptions
that (1) each element has no deletions or unique substi-
tutions at the time of transposition, (2) rates of deletions
and substitutions are constant in time, and (3) for any
given time, the number of deletions and the number of
substitutions follow a Poisson distribution. The confi-
dence limits were found using the
x
2
approximation of
the log-likelihood ratio. The positive correlation be-
tween the number of terminal branch substitutions and
the number of deletions was ascertained using Fried-
man’s test for randomized blocks (Sokal and Rohlf
1995).
When considering the rate of deletions per base
pair of DNA, one needs to take into account the order
in which the deletions have taken place. For instance,
suppose that two deletions of 1 and 500 bp occur in a
region originally of 1 kb in length. Depending on the
order in which the two deletions take place, the estimate
of the number of deletions per kilobase can be either 2
(when the 1-bp deletion happens first) or 3 (when the
500-bp deletion is first). This discrepancy is present, be-
cause if the 500-bp deletion happens first, then the 1-bp
deletion will have taken place in only 500 bp of the
remaining DNA, and thus we should count it as 2 events
per kilobase. Because we do not know the order of the
deletion events in the Helena sequence, we have esti-
mated the relative rate of indels using two procedures,
one which should underestimate and one which should
overestimate the rate. Together, these procedures give us
the minimum and maximum values of the rate of dele-
tion. The minimum estimate of deletion rate assumes
that all deletions take place simultaneously in a se-
quence of the original length. To arrive at the maximum
estimate, we assume that all deletions take place in a
clone of the final length, that is, after the total length of
all deletions has been subtracted from the original length
of the sequence. The difference between the minimum
and maximum estimates in our sample is around 15%
of the mean and is well within the 95% confidence in-
terval of either estimate. To be conservative, and also
for the sake of brevity, we report only minimum esti-
mates of the deletion rate.
The calculation of a half-life of a pseudogene was
done using a continuous decay formula: L
5
L
0
exp(
2
rt),
where L is the length of a pseudogene at time t, L
0
is
the length at time 0, and r is the deletion rate (product
of the average size of a deletion by the rate of deletions
per substitution or per year). With the rate of DNA loss
given in Myr, we used the following estimates of the
neutral substitution rates: 5
3
10
2
9
substitutions/year for
mammals and 15
3
10
2
9
substitutions/year for Dro-
sophila (Sharp and Li 1989).
Results and Discussion
Sampling and Phylogenetic Analysis of Helena in the
D. melanogaster Subgroup
Helena, a non-LTR retrotransposable element that
was originally identified in D. virilis (Petrov et al. 1995),
is widely distributed in the genus Drosophila (unpub-
lished data) and, in particular, is present in all species
of the D. melanogaster subgroup as well as in D. pseu-
doobscura. We have cloned and sequenced multiple in-
sertions of Helena in all eight species of the D. mela-
nogaster subgroup (D. orena, D. erecta, D. teissieri, D.
yakuba, D. melanogaster, D. sechellia, D. simulans, and
D. mauritiana) and have obtained a single clone from
D. pseudoobscura. The cloning procedure involved car-
rying out PCR reactions with two primers designed to
amplify the internal 1,357-bp region in the putative re-
verse transcriptase gene of Helena and then cloning the
products of the reaction and sequencing individual
clones. In an effort to sample independently transposed
elements, we used a single strain per species as template
DNA for PCR, which should reduce the probability of
vertical transmission and of resampling of the elements
that are present at the same site in the genome (Petrov
and Hartl 1998). On the other hand, because each spe-
cies carries multiple insertions of Helena, this procedure
is likely to result in sampling of the same insertion in
more than one clone. To minimize this problem, we ex-
cluded all identical sequences from our analysis. In this
way, we obtained 23 different sequences from eight spe-
cies in the D. melanogaster subgroup and 1 sequence
from D. pseudoobscura.
The alignment of the sampled sequences revealed
a large number of indels, including 64 apparent dele-
tions and 8 apparent insertions (fig. 1). However, outside
of the indels, the alignment was unambiguous. To fur-
ther investigate the evolution of Helena in the D. mel-
anogaster subgroup, we performed phylogenetic analy-
sis with all sampled sequences using maximum parsi-
mony. The resulting tree, in figure 1, is the strict con-
sensus of 12 equally parsimonious trees.
Evolutionary History of Helena in the D.
melanogaster Subgroup
The evolutionary history of each independently
transposed non-LTR element can be separated into two
distinct phases: (1) the evolution of an active lineage,
which is reflected in the sequence of each element at
the time of transposition, and (2) the pseudogenelike
neutral evolution of each element after transposition.
Because multiple independently transposed elements
generated from the same active lineage are expected to
have the same sequence at the time of transposition, it
follows that substitutions that occur in active lineages
must be shared among several elements. On the other
hand, the pseudogenelike evolution of each element af-
ter transposition should result in unique substitutions.
Based on this reasoning, we have argued in our previous
reports (Petrov, Lozovskaya, and Hartl 1996; Petrov and
Hartl 1998) that maximum parsimony should distinguish
between these two phases, such that the evolution of the
active lineages will be represented by substitutions that
are shared among several elements (these are changes
that map to the internal branches), whereas neutral drift
will be reflected in the element-specific terminal branch-
es. The basis of this separation is valid as long as all of
the active lineages are represented in our sample more
296 Petrov and Hart
F
IG
. 1.—Phylogenetic analysis and the locations and sizes of deletions in copies of Helena from the D. melanogaster species subgroup.
Schematic diagram of the location of deletions in the 22 aligned sequences of Helena are shown on the right. A sequence of Helena from D.
pseudoobscura was used to root the tree. Maximum-parsimony analysis was carried out using all positions in the nucleotide alignment at equal
weight. We have ignored the insertions and have treated deletions as ‘missing data.’ The number of unambiguous substitutions is shown above
each branch. Deletions are shown as filled-in black bars, with the length of each bar corresponding to the length of each deletion. Insertions
are represented by triangles. The bold lines indentify the ‘pseudogene’ branches (see text for explanation).
than once, as long as all the terminal sequences corre-
spond to independently transposed elements, and as long
as inactive elements cannot transpose in trans. (For a
more detailed account, see Petrov and Hartl [1998].)
Thus we expect to find evidence of purifying se-
lection acting along the internal branches of the tree and
evidence of a lack of constraint on the terminal branch-
es. The distribution of point substitutions along the
branches of the tree fits this expectation (fig. 2a): in the
internal branches there is a sharp excess of third-position
substitutions, indicating the action of purifying selection
against amino acid substitutions (
x
2
5
66.6, P
5
3.5
3
10
2
15
), whereas in the terminal branches, point substi-
tutions are distributed evenly among all codon positions
(
x
2
5
0.54, P
5
0.76). In one respect, however, the
distribution of indels and stop codons is not entirely
consistent with our expectations. Because most indels
and stop codons in the coding regions are likely to abol-
ish the activity of reverse transcriptase, we expect that
indels and stop codons should appear exclusively in the
terminal branches. The vast majority of indels and stop
codons (60 of 64 deletions, 7 of 8 insertions, and 17 of
18 stop codons) do map to the terminal branches. How-
ever, seven deletions, one insertion, and one stop codon
are shared among two to five different sequences.
The shared indels and stop codons can be mapped
onto the internal branches in a way that is completely
consistent with the tree without invoking parallel
changes or reversals. A key observation is that all of
them map to the branches connecting five elements,
namely mauritiana52, mauritiana58, simulans355, se-
chellia455, and sechellia469 (fig. 3). The presence of
indels and stop codons in the internal branches may im-
ply that these particular changes did not significantly
interfere with the transpositional competence of the ac-
tive lineage. This possibility implies that in the branches
shared between the elements sechellia455 and sechel-
lia469, three deletions (of 1, 4, and 24 bp) and one in-
sertion (of 1 bp), which together remove 28 bp of DNA
and result in a frameshift, nevertheless allowed two in-
dependent transpositions that produced sechellia455 and
sechellia469. Alternatively, it is possible that five of the
inconsistent elements correspond to a single transposi-
tional event, with subsequent vertical transmission and
resampling of the same element in five different cases.
There are a number of reasons why we favor the
resampling hypothesis. First, all of the indels and stop
codons are concentrated in one part of the tree. If some
indels and stop codons were consistent with transposi-
tion in general, we would expect to see them in other
parts of the tree as well. Second, all five of the elements
were sampled from a cluster of the very closely related
species D. mauritiana, D. simulans, and D. sechellia
(Lemeunier and Ashburner 1976), which makes sam-
pling of a vertically transmitted inactive element a more
likely scenario. Finally, vertical transmission of a single
insertion should lead to a general release from purifying
selection along the part of the tree that connects these
DNA Loss in Drosophila 297
F
IG
. 2.—Distribution of substitutions by codon position along
branches of the Helena gene tree. a, Distribution of substitutions along
the terminal and internal branches of the Helena gene tree. The internal
branches show clear signs of purifying selection (P
5
3.5
3
10
2
15
),
whereas the terminal branches do not (P
5
0.76). b, Distribution of
substitutions along the ‘pseudogene’ and ‘active’ branches of the
tree. The ‘pseudogene’ branches comprise all terminal branches com-
bined with the branches of the mauritiana52–sechellia469 clade (see
text and fig. 3). The ‘active’ branches comprise all of the internal
branches with the exclusion of the mauritiana52–sechellia469 clade.
Purifying selection is even more pronounced in the ‘active’’ branches
(P
5
5.4
3
10
2
21
) than in the internal branches and is absent in the
‘pseudogene’ branches (P
5
0.86). c, Lack of purifying selection
along the internal branches (P
5
0.85) and the terminal branches (P
5
0.63) of the mauritiana52–sechellia469 clade.
F
IG
. 3.—The mauritiana52–sechellia469 clade, which has all of
the shared deletions and insertions. Deletions are represented by filled-
in bars, insertions by open bars. Numbers of point substitutions map-
ping to each branch are indicated above each branch.
five elements. This prediction is consistent with the
presence of deletions and insertions, and it is also sup-
ported by the distribution of point substitutions along
the internal branches connecting these elements (fig. 2c):
the point substitutions are distributed evenly among the
three codon positions (
x
2
5
0.32, P
5
0.85).
We therefore conclude that the internal branches
leading to the five inconsistent elements appear to cor-
respond to the vertical evolution of a single insertion in
the common ancestor of D. mauritiana, D. sechellia, and
D. simulans. These branches are therefore combined
with all of the terminal branches to arrive at the set of
‘pseudogene’ branches that correspond to the pseudo-
genelike part of evolution in our sample of Helena in-
sertions. (These ‘pseudogene’’ branches are represented
by bold lines in fig. 1.)
The Numbers of Deletions and Point Substitutions Are
Positively Correlated Along the ‘Pseudogene’
Branches
The number of deletions, insertions, and point sub-
stitutions along each ‘pseudogene’ branch should be
proportional to the amount of time that has elapsed for
each element after its transposition. We therefore expect
to find a positive correlation between the numbers of
any two types of substitutions for any ‘pseudogene’
branch. We do find such a correlation for the numbers
of deletions and point substitutions (fig. 4; P
5
0.008,
Friedman’s method for randomized blocks). We cannot,
however, demonstrate a correlation between the number
of insertions and the number of point substitutions (P
5
0.38, Friedman’s method for randomized blocks), which
is probably due to the small number of insertions (8) in
the sample and, consequently, insufficient power to de-
tect a correlation. Visual inspection of figure 1 shows
that the distribution of insertions is at least consistent
with the model. Most insertions are present on the lon-
gest branches that also have a large number of substi-
tutions and deletions (yakuba387, simulans34); and the
shortest branches are free of insertions (mauritiana52,
yakuba383). In fact, the branches with insertions have,
on average, 2.8 times more point substitutions and 2.3
times more deletions than branches without insertions.
298 Petrov and Hart
F
IG
. 4.—The solid line shows the maximum-likelihood regression between the numbers of deletions and point substitutions along the
‘pseudogene’ branches. The dashed lines represent the 95% confidence interval of the rate of deletions relative to the number of substitutions.
The presence of the positive correlation also allows
us to estimate the relative mutation rates of deletions
and point substitutions. After correcting the number of
substitutions for the length of each copy of Helena se-
quences and for multiple hits using the one-parameter
Jukes-Cantor method, we arrive at the maximum-like-
lihood estimate of 0.12 deletions per substitution (the
95% confidence interval is 0.09–0.16). Note that this is
a conservative minimum estimate of the rate of deletions
(see Materials and Methods). This estimate of 0.12 de-
letions per substitution is marginally smaller than the
one we reported for the D. virilis group (0.16 deletions
per substitution), but it is not significantly different (G-
test, P
5
0.47). The combined rate of deletion in these
two groups is 0.13 deletions per substitution, with a 95%
confidence interval of 0.12–0.14 deletions per substitu-
tion.
Pattern of Indels in the D. melanogaster and D. virilis
Species Groups
Deletions in the D. melanogaster subgroup Helena
sample range from 1 to 432 bp, with a mean of 34 bp
and a standard deviation of 65 bp. The distribution of
deletion sizes is highly asymmetrical: 62% of all dele-
tions range from 1 to 20 bp, 19% range from 21 to 50
bp, and 19% range from 51 to 432 bp. Deletions of 1
bp are the most frequent; they account for 26% (17 of
64) of all deletions.
The lengths of the sequenced portions of Helena in
the D. melanogaster and D. virilis data sets are different
(1,317 bp compared to 363 bp), which prevents us from
comparing average lengths of deletions directly. The
problem is that we would expect to miss most of the
large deletions in the shorter sequences in the D. virilis
data set, because it is impossible to observe a deletion
larger than 363 bp in a sequence only 363 bp in length.
In order to make the distribution of deletion sizes in the
D. virilis and D. melanogaster data sets commensurable,
we used a sliding window of 363 bp and a step length
of 50 bp to extract all the deletions in the D. melano-
gaster data set that have both of their breakpoints inside
this window. The resulting simulated distribution of de-
letions is that expected from a sequenced region of Hel-
ena of 363 bp instead of one of 1,350 bp in species of
the D. melanogaster subgroup. The average size of de-
letions in the simulated D. melanogaster data set is 25.0
bp, which is very close to that observed in species of
the D. virilis group (24.3 bp). We can also compare the
shapes of the distribution of deletion size in the D. virilis
group and in the simulated D. melanogaster data set,
which appear very similar (fig. 5).
In contrast to the large number and the average size
of deletions, we observed only eight small insertions.
The insertions range from 1 to 7 bp and average 2.8
6
2.3 bp. The only insertion in the D. virilis data set is a
tandem duplication of 4 bp that falls in the middle of
the distribution of insertion sizes in the D. melanogaster
data set. However, given the extremely small number of
insertion events, no meaningful comparison of the in-
sertion sizes is possible.
We can also compare the relative frequencies of
deletions and insertions in the D. melanogaster sub-
group (64 deletions vs. 9 insertions) and in the D. virilis
group (23 deletions vs. 1 insertion). A G-test with the
Yates correction for the small number of insertions fails
to reveal significant heterogeneity (P
5
0.57).
We have previously reported that about half of the
deletions in the D. virilis data set can be inferred to have
been flanked by short 2–7-bp direct repeats, one of
which is deleted along with the intervening sequence
(Petrov, Lozovskaya, and Hartl 1996; Petrov and Hartl
1998); the only insertion in the D. virilis Helena data is
a tandem duplication of 4 bp. We observe an essentially
identical pattern in the D. melanogaster data set (data
not shown). Approximately half of all deletions are
DNA Loss in Drosophila 299
F
IG
. 5.—Distribution of deletion sizes in the D. virilis (Petrov, Lozovskaya, and Hartl 1996) and in the simulated D. melanogaster data
sets (see text for explanation). Each bar represents the proportion of deletions in the data set that fall within a particular size range. The D.
melanogaster deletions are represented by black bars, and the D. virilis deletions are represented by white bars.
flanked by direct duplications of 1–7 bp, and 6 of 9
insertions are tandem duplications of 1–7 bp. The sim-
ilarity of these pattern suggests that indels in the D.
melanogaster and D. virilis species groups are generated
by similar mechanisms. Combined with similar size dis-
tributions and rates of formation of indels in the two
groups, it appears that the patterns of indel evolution in
the D. virilis and D. melanogaster groups are indistin-
guishable.
Is the Apparent Size Distribution of Indels Biased by
Selection for Smaller Genome Size?
The central assumption of our study is that indi-
vidual insertions of non-LTR elements evolve neutrally
and accumulate point substitutions, deletions, and inser-
tions in proportion to the likelihood of their spontaneous
formation. We have supported this claim by demonstrat-
ing that point substitutions along ‘pseudogene’’ branch-
es map to the first, second, and third positions of codons
with equal probability, signifying a lack of purifying
selection on this sequence for the ability to produce the
functional reverse transcriptase. Because we sequenced
a part of the coding region and not a regulatory se-
quence, such as an enhancer, a silencer, or a binding site
for a chromatin-related protein, it is also unlikely that
the mere presence of this sequence would have any di-
rect biological activity and thus be affected by selection.
These considerations make it seem likely that the pattern
of point substitutions along ‘pseudogene’ branches in
our data set does indeed reflect the neutral pattern and
rate of mutation.
The situation is potentially more complicated in the
case of indels. Indels do not merely affect the function
of the gene in which they occur, they may also have
more global effects by, for example, changing the total
length of the genome. Differences in the total amount
of DNA can lead to variation in time of replication,
energy required for proper chromatin packaging, and so
forth, all of which can be nonneutral. If selection favors
smaller genome size, that in itself might bias our data
set toward larger deletions and against insertions of any
size. The greater efficacy of selection in Drosophila ow-
ing to much larger population size might then account
for the discrepancy in average deletion size between
Drosophila and mammals (Charlesworth 1996; Petrov,
Lozovskaya, and Hartl 1996; Petrov and Hartl 1998).
One prediction of this ‘selectionist’ model would
be that selection would tend to eliminate insertions of
individual elements in proportion to the total amount of
DNA that they add to the genome. Elements that ac-
cumulate more and, importantly, longer deletions will
be more likely to persist in populations for longer pe-
riods. We would therefore expect to observe a positive
correlation between the ages of individual elements and
the total number and the lengths of deletions. The age
of a pseudogenelike element is proportional to the num-
ber of point substitutions accumulated since transposi-
tion. The ‘‘neutralist’’ model, on the other hand, predicts
a positive correlation only between the number of point
substitution and the number of deletions, not between
the lengths of the deletions and the number of point
substitutions. Both deletions and point substitutions
should accumulate with time, but long and short dele-
tions should be observed in young and old elements
with equal likelihood.
As predicted by both models, numbers of deletions
and point substitutions do correlate in the Helena data
sets in both the D. virilis group (Petrov et al. 1996;
Petrov and Hartl 1998) and the D. melanogaster sub-
300 Petrov and Hart
F
IG
. 6.—a, Lack of correlation between the number of point substitutions and the sizes of deletions in individual elements in the D. virilis
data set (Petrov, Lozovskaya, and Hartl 1996) (Friedman’s method for randomized blocks, P
5
0.3). b, Lack of correlation between the number
of point substitutions and the sizes of deletions in the D. melanogaster data set (Friedman’s method for randomized blocks, P
5
0.9).
group (fig. 4; P
5
0.008, Friedman’s method for ran-
domized blocks). Neither data set, however, shows signs
of a positive correlation between the lengths of deletions
and the number of point substitutions (fig. 6). The ab-
sence of a detectable bias toward longer deletions in
older sequences argues that the observed pattern and
rate of indels in Drosophila is primarily the product of
spontaneous mutation biased toward frequent long de-
letions and rare short insertions. Selection for smaller
genome size may indeed be operating in Drosophila, but
it is apparently not very efficacious when applied to in-
dels of one to a few hundred base pairs.
High Rate of DNA Loss in Both the D. melanogaster
and D. virilis Species Groups
We have previously suggested that Drosophila ex-
hibit a high rate of DNA loss through the biased accu-
mulation of large deletions (Petrov, Lozovskaya, and
Hartl 1996; Petrov and Hartl 1998). We have based this
suggestion on the demonstration that a 363-bp region of
the reverse transcriptase gene in a non-LTR element
Helena in the D. virilis group preferentially accumulates
large deletions when the gene is relieved of functional
constraints. The validity of this claim depends on the
assumptions that (1) the analyzed 363-bp region of Hel-
DNA Loss in Drosophila 301
Table 1
Indel Evolution in Mammals and Drosophila
Drosophila Mammals
a
Significance of
Difference
Ratio of insertions to point substitutions....................... 0.015 (0.012–0.026)
b
0.010 (0.006–0.013) NS
c
Ratio of deletions to insertions .............................. 8.7 (6.2–17.8) 4.7 (3.1–7.3) NS
Ratio of deletions to point substitutions ....................... 0.13 (0.12–0.14) 0.049 (0.041–0.058) P
K
0.05
Mean deletion size (bp) .................................... 24.9
6
37.0 3.2
6
4.6 P
K
0.05
Mean insertion size (bp).................................... 2.9
6
2.3 2.4
6
2.1 (8.5
6
24.1)
d
NS
Half-life of a pseudogene (point substitutions per nucleotide) ..... 0.21 4.42 P
K
0.05
Half-life of a pseudogene (Myr) ............................. 14.3 884 P
K
0.05
a
Data are from Graur, Shuali, and Li (1989).
b
95% confidence interval of the estimate.
c
Not statistically significant.
d
Taking into account a single 125-bp insertion in the rat a-tubulin.
ena is an unbiased representative of most of the DNA
sequences in Drosophila, and (2) the mutational pattern
of indels in the D. virilis group is not significantly dif-
ferent from that in Drosophila in general.
To test these assumptions, we analyzed the pattern
of indels using a different part of Helena in the D. mel-
anogaster subgroup, which is distantly related to D. vi-
rilis. The main conclusion is that, in all respects, the
patterns of spontaneous formation of indels in these two
groups are indistinguishable. We did not detect any sig-
nificant differences in the relative frequencies of dele-
tions and insertions, the relative rates of deletions and
point substitutions, or the size distributions of indels. In
addition, indels in both groups are likely to be formed
by similar mechanisms, as indicated by the presence of
short direct repeats flanking many of the deletions in
both data sets. The fact that we have observed such
similar patterns of indel formation in two unrelated se-
quences boosts our confidence that this pattern is general
for a large proportion of sequences in the Drosophila
genome.
Drosophila melanogaster and D. virilis belong to
different subgenera of Drosophila, Sophophora and
Drosophila, respectively. They last shared a common
ancestor approximately 40 MYA and represent one of
the deepest splits in the drosophilid phylogeny (Russo,
Takezaki, and Nei 1995). Hence the similarity of the
patterns of indel formation argues strongly that the high
rate of DNA loss is prevalent and probably ancestral for
all drosophilids.
Because the D. melanogaster and D. virilis data
sets are so similar, we can combine them to more ac-
curately compare indel evolution in Drosophila and
mammals (Graur, Shuali, and Li 1989). Table 1 sum-
marizes these comparisons. For insertions, there is no
profound difference between Drosophila and mammals.
Both the relative rates of insertions compared to point
substitutions, and the average sizes of insertions are very
similar. Deletions, on the other hand, are both more
prevalent and much larger in Drosophila than in mam-
mals. Because insertions are so infrequent and short
compared to deletions, we ignored them in our estima-
tion of the rate of DNA loss.
The relative rate of deletions per point substitution
is 2.6 times higher in Drosophila than in mammals.
Even more pronounced is the difference in the average
sizes of deletions, which are almost eight times larger
in Drosophila. The higher rate of formation and the larg-
er average size of deletions combine to eliminate DNA
approximately 20-fold faster in Drosophila than in
mammals. Taking into account that the rate of point sub-
stitutions is about threefold higher in Drosophila than
in mammals (Sharp and Li 1989), we estimate that Dro-
sophila loses nonessential DNA at a rate that is approx-
imately 60 times higher than that in mammals. Thus, a
pseudogene fixed in a mammalian lineage is expected
on average to lose half of its DNA in 884 Myr—an
extremely long period, even on an evolutionary time-
scale—and it will become unrecognizable owing to
point substitutions long before then. In contrast, a Dro-
sophila pseudogene is expected to lose half of its DNA
in only 14.3 Myr. To put this in the context of Dro-
sophila evolution, the evolutionary distance between D.
melanogaster and D. yakuba is approximately 12 Myr,
so homologous pseudogenes in D. melanogaster and D.
yakuba will share only 56% of their DNA and would
be unlikely to either cross hybridize or even be alignable
should they be sequenced. If the rates of pseudogene
formation in mammals and Drosophila are similar, the
higher rate of DNA elimination will significantly reduce
the probability of observing a pseudogene in Drosophila
at any given time.
Variation in the rate of DNA loss among different
lineages may also contribute to the differences in ge-
nome size by affecting the amount of superfluous DNA
in the form of pseudogenes, long introns, intergenic
regions, and so forth. If this is true, then lineages with
high rates of DNA loss should have small, ‘tidy’ ge-
nomes with few pseudogenes and short introns, whereas
lineages with low rates of DNA loss should have large,
‘messy’ genomes with large proportions of ‘junk’
DNA of all kinds. The absence of research that com-
bines measurements of the amount of ‘junk’ DNA, ge-
nome size, and estimates of the rate of spontaneous
DNA loss due to biased mutation in different lineages
precludes immediate evaluation of this hypothesis. The
simplicity of estimating relative rates and size distribu-
tions of deletions and insertions using non-LTR retro-
transposable elements (Petrov, Lozovskaya, and Hartl
1996; Petrov and Hartl 1998), combined with their ex-
302 Petrov and Hart
tremely wide phylogenetic distribution (Kimmel, Ole-
Moiyoi, and Young 1987; Schwarz-Sommer et al. 1987;
Finnegan 1989a, 1989b; Hutchison et al. 1989; Cam-
bareri, Helber, and Kinsey 1994), should prove useful in
resolving these kinds of issues.
Acknowledgments
We thank M. Siegal, D. Weinrech, R. Lewontin, P.
Goss, and members of our laboratory for helpful dis-
cussions. Comments by C. Aquadro and two anonymous
reviewers substantially improved the manuscript. This
work was supported by NIH grants GM33741 and
HG01250.
LITERATURE CITED
B
EGUN
, D. 1997. Origin and evolution of a new gene descend-
ed from alcohol dehydrogenase in Drosophila. Genetics.
145:375–382.
C
AMBARERI
, E. B., J. H
ELBER
, and J. A. K
INSEY
. 1994. Tad1-
1, an active LINE-like element of Neurospora crassa. Mol.
Gen. Genet. 242:658–665.
C
HARLESWORTH
, B. 1996. The changing sizes of genes. Nature.
384:315–316.
F
INNEGAN
, D. J. 1989a. F and related elements in Drosophila
melanogaster. Pp. 519–522 in D. E. B
ERG
and M. M.
H
OWE
, eds. Mobile DNA. American Society for Microbi-
ology, Washington, D.C.
. 1989b. The I factor and I-R hybrid dysgenesis in Dro-
sophila melanogaster. Pp. 503–518 in D. E. B
ERG
and M.
M. H
OWE
, eds. Mobile DNA. American Society for Micro-
biology, Washington, D.C.
G
RAUR
, D., Y. S
HUALI
, and W.-H. L
I
. 1989. Deletions in pro-
cessed pseudogenes accumulate faster in rodents than in hu-
mans. J. Mol. Evol. 28:279–285.
H
UTCHISON
, C. A. III, S. C. H
ARDIES
,D.D.L
OEB
,W.R.S
HE
-
HEE
, and M. H. E
DGELL
. 1989. LINEs and related retropo-
sons: long interspersed repeated sequences in the eukaryotic
genome. Pp. 593–618 in D. E. B
ERG
and M. M. H
OWE
, eds.
Mobile DNA. American Society for Microbiology, Wash-
ington, D.C.
J
EFFS
, P., and M. A
SHBURNER
. 1991. Processed pseudogenes in
Drosophila. Proc. R. Soc. Lond. B. 244:151–159.
K
IMMEL
, B. E., O. K. O
LE
-M
OIYOI
, and J. R. Y
OUNG
. 1987.
Ingi, a 5.2-kb dispersed sequence element from Trypano-
soma brucei that carries half of a smaller mobile element
at either end and has homology with mammalian LINEs.
Mol. Cell. Biol. 7:1465–1475.
L
EMEUNIER
, F., and M. A
SHBURNER
. 1976. Relationships within
the melanogaster species subgroup of the genus Drosophila
(Sophophora). II. Phylogenetic relationships between six
species based upon polytene chromosome banding sequenc-
es. Proc. R. Soc. Lond. B 193:275–294.
L
ONG
, M. Y., and C. H. L
ANGLEY
. 1993. Natural selection and
the origin of jingwei, a chimeric processed functional gene
in Drosophila. Science 260:91–95.
M
ADDISON
, W. P., and D. R. M
ADDISON
. 1992. MacClade. Ver-
sion 3. Sinauer, Sunderland, Mass.
P
ETROV
, D. A., and D. L. H
ARTL
. 1998. Trash DNA is what
gets thrown away: high rate of DNA loss in Drosophila.
Gene (in press).
P
ETROV
, D. A., E. R. L
OZOVSKAYA
, and D. L. H
ARTL
. 1996.
High intrinsic rate of DNA loss in Drosophila. Nature 384:
346–349.
P
ETROV
, D. A., J. L. S
CHUTZMAN
,D.L.H
ARTL
, and E. R.
L
OZOVSKAYA
. 1995. Diverse transposable elements are mo-
bilized in hybrid dysgenesis in Drosophila virilis. Proc.
Natl. Acad. Sci. USA 92:8050–8054.
R
USSO
, C. A. M., N. T
AKEZAKI
, and M. N
EI
. 1995. Molecular
phylogeny and divergence times of drosophilid species.
Mol. Biol. Evol. 12:391–404.
S
CHWARZ
-S
OMMER
, Z., L. L
ECLERCQ
,E.G
OBEL
, and H. S
AE
-
DLER
. 1987. Cin4, an insert altering the structure of the A1
gene in Zea mays, exhibits properties of nonviral retrotrans-
posons. EMBO J. 6:3873–3880.
S
HARP
, P. M., and W.-H. L
I
. 1989. On the rate of DNA se-
quence evolution in Drosophila. J. Mol. Evol. 28:398–402.
S
OKAL
, R. R., and F. J. R
OHLF
. 1995. Biometry: the principles
and practice of statistics in biological research. W. H. Free-
man, New York.
S
ULLIVAN
,D.T.,W.T.S
TARMER
,S.W.C
URTISS
,M.M
ENOTTI
-
R
AYMOND
, and J. Y
UM
. 1994. Unusual molecular evolution
of an Adh pseudogene in Drosophila. Mol. Biol. Evol. 11:
443–458.
S
WOFFORD
, D. L. 1991. PAUP: phylogenetic analysis using
parsimony. Version 3.0s. Illinois Natural History Survey,
Champaign.
T
HOMAS
, C. A. 1971. The genetic organization of chromo-
somes. Annu. Rev. Genet. 5:237–256.
W
EINER
, A. M., P. L. D
EININGER
, and A. E
FSTRATIADIS
. 1986.
Nonviral retroposons: genes, pseudogenes, and transposable
elements generated by the reverse flow of genetic infor-
mation. Annu. Rev. Biochem. 55:631–661.
C
HARLES
F. A
QUADRO
, reviewing editor
Accepted November 10, 1997
... Duplication, reshuffling, transposition, retrotransposition, chimeric phenomena account for most new genes (Andersson et al. 2015;Schlotterer 2015;VanKuren and Long 2018;Zhao et al. 2021), but small noncoding loci like miRNAs may represent the most common source of de novo genes (Lu et al. 2008b;Lyu et al. 2014;Zhao et al. 2021). Most miRNAs arising de novo are probably functionless (Lu et al. 2008b;Berezikov et al. 2010) or even dead-on-arrival (Petrov et al. 1996;Petrov and Hartl 1998), but many may become adaptive miRNAs (Lu et al. 2008a;Mohammed et al. 2014, Lyu et al. 2014Zhao et al. 2021). ...
Article
Several functional classes of short noncoding RNAs are involved in manifold regulatory processes in eukaryotes, including, among the best characterized, miRNAs. One of the most intriguing regulatory networks in the eukaryotic cell is the mito-nuclear crosstalk: recently, miRNA-like elements of mitochondrial origin, called smithRNAs, were detected in a bivalve species, Ruditapes philippinarum. These RNA molecules originate in the organelle but were shown in vivo to regulate nuclear genes. Since miRNA genes evolve easily de novo with respect to protein-coding genes, in the present work we estimate the probability with which a newly arisen smithRNA finds a suitable target in the nuclear transcriptome. Simulations with transcriptomes of 12 bivalve species suggest that this probability is high and not species specific: one in a hundred million (1 × 10-8) if five mismatches between the smithRNA and the 3' mRNA are allowed, yet many more are allowed in animals. We propose that novel smithRNAs may easily evolve as exaptation of the pre-existing mitochondrial RNAs. In turn, the ability of evolving novel smithRNAs may have played a pivotal role in mito-nuclear interactions during animal evolution, including the intriguing possibility of acting as speciation trigger.
... papatasi genome also could be due to recently active TEs. Alternatively, genomic differences in TE content might be the result of intrinsic genomic deletion patterns in Lu. longipalpis, due to the effective recognition and elimination machinery removing these foreign sequences from the genome, as has been shown to occur in Drosophila species [99]. ...
Article
Full-text available
Phlebotomine sand flies are of global significance as important vectors of human disease, transmitting bacterial, viral, and protozoan pathogens, including the kinetoplastid parasites of the genus Leishmania, the causative agents of devastating diseases collectively termed leishmaniasis. More than 40 pathogenic Leishmania species are transmitted to humans by approximately 35 sand fly species in 98 countries with hundreds of millions of people at risk around the world. No approved efficacious vaccine exists for leishmaniasis and available therapeutic drugs are either toxic and/or expensive, or the parasites are becoming resistant to the more recently developed drugs. Therefore, sand fly and/or reservoir control are currently the most effective strategies to break transmission. To better understand the biology of sand flies, including the mechanisms involved in their vectorial capacity, insecticide resistance, and population structures we sequenced the genomes of two geographically widespread and important sand fly vector species: Phlebotomus papatasi, a vector of Leishmania parasites that cause cutaneous leishmaniasis, (distributed in Europe, the Middle East and North Africa) and Lutzomyia longipalpis, a vector of Leishmania parasites that cause visceral leishmaniasis (distributed across Central and South America). We categorized and curated genes involved in processes important to their roles as disease vectors, including chemosensation, blood feeding, circadian rhythm, immunity, and detoxification, as well as mobile genetic elements. We also defined gene orthology and observed micro-synteny among the genomes. Finally, we present the genetic diversity and population structure of these species in their respective geographical areas. These genomes will be a foundation on which to base future efforts to prevent vector-borne transmission of Leishmania parasites.
... However, the multispecies analyses conducted in this study found that codon usage was discrepant in species with either the same or different GC content. In addition to GC content, several other factors may also affect codon usage, such as species self-selection [44], mutation bias, insertion bias [45], strand-specific nucleotide bias [46], CpG bias [47], GC/AT bias [34,48] and so on. Based on the usage profiles of over 7270 species, we identified two correlated clusters for codons and amino acids separately ( Figure 4). ...
Article
Full-text available
The mechanisms shaping the amino acids recruitment pattern into the proteins in the early life history presently remains a huge mystery. In this study, we conducted genome-wide analyses of amino acids usage and genetic codons structure in 7270 species across three domains of life. The carried-out analyses evidenced ubiquitous usage bias of amino acids that were likely independent from codon usage bias. Taking advantage of codon usage bias, we performed pseudotime analysis to re-determine the chronological order of the species emergence, which inspired a new species relationship by tracing the imprint of codon usage evolution. Furthermore, the multidimensional data integration showed that the amino acids A, D, E, G, L, P, R, S, T and V might be the first recruited into the last universal common ancestry (LUCA) proteins. The data analysis also indicated that the remaining amino acids most probably were gradually incorporated into proteogenesis process in the course of two long-timescale parallel evolutionary routes: I→F→Y→C→M→W and K→N→Q→H. This study provides new insight into the origin of life, particularly in terms of the basic protein composition of early life. Our work provides crucial information that will help in a further understanding of protein structure and function in relation to their evolutionary history.
... These rearrangements are unlikely to be an artifact of the plasmid assay, since they were detected neither in plasmids isolated from embryos co-injected with Rad18 WT , nor from UV-irradiated embryos. In this respect, genomic deletions have been observed in the S subgenome of adult X. laevis (56) as well as in D. melanogaster (57), suggesting that such genomic rearrangements might be generated naturally during evolution in these organisms. Although Y-family TLS Pols can generate deletions, their extent is rather small (1-3 bp), implying other mechanisms such as replication fork instability, which is a common feature of DNA damage checkpoint inefficiency (5,58,59). ...
Article
Full-text available
In early embryogenesis of fast cleaving embryos, DNA synthesis is short and surveillance mechanisms preserving genome integrity are inefficient, implying the possible generation of mutations. We have analyzed mutagenesis in Xenopus laevis and Drosophila melanogaster early embryos. We report the occurrence of a high mutation rate in Xenopus and show that it is dependent upon the translesion DNA synthesis (TLS) master regulator Rad18. Unexpectedly , we observed a homology-directed repair contribution of Rad18 in reducing the mutation load. Genetic invalidation of TLS in the pre-blastoderm Drosophila embryo resulted in reduction of both the hatching rate and single-nucleotide variations on pericentromeric heterochromatin in adult flies. Altogether , these findings indicate that during very early Xenopus and Drosophila embryos TLS strongly contributes to the high mutation rate. This may constitute a previously unforeseen source of genetic diversity contributing to the polymorphisms of each individual with implications for genome evolution and species adaptation.
... [40]). While S. oryzae's TE density and distribution evokes the architecture of mammalian genomes, this relatively younger TE landscape suggests higher deletion rates, and possibly a higher TE turnover rate, as observed in Drosophila [255,256]. LINEs and DNA transposons have the wider spectrum of divergence levels, suggesting an aggregation of distinct dynamics for the TE families present in S. oryzae. By contrast, the rare LTR copies identified appear to be the most homogeneous within families, with only a few substitutions between copies and their consensuses, suggesting a very recent amplification in this subclass. ...
Article
Full-text available
Background The rice weevil Sitophilus oryzae is one of the most important agricultural pests, causing extensive damage to cereal in fields and to stored grains. S. oryzae has an intracellular symbiotic relationship (endosymbiosis) with the Gram-negative bacterium Sodalis pierantonius and is a valuable model to decipher host-symbiont molecular interactions. Results We sequenced the Sitophilus oryzae genome using a combination of short and long reads to produce the best assembly for a Curculionidae species to date. We show that S. oryzae has undergone successive bursts of transposable element (TE) amplification, representing 72% of the genome. In addition, we show that many TE families are transcriptionally active, and changes in their expression are associated with insect endosymbiotic state. S. oryzae has undergone a high gene expansion rate, when compared to other beetles. Reconstruction of host-symbiont metabolic networks revealed that, despite its recent association with cereal weevils (30 kyear), S. pierantonius relies on the host for several amino acids and nucleotides to survive and to produce vitamins and essential amino acids required for insect development and cuticle biosynthesis. Conclusions Here we present the genome of an agricultural pest beetle, which may act as a foundation for pest control. In addition, S. oryzae may be a useful model for endosymbiosis, and studying TE evolution and regulation, along with the impact of TEs on eukaryotic genomes.
... D. melanogaster has a particularly small genome, even among insects (Hanrahan and Johnston 2011). Deletions predominate over insertions in sequences not under selective pressure (Petrov and Hartl 1998;Petrov 2002a). There are very few transposon insertions except in the heterochromatic regions near centromeres and telomeres, and on the fourth and the Y Chromosomes (https://www.flybase.org). ...
Article
Full-text available
Eukaryotic genomes typically show a uniform G + C content among chromosomes, but on smaller scales, many species have a G + C density that fluctuates with a characteristic wavelength. This oscillation is evident in many insect species, with wavelengths ranging between 700 bp and 4 kb. Measures of evolutionary conservation oscillate in phase with G + C content, with conserved regions having higher G + C. Loci with large regulatory regions show more regular oscillations; coding sequences and heterochromatic regions show little or no oscillation. There is little oscillation in vertebrate genomes in regions with densely distributed mobile repetitive elements. However, species with few repeats show oscillation in both G + C density and sequence conservation. These oscillations may reflect optimal spacing of cis -regulatory elements.
Article
Full-text available
A central goal in evolutionary biology is to determine the predictability of adaptive genetic changes. Despite many documented cases of convergent evolution at individual loci, little is known about the repeatability of gene family expansions and contractions. To address this void, we examined gene family evolution in the redheaded pine sawfly Neodiprion lecontei , a noneusocial hymenopteran and exemplar of a pine‐specialized lineage evolved from angiosperm‐feeding ancestors. After assembling and annotating a draft genome, we manually annotated multiple gene families with chemosensory, detoxification, or immunity functions before characterizing their genomic distributions and molecular evolution. We find evidence of recent expansions of bitter gustatory receptor, clan 3 cytochrome P450, olfactory receptor, and antimicrobial peptide subfamilies, with strong evidence of positive selection among paralogs in a clade of gustatory receptors possibly involved in the detection of bitter compounds. In contrast, these gene families had little evidence of recent contraction via pseudogenization. Overall, our results are consistent with the hypothesis that in response to novel selection pressures, gene families that mediate ecological interactions may expand and contract predictably. Testing this hypothesis will require the comparative analysis of high‐quality annotation data from phylogenetically and ecologically diverse insect species and functionally diverse gene families. To this end, increasing sampling in under‐sampled hymenopteran lineages and environmentally responsive gene families and standardizing manual annotation methods should be prioritized.
Preprint
Full-text available
Phlebotomine sand flies are of global significance as important vectors of human disease, transmitting bacterial, viral, and protozoan pathogens, including the devastating kinetoplastid parasites of the genus Leishmania , the causative agents of diseases collectively termed leishmaniasis. More than 40 pathogenic Leishmania species are transmitted to humans by approximately 35 sand fly species in 98 countries with hundreds of millions of people at risk around the world. As no approved efficacious vaccine exists, available drugs are expensive and/or toxic, and resistance is emerging, management of sand fly populations to break transmission is currently the most effective disease control strategy. To better understand the biology of sand flies, including the mechanisms involved in their vectorial capacity, insecticide resistance, and population structures we sequenced the genomes of two of the most important sand fly species: Phlebotomus papatasi , a cutaneous leishmaniasis vector, (distributed in the Middle East and North Africa) and Lutzomyia longipalpis, a visceral leishmaniasis vector (distributed across Central and South America). We categorized and curated genes involved in processes important to their roles as disease vectors, including chemosensation, blood feeding, circadian rhythm, immunity, and detoxification, as well as mobile genetic elements. We also defined gene orthology and observed micro-synteny among the genomes. Finally, we present the genetic diversity and population structure of these species in their respective geographical areas. These genomes will be a foundation on which to base future efforts to prevent vector-borne transmission of Leishmania parasites. Author Summary The leishmaniases are a group of neglected tropical diseases caused by protist parasites from the Genus Leishmania . Different Leishmania species present a wide clinical profile, ranging from mild, often self-resolving cutaneous lesions that can lead to protective immunity, to severe metastatic mucosal disease, to visceral disease that is ultimately fatal. Leishmania parasites are transmitted by the bites of sand flies, and as no approved vaccine exists, available drugs are toxic and/or expensive and resistance is emerging, new dual control strategies to combat these diseases must be developed, combining interventions on human infections and integrated sand fly population management. Effective vector control requires a good understanding of the biology of sand flies. To this end, we sequenced and annotated the genomes of two sand fly species that are important leishmaniasis vectors from the Old and New Worlds. These genomes allow us to better understand, at the genetic level, processes important in the vector biology of these species, such as finding hosts, blood-feeding, immunity, and detoxification. These genomic resources highlight the driving forces of evolution of two major Leishmania vectors and provide foundations for future research on how to better prevent leishmaniasis by control of the sand fly vectors.
Article
Salt taste is one of the most ancient of all sensory modalities. However, the molecular basis of salt taste remains unclear in invertebrates. Here, we show that the response to low, appetitive salt concentrations in Drosophila depends on Ir56b, an atypical member of the ionotropic receptor (Ir) family. Ir56b acts in concert with two coreceptors, Ir25a and Ir76b. Mutation of Ir56b virtually eliminates an appetitive behavioral response to salt. Ir56b is expressed in neurons that also sense sugars via members of the Gr (gustatory receptor) family. Misexpression of Ir56b in bitter-sensing neurons confers physiological responses to appetitive doses of salt. Ir56b is unique among tuning Irs in containing virtually no N-terminal region, a feature that is evolutionarily conserved. Moreover, Ir56b is a “pseudo-pseudogene”: its coding sequence contains a premature stop codon that can be replaced with a sense codon without loss of function. This stop codon is conserved among many Drosophila species but is absent in a number of species associated with cactus in arid regions. Thus, Ir56b serves the evolutionarily ancient function of salt detection in neurons that underlie both salt and sweet taste modalities.
Article
Fifteen lines each of Drosophila melanogaster, D. simulans, and D. sechellia were scored for 19 microsatellite loci. One to four alleles of each locus in each species were sequenced, and microsatellite variability was compared with sequence structure. Only 7 loci had their size variation among species consistent with the occurrence of strictly stepwise mutations in the repeat array, the others showing extensive variability in the flanking region compared to that within the microsatellite itself. Polymorphisms apparently resulting from complex nonstepwise mutations involving the microsatellite were also observed, both within and between species. Maximum number of perfect repeats and variance of repeat count were found to be strongly correlated in microsatellites showing an apparently stepwise mutation pattern. These data indicate that many microsatellite mutation events are more complex than represented even by generalized stepwise mutation models. Care should therefore be taken in inferring population or phylogenetic relationships from microsatellite size data alone. The analysis also indicates, however, that evaluation of sequence structure may allow selection of microsatellites that more closely match the assumptions of stepwise models.
Article
Full-text available
A dispersed repetitive element named ingi, which is present in the genome of the protozoan parasite Trypanosoma brucei, is described. One complete 5.2-kilobase element and the ends of two others were sequenced. There were no direct or inverted terminal repeats. Rather, the ends consisted of two halves of a previously described 512-base-pair transposable element (G. Hasan, M.J. Turner, and J.S. Cordingley, Cell 37:333-341, 1984). Oligo(dA) tails and possible insertion site duplications suggested that ingi is a retroposon. The sequenced element appears to be a pseudogene copy of an original retroposon with one or more open reading frames occupying most of its length. Significant homologies of the encoded amino acid sequences with reverse transcriptases and mammalian long interpersed nuclear element sequences suggest a remote evolutionary origin for this kind of retroposon.
Article
The melanogaster species subgroup of Drosophila comprises six sibling species. The interrelationship between these species has been studied by analysis of the banding patterns of their polytene chromosomes. The species fall into two groups: (1) melanogaster, simulans and mauritiana and (2) erecta, teissieri and yakuba. The former group are chromosomally closely related, indeed simulans and mauritiana are homosequential. The latter group (all African endemic species) are less closely related although they all share eight autosomal inversions of the standard (i.e. melanogaster) sequence. From this shared sequence the chromosomes of the three African endemic species have diverged considerably by many paracentric inversions. Both D. teissieri and D. yakuba are polymorphic; we describe nine and four inversion sequences in them respectively. D. erecta is monomorphic although our sample size is very small (only two populations). We discuss both the origin of interspecific inversions, especially the problem of inversion breakpoint coincidence, and the light this study throws upon evolutionary relationships within this group of species.
Article
Two species of Drosophila, D. yakuba and D. teissieri, possess pseudogenes of Adh. These pseudogenes lack introns and map to chromosome arm 3R, rather than to chromosome arm 2L, wherein are located the functional Adh genes. Their structure suggests that the pseudogenes arose from reverse transcripts. Because the pseudogenes map to homologous sites in both species, they presumably arose before these species diverged. Remarkably, the pattern of base substitution in the pseudogenes differs between sites that correspond to degenerate and non-degenerate codon positions in their functional paralogs.
Article
The relative rates of point nucleotide substitution and accumulation of gap events (deletions and insertions) were calculated for 22 human and 30 rodent processed pseudogenes. Deletion events not only outnumbered insertions (the ratio being 7:1 and 3:1 for human and rodent pseudogenes, respectively), but also the total length of deletions was greater than that of insertions. Compared with their functional homologs, human processed pseudogenes were found to be shorter by about 1.2%, and rodent pseudogenes by about 2.3%. DNA loss from processed pseudogenes through deletion is estimated to be at least seven times faster in rodents than in humans. In comparison with the rate of point substitutions, the abridgment of pseudogenes during evolutionary times is a slow process that probably does not retard the rate of growth of the genome due to the proliferation of processed pseudogenes.
Article
Analysis of the rate of nucleotide substitution at silent sites in Drosophila genes reveals three main points. First, the silent rate varies (by a factor of two) among nuclear genes; it is inversely related to the degree of codon usage bias, and so selection among synonymous codons appears to constrain the rate of silent substitution in some genes. Second, mitochondrial genes may have evolved only as fast as nuclear genes with weak codon usage bias (and two times faster than nuclear genes with high codon usage bias); this is quite different from the situation in mammals where mitochondrial genes evolve approximately 5-10 times faster than nuclear genes. Third, the absolute rate of substitution at silent sites in nuclear genes in Drosophila is about three times higher than the average silent rate in mammals.