Microsatellite is an important component of complete hepatitis C virus genomes.
ABSTRACT Microsatellites are common and play diverse roles in eukaryotic and prokaryotic genomes. However, to our knowledge, microsatellite distribution remains largely enigmatic in viruses yet is crucial for understanding instability of viral genomes. We have therefore, examined microsatellite distribution in 54 complete genomes of Hepatitis C virus (HCV) from six genotypes, showing microsatellites were an important component of HCV genomes. Our results showed, in all analyzed HCV genomes, genome size and GC content had a weak influence on number, relative abundance and relative density of microsatellites, respectively. For each HCV genome, mono-, di- and trinucleotide repeats were very predominant, whereas other types of repeats rarely occurred. Our results revealed that the occurrence of microsatellites was significantly less than higher prokaryotes and eukaryotes and that all identified microsatellites were very short. The discovery of microsatellites in HCV genomes may become useful for population genetic, evolutionary analysis and strain (isolate) identification.
-
Citations (0)
-
Cited In (0)
Page 1
Microsatellite is an important component of complete Hepatitis C virus genomes
Ming Chena,b, Zhongyang Tanc,⇑, Guangming Zenga,b,⇑
aCollege of Environmental Science and Engineering, Hunan University, Changsha 410082, China
bKey Laboratory of Environmental Biology and Pollution Control, Hunan University, Ministry of Education, Changsha 410082, China
cCollege of Biology, State Key Laboratory for Chemo/Biosensing and Chemometrics, Hunan University, Changsha 410082, China
a r t i c l ei n f o
Article history:
Received 11 January 2011
Received in revised form 2 June 2011
Accepted 16 June 2011
Available online 23 June 2011
Keywords:
Microsatellite
Hepatitis C virus
Simple sequence repeat
Comparative genomics
a b s t r a c t
Microsatellites are common and play diverse roles in eukaryotic and prokaryotic genomes. However, to
our knowledge, microsatellite distribution remains largely enigmatic in viruses yet is crucial for under-
standing instability of viral genomes. We have therefore, examined microsatellite distribution in 54 com-
plete genomes of Hepatitis C virus (HCV) from six genotypes, showing microsatellites were an important
component of HCV genomes. Our results showed, in all analyzed HCV genomes, genome size and GC con-
tent had a weak influence on number, relative abundance and relative density of microsatellites, respec-
tively. For each HCV genome, mono-, di- and trinucleotide repeats were very predominant, whereas other
types of repeats rarely occurred. Our results revealed that the occurrence of microsatellites was signifi-
cantly less than higher prokaryotes and eukaryotes and that all identified microsatellites were very short.
The discovery of microsatellites in HCV genomes may become useful for population genetic, evolutionary
analysis and strain (isolate) identification.
? 2011 Elsevier B.V. All rights reserved.
1. Introduction
Microsatellites or simple sequence repeats (SSRs) consist of
mono-, di-, tri-, tetra-, penta- and hexanucleotide repeats (Chen
et al., 2011a, 2010), being highly polymorphic in eukaryotic
(Ellegren, 2004; Li et al., 2004; Rajendrakumar et al., 2007) and
prokaryotic genomes (Gur-Arie et al., 2000). The abundance of
these six types of microsatellites varies between different species
(Karaoglu et al., 2005). Microsatellites are found in diverse regions
of genomes, including 30-UTR, 50-UTR, exon and intron (Li et al.,
2004; Rajendrakumar et al., 2007; Toth et al., 2000). Triplet repeats
are more common than non-triplet repeats in coding regions in
eukaryotes, due to the fact that length changes in non-triplet re-
peats may lead to frameshift mutations in coding regions (Ellegren,
2004; Li et al., 2004). The most common microsatellite motifs may
be different in various species. For example, in Aspergillus nidulans,
the most common microsatellite motifs are AT/TA repeats, whereas
AG/GA repeats are most abundant in Fusarium graminearum
(Karaoglu et al., 2005). Genome size and GC content have been
shown to have a certain influence on the occurrence of microsatel-
lites in several species (Coenye and Vandamme, 2005; Dieringer
and Schlotterer, 2003). Strand slippage and unequal recombination
have been proposed to explain microsatellite instability (Toth
et al., 2000). Intrinsic features of microsatellites (repeat number,
length and motif size) have the strongest influence on the micro-
satellite mutability, whereas regional genomic factors have only
minor effects (Kelkar et al., 2008). Mutability of microsatellites
grows with the number of repeats, most likely because of an in-
crease in the probability of slippage (Pearson et al., 2005). Imper-
fection in microsatellites is thought to influence replication
slippage by limiting expansion of microsatellite size (Mudunuri
and Nagarajaram, 2007). Because of their high instability, micro-
satellites are believed to serve a functional role in genome evolu-
tion (Tautz et al., 1986). It has been shown that microsatellites
are associated with various genetic diseases (Usdin, 2008), includ-
ing Huntington’s disease and spinobulbar muscular atrophy (Li
et al., 2004). Some microsatellites are related to bacterial patho-
genesis and virulence, and can increase the antigenic variance to
escape from the host immune response (Li et al., 2004; Mrazek
et al., 2007).
However, despite widespread distribution and functional signif-
icance in genomes, little is known about distribution rules of
microsatellites in viral genomes. Numerous polymorphic microsat-
ellites were detected in human cytomegalovirus (HCMV), herpes
simplex virus type 1 (HSV-1), and Ostreid herpesvirus 1 (OsHV-1)
genomes (Davis et al., 1999; Deback et al., 2009; Segarra et al.,
2010). Microsatellites have been also observed in genomes of epi-
demic human respiratory adenovirus, influenza virus, and Sin
Nombre virus (Houng et al., 2009; Mudunuri et al., 2009). Our
1567-1348/$ - see front matter ? 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.meegid.2011.06.012
Abbreviations: HCV, Hepatitis C virus; SSRs, simple sequence repeats; HIV-1,
Human Immunodeficiency Virus Type 1.
⇑Corresponding authors.
E-mail addresses: zhongyang@hnu.cn (Z. Tan), zgming@hnu.cn (G. Zeng).
Infection, Genetics and Evolution 11 (2011) 1646–1654
Contents lists available at ScienceDirect
Infection, Genetics and Evolution
journal homepage: www.elsevier.com/locate/meegid
Page 2
recent report comprehensively analyzed microsatellite distribution
in viral pre-microRNAs, and found microsatellites were extensively
presented in these small non-coding RNA sequences (Chen et al.,
2010). In a previous study performed by us, Human Immunodefi-
ciency Virus Type 1 (HIV-1) was thought to be an excellent system
to study evolution and roles of viral microsatellites, and this anal-
ysis indicated microsatellites were very short in length and were in
low abundance (Chen et al., 2009). However, there remains much
to be confirmed whether these features from HIV-1 genomes are
suitable for other viruses. Moreover, until recently, there is some
lack of knowledge about mononucleotide repeats and the correla-
tion between genome features and microsatellite distribution in
viral genomes. HCV has a positive sense RNA genome that is com-
posed of a single open reading frame, mostly containing six geno-
types (genotypes 1, 2, 3, 4, 5 and 6) (Kuiken et al., 2005). Genomic
diversity of HCV can provide a very good opportunity to address
abovementioned problems.
In the present study, we present a comprehensive analysis of
the distribution of microsatellites over 6 nt in 54 complete HCV
genomes which belong to six genotypes. We analyzed distribution
of mononucleotide repeats and explored the correlation between
genome features and microsatellite distribution using linear
regression analyses for the first time. We also compared our results
and other organisms, and discussed their similarity and difference.
2. Materials and methods
2.1. HCV genome sequences
We downloaded 54 complete HCV genomes from GenBank
(http://www.ncbi.nlm.nih.gov). Analyzed sequences fall into six
genotypes. The availability of complete HCV genomes from differ-
ent genotypes is non-identical. The availably complete genomes
from genotypes 1, 2, 3 and 6 are significantly more than those from
genotypes 4 and 5. At the time of writing, only one complete gen-
ome was available for genotypes 4 and 5, respectively. Thus, the se-
lected number of complete HCV genomes from various genotypes
is different in this study. Detailed information on these genomes
was given in Table 1.
2.2. Identification of microsatellites
We extracted imperfect mononucleotide repeats with lengths of
6 nt or more in each of surveyed HCV genomes using an IMEx pro-
gram (Mudunuri and Nagarajaram, 2007). Perfect di-, tri-, tetra-,
penta- and hexanucleotide repeats were detected by use of SSRIT
(Temnykh et al., 2001); these microsatellites were repeating three
times or more. These parameters were based on (i) Rajendrakumar
et al. selected these significant threshold values for analyzing the
microsatellite distribution in organellar genomes of rice (Rajend-
rakumar et al., 2007), (ii) we have made a survey of microsatellites
with repeat number P3 in 81 completed HIV-1 genomes (Chen
et al., 2009), and (iii) most microsatellites were very short in
viruses. Each sequence was analyzed separately. To differentiate
coding regions, 30-UTR and 50-UTR, the existing annotation (the
‘‘CDS’’ features) were extracted from the corresponding GenBank
files. The starting position of CDS in U89019 (S16) is significantly
different from other 53 sequences. This may be a result of a cor-
rectly assigned start codon. Thus, the locations of microsatellites
in S16 genome are not given. Note that a small number of micro-
satellites locate in overlap region of coding and non-coding regions
in several HCV genomes, and these microsatellites are reported as
non-coding microsatellites in the present study.
2.3. Calculation of the expected number of microsatellites
We compared the observed number of microsatellites (O) with
the expected number of microsatellites (E) in the form of a ratio of
O/E in order to evaluate whether microsatellites were over- or
underrepresented in HCV genome sequences. To assess statistical
significance of the microsatellite representation (O/E), we used Z-
scores defined as (O ? E)/pE (Mrazek, 2006). The expected number
of microsatellite composed of Mt(M is motif of the microsatellite
with repeat number of t, and its length is L) in a genome of length
G was calculated as given by (de wachter, 1981):
Table 1
List of analyzed HCV genomes.
No.Acc. No.Genotype Size (nt) GC%No.Acc. No.Genotype Size (nt)GC%
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
S16
S17
S18
S19
S20
S21
S22
S23
S24
S25
S26
S27
AB520610
AF271632
EF621489
EU155241
EU256080
FJ024275
FJ390395
FN435993
L02836
M62321
M67463
M84754
S62220
U01214
U16362
U89019
AB030907
AB031663
AB047639
AF169002
AF169003
AF169004
AF169005
AF177036
AF238481
AF238482
AF238483
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
9598
9618
9599
9275
9329
9269
9264
9613
9400
9401
9416
9425
9440
9446
9415
9400
9654
9488
9678
9661
9693
9653
9700
9711
9416
9416
9416
58.5
58.3
58.6
58.9
58.5
59
58.9
57.9
57.9
58.9
58.8
58.4
58.9
58.2
58.5
58.3
56.1
55.4
58.3
57.4
57.2
57.5
57.3
56.9
57.8
57.8
57.4
S28
S29
S30
S31
S32
S33
S34
S35
S36
S37
S38
S39
S40
S41
S42
S43
S44
S45
S46
S47
S48
S49
S50
S51
S52
S53
S54
AF238484
AF238485
AF238486
D50409
NC_009823
D17763
D28917
D49374
D63821
NC_009825
NC_009826
AY878650
AY878651
DQ278891
DQ278893
DQ278894
DQ314805
DQ480519
DQ480520
DQ480522
DQ480523
DQ480524
DQ835770
EF424627
EF424628
EF424629
NC_009827
2
2
2
2
2
3
3
3
3
4
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
9416
9416
9416
9513
9711
9456
9454
9444
9450
9355
9343
9388
9373
9440
9430
9441
9468
9358
9358
9358
9358
9361
9447
9450
9453
9459
9628
57.6
57.7
56
57.7
56.9
55.6
55.8
56
54.9
56.2
57.1
56
55.7
55.9
55.7
55.4
55.5
56.4
56.5
56.4
56.2
56.9
55.9
57.2
56
56.3
55.4
M. Chen et al./Infection, Genetics and Evolution 11 (2011) 1646–1654
1647
Page 3
ExpðMtÞ ¼ fðMÞt½1 ? fðMÞ?½G0ð1 ? fðMÞÞ þ 2L?
G0¼ G ? tL ? 2L þ 1
ð1Þ
ð2Þ
where Exp(Mt) is the expected number of Mt, and f(M) is the prob-
ability of M.
2.4. Statistical analysis
We used SPSS 18.0 and EXCEL 2007 to perform all statistical
analysis. Linear regression was used to reveal the correlation be-
tween the number, relative abundance, relative density of micro-
satellites and two genome features (genome size and GC content).
3. Results and discussion
Different studies used different parameters to search a genome
for microsatellites. Power et al. showed the number of microsatel-
lites significantly changed by increasing or decreasing the thresh-
old value of repeat units (Power et al., 2009). Thus, it is very
important to select an appropriate threshold value of repeat
length. Previous studies have selected threshold repeat length of
6 nt in HIV-1 genomes whose genome sizes are very similar with
HCV genomes (Chen et al., 2009). Likewise, in the present study,
we also analyzed microsatellites over 6 nt in 54 completely se-
quenced HCV genomes. Until now, there are no studies which have
systematically addressed and compared the distribution of mono-
Table 2
Occurrence of microsatellites among coding and non-coding regions for HCV genomes.
No. Mononucleotide repeatsMicrosatellites2–6
Coding50-UTR30-UTRGenome-wide Coding50-UTR 30-UTR Genome-wide
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
S16
S17
S18
S19
S20
S21
S22
S23
S24
S25
S26
S27
S28
S29
S30
S31
S32
S33
S34
S35
S36
S37
S38
S39
S40
S41
S42
S43
S44
S45
S46
S47
S48
S49
S50
S51
S52
S53
S54
20
18
13
18
20
15
19
18
19
20
14
19
18
17
25
N/A
18
12
16
12
17
13
14
15
16
16
14
17
18
16
17
15
13
12
11
15
9
17
8
9
8
9
8
16
13
11
8
11
12
13
11
19
10
10
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
N/A
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
4
1
1
0
0
0
0
2
0
0
0
1
1
1
0
N/A
2
1
1
1
1
1
1
1
0
0
0
0
0
0
1
1
2
2
0
1
1
0
0
0
1
1
1
1
0
0
0
0
0
1
0
0
1
1
25
20
15
19
21
16
20
21
20
21
15
21
20
19
26
18
21
14
18
14
19
15
16
17
17
17
15
18
19
17
19
17
16
15
12
17
11
18
9
10
10
11
10
18
14
12
9
12
13
15
12
20
12
12
19
19
24
19
17
19
25
22
26
18
23
22
20
17
24
N/A
20
27
26
24
26
27
18
23
22
24
27
26
26
28
25
23
24
29
23
23
20
25
21
26
21
26
26
16
24
23
24
33
21
25
28
25
22
26
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
N/A
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
2
2
0
0
0
0
3
0
0
0
0
0
0
0
N/A
2
0
3
3
4
3
3
3
0
0
0
0
0
0
0
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
21
21
26
19
17
19
25
25
26
18
23
22
20
17
24
23
22
27
29
27
30
30
21
26
22
24
27
26
26
28
25
26
24
29
23
23
20
25
21
26
21
26
26
16
24
23
24
33
21
25
28
25
22
28
N/A, not available (see Section 2).
1648
M. Chen et al./Infection, Genetics and Evolution 11 (2011) 1646–1654
Page 4
nucleotide repeats in any viral genomes. Mononucleotide repeats
are found to strongly affect the local mutation rate (Levinson and
Gutman, 1987). The presence of mononucleotide repeats is thought
to be an important determinant of stability (Ackermann and Chao,
2006). Thus, the analysis of imperfect mononucleotide repeats can
give a good insight into the evolutionary and potential roles of
microsatellites. In this study, we identified imperfect mononucleo-
tide repeats over 6 nt. To compare with our previous results in
HIV-1 genomes where only perfect di-, tri-, tetra-, penta- and hexa-
nucleotide repeats are identified (Chen et al., 2009), we divided
microsatellites into two categories: mononucleotide repeats and
microsatellites2–6(refer to perfect di-, tri-, tetra-, penta- and hexa-
nucleotide repeats). Among Escherichia coli (E. coli), microsatellites
are richer in coding regions than in non-coding regions, because
the bulk of the genome is composed of open reading frames (Chen
et al., 2011b; Gur-Arie et al., 2000). Similarly, coding density of
HCV genome is also very high (Kuiken et al., 2005). To assess
whether microsatellites were also richer in coding regions for
HCV, we detected microsatellites in coding regions, 30-UTR and
50-UTR. Our results clearly showed microsatellites were signifi-
cantly more abundant in coding regions than in non-coding regions
in HCV genomes (Table 2 and Supplementary Tables 1 and 2). As
expected, this result was similar to that of E. coli (Gur-Arie et al.,
2000).
3.1. Mononucleotide repeats
Mononucleotide repeats were observed in each HCV genome,
showing relatively high occurrence. Obviously, poly (G/C) repeats
were significantly more predominant than poly (A/T) repeats in
each complete HCV genome (Table 3). It is generally assumed that
the higher poly (G/C) frequencies in the genomes can be attribut-
able to the high GC content of the genomes (Karaoglu et al.,
2005). However, it must be noted that the GC content is only
slightly higher than AT content in each of analyzed sequences,
but leading to a significant difference between the occurrences of
both poly (G/C) and poly (A/T) repeats (t test, p < 0.001). Thus, GC
content had a weak influence on the occurrence of poly (G/C) re-
peats in HCV genomes. Each HCV genome contained 0–6 poly (A/
T) repeats and 8–23 poly (G/C) repeats. Interestingly, in general,
poly (A/T) tracts are more abundant than poly (G/C) tracts in
eukaryotic and prokaryotic genomes (Gur-Arie et al., 2000;
Karaoglu et al., 2005; Toth et al., 2000). For example, the genome
of Saccharomyces cerevisiae showed a significant preference for
poly (A/T) over poly (G/C) (99.5% vs. 0.5%) (Karaoglu et al., 2005).
Mononucleotide repeats were consistently overrepresented in all
surveyed HCV genomes (O/E ranged from 1.21 to 3.19) (Table 3
and Supplementary Table 3). The strongest overrepresentation of
mononucleotide repeats was exhibited by the sequence D28917.
The relative abundance of mononucleotide repeats was similar in
analyzed HCV genomes overall, ranging from 0.96 to 2.76. The
highest relative density of mononucleotide repeats was found in
AF177036 and NC_009823 (23.89 nt/kb), followed by FN435993
(23.82 nt/kb), and the lowest one was in AY878650 (6.07 nt/kb).
Some authors have assessed the relationship between microsatel-
lite content and genome size (Chen et al., 2009; Coenye and
Vandamme, 2005; Karaoglu et al., 2005), showing the total micro-
satellite contents in these organisms are not directly proportional
to the genome sizes (Chen et al., 2009; Karaoglu et al., 2005),
although it is generally inferred that the larger genome owns more
microsatellites than do the smaller one (Hancock, 2002). Moreover,
the correlation between GC content and distribution of mononu-
cleotide repeats is analyzed as well in prokaryotes (Coenye and
Vandamme, 2005). Similarly, we surveyed the correlation between
distribution of mononucleotide repeats (number, relative abun-
dance and relative density) and two genome features (genome size
and GC content). Our results here indicated that genome size did
not significantly correlate with number (R2= 0.0257, p > 0.05)
Table 3
Occurrence, relative abundance and relative density of mononucleotide repeats in analyzed HCV genomes.
No.Repeat type Total MRAa
MRDb
O/Ec
No.Repeat type TotalMRAa
MRDb
O/Ec
ATGCATGC
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
S16
S17
S18
S19
S20
S21
S22
S23
S24
S25
S26
S27
2
0
0
0
1
0
0
1
1
1
0
1
0
0
2
1
1
1
0
0
0
0
0
0
0
1
0
4
1
1
0
0
0
0
2
0
0
0
1
1
1
1
0
3
1
1
2
1
2
2
1
1
0
0
9
8
7
8
9
8
7
10
11
25
20
15
19
21
16
20
21
20
21
15
21
20
19
26
18
21
14
18
14
19
15
16
17
17
17
15
2.60
2.08
1.56
2.05
2.25
1.73
2.16
2.18
2.13
2.23
1.59
2.23
2.12
2.01
2.76
1.91
2.18
1.48
1.86
1.45
1.96
1.55
1.65
1.75
1.81
1.81
1.59
21.05
23.50
15.52
13.05
14.47
11.22
13.60
23.82
13.51
15.00
10.51
15.49
14.62
17.15
17.63
12.77
18.65
10.43
19.22
15.22
23.32
15.02
21.55
23.89
12.53
12.96
11.05
2.99
2.25
1.87
2.14
2.44
1.90
2.34
2.55
2.58
2.85
1.75
2.35
2.40
2.18
2.83
2.18
2.67
2.18
2.27
1.92
2.56
2.15
2.07
2.65
2.17
2.31
1.88
S28
S29
S30
S31
S32
S33
S34
S35
S36
S37
S38
S39
S40
S41
S42
S43
S44
S45
S46
S47
S48
S49
S50
S51
S52
S53
S54
0
0
0
0
0
2
0
0
2
1
3
0
0
0
0
0
0
1
1
1
1
1
0
0
1
0
0
1
2
0
1
1
3
4
1
2
1
0
0
0
1
1
1
2
1
1
0
1
1
2
2
1
2
1
710
7
10
8
9
4
6
8
7
4
10
6
7
6
7
3
7
8
5
6
6
8
8
3
11
18
19
17
19
17
16
15
12
17
11
18
1.91
2.02
1.81
2.00
1.75
1.69
1.59
1.27
1.80
1.18
1.93
0.96
1.07
1.06
1.17
1.06
1.90
1.50
1.28
0.96
1.28
1.39
1.59
1.27
2.12
1.27
1.25
13.17
12.96
12.00
15.24
23.89
13.01
10.79
9.00
12.70
8.55
12.31
6.07
6.93
8.90
8.59
9.53
15.00
9.94
8.66
6.73
8.55
9.40
12.70
11.11
16.29
11.63
15.68
2.26
2.20
2.58
2.35
2.65
2.08
3.19
2.71
2.53
1.95
2.51
1.48
1.58
1.63
1.73
1.58
2.78
1.74
1.49
1.21
1.49
1.61
1.84
1.71
2.88
2.16
2.07
10
7
10
7
7
5
3
6
5
5
3
3
3
3
6
9
4
5
2
4
3
5
7
7
7
3
7
11
11
8
13
10
12
8
8
10
11
10
12
8
8
4
5
3
8
4
5
7
8
7
7
8
7
12
7
9
8
8
9
10
10
11
10
18
14
12
11
9
9
8
12
99
10
9
9
9
8
9
8
12
13
15
12
20
12
12
3
8
aRelative abundance is the total mononucleotide repeats per kb of sequence analyzed.
bRelative density is defined as the total length (nt) contributed by each mononucleotide repeat per kb of sequence analyzed.
cObserved number of mononucleotide repeats/expected number of mononucleotide repeats.
M. Chen et al./Infection, Genetics and Evolution 11 (2011) 1646–1654
1649
Page 5
and relative abundance (R2= 0.0114, p > 0.05) of mononucleotide
repeatsbutsignificantlycorrelated
(R2= 0.5633, p < 0.05) of mononucleotide repeats (Fig. 1). Com-
pared with genome size, GC content was found to be weakly but
significantly correlated with number (R2= 0.3712, p < 0.05) and
relative abundance (R2= 0.3818, p < 0.05) of mononucleotide
with relative density
repeats, except for relative density of mononucleotide repeats
(R2= 0.1375, p > 0.05) (Fig. 2).
3.2. Microsatellite2–6
To compare with our previous results in HIV-1 genomes where
only microsatellites2–6were surveyed (Chen et al., 2009) and to
investigate whether the features of microsatellites2–6from HIV-1
Fig. 1. Relationship between the genome size and the number, relative abundance
and relative density of mononucleotide repeats in analyzed HCV genomes. Relative
abundance is the total mononucleotide repeats per kb of sequence analyzed.
Relative density is defined as the total length (nt) contributed by each mononu-
cleotide repeat per kb of sequence analyzed.
Fig. 2. Relationship between the GC content and the number, relative abundance
and relative density of mononucleotide repeats in analyzed HCV genomes. See Fig. 1
legend.
1650
M. Chen et al./Infection, Genetics and Evolution 11 (2011) 1646–1654
Page 6
are consistent with other viruses, we investigated the presence of
microsatellites2–6in 54 complete HCV genomes from six geno-
types. Our results showed (i) microsatellites2–6were prevalently
present in these surveyed sequences, (ii) with the repeat unit
increasing, the number of repeats became less and less, and (iii)
minor differences for total, relative abundance and relative density
of microsatellites2–6could be seen between diverse HCV genomes,
respectively. These features are very consistent with our previous
results in HIV-1 genomes (Chen et al., 2009).
An overview of the occurrence, relative abundance and relative
density of microsatellites2–6for HCV genomes was shown in
Table 4. For these surveyed HCV genomes, the number of microsat-
ellites2–6ranged from 16 to 33. Comparison between observed
number of microsatellites2–6and expected number of microsatel-
lites2–6based on the formula proposed by de wachter (1981) re-
vealed the ratio of O/E was variable with the range from O.88 to
2.48. DQ480523 had the highest relative abundance of microsatel-
lites2–6(3.53 repeats/kb) whereas DQ314805 had the lowest (1.69
repeats/kb). The relative density of microsatellites2–6was nearly as
equally represented across the 54 complete HCV genomes, regard-
less of whether the sequences are selected from different geno-
types. The highest relative density was 24.26 nt/kb found in the
sequence DQ480523 which is from genotype 6, and the lowest
microsatellite2–6
densitywas
DQ314805 which likewise belongs to genotype 6. The relative den-
sity of microsatellites2–6in most surveyed genomes was smaller
than 20 nt/kb. In sharp contrast to this, the relative density of
microsatellites2–6was 20 nt/kb or more in most analyzed HIV-1
genomes (Chen et al., 2009). The microsatellite2–6with the longest
nucleotide stretch belonged to AF238483 and AF238485, consist-
ing of (GCTCT)3motif of 15 nt. 11 complete HCV genomes investi-
gated contained microsatellites of length P12 nt (Supplementary
Table 4). For the longest microsatellite2–6motifs, only three types
11.72 nt/kbin thesequence
of repeats (tri-, tetra- and pentanucleotide repeat types) were
found in all analyzed HCV genomes; most of longest microsatel-
lite2–6motifs were 9 nt in length, and belonged to trinucleotide re-
peat type. This is drastically different from HIV-1 and fungi
genomes in which the repeat types of the longest microsatellite
motifs are diverse (Chen et al., 2009; Karaoglu et al., 2005). The
plots for correlation between microsatellites2–6distribution (num-
ber, relative abundance and relative density) and genome features
(genome size and GC content) were shown in Figs. 3 and 4. Clearly,
number (R2= 0.0939, p < 0.05) of microsatellites2–6was weakly but
significantly correlated with genome size, whereas relative abun-
dance (R2= 0.0506, p > 0.05) and relative density (R2= 0.0385,
p > 0.05) of microsatellites2–6were not significantly related to gen-
ome size. GC content did not show significant correlation with
number (R2= 0.0706, p > 0.05) and relative density (R2= 0.0582,
p > 0.05) of microsatellites2–6, but had a significant relation to rel-
ative abundance (R2= 0.0724, p < 0.05) of microsatellites2–6.
In the present study, we divided dinucleotide repeats into six
types: AG/GA, GT/TG, AC/CA, CT/TC, AT/TA and CG/GC. Our results
clearly indicated six types of dinucleotide repeats were variable in
number in different HCV genomes, respectively (Supplementary
Table 5). Our previous observation in all 81 complete HIV-1 gen-
omes showed AG/GA repeats were most predominant among dinu-
cleotide repeats (Chen et al., 2009). However, GT/TG repeats were
the most abundant dinucleotide repeat types in more than half of
surveyed HCV genomes, followed by AC/CA repeats (Supplemen-
tary Table 5). Moreover, our results also revealed an additional dif-
ferenceinthetwo species:
microsatellites2–6in complete HCV genomes was lower than that
in HIV-1 genomes (11.72–24.26 nt/kb vs. 16–35 nt/kb; Chen
et al., 2009). An interesting result was that CG/GC repeats were
very predominant in HCV genomes and even could be the most
common dinucleotide repeats in some sequences such as in
overallrelative density of
Table 4
Occurrence, relative abundance, relative density and representation of microsatellites2–6.
No.Di/tri/tetra/penta/hexaa
TotalRAb
RDc
O/Ed
No.Di/tri/tetra/penta/hexaa
TotalRAb
RDc
O/Ed
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
S16
S17
S18
S19
S20
S21
S22
S23
S24
S25
S26
S27
17/4/0/0/0
16/5/0/0/0
21/5/0/0/0
15/4/0/0/0
13/4/0/0/0
15/4/0/0/0
19/6/0/0/0
18/6/1/0/0
19/7/0/0/0
13/5/0/0/0
18/5/0/0/0
16/6/0/0/0
17/3/0/0/0
15/2/0/0/0
18/6/0/0/0
16/6/1/0/0
17/4/1/0/0
23/4/0/0/0
24/5/0/0/0
22/5/0/0/0
23/7/0/0/0
25/5/0/0/0
19/2/0/0/0
22/4/0/0/0
17/4/1/0/0
20/4/0/0/0
22/4/0/1/0
21
21
26
19
17
19
25
25
26
18
23
22
20
17
24
23
22
27
29
27
30
30
21
26
22
24
27
2.19
2.18
2.71
2.05
1.82
2.05
2.70
2.60
2.77
1.91
2.44
2.33
2.12
1.80
2.55
2.45
2.28
2.85
3.00
2.79
3.10
3.11
2.16
2.68
2.34
2.55
2.87
14.38
15.08
18.02
14.23
12.43
13.59
18.35
18.31
19.26
13.51
16.46
16.13
13.88
11.86
17.63
17.45
15.95
18.76
20.15
18.94
21.15
20.41
14.23
17.92
16.36
16.99
19.86
0.88
1.23
1.25
1.13
1.10
0.89
1.32
1.28
1.45
1.25
1.22
1.15
1.03
1.18
1.41
1.48
1.27
1.31
1.23
1.32
1.23
1.49
0.99
1.24
1.06
1.19
1.40
S28
S29
S30
S31
S32
S33
S34
S35
S36
S37
S38
S39
S40
S41
S42
S43
S44
S45
S46
S47
S48
S49
S50
S51
S52
S53
S54
22/4/0/0/0
20/5/0/1/0
23/4/1/0/0
19/6/0/0/0
22/4/0/0/0
17/7/0/0/0
19/9/1/0/0
18/5/0/0/0
20/3/0/0/0
15/5/0/0/0
20/5/0/0/0
17/4/0/0/0
21/5/0/0/0
17/4/0/0/0
21/5/0/0/0
23/3/0/0/0
11/5/0/0/0
17/7/0/0/0
18/4/1/0/0
18/6/0/0/0
24/9/0/0/0
16/5/0/0/0
20/5/0/0/0
21/7/0/0/0
21/4/0/0/0
14/8/0/0/0
21/7/0/0/0
26
26
28
25
26
24
29
23
23
20
25
21
26
21
26
26
16
24
23
24
33
21
25
28
25
22
28
2.76
2.76
2.97
2.63
2.68
2.54
3.07
2.44
2.43
2.14
2.68
2.24
2.77
2.22
2.76
2.75
1.69
2.56
2.46
2.56
3.53
2.24
2.65
2.96
2.64
2.33
2.91
18.48
19.54
20.18
18.29
17.92
17.45
22.11
17.15
15.77
14.86
17.87
14.91
18.24
14.83
18.13
17.69
11.72
17.63
16.67
17.74
24.26
15.28
17.47
20.21
17.14
17.23
19.84
1.32
1.14
1.25
1.23
1.24
1.39
1.97
1.14
1.14
1.36
1.43
1.00
1.07
1.00
1.06
1.24
0.89
1.43
1.44
2.26
2.48
1.47
1.19
1.36
1.25
1.23
1.34
Microsatellites2–6indicates the di-, tri-, tetra-, penta- and hexanucleotide repeats.
aDi/tri/tetra/penta/hexa: number of dinucleotide repeats/trinucleotide repeats/tetranucleotide repeats/pentanucleotide repeats/hexanucleotide repeats, respectively.
bRelative abundance is the total microsatellites2–6per kb of sequence analyzed.
cRelative density is defined as the total length (nt) contributed by each microsatellite2–6per kb of sequence analyzed.
dObserved number of microsatellites2–6/expected number of microsatellites2–6.
M. Chen et al./Infection, Genetics and Evolution 11 (2011) 1646–1654
1651
Page 7
L02836 and D28917. However, the CG/GC repeats are very low in
genomes of human, Drosophila, Caenorhabditis elegans, yeast, fungi,
and HIV-1 (Chen et al., 2009; Karaoglu et al., 2005; Katti et al.,
2001). Trinucleotide repeats were the second abundant repeats
in surveyed HCV genomes. Consistent with dinucleotide repeats,
our analysis also show trinucleotide repeat types were variable
within and between different HCV genotypes. The (ATG)3repeat
was the most prevalent trinucleotide repeat in HCV genomes
except for AB030907, AF238486 and EF424628 (Supplementary
Table 6). Tetranucleotide repeats (TTCT)3and (AGGG)3consisted
of a string of the same character and another different character.
Hence, only a single mutation would be required for a transforma-
tion of one mononucleotide repeat motif into the tetranucleoitde
repeat motif. It is naturally assumed that the two tetranucleotide
repeats (TTCT)3and (AGGG)3can originate from the mutations of
(T)nand (G)n, respectively.Seven sequences (FN435993, U89019,
Fig. 3. Relationship between the genome size and the number, relative abundance
and relative density of microsatellites2–6in analyzed HCV genomes. Relative
abundance is the total microsatellites2–6per kb of sequence analyzed. Relative
density is defined as the total length (nt) contributed by each microsatellite2–6per
kb of sequence analyzed.
Fig. 4. Relationship between the GC content and the number, relative abundance
and relative density of microsatellites2–6in analyzed HCV genomes. See Fig. 3
legend.
1652
M. Chen et al./Infection, Genetics and Evolution 11 (2011) 1646–1654
Page 8
AB030907, AF238481, AF238486, D28917 and DQ480520) were
shown to contain tetranucleotide repeats (Supplementary Table
7), and only two sequences (AF238483 and AF238485) had pen-
tanucleotide repeats (Supplementary Table 8). Among all surveyed
genomes, no hexanucleotide repeats were found. There is evidence
that different taxa show different preference for microsatellite
types (Karaoglu et al., 2005; Toth et al., 2000). For example, the
(GT)nis the most abundant repeat motif in animals and inverte-
brates (Stallings et al., 1991). However, rare work is done to prove
whether this preference is present or not in different complete
genomes from the same species. To date, only a related study is fin-
ished, showing that the most common microsatellite motifs may
be different in diverse completed HIV-1 genomes (Chen et al.,
2009). Clearly, our study also showed the most common microsat-
ellite types changed between analyzed HCV genomes.
3.3. Identification of microsatellite polymorphisms
To compute triplet repeat length polymorphism in the human
transcriptome, Molla et al. first identified triplet repeat blocks
composed of triplet repeats and their corresponding flanking se-
quences in human reference genome (assembly: NCBI36), and then
detected these repeat blocks in genome of James Watson (Molla
et al., 2009). Similar method was used to detect tandem repeat var-
iation in protein-coding regions of human genes (O’Dushlaine
et al., 2005). Recently, a polymorphic microsatellite database has
been constructed for prokaryotes by comparing the genome se-
quences of different strains from the same species (Kumar et al.,
2011). Clearly, these studies employed genome alignment methods
to identify microsatellite polymorphisms. According to the above
methods, we used the sequence S1 (AB520610) as the template
for microsatellite detection in the present study. Other 53 se-
quences (S2-S54) were used to construct the database ‘53sequen-
ces.fna’. In addition, 10 bp of flanking sequence on both sides of
each microsatellite was also extracted. Microsatellite lacking
10 bp of flanking sequence on its both sides was omitted. Microsat-
ellites detected in sequence S1 were defined as the reference
microsatellites. We searched the database for the similar microsat-
ellite regions with the reference microsatellites using their 10 bp of
flanking sequences. We defined a reference microsatellite as hav-
ing polymorphism if the length of the reference repeat block is
non-identical with that of the other sequences in the database,
and this length difference must be a multiple of the repeat unit.
The schematic representation of our method was shown in Supple-
mentary Fig. 1. The prerequisite for the detection of microsatellite
variations by genome alignment is the conservation of the flanking
sequences. However, comparison of flanking sequences from each
microsatellite site in HCV genomes clearly indicated one or more
insertions, deletions and substitutions existed between these
sequences. Low conservation of the flanking sequences from
microsatellite sites in HCV genomes hindered the use of the above
method in the present study. Manual methods might help to iden-
tify microsatellite polymorphism in HCV genomes. Mononucleo-
tide repeats were used as the representative for manually
detecting polymorphisms. The distinction (base composition, cod-
ing density, or other genome features) between the sequences
from different genotypes should be more significant than that be-
tween the sequences from the same genotypes. To avoid errors
resulting from this distinction, we selected the sequences from
genotype 1 for the purpose of this work. However, the starting po-
sition of Coding Sequence (CDS) of U89019 is significantly different
from that of other sequences (Supplementary Table 1). This may be
a result of incorrect annotation in GenBank. Thus, this sequence is
not considered for this analysis and a total of 15 sequences from
genotype 1 are used for this purpose. Each mononucleotide repeat
in a genome is manually evaluated with regards to its flanking gen-
ome sequences and its position relative to starting position of CDS,
whether its counterpart is present or not in the other complete
genome sequences and whether there is any variation in the
microsatellite between these genome sequences. Then, we ob-
served some microsatellite polymorphisms in HCV genomes
(Supplementary Table 9). Clearly, manually estimation showed
microsatellite polymorphisms were present in HCV genomes.
However, it must be noted that manual methods were not rigorous
and rough, and could not completely correctly identify all micro-
satellite polymorphisms.
In conclusion, the study of microsatellites in 54 complete HCV
genomes is the first step towards a better understanding the nat-
ure, evolution and function of viral microsatellites. Similar study
in all sequenced complete viral genomes is in process to investi-
gate whether microsatellites make up an important proportion in
all viral genomes and whether they have any functional signifi-
cance and evolutionary dynamics. Our study showed microsatel-
lites were an important component of complete HCV genomes
and some microsatellites were significantly overrepresented, sug-
gesting they may play important roles in genome organization.
Genome features are weakly correlated with the number, relative
abundance and relative density of microsatellites in these surveyed
genomes. Consistent with HIV-1, we observed a similar distribu-
tion pattern of microsatellites2–6based on relative abundance
and relative density. However, it must be noted that the repeat mo-
tifs varied between HCV genomes. In the present study, all identi-
fied microsatellites are very short. This may be because (i) longer
microsatellites may be more unstable than shorter microsatellites
due to the fact that longer microsatellites have more opportunities
to undergo slipped-strand misparing (Wierdl et al., 1997), and (ii)
longer microsatellites exhibit the downward mutation bias and
short existence time (Harr and Schlotterer, 2000; Karaoglu et al.,
2005). Because of high mutability, microsatellites may be involved
in generating genomic diversity and take part in genome evolving
in eukaryotes as well as in prokaryotes (Ellegren, 2004; Li et al.,
2004). Our analysis showed microsatellites existed extensively in
complete HCV genomes, and microsatellite distribution varied in
these sequences, suggesting microsatellites have a potential for
generating HCV genomic diversity and phenotypic changes (Li
et al., 2004). Other mechanisms including neutral and adaptive
evolution are also demonstrated to play important roles in the
diversification of HCV, which can provide genetic variants for fast
adaption to new selection pressures (Simmonds, 2004). Microsat-
ellite variation may be a useful resource of HCV genome evolution,
possibly helping HCV genome quickly adapt to environmental
changes and counteract the human immune response (Li et al.,
2004; Mrazek et al., 2007).
Acknowledgements
The authors thank the editors and reviewers for very helpful
comments and suggestions. The study was financially supported
byProduction,Educationand
Guangdong Province (2010B090400439), Great program for GMO,
Ministry ofAgricultureofthe
(2009ZX08015-003A), the National Natural Science Foundation of
China (Nos. 50608029, 50978088, 50808073, 51039001), Hunan
Provincial Innovation Foundation for Postgraduate, the National
Basic Research Program (973 Program) (No. 2005CB724203),
Program for Changjiang Scholars and Innovative Research Team
in University (IRT0719), the Hunan Provincial Natural Science
Foundation of China (10JJ7005), the Hunan Key Scientific Research
Project(2009FJ1010), and Hunan Provincial Innovation Foundation
For Postgraduate (CX2010B157).
Research guidingproject,
peopleRepublic ofChina
M. Chen et al./Infection, Genetics and Evolution 11 (2011) 1646–1654
1653
Page 9
Appendix A. Supplementary data
Supplementary data associated with this article can be found, in
the online version, at doi:10.1016/j.meegid.2011.06.012.
References
Ackermann, M., Chao, L., 2006. DNA sequences shaped by selection for stability.
PLoS Genet. 2, e22.
Chen, M., Tan, Z., Jiang, J., Li, M., Chen, H., Shen, G., Yu, R., 2009. Similar distribution
of simple sequence repeats in diverse completed Human Immunodeficiency
Virus Type 1 genomes. FEBS Lett. 583, 2959–2963.
Chen, M., Tan, Z., Zeng, G., 2011a. MfSAT: detect simple sequence repeats in viral
genomes. Bioinformation 6, 171–172.
Chen, M., Tan, Z., Zeng, G., Peng, J., 2010. Comprehensive analysis of simple
sequence repeats in pre-miRNAs. Mol. Biol. Evol. 27, 2227–2232.
Chen, M., Zeng, G., Tan, Z., Jiang, M., Zhang, J., Zhang, C., Lu, L., Lin, Y., Peng, J., 2011b.
Compound microsatellites in complete Escherichia coli genomes. FEBS Lett..
doi:10.1016/j.febslet.2011.03.005.
Coenye, T., Vandamme, P., 2005. Characterization of mononucleotide repeats in
sequenced prokaryotic genomes. DNA Res. 12, 221–233.
Davis, C.L., Field, D., Metzgar, D., Saiz, R., Morin, P.A., Smith, I.L., Spector, S.A., Wills,
C., 1999. Numerous length polymorphisms at short tandem repeats in human
cytomegalovirus. J. Virol. 73, 6265–6270.
de wachter, R., 1981. The number of repeats expected in random nucleic acid
sequences and found in genes. J. Theor. Biol. 91, 71–98.
Deback, C., Boutolleau, D., Depienne, C., Luyt, C.E., Bonnafous, P., Gautheret-Dejean,
A., Garrigue, I., Agut, H., 2009. Utilization of microsatellite polymorphism for
differentiating herpes simplex virus type 1 strains. J. Clin. Microbiol. 47, 533–
540.
Dieringer, D., Schlotterer, C., 2003. Two distinct modes of microsatellite mutation
processes: evidence from the complete genomic sequences of nine species.
Genome Res. 13, 2242–2251.
Ellegren, H., 2004. Microsatellites: simple sequences with complex evolution. Nat.
Rev. Genet. 5, 435–445.
Gur-Arie, R., Cohen, C.J., Eitan, Y., Shelef, L., Hallerman, E.M., Kashi, Y., 2000. Simple
sequence repeats in Escherichia coli: abundance, distribution, composition, and
polymorphism. Genome Res. 10, 62–71.
Hancock, J.M., 2002. Genome size and the accumulation of simple sequence repeats:
implications of new data from genome sequencing projects. Genetica 115, 93–
103.
Harr, B., Schlotterer, C., 2000. Long microsatellite alleles in Drosophila melanogaster
have a downward mutation bias and short persistence times, which cause their
genome-wide underrepresentation. Genetics 155, 1213–1220.
Houng, H.S., Lott, L., Gong, H., Kuschner, R.A., Lynch, J.A., Metzgar, D., 2009.
Adenovirus microsatellite reveals dynamics of transmission during a recent
epidemic of human adenovirus serotype 14 infection. J. Clin. Microbiol. 47,
2243–2248.
Karaoglu, H., Lee, C.M., Meyer, W., 2005. Survey of simple sequence repeats in
completed fungal genomes. Mol. Biol. Evol. 22, 639–649.
Katti, M.V., Ranjekar, P.K., Gupta, V.S., 2001. Differential distribution of simple
sequence repeats in eukaryotic genome sequences. Mol. Biol. Evol. 18,
1161–1167.
Kelkar, Y.D., Tyekucheva, S., Chiaromonte, F., Makova, K.D., 2008. The genome-wide
determinants of human and chimpanzee microsatellite evolution. Genome Res.
18, 30–38.
Kuiken, C., Yusim, K., Boykin, L., Richardson, R., 2005. The Los Alamos hepatitis C
sequence database. Bioinformatics 21, 379–384.
Kumar, P., Chaitanya, P.S., Nagarajaram, H.A., 2011. PSSRdb: a relational database of
polymorphic simple sequence repeats extracted from prokaryotic genomes.
Nucleic Acids Res. 39, D601–D605.
Levinson, G., Gutman, G.A., 1987. Slipped-strand mispairing: a major mechanism for
DNA sequence evolution. Mol. Biol. Evol. 4, 203–221.
Li, Y.C., Korol, A.B., Fahima, T., Nevo, E., 2004. Microsatellites within genes:
structure, function, and evolution. Mol. Biol. Evol. 21, 991–1007.
Molla, M., Delcher, A., Sunyaev, S., Cantor, C., Kasif, S., 2009. Triplet repeat length
bias and variation in the human transcriptome. Proc. Natl. Acad. Sci. USA 106,
17095–17100.
Mrazek, J., 2006. Analysis of distribution indicates diverse functions of simple
sequence repeats in Mycoplasma genomes. Mol. Biol. Evol. 23, 1370–1385.
Mrazek, J., Guo, X., Shah, A., 2007. Simple sequence repeats in prokaryotic genomes.
Proc. Natl. Acad. Sci. USA 104, 8472–8477.
Mudunuri, S.B., Nagarajaram, H.A., 2007. IMEx: imperfect microsatellite extractor.
Bioinformatics 23, 1181–1187.
Mudunuri, S.B., Rao, A.A., Pal lamsetty, S., Mishra, P., Nagarajaram, H.A., 2009. VMD:
viral microsatellite database – a comprehensive resource for all viral
microsatellites. J. Comput. Sci. Syst. Biol. 2, 283–286.
O’Dushlaine, C.T., Edwards, R.J., Park, S.D., Shields, D.C., 2005. Tandem repeat copy-
number variation in protein-coding regions of human genes. Genome Biol. 6,
R69.
Pearson, C.E., Nichol Edamura, K., Cleary, J.D., 2005. Repeat instability: mechanisms
of dynamic mutations. Nat. Rev. Genet. 6, 729–742.
Power, P.M., Sweetman, W.A., Gallacher, N.J., Woodhall, M.R., Kumar, G.A., Moxon,
E.R., Hood, D.W., 2009. Simple sequence repeats in Haemophilus influenzae.
Infect. Genet. Evol. 9, 216–228.
Rajendrakumar, P., Biswal, A.K., Balachandran, S.M., Srinivasarao, K., Sundaram,
R.M., 2007. Simple sequence repeats in organellar genomes of rice: frequency
and distribution in genic and intergenic regions. Bioinformatics 23, 1–4.
Segarra, A., Pepin, J.F., Arzul, I., Morga, B., Faury, N., Renault, T., 2010. Detection and
description of a particular Ostreid herpesvirus 1 genotype associated with
massive mortality outbreaks of Pacific oysters, Crassostrea gigas, in France in
2008. Virus Res. 153, 92–99.
Simmonds, P., 2004. Genetic diversity and evolution of hepatitis C virus – 15 years
on. J. Gen. Virol. 85, 3173–3188.
Stallings, R.L., Ford, A.F., Nelson, D., Torney, D.C., Hildebrand, C.E., Moyzis, R.K., 1991.
Evolution and distribution of (GT)n repetitive sequences in mammalian
genomes. Genomics 10, 807–815.
Tautz, D., Trick, M., Dover, G.A., 1986. Cryptic simplicity in DNA is a major source of
genetic variation. Nature 322, 652–656.
Temnykh, S., DeClerck, G., Lukashova, A., Lipovich, L., Cartinhour, S., McCouch, S.,
2001. Computational and experimental analysis of microsatellites in rice (Oryza
sativa L.): frequency, length variation, transposon associations, and genetic
marker potential. Genome Res. 11, 1441–1452.
Toth, G., Gaspari, Z., Jurka, J., 2000. Microsatellites in different eukaryotic genomes:
survey and analysis. Genome Res. 10, 967–981.
Usdin, K., 2008. The biological effects of simple tandem repeats: lessons from the
repeat expansion diseases. Genome Res. 18, 1011–1019.
Wierdl, M., Dominska, M., Petes, T.D., 1997. Microsatellite instability in yeast:
dependence on the length of the microsatellite. Genetics 146, 769–779.
1654
M. Chen et al./Infection, Genetics and Evolution 11 (2011) 1646–1654
View other sources
Hide other sources
-
Available from Zhongyang Tan · 10 Nov 2012
-
Available from hnu.cn