Proc. Natl. Acad. Sct. USA
Vol. 76, No. 10, pp. 5269-5273, October 1979
Mathematical model for studying genetic variation in terms of
(molecular evolution/mitochondrial DNA/nucleotide diversity)
MASATOSHI NEI AND WEN-HSIUNG Li
Center for Demographic and Population Genetics, University of Texas Health Science Center, Houston, Texas 77025
Communicated by Motoo Kimura, August 1, 1979
change of restriction sites in mitochondrial DNA is developed.
Formulas based on this model are presented for estimating the
number of nucleotide substitutions between two populations
or species. To express the degree ofpolymorphism in a popula-
tion at the nucleotide level, a measure called "nucleotide di-
versity" is proposed.
A mathematical model for the evolutionary
In recent years a number of authors have studied the genetic
variation in mitochondril DNA (mtDNA) within and between
species by using restriction endonucleases (1-6). An important
finding from these studies is that mtDNA has a high rate of
nucleotide substitution compared with nuclear DNA, and thus
it is suited for studying the genetic divergence of closely related
species (5-7). However, the mathematical theory for analyzing
data from restriction enzyme studies is not well developed. To
our knowledge, the only study is that of Upholt (8).
A restriction endonuclease recognizes a specific sequence of
nucleotide pairs, generally four or six pairs in length, and cleaves
it. Therefore, if a circular DNA such as mtDNA has m such
recognition (restriction) sites, it is fragmented into m segments
after digestion by this enzyme. The number and locations of
restriction sites vary with nucleotide sequence. The higher the
similarity of the two DNA sequences compared, the closer the
cleavage patterns. Therefore, it is possible to estimate the
number of nucleotide substitutions between two homologous
DNAs by comparing the locations of restriction sites. Similarly,
the number of nucleotide substitutions may be estimated from
the proportion of DNA fragments that are common to two or-
ganisms. Upholt (8) studied these two problems, but his for-
mulation is not general and seems to involve some errors. Fur-
thermore, Upholt paid no attention to the apparently high
degree of heterogeneity of DNA sequences within populations
(5). When the genetic divergence between closely related
species is to be studied, it is necessary to eliminate the effect of
The purpose of this paper is to develop a more rigorous
mathematical model of genetic divergence ofDNA and present
a statistical method for analyzing data from restriction enzyme
studies. In the first four sectionswe shall either assume that there
is no polymorphism within populations or consider the genetic
divergence between a pair of organisms (individuals) only. The
assumption of no polymorphism will be removed in the fifth
Evolutionary change of restriction sites
Under certain circumstances it is possible to map restriction sites
in DNA. Once these restriction sites are determined for two
different organisms, the proportion of sites shared by them can
The publication costs of this article were defrayed in part by page
charge payment. This article must therefore be hereby marked "ad-
vertisement" in accordance with 18 U. S. C. §1734 solely to indicate
be computed. This proportion is expected to decline as the or-
ganisms'DNA sequences diverge. Before studying this problem,
however, we consider the evolutionary change of restriction
sites in a single population.
Consider a mtDNA of mT nucleotide pairs with a G+C
content of g. We note that in many vertebrate species mT is
about 16,000. If all nucleotides are randomly distributed in the
DNA sequence, the expected frequency of restriction sites with
r nucleotide pairs is
a = (g/2)r"[(1 -g)/2]r2
in which r1 and r2 are the number of guanines (G) plus cytosines
(C) and the number of adenines (A) plus thymines (T) in the
restriction site, respectively, and r1 + r2=r. (We consider only
those restriction enzymes that recognize a unique sequence.)
For example, if g
0.44 and mT = 16,000, the expected fre-
quency of restriction site G-A-A-T-T-C (EcoRI) is 0.0003 and
the expected total number (n) of restriction sites ismTa = 4.8.
Because a is generally small and mT is large, n follows the
Poisson distribution with mean m'a.
We now study the evolutionary change of the number of
restriction sites in mtDNA. Let n(t) be thenumber of restriction
sites at time t and n(0) =no, We make twoassumptions: (i) The
expected G+C content stays constant and (ii) nucleotide sub-
stitution occurs randomly and follows the Poisson process with
a rate of substitution of X per unit time (year or generation). We
note that as time goes-on the original sites will gradually dis-
appear while new sites will be formed. Thus, n(t) can be written
as n1(t) + n2(t), in whichnI(t)denotes the number of original
sites that remain unchanged and n2(t) that of new sites. Occa-
sionally new sites may be formed at a position where the re-
striction site sequence once existed but disappeared by muta-
tion. These new sites are included in n2(t) rather than innI(t).
Under our assumptions the probability that an original re-
striction site remains, unchanged by time t is P = e-rxt.
Therefore, the expectation ofnl(t) is noe-rXt. The expectation
of n2(t) can be obtained in the following way. Consider a ran-
domly chosen sequence of r nucleotide pairs. The probability
that this sequence has undergone one or more nucleotide sub-
stitutions by time t is 1- P. We assume that nucleotide sub-
stitution produces a new random sequence of nucleotides. Then,
the probability that a new restriction site is formed at this po-
sition is a(1 - P). Because there are mT possible sequences in
the entire DNA, the expected value ofn2(t) ismTa(l-P). This
formula can also be derived by a more rigorous but tedious
method. At any rate, the expectation [E(n)] of n(t) becomes
E(n) =noP + mTa(l-P).
As expected, E(n) stays constant if no = mTa.
The variance [V(n)] of n(t) is obtained by noting that n
binomially distributed, whereas n2 follows the Poisson distri-
bution. Because ni and n2 are independent, we have
V(n) =noP(l -P) + mTa(l-P).
Proc. Natl. Acad. Sci. USA 76 (1979)
In the above formulation we have regarded the original re-
striction sites restored by backward mutations as new sites. For
our purpose, however, it is better to regard them as identical
with the original sites. In this case we need a slightly different
formulation. We first consider the probability (Pi) that the
nucleotide at a particular site at time t is the same as that of t
= 0. If we assume that the mutation rate is the same for all di-
rections among the four nucleotides, the recurrence formula
forpt is given by
Pt+I = (1 - X)Pt + /3AX(l-Pt).
The continuous time solution of this equation with the initial
conditionpo= 1 gives
Pt =(1+ 3e-4Xt/3)/4.
For a restriction site to exist at the original position, all of the
r nucleotides must be identical with the original ones. Thus, the
probability that a restriction site exists at the original position
at time t is P =Pt.The mean and variance ofnlIare thengiven
by nfOP and nfOP(I -P), respectively, with the newly defined
P. In practice, however, P = pr is close to e-rXt unless Xt is
larger than about 0.15. On the other hand, n2 again follows the
Poisson distribution with the mean and variance of (mT-
no)a(l - P)/(1 - a).
DNA divergence between two populations
Let us now considerDNA divergence between two evolutionary
lineages or populations X and Y. We assume that the mtDNAs
in the two populations were derived from a common ancestral
DNA sequence at time 0. Letnx,and nX2 be the number of
ancestral restriction sites and the number of new sites in pop-
ulation X, respectively, with nx = nxi + nX2, and let nyi, ny2,
and ny be the-corresponding values in population Y. We denote
the number of identical sites shared by the two populations by
nxy. We assume that all identical sites are those that remain
unchanged from the common ancestor. Theoretically, new
mutations may produce identical sites, but the contribution of
new mutations is not so important unless Xt is large, as will be
discussed elsewhere. At any rate, under the present assumption
nxy follows a binomial distribution, and the mean and variance
of nXy are given by n0p2 and noP2(1-p2), respectively, in
which P is either e-rxt or the rth power of Pt in Eq. 5.
On the other hand, the proportion of ancestral restriction sites
that remain unchanged in both lines is S = nxy/nfo. The mean
and variance of S are given by
V(S) =P2(1 -P2)/no.
Therefore, if we use P = e-rxt, the mean number of nucleotide
substitutions per nucleotide site (5 = 2Xt) is given by
This relationship is identical with Upholt's (8). On the other
hand, if we use thePt given by Eq. 5,we have
s = -(3/2) In [(491/2-r-1)/31.
To apply Eq. 8 or Eq. 9 to real data, S must be estimated.
Brown et al. (6) used nxy/(nx + ny -nxy) as an estimate of
S, but this gives an underestimate of S. If no is known, S may
be estimated by nfxy/nfo. In practice, of course, it is not known.
However, if we note E(nfo)=E(nx)=E(ny)
E(no) refers to the mean of replicate values ofno, (nx + ny)/2
may be used as an estimator of no. Therefore, S may be esti-
S = 2nxy/(xx + ny).
E(n), in which
Although it is not clear from their description, Upholt and
Dawid (2) seem to have used this formula.
We now investigate the statistical properties of this estimator.
Using the Taylor expansion and neglecting the third- and
higher-order terms, we obtain
E(nx) + E(ny)
[E(nx) + E(ny)]2
2Cov(nxynx + ny)
+2E(nxy)V(nx + ny)
approximately. Because nX and ny change independently,
V(nx + ny) = 2V(n). We also note that Cov(nxy,nx + ny) =
2Cov(nxy,nx). Furthermore, V(n) = E(n)(1-p2) if we note
E(no) = mTa in Eq. 3. It can also be shown that Cov(nxynx)
= noP2(1- P), which is E(n)P2(1- P) when no = E(n).
E(g) = p2-p2(1 -P)2/[2E(n)].
This indicates that . is an underestimate of p2 but the bias is
generally small when E(n) is fairly large.
The approximate variance of S can be obtained in the same
way. If we replace p2 by S and E(n) by li = (nx + ny)/2 in the
variance obtained, it becomes
This formula may be used for estimating the variance of. from
data. In practice, the second term in the brackets of Eq. 12 is
generally small compared with the first term.
Because S can be estimated by 5, the estimate (5) of 6 may
be obtained by replacing S in Eq. 8 or Eq. 9 by S. The large-
sample variance of 6 obtained by Eq. 8 is given by
approximately, in which V(g) is given by Eq. 12. On the other
hand, the variance of 3 obtained by Eq. 9 is
V(6) = [81 1/TV(S)]/[(4sl/2r- 1)r.]2.
The above two formulas indicate that the variance of S is
large when ii is small. Therefore, it is important to increase the
reliability of . by using many different restriction enzymes.
When enzymes with the same r value are used, we can addAeach
of nx, ny, and nxy for all enzymes and then compute a and
V(6). However, when enzymes with different r values are used,
6 should be estimated for each r group and then the average
weighted with the reciprocals of variances should be com-
In the derivation of Eqs. 8 and 9 we have assumed that the
rate of nucleotide substitution (X) is constant over time. How-
ever, these formulas hold regardless of this assumption, pro-
vided nucleotide substitution occurs at random. However, if
the rate is constant, a is linearly related with the time (t) after
divergence between the populations, i.e.,5
be used for estimating t when X is known.
The above formulation depends on the assumption that the
probability of nucleotide substitution is the same for all nu-
cleotide sites. In the case of mtDNA this assumption does not
seem to be satisfied. Indeed, data from DNA hybridization
experiments suggest that the rate of nucleotide substitution
greatly varies among sites (6, 7). Uzzell and Corbin (9) have
shown that in the cytochrome c gene the number of nucleotide
substitutions per nucleotide site follows the negative binomial
distribution when synonymous codons are disregarded. This
suggests that the rate of nucleotide substitution per site follows
the gamma distribution. If we assume that the same distribution
applies to mtDNA, we can evaluate the effect of variation of
2Xt, and thus can
Genetics: Nei and Li
Proc. Natl. Acad. Sci. USA 76 (1979)
substitution rate (A) on the estimate of nucleotide substitutions.
In the following we assume that A is constant over evolutionary
time but varies among restriction sites following the gamma
in which a = X2/Vx, d = X/VA, in which X and VA are the
mean and variance of A, respectively. If we use P = e-TAt, the
mean of S in Eq. 6 becomes
e-2rxtf(A)dX= ta +a2r]J.
At the present time the value of a is not known, but probably
(Y > 1 in most cases. In the cytochrome c gene a has been esti-
mated to be about 2. It is noted that when a > 1 the difference
between Eqs. 8 and 15 is small as long as S is larger than 0.7 but
increases as S declines further (10). If a is known, the average
number of nucleotide substitutions (6 = 2At) should be esti-
mated by using Eq. 15. For example, if a = 2,
6 = (2/r)(1/vS - 1).
changed in both populations is P4(1 - b)2(m-r+l). Because there
are no fragments originally, the proportion of fragments shared
by the two populations is
F = (1/no)
P4(1 - b)2(mi-r+1)
in which mi is the number of nucleotide sites in the ith frag-
In practice, the above formula is not applicable, because no
and mi are not known. However, it is possible to compute the
probability of formation of a fragment of m nucleotides under
the assumption of random nucleotide distribution. It is given
by a(1 - a)m-r/T, in which 1 is the normalizing factor and
The expected proportion of fragments that remain unchanged
in both populations at time t is then given by
F = f P4(1 - b)2(m-r+ ')a(l -a)m-r/T.
Assuming that (mT - r + 1)a is so large that T is close to 1, we
a(1 - b)P4/[a(l - b)2 + b(2 - b)].
This-formula is different from Upholt's. Because a is usually
much smaller than 1 and b = a[1 - P], the above formula can
be approximated by
P4/(3 - 2P).
Using P = e-rAt and 6 = 2At, F can be related to 6. The rela-
tionship between 6 and F is shown in Fig. 1 for r = 4 and 6. ThL
relationship may be used for estimating 6 from F.
To estimate F, we propose the following estimator.
F = 2nxy/(nx + ny),
in which nx and ny are the numbers of fragments in popula-
tions X and Y, respectively, whereas nxy is the number of
fragments shared by the two populations.
In the above formulation, we have not considered back
mutation. This is justified because the "fragment" method can
be used only when 6 is relatively small.
Evolutionary change of DNA fragments
The current experimental method of comparing restriction-site
maps is laborious and may not be suited for a large-scale pop-
ulation survey. A simpler method is to compare the electro-
phoretic patterns ofDNA digested by a restriction endonuclease
between the two species or populations in question. The degree
of genetic divergence of DNA between the two populations is
expected to be correlated with the proportion of DNA frag-
ments shared by them. Let us now study the relationship be-
tween these two quantities.
For a given DNA fragment to be conserved in the evolu-
tionary process, two conditions must be met, as noted by Upholt
(8). (i) Two external restriction sites remain unchanged, and
(ii) no new restriction sites occur within the fragment. The
probability of the first event is obviously p2. The probability
of the second event can be obtained in the following way. Let
m be the number of nucleotides in this fragment. Then there
are m - r + 1 possible sequences of r nucleotides between the
two external restriction sites. As shown before, the probability
for a randomly chosen r-base sequence to become a new re-
striction site by time t is b =a[1 -P]. Thus, the probability that
no new sites are formed in this fragment by time t is (1 -
b)m-r+ 1, and the probability that this fragment remains un-
Number of nucleotide substitutions per site
Relationship between the proportion of shared DNA
fragments (F) and the number of nucleotide substitutions per site
Numbers 1, 2,..., 8 represent descendantDNA sequences.M is the
expected number of nucleotide substitutions for the shortest branch.
In the present simulationM was 8 per 300 nucleotide sites or 100 co-
Evolutionary tree used in the computer simulation.
Genetics: Nei and Li
Proc. Natl. Acad. Sci. USA 76 (1979)
Number of shared restriction sites and shared DNA fragments between
DNA sequences in a computer simulation
The eight DNA sequences represent those given in Fig. 2. Figures above the diagonal are the numbers
of shared restriction sites, whereas those below the diagonal are the number of shared DNA fragments.
Figures on the diagonal refer to numbers of restriction sites for each descendant sequence.
In order to see the accuracy of the theory developed we have
done a computer simulation. In practice, we used artificial
nucleotide sequences generated in the work of Y. Tateno and
M. Nei (unpublished) on molecular taxonomy. In this study a
hypothetical sequence of 6000 nucleotide pairs in a circular
form was used. An ancestral sequence of random nucleotides
was generated by using pseudorandom numbers with an ex-
pected G+C content of 0.5, and from this sequence eight de-
scendant sequences were produced following the evolutionary
tree given in Fig. 2. The number of nucleotide substitutions for
each branch in this figure followed the Poisson distribution with
the mean given along the branch (per 300 nucleotide sites or
100 codons). After generating the eight descendant sequences,
we determined the locations of restriction sites for five different
hypothetical endonucleases in all of them. Each restriction
enzyme was assumed to recognize a particular sequence of four
Identity of Restriction Sites. The total number of restriction
sites for the five "enzymes" in each descendant sequence is
given in Table 1 together with the number of sites shared by
each pair of sequences. Using these data, we estimated S and
6. The results obtained are presented in Table 2. When two or
more sequence comparisons have the same 6 value (e.g., 1-3
vs. 4), the average of bs for all comparisons are presented. The
6 value was estimated by Eqs.8Aand9; the estimate of 6 ob-
tained by Eq. 8 is designated by 61 and that obtained by Eq. 9
Table 1 shows that n is 100 to 115. These values are somewhat
smaller than the expected value of 5 X 23.4 = 117, but the
differences are not statistically significant because the.expected
standard deviation is 10.8. The values of 61 and b6 are also not
far from the expected value of 6 if we consider the large sto-
chastic error to which they are subject. Theoretically, 62 is a
better estimate than 61 as mentioned earlier, but in practice
there is not much difference between the two estimates. In the
comparison of 1-7 vs. 8, 62 (and also 61) is somewhat smaller
than the expected value. This smaller value occured largely
because the proportion of identical sites was affected appre-
ciably by new mutation in this case. Indeed, when we disre-
garded the identical sites due to new mutation, the 62 value was
0.378, which is close to the expected value of 0.373. The effect
of mutation was observed also in the case of smaller 6 values,
but it was not so serious as in the case of 6 = 0.373.
One important finding in the present simulation is that the
estimate of 6 is subject to a large stochastic error when nX nfy,
and nxy are small. For example, when only one type of "re-
striction enzyme" is used, E(n) is 23.4. In this case the 62 value
for 6 vs. 8 took the values of 0.452, 0.348, 0.253, 0.423, and 0.358
for the five different types of "restriction enzymes. used.
Therefore, it is important to use a large number of restriction
enzymes. Of course, the accuracy of 62 depends on the nuniber
of base pairs in the restriction site. The sampling error of 62 is
expected to be smaller for r = 6 than for r = 4 when S is the
Identity ofDNA Fragments. Using data on restriction-site
maps in the eight descendant DNA sequences, we computed
the number of identical DNA fragments that were shared by
each pair of sequences (Table 1). We then estimated F and 6;
the results are presented in Table 2. The estimate of 6 obtained
by this method is designated by 63. It is clear that 63 again
roughly agrees with the expected value. In this case the effect
of mutation on the estimate of 6 is not so large as in the case of
"identical sites" method, because the probability of formation
of identical fragments by mutation is smaller than that of for-
mation of identical restriction sites. However, the sampling
error of 6s is generally larger than that of 6b or 62.
Estimates (61, 62, 63) of the number of nucleotide substitutions
in comparison with the expected numbers (5)
1 vs. 2
1-2 vs. 3
1-4 vs. 5
1-5 vs. 6
1-6 vs. 7
1-7 vs. 8
61,62, and 63 were obtained through Eqs. 8, 9, and 20, respectively. When two or more sequence com-
parisons have the same a value, the averages ofthe estimates are presented. Similarly, S andP are the
averages for all comparisons having the same 6 value. Therefore, 61, 82, and63are notdirectlyobtainable
from the S and P values presented except in the comparison of 1 vs. 2. These results were obtained by
Genetics: Nei and Li
Proc. Natl. Acad. Soi. USA 76(1979)
In population genetics it is customary to measure the- genie
variation of a population in terms of heterozygosity or gene
diversity (11). In the case of mtDNA, however, this measure is
not appropriate, because mtDNA contains many genes and thus
the gene diversity would be close to 1 in many populations; In
this case genie variation may be measured more appropriately
by the average number of nucleotide differences per site be-
tween two randomly chosen DNA sequences. We call this the
index of nucleotide diversity or simply nucleotide diversity,
and denote it by ir. It is defined as
Xr = Exixjrij,
in whichXiis the frequency of the ith sequence in the popi
tion and rijis the number of nucleotide differences per .
cleotide site between the ith and jth sequences.
The nucleotide diversity may be estimated from restriction
enzyme data if we knowxiand 7rij.The value ofirican be
estimated either from S or from F as mentioned above. When
data on restriction-site maps are available, it is also possible to
compute the average proportion of shared sites between two
randomly chosen DNA sequences. It is given by
S = ZxixjSij.
This will give another estimate of ir. That is,
In the preceding sections we presented formulas for esti-
mating the number of nucleotide substitutions between two
populations under the assumption that the effect of intra-
populational variation is negligible. When the populations to
be compared are closely related, this assumption will not gen-
erally be satisfied. In this case the intrapopulational variation
should be subtracted from the total interpopulational differ-
Let xi and yi be the frequencies of the ith restriction-site
sequence in populations X and Y, respectively. Then, the ir
values for populations X and Y may be estimated by *x =
Zijxixj;rij and 7y = L jYjysirj, respectively, whereas the av-
erage number of nucleotide differences between two randomly
chosen DNA sequences, one from each of X and Y, may be es-
timated by 7xy =I2ijxiyjrij.Therefore, the estimate of net
nucleotide differences between the two populations is given
As mentioned earlier, *fr nriay be obtained either from S or F.
Another way of estimating a is to use the normalized proportion
of shared sites between X and Y. It is defined as
S = Sxy/V /-7
in which Sx =2;0jxix1Sij, Sy=y2ijyiyjSij,andSxy=2ijxjyjSij.
The 6 value is then given by Eq. 8. This method is analogous
to that of estimating genetic distance from gene frequency data
The theory developed in this paper is dependent on the as-
sumption that all nucleotides are distributed at random over
the DNA sequences with a given G+C content. Available data
suggest that this assumption is not always satisfied. Brown (12)
has shown that the contents of thymines and guanines in the
heavy strand of mtDNA are considerably different from those
of the light strand in man, green monkey, and mouse. However,
because we are concerned with the evolutionary change of
mtDNA, the nonrandom distribution would not affect our es-
timate of nucleotide substitutions seriously unless it is ex-
At the present time the magnitude of nucleotide diversity
(-r) in natural populations is not well known. The Peromyscus
polinotus data of Avise et al. (5) suggest that if is of the order
of 0.01, whereas in man it seems to be of the order of 0.002 (6).
This quantity is expected to vary from population to population
even in the same species. Therefore, it is important to make
correction for this factor in the estimation of the degree of
nucleotide divergence between closely related species.
Theoretically, it is possible to express nucleotide diversity Xr
in terms of the mutation rate per nucleotide site per host gen-
eration (ju) and the effective population size (13-15). In the case
of mitochondria, which are maternally inherited, ir is ap-
proximately given by 2NmA, in which Nm is the number of
female adult individuals. We note that there is little genetic
heterogeneity among mtDNAs of one host individual in
mammals. On the other hand, the average heterozygosity for
nuclear genes may be expressed as H=4Nnv/(4Nnv + 1), in
which Nn is the effective population size for nuclear genes and
equal to the number of both male and female individuals, and
v is the mutation rate per gene. In P. polinotus, H has been
estimated to be 0.08 for isozyme data (16). If we assume that
an average structural gene consists of 1000 nucleotide pairs and
only 1/10th of nucleotide variation in structural genes is de-
tectable by electrophoresis, the average nucleotide difference
per site between two randomly chosen nuclear genes becomes
0.0008. Therefore, it seems that mtDNA is much more variable
than structural genes in nuclear DNA. This conclusion is dif-
ferent from Langley and Shah's (17) that they are almost
equally variable in Drosophila.
We thank W. M. Brown and A. C. Wilson for their valuable com-
ments on the manuscript. This study was supported by research grants
from the National Science Foundation and the National Institutes of
Potter, S. T., Newbold, J. E., Hutchison, C. A. & Edgell, M. H.
(1975) Proc. Nati. Acad. Sci. USA 72, 4496-4500.
Upholt, W. B. & Dawid, I. B. (1977) Cell 11, 571-583.
Levings, C. S. & Pring, D. R. (1977) J. Hered. 68,350-354.
Parker, R. C. & Watson, R. M. (1977) Nucleic Acids Res. 4,
Avise, J. C., Lansman, R. A. & Shade, R. 0. (1979) Genetics 92,
Brown, W. M., George, M. & Wilson, A. C. (1979) Proc. Natl.
Acad. Sci. USA 76, 1967-1971.
Dawid, I. B. (1972) Dev. Biol. 29, 139-151.
Upholt, W. B. (1977) Nucleic Acids Res. 4, 1257-1265.
Uzzell, T. & Corbin, K. W. (1971) Science 172, 1089-1096.
Nei, M. (1980) Proceedings of the XIV International Congress
of Genetics, Moscow, U.S.S.R., in press.
Nei, M. (1975) Molecular Population Genetics and Evolution
Brown, W. M. (1976) Dissertation (California Inst. Tech., Pasa-
Kimura, M. (1969) Genetics 61,893-903.
Watterson, G. A. (1975) Theor. Pop. Biol. 7,256-276.
Li, W.-H. (1977) Genetics 85,331-337.
Selander, R. K., Smith, M. H., Yang, S. Y., Johnson, W. E. &
Gentry, J. B. (1971) Stud. Genet. 6, 49-90.
Langley, C. H. & Shah, D. M. (1979) Nature (London), in
Genetics: Nei and Li