Page 1

Proc. Natl. Acad. Sct. USA

Vol. 76, No. 10, pp. 5269-5273, October 1979

Genetics

Mathematical model for studying genetic variation in terms of

restriction endonucleases

(molecular evolution/mitochondrial DNA/nucleotide diversity)

MASATOSHI NEI AND WEN-HSIUNG Li

Center for Demographic and Population Genetics, University of Texas Health Science Center, Houston, Texas 77025

Communicated by Motoo Kimura, August 1, 1979

ABSTRACT

change of restriction sites in mitochondrial DNA is developed.

Formulas based on this model are presented for estimating the

number of nucleotide substitutions between two populations

or species. To express the degree ofpolymorphism in a popula-

tion at the nucleotide level, a measure called "nucleotide di-

versity" is proposed.

A mathematical model for the evolutionary

In recent years a number of authors have studied the genetic

variation in mitochondril DNA (mtDNA) within and between

species by using restriction endonucleases (1-6). An important

finding from these studies is that mtDNA has a high rate of

nucleotide substitution compared with nuclear DNA, and thus

it is suited for studying the genetic divergence of closely related

species (5-7). However, the mathematical theory for analyzing

data from restriction enzyme studies is not well developed. To

our knowledge, the only study is that of Upholt (8).

A restriction endonuclease recognizes a specific sequence of

nucleotide pairs, generally four or six pairs in length, and cleaves

it. Therefore, if a circular DNA such as mtDNA has m such

recognition (restriction) sites, it is fragmented into m segments

after digestion by this enzyme. The number and locations of

restriction sites vary with nucleotide sequence. The higher the

similarity of the two DNA sequences compared, the closer the

cleavage patterns. Therefore, it is possible to estimate the

number of nucleotide substitutions between two homologous

DNAs by comparing the locations of restriction sites. Similarly,

the number of nucleotide substitutions may be estimated from

the proportion of DNA fragments that are common to two or-

ganisms. Upholt (8) studied these two problems, but his for-

mulation is not general and seems to involve some errors. Fur-

thermore, Upholt paid no attention to the apparently high

degree of heterogeneity of DNA sequences within populations

(5). When the genetic divergence between closely related

species is to be studied, it is necessary to eliminate the effect of

this heterogeneity.

The purpose of this paper is to develop a more rigorous

mathematical model of genetic divergence ofDNA and present

a statistical method for analyzing data from restriction enzyme

studies. In the first four sectionswe shall either assume that there

is no polymorphism within populations or consider the genetic

divergence between a pair of organisms (individuals) only. The

assumption of no polymorphism will be removed in the fifth

section.

Evolutionary change of restriction sites

Under certain circumstances it is possible to map restriction sites

in DNA. Once these restriction sites are determined for two

different organisms, the proportion of sites shared by them can

The publication costs of this article were defrayed in part by page

charge payment. This article must therefore be hereby marked "ad-

vertisement" in accordance with 18 U. S. C. §1734 solely to indicate

this fact.

5269

be computed. This proportion is expected to decline as the or-

ganisms'DNA sequences diverge. Before studying this problem,

however, we consider the evolutionary change of restriction

sites in a single population.

Consider a mtDNA of mT nucleotide pairs with a G+C

content of g. We note that in many vertebrate species mT is

about 16,000. If all nucleotides are randomly distributed in the

DNA sequence, the expected frequency of restriction sites with

r nucleotide pairs is

a = (g/2)r"[(1 -g)/2]r2

in which r1 and r2 are the number of guanines (G) plus cytosines

(C) and the number of adenines (A) plus thymines (T) in the

restriction site, respectively, and r1 + r2=r. (We consider only

those restriction enzymes that recognize a unique sequence.)

For example, if g

0.44 and mT = 16,000, the expected fre-

quency of restriction site G-A-A-T-T-C (EcoRI) is 0.0003 and

the expected total number (n) of restriction sites ismTa = 4.8.

Because a is generally small and mT is large, n follows the

Poisson distribution with mean m'a.

We now study the evolutionary change of the number of

restriction sites in mtDNA. Let n(t) be thenumber of restriction

sites at time t and n(0) =no, We make twoassumptions: (i) The

expected G+C content stays constant and (ii) nucleotide sub-

stitution occurs randomly and follows the Poisson process with

a rate of substitution of X per unit time (year or generation). We

note that as time goes-on the original sites will gradually dis-

appear while new sites will be formed. Thus, n(t) can be written

as n1(t) + n2(t), in whichnI(t)denotes the number of original

sites that remain unchanged and n2(t) that of new sites. Occa-

sionally new sites may be formed at a position where the re-

striction site sequence once existed but disappeared by muta-

tion. These new sites are included in n2(t) rather than innI(t).

Under our assumptions the probability that an original re-

striction site remains, unchanged by time t is P = e-rxt.

Therefore, the expectation ofnl(t) is noe-rXt. The expectation

of n2(t) can be obtained in the following way. Consider a ran-

domly chosen sequence of r nucleotide pairs. The probability

that this sequence has undergone one or more nucleotide sub-

stitutions by time t is 1- P. We assume that nucleotide sub-

stitution produces a new random sequence of nucleotides. Then,

the probability that a new restriction site is formed at this po-

sition is a(1 - P). Because there are mT possible sequences in

the entire DNA, the expected value ofn2(t) ismTa(l-P). This

formula can also be derived by a more rigorous but tedious

method. At any rate, the expectation [E(n)] of n(t) becomes

E(n) =noP + mTa(l-P).

As expected, E(n) stays constant if no = mTa.

The variance [V(n)] of n(t) is obtained by noting that n

binomially distributed, whereas n2 follows the Poisson distri-

bution. Because ni and n2 are independent, we have

V(n) =noP(l -P) + mTa(l-P).

[1]

[2]

is

[3]

Page 2

Proc. Natl. Acad. Sci. USA 76 (1979)

In the above formulation we have regarded the original re-

striction sites restored by backward mutations as new sites. For

our purpose, however, it is better to regard them as identical

with the original sites. In this case we need a slightly different

formulation. We first consider the probability (Pi) that the

nucleotide at a particular site at time t is the same as that of t

= 0. If we assume that the mutation rate is the same for all di-

rections among the four nucleotides, the recurrence formula

forpt is given by

Pt+I = (1 - X)Pt + /3AX(l-Pt).

The continuous time solution of this equation with the initial

conditionpo= 1 gives

[4]

Pt =(1+ 3e-4Xt/3)/4.

[5]

For a restriction site to exist at the original position, all of the

r nucleotides must be identical with the original ones. Thus, the

probability that a restriction site exists at the original position

at time t is P =Pt.The mean and variance ofnlIare thengiven

by nfOP and nfOP(I -P), respectively, with the newly defined

P. In practice, however, P = pr is close to e-rXt unless Xt is

larger than about 0.15. On the other hand, n2 again follows the

Poisson distribution with the mean and variance of (mT-

no)a(l - P)/(1 - a).

DNA divergence between two populations

Let us now considerDNA divergence between two evolutionary

lineages or populations X and Y. We assume that the mtDNAs

in the two populations were derived from a common ancestral

DNA sequence at time 0. Letnx,and nX2 be the number of

ancestral restriction sites and the number of new sites in pop-

ulation X, respectively, with nx = nxi + nX2, and let nyi, ny2,

and ny be the-corresponding values in population Y. We denote

the number of identical sites shared by the two populations by

nxy. We assume that all identical sites are those that remain

unchanged from the common ancestor. Theoretically, new

mutations may produce identical sites, but the contribution of

new mutations is not so important unless Xt is large, as will be

discussed elsewhere. At any rate, under the present assumption

nxy follows a binomial distribution, and the mean and variance

of nXy are given by n0p2 and noP2(1-p2), respectively, in

which P is either e-rxt or the rth power of Pt in Eq. 5.

On the other hand, the proportion of ancestral restriction sites

that remain unchanged in both lines is S = nxy/nfo. The mean

and variance of S are given by

g =p2,

V(S) =P2(1 -P2)/no.

Therefore, if we use P = e-rxt, the mean number of nucleotide

substitutions per nucleotide site (5 = 2Xt) is given by

6 =-(InS)/r.

This relationship is identical with Upholt's (8). On the other

hand, if we use thePt given by Eq. 5,we have

s = -(3/2) In [(491/2-r-1)/31.

To apply Eq. 8 or Eq. 9 to real data, S must be estimated.

Brown et al. (6) used nxy/(nx + ny -nxy) as an estimate of

S, but this gives an underestimate of S. If no is known, S may

be estimated by nfxy/nfo. In practice, of course, it is not known.

However, if we note E(nfo)=E(nx)=E(ny)

E(no) refers to the mean of replicate values ofno, (nx + ny)/2

may be used as an estimator of no. Therefore, S may be esti-

mated by

S = 2nxy/(xx + ny).

[6]

[7]

[8]

[9]

E(n), in which

[10]

Although it is not clear from their description, Upholt and

Dawid (2) seem to have used this formula.

We now investigate the statistical properties of this estimator.

Using the Taylor expansion and neglecting the third- and

higher-order terms, we obtain

E(S)_

E(nx) + E(ny)

[E(nx) + E(ny)]2

2E(nxy)

2Cov(nxynx + ny)

+2E(nxy)V(nx + ny)

[E(nx)+E(ny)13

approximately. Because nX and ny change independently,

V(nx + ny) = 2V(n). We also note that Cov(nxy,nx + ny) =

2Cov(nxy,nx). Furthermore, V(n) = E(n)(1-p2) if we note

E(no) = mTa in Eq. 3. It can also be shown that Cov(nxynx)

= noP2(1- P), which is E(n)P2(1- P) when no = E(n).

Therefore,

E(g) = p2-p2(1 -P)2/[2E(n)].

This indicates that . is an underestimate of p2 but the bias is

generally small when E(n) is fairly large.

The approximate variance of S can be obtained in the same

way. If we replace p2 by S and E(n) by li = (nx + ny)/2 in the

variance obtained, it becomes

V(g)=[g(1-g)-g2(l-g1/2)2]/jr

This formula may be used for estimating the variance of. from

data. In practice, the second term in the brackets of Eq. 12 is

generally small compared with the first term.

Because S can be estimated by 5, the estimate (5) of 6 may

be obtained by replacing S in Eq. 8 or Eq. 9 by S. The large-

sample variance of 6 obtained by Eq. 8 is given by

V(6)=[db/dg]2V(S)=V(§)/(r§)2,

approximately, in which V(g) is given by Eq. 12. On the other

hand, the variance of 3 obtained by Eq. 9 is

V(6) = [81 1/TV(S)]/[(4sl/2r- 1)r.]2.

The above two formulas indicate that the variance of S is

large when ii is small. Therefore, it is important to increase the

reliability of . by using many different restriction enzymes.

When enzymes with the same r value are used, we can addAeach

of nx, ny, and nxy for all enzymes and then compute a and

V(6). However, when enzymes with different r values are used,

6 should be estimated for each r group and then the average

weighted with the reciprocals of variances should be com-

puted.

In the derivation of Eqs. 8 and 9 we have assumed that the

rate of nucleotide substitution (X) is constant over time. How-

ever, these formulas hold regardless of this assumption, pro-

vided nucleotide substitution occurs at random. However, if

the rate is constant, a is linearly related with the time (t) after

divergence between the populations, i.e.,5

be used for estimating t when X is known.

The above formulation depends on the assumption that the

probability of nucleotide substitution is the same for all nu-

cleotide sites. In the case of mtDNA this assumption does not

seem to be satisfied. Indeed, data from DNA hybridization

experiments suggest that the rate of nucleotide substitution

greatly varies among sites (6, 7). Uzzell and Corbin (9) have

shown that in the cytochrome c gene the number of nucleotide

substitutions per nucleotide site follows the negative binomial

distribution when synonymous codons are disregarded. This

suggests that the rate of nucleotide substitution per site follows

the gamma distribution. If we assume that the same distribution

applies to mtDNA, we can evaluate the effect of variation of

[11]

[12]

A

[13]

[14]

2Xt, and thus can

5270

Genetics: Nei and Li

Page 3

Proc. Natl. Acad. Sci. USA 76 (1979)

5271

substitution rate (A) on the estimate of nucleotide substitutions.

In the following we assume that A is constant over evolutionary

time but varies among restriction sites following the gamma

distribution

f(A)= [/3a/F(aY)]e-:A~a-l

in which a = X2/Vx, d = X/VA, in which X and VA are the

mean and variance of A, respectively. If we use P = e-TAt, the

mean of S in Eq. 6 becomes

E(S)= 3'

e-2rxtf(A)dX= ta +a2r]J.

[15]

At the present time the value of a is not known, but probably

(Y > 1 in most cases. In the cytochrome c gene a has been esti-

mated to be about 2. It is noted that when a > 1 the difference

between Eqs. 8 and 15 is small as long as S is larger than 0.7 but

increases as S declines further (10). If a is known, the average

number of nucleotide substitutions (6 = 2At) should be esti-

mated by using Eq. 15. For example, if a = 2,

6 = (2/r)(1/vS - 1).

[16]

changed in both populations is P4(1 - b)2(m-r+l). Because there

are no fragments originally, the proportion of fragments shared

by the two populations is

F = (1/no)

P4(1 - b)2(mi-r+1)

i=l

[17]

in which mi is the number of nucleotide sites in the ith frag-

ment.

In practice, the above formula is not applicable, because no

and mi are not known. However, it is possible to compute the

probability of formation of a fragment of m nucleotides under

the assumption of random nucleotide distribution. It is given

by a(1 - a)m-r/T, in which 1 is the normalizing factor and

given by

T=Ea(l-a)m-r =

m=r

The expected proportion of fragments that remain unchanged

in both populations at time t is then given by

(

a)MT-r+

F = f P4(1 - b)2(m-r+ ')a(l -a)m-r/T.

mar

Assuming that (mT - r + 1)a is so large that T is close to 1, we

obtain

F

a(1 - b)P4/[a(l - b)2 + b(2 - b)].

This-formula is different from Upholt's. Because a is usually

much smaller than 1 and b = a[1 - P], the above formula can

be approximated by

P4/(3 - 2P).

Using P = e-rAt and 6 = 2At, F can be related to 6. The rela-

tionship between 6 and F is shown in Fig. 1 for r = 4 and 6. ThL

relationship may be used for estimating 6 from F.

To estimate F, we propose the following estimator.

F = 2nxy/(nx + ny),

in which nx and ny are the numbers of fragments in popula-

tions X and Y, respectively, whereas nxy is the number of

fragments shared by the two populations.

In the above formulation, we have not considered back

mutation. This is justified because the "fragment" method can

be used only when 6 is relatively small.

[18]

Evolutionary change of DNA fragments

The current experimental method of comparing restriction-site

maps is laborious and may not be suited for a large-scale pop-

ulation survey. A simpler method is to compare the electro-

phoretic patterns ofDNA digested by a restriction endonuclease

between the two species or populations in question. The degree

of genetic divergence of DNA between the two populations is

expected to be correlated with the proportion of DNA frag-

ments shared by them. Let us now study the relationship be-

tween these two quantities.

For a given DNA fragment to be conserved in the evolu-

tionary process, two conditions must be met, as noted by Upholt

(8). (i) Two external restriction sites remain unchanged, and

(ii) no new restriction sites occur within the fragment. The

probability of the first event is obviously p2. The probability

of the second event can be obtained in the following way. Let

m be the number of nucleotides in this fragment. Then there

are m - r + 1 possible sequences of r nucleotides between the

two external restriction sites. As shown before, the probability

for a randomly chosen r-base sequence to become a new re-

striction site by time t is b =a[1 -P]. Thus, the probability that

no new sites are formed in this fragment by time t is (1 -

b)m-r+ 1, and the probability that this fragment remains un-

0.8

0.6

F

r=4

0.4

r=

0.2.

0

0.1

0.2

0.3

0.4

Number of nucleotide substitutions per site

Relationship between the proportion of shared DNA

fragments (F) and the number of nucleotide substitutions per site

(6).

FIG.

1.

[19]

F

[20]

[21]

1

2

3

4

5

6

7

8

FIG. 2.

Numbers 1, 2,..., 8 represent descendantDNA sequences.M is the

expected number of nucleotide substitutions for the shortest branch.

In the present simulationM was 8 per 300 nucleotide sites or 100 co-

dons.

Evolutionary tree used in the computer simulation.

Genetics: Nei and Li

Page 4

Proc. Natl. Acad. Sci. USA 76 (1979)

Table 1.

Number of shared restriction sites and shared DNA fragments between

DNA sequences in a computer simulation

2

3

4

Sequence

1

56

7

8

1

2

3

4

5

6

7

8

(105)

68

34

18

12

13

86

66

70

46

51

51

46

45

47

44

39

40

39

41

38

30

28

23

30

30

33

29

27

30

28

31

28

24

(101)

33

18

14

13

(107)

20

11

10

(100)

8

(107)

10

8

3

4

(114)

4

3

3

2

3

2

5

2

7

3

(115)

2

(108)

The eight DNA sequences represent those given in Fig. 2. Figures above the diagonal are the numbers

of shared restriction sites, whereas those below the diagonal are the number of shared DNA fragments.

Figures on the diagonal refer to numbers of restriction sites for each descendant sequence.

Computer simulation

In order to see the accuracy of the theory developed we have

done a computer simulation. In practice, we used artificial

nucleotide sequences generated in the work of Y. Tateno and

M. Nei (unpublished) on molecular taxonomy. In this study a

hypothetical sequence of 6000 nucleotide pairs in a circular

form was used. An ancestral sequence of random nucleotides

was generated by using pseudorandom numbers with an ex-

pected G+C content of 0.5, and from this sequence eight de-

scendant sequences were produced following the evolutionary

tree given in Fig. 2. The number of nucleotide substitutions for

each branch in this figure followed the Poisson distribution with

the mean given along the branch (per 300 nucleotide sites or

100 codons). After generating the eight descendant sequences,

we determined the locations of restriction sites for five different

hypothetical endonucleases in all of them. Each restriction

enzyme was assumed to recognize a particular sequence of four

base pairs.

Identity of Restriction Sites. The total number of restriction

sites for the five "enzymes" in each descendant sequence is

given in Table 1 together with the number of sites shared by

each pair of sequences. Using these data, we estimated S and

6. The results obtained are presented in Table 2. When two or

more sequence comparisons have the same 6 value (e.g., 1-3

vs. 4), the average of bs for all comparisons are presented. The

6 value was estimated by Eqs.8Aand9; the estimate of 6 ob-

tained by Eq. 8 is designated by 61 and that obtained by Eq. 9

by62.

Table 1 shows that n is 100 to 115. These values are somewhat

smaller than the expected value of 5 X 23.4 = 117, but the

differences are not statistically significant because the.expected

standard deviation is 10.8. The values of 61 and b6 are also not

far from the expected value of 6 if we consider the large sto-

chastic error to which they are subject. Theoretically, 62 is a

better estimate than 61 as mentioned earlier, but in practice

there is not much difference between the two estimates. In the

comparison of 1-7 vs. 8, 62 (and also 61) is somewhat smaller

than the expected value. This smaller value occured largely

because the proportion of identical sites was affected appre-

ciably by new mutation in this case. Indeed, when we disre-

garded the identical sites due to new mutation, the 62 value was

0.378, which is close to the expected value of 0.373. The effect

of mutation was observed also in the case of smaller 6 values,

but it was not so serious as in the case of 6 = 0.373.

One important finding in the present simulation is that the

estimate of 6 is subject to a large stochastic error when nX nfy,

and nxy are small. For example, when only one type of "re-

striction enzyme" is used, E(n) is 23.4. In this case the 62 value

for 6 vs. 8 took the values of 0.452, 0.348, 0.253, 0.423, and 0.358

for the five different types of "restriction enzymes. used.

Therefore, it is important to use a large number of restriction

enzymes. Of course, the accuracy of 62 depends on the nuniber

of base pairs in the restriction site. The sampling error of 62 is

expected to be smaller for r = 6 than for r = 4 when S is the

same.

Identity ofDNA Fragments. Using data on restriction-site

maps in the eight descendant DNA sequences, we computed

the number of identical DNA fragments that were shared by

each pair of sequences (Table 1). We then estimated F and 6;

the results are presented in Table 2. The estimate of 6 obtained

by this method is designated by 63. It is clear that 63 again

roughly agrees with the expected value. In this case the effect

of mutation on the estimate of 6 is not so large as in the case of

"identical sites" method, because the probability of formation

of identical fragments by mutation is smaller than that of for-

mation of identical restriction sites. However, the sampling

error of 6s is generally larger than that of 6b or 62.

Table 2.

Estimates (61, 62, 63) of the number of nucleotide substitutions

in comparison with the expected numbers (5)

Restriction sites

5 62

Sequence

comparison

DNA fragments

P

63

6

1 vs. 2

1-2 vs. 3

1-3 vs.4

1-4 vs. 5

1-5 vs. 6

1-6 vs. 7

1-7 vs. 8

0.835

0.648

0.483

0.433

0.362

0.263

0.262

0.045

0.109

0.182

0.209

0.254

0.334

0.335

0.045

0.109

0.185

0.213

0.260

0.344

0.345

0.660

0.319

0.183

0.107

0.099

0.038

0.024

0.036

0.103

0.158

0.213

0.222

0.324

0.376

0.053

0.107

0.160

0.213

0.267

0.320

0.373

61,62, and 63 were obtained through Eqs. 8, 9, and 20, respectively. When two or more sequence com-

parisons have the same a value, the averages ofthe estimates are presented. Similarly, S andP are the

averages for all comparisons having the same 6 value. Therefore, 61, 82, and63are notdirectlyobtainable

from the S and P values presented except in the comparison of 1 vs. 2. These results were obtained by

computer simulation.

5272

Genetics: Nei and Li

Page 5

Proc. Natl. Acad. Soi. USA 76(1979)

5273

Intrapopulational variation

In population genetics it is customary to measure the- genie

variation of a population in terms of heterozygosity or gene

diversity (11). In the case of mtDNA, however, this measure is

not appropriate, because mtDNA contains many genes and thus

the gene diversity would be close to 1 in many populations; In

this case genie variation may be measured more appropriately

by the average number of nucleotide differences per site be-

tween two randomly chosen DNA sequences. We call this the

index of nucleotide diversity or simply nucleotide diversity,

and denote it by ir. It is defined as

Xr = Exixjrij,

ij

[22]

in whichXiis the frequency of the ith sequence in the popi

tion and rijis the number of nucleotide differences per .

cleotide site between the ith and jth sequences.

The nucleotide diversity may be estimated from restriction

enzyme data if we knowxiand 7rij.The value ofirican be

estimated either from S or from F as mentioned above. When

data on restriction-site maps are available, it is also possible to

compute the average proportion of shared sites between two

randomly chosen DNA sequences. It is given by

S = ZxixjSij.

This will give another estimate of ir. That is,

*=(-lnS)/r.

In the preceding sections we presented formulas for esti-

mating the number of nucleotide substitutions between two

populations under the assumption that the effect of intra-

populational variation is negligible. When the populations to

be compared are closely related, this assumption will not gen-

erally be satisfied. In this case the intrapopulational variation

should be subtracted from the total interpopulational differ-

ence.

Let xi and yi be the frequencies of the ith restriction-site

sequence in populations X and Y, respectively. Then, the ir

values for populations X and Y may be estimated by *x =

Zijxixj;rij and 7y = L jYjysirj, respectively, whereas the av-

erage number of nucleotide differences between two randomly

chosen DNA sequences, one from each of X and Y, may be es-

timated by 7xy =I2ijxiyjrij.Therefore, the estimate of net

nucleotide differences between the two populations is given

by

[23]

[24]

/3= ixy-(*x+

y)/2.

[25]

As mentioned earlier, *fr nriay be obtained either from S or F.

Another way of estimating a is to use the normalized proportion

of shared sites between X and Y. It is defined as

S = Sxy/V /-7

in which Sx =2;0jxix1Sij, Sy=y2ijyiyjSij,andSxy=2ijxjyjSij.

The 6 value is then given by Eq. 8. This method is analogous

to that of estimating genetic distance from gene frequency data

(11).

Discussion

The theory developed in this paper is dependent on the as-

sumption that all nucleotides are distributed at random over

the DNA sequences with a given G+C content. Available data

suggest that this assumption is not always satisfied. Brown (12)

has shown that the contents of thymines and guanines in the

[26]

heavy strand of mtDNA are considerably different from those

of the light strand in man, green monkey, and mouse. However,

because we are concerned with the evolutionary change of

mtDNA, the nonrandom distribution would not affect our es-

timate of nucleotide substitutions seriously unless it is ex-

treme.

At the present time the magnitude of nucleotide diversity

(-r) in natural populations is not well known. The Peromyscus

polinotus data of Avise et al. (5) suggest that if is of the order

of 0.01, whereas in man it seems to be of the order of 0.002 (6).

This quantity is expected to vary from population to population

even in the same species. Therefore, it is important to make

correction for this factor in the estimation of the degree of

nucleotide divergence between closely related species.

Theoretically, it is possible to express nucleotide diversity Xr

in terms of the mutation rate per nucleotide site per host gen-

eration (ju) and the effective population size (13-15). In the case

of mitochondria, which are maternally inherited, ir is ap-

proximately given by 2NmA, in which Nm is the number of

female adult individuals. We note that there is little genetic

heterogeneity among mtDNAs of one host individual in

mammals. On the other hand, the average heterozygosity for

nuclear genes may be expressed as H=4Nnv/(4Nnv + 1), in

which Nn is the effective population size for nuclear genes and

equal to the number of both male and female individuals, and

v is the mutation rate per gene. In P. polinotus, H has been

estimated to be 0.08 for isozyme data (16). If we assume that

an average structural gene consists of 1000 nucleotide pairs and

only 1/10th of nucleotide variation in structural genes is de-

tectable by electrophoresis, the average nucleotide difference

per site between two randomly chosen nuclear genes becomes

0.0008. Therefore, it seems that mtDNA is much more variable

than structural genes in nuclear DNA. This conclusion is dif-

ferent from Langley and Shah's (17) that they are almost

equally variable in Drosophila.

We thank W. M. Brown and A. C. Wilson for their valuable com-

ments on the manuscript. This study was supported by research grants

from the National Science Foundation and the National Institutes of

Health.

1.

Potter, S. T., Newbold, J. E., Hutchison, C. A. & Edgell, M. H.

(1975) Proc. Nati. Acad. Sci. USA 72, 4496-4500.

Upholt, W. B. & Dawid, I. B. (1977) Cell 11, 571-583.

Levings, C. S. & Pring, D. R. (1977) J. Hered. 68,350-354.

Parker, R. C. & Watson, R. M. (1977) Nucleic Acids Res. 4,

1291-1300.

Avise, J. C., Lansman, R. A. & Shade, R. 0. (1979) Genetics 92,

279-295.

Brown, W. M., George, M. & Wilson, A. C. (1979) Proc. Natl.

Acad. Sci. USA 76, 1967-1971.

Dawid, I. B. (1972) Dev. Biol. 29, 139-151.

Upholt, W. B. (1977) Nucleic Acids Res. 4, 1257-1265.

Uzzell, T. & Corbin, K. W. (1971) Science 172, 1089-1096.

Nei, M. (1980) Proceedings of the XIV International Congress

of Genetics, Moscow, U.S.S.R., in press.

Nei, M. (1975) Molecular Population Genetics and Evolution

(North-Holland, Amsterdam).

Brown, W. M. (1976) Dissertation (California Inst. Tech., Pasa-

dena, CA).

Kimura, M. (1969) Genetics 61,893-903.

Watterson, G. A. (1975) Theor. Pop. Biol. 7,256-276.

Li, W.-H. (1977) Genetics 85,331-337.

Selander, R. K., Smith, M. H., Yang, S. Y., Johnson, W. E. &

Gentry, J. B. (1971) Stud. Genet. 6, 49-90.

Langley, C. H. & Shah, D. M. (1979) Nature (London), in

press.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

Genetics: Nei and Li