States dogs, except that we also veriﬁed that they were mixtures of several
different breeds by using the Wisdom MX breed test (Mars Inc.).
Microsatellite Genotyping. Two hundred twenty-seven village dogs were typed
on a 96-microsatellite panel described in (15, 16). Microsatellites were ampliﬁed
individually in the presence of a ﬂuorescently labeled universal primer and were
combined post-PCR into sets of 1 to 4 markers for capillary electrophoresis on an
ABI3730xl (ABI). Standard PCR conditions have been described in ref. 15 while
adjustments made to individual markers are listed in Table S5. Each 96-well plate
of samples included a previously genotyped control sample for size veriﬁcation
and binned using GeneMapper 4.0. All genotype calls were checked manually
and markers were scanned individually for the appearance of new alleles outside
the existing bins. After genotyping, 7 markers were excluded on the basis of high
missing rates (⬎20%) or heterozygote deﬁcits (P ⬍ 0.01) in a majority of the 8
regional populations because this suggests the presence of null alleles at these
loci. These data were combined with dogs from 126 breeds previously genotyped
for breed structure studies (15, 16).
SNP Genotyping. One hundred sixty-eight village dogs, 102 mixed-breed dogs,
and dogs from 126 breeds were genotyped using the sequenom iPLEX platform
on a 321-SNP panel described in ref. 20. For each sample, 2
L of dog genomic
DNA was aliquoted into 13 separate microtiter wells for PCR ampliﬁcation. Each
genomic aliquot was ampliﬁed in a total volume of 10
L ⬎45 cycles with up to
28 primer pairs. Each reaction was treated with shrimp alkaline phosphatase for
40 min before heat inactivation. Primer extension reactions were carried out in a
standard thermocycler according to the sequenom iPLEX gold protocol. Each
reaction was desalted before spotting and shooting a SpectroChip on the Com-
pact MassARRAY system (Sequenom). Results were interpreted automatically
using cluster plots with the Histogram tabular view active in SpectroTyper-
TyperAnalyzer (Sequenom). SNP genotypes were loaded into P
LINK version 1.0.4
(21) and 15 SNPs with high missingness (⬎20%) and 1 SNP with an extreme
heterozygote deﬁciency (P ⬍ 10
below Hardy-Weinberg equilibrium) were
removed from further analysis.
Mitochondrial Sequencing. A 680-bp fragment of the mitochondrial D-loop was
ampliﬁed in two overlapping reactions. Region-1 was ampliﬁed using forward
primer H15422: 5⬘-CTCTTGCTCCACCATCAGC-3⬘, and reverse primer L15781: 5⬘-
GTAAGAACCAGATGCCAGG-3⬘. Region-2 was ampliﬁed using forward primer
H15693 5⬘-AATAAGGGCTTAATCACCATGC-3⬘ and reverse primer L16106: 5⬘-
AAACTATATGTCCTGAAACC-3⬘ (primer names correspond to 3⬘ most position of
primer, relative to the published dog mitochondrial genome as in (6)). PCR was
carried out under the following protocol using 10 ng genomic DNA: Denatur-
ation: 94 °C (40 s); annealing: 54 °C (1 min); ampliﬁcation: 72 °C (1 min) for 35 total
cycles followed by a 5 min ﬁnal annealing step at 72 °C. Sequencing reactions
were carried out on an ABI 3730 sequencer using BigDye Terminator chemistry
using the Region-1 reverse primer and Region-2 forward primer. Any reads with
ambiguous bases were rerun in the opposite direction. Sequences were edited,
assembled, and aligned with Sequencher 4.8 (Gene Codes Corporation) and
submitted to GenBank with Sequin (http://www.ncbi.nlm.nih.gov/Sequin/).
Statistical Analyses. We used two approaches—principal component analysis
IGENSOFT v2.0 (22) and clustering analysis with STRUCTURE v2.2 (14)—to
classify individuals as indigenous or non-native and to describe the genetic
structure of indigenous African village dogs and their relationship to dogs from
putatively African breeds. We relied primarily on STRUCTURE to determine the
proportion of non-African admixture present in each village dog because struc-
ture allows for probabilistic assignment of individuals to classes and explicit
modeling of admixture (22). In contrast, PCA makes no assumptions regarding
discrete versus clinal population structure and is well suited for describing the
principal axes of genetic variation between populations. In practice, STRUCTURE
and PCA usually reveal very similar patterns of genetic variation (22).
Before running these clustering methods, we removed markers in high LD
with other markers [r
⬎0.5, see (23)] using Arlequin v3.11 (24) and removed 9
village dogs that showed high relatedness to another dog in the genotyping
⬎ 0.3). All STRUCTURE runs were done using the admixture model
with correlated allele frequencies, no prior population information, and default
parameter settings with a burnin period of 100,000 iterations followed by
500,000 MCMC repetitions, with 10 runs per K, and averaged using CLUMPP
v1.1.2 (25). In contrast, PCA was carried out separately for the SNP and microsat-
ellite markers. Microsatellite loci with n ⬎ 2 alleles were recorded as n-1 biallelic
loci before running PCA in Eigensoft.
Expected heterozygoisty (h) was calculated in Arlequin after removing 10 dogs
that appeared to be r approximately 0.5 related. F
based on SNP loci was computed
with a custom C⫹⫹ implementation of Eq. 6 from (26); microsatellite F
puted using Arlequin. Unless otherwise noted, statistical tests were performed in R
v2.6.2 (27). STRUCTURE results were plotted using Distruct v1.1 (28).
ACKNOWLEDGMENTS. We thank numerous volunteers and animal shelters for
their assistance in gathering samples, including Leonard Kuwale, Ahmed Sa-
maha, Kazhila Chinsembu, Animal Care in Egypt (Luxor), Animal Friends Shelter
(Giza), Albergue de Animales Villa Michelle (Mayaguez), and Albergue La Gab-
riella (Ponce); Jason Mezey, Fengfei Wang, Katarzyna Bryc, and Andy Reynolds
for their assistance with lab and computational resources; Bob Wayne, Niels
Pedersen, Ben Sacks, Sarah Brown, and Peter Savolainen for helpful comments
and discussion; and the intramural program of the National Human Genome
Research Institute. This work supported by the Center for Vertebrate Genomics,
Department of Clinical Sciences and Baker Institute of Animal Health, Cornell
University; National Institutes of Health Center for Scientiﬁc Review and R24
research grant program; NationalScience Foundation Grant 0516310; and a Sloan
Foundation research fellowship.
1. Wayne R (2001) Consequences of domestication: Morphological diversity of the dog. In
The Genetics of the Dog, eds Ruvinsky A, Sampson J (CABI Publishing, Oxon, UK), pp 43–60.
2. Clutton-Brock J (1995) Origins of the dog: Domestication and early history. In The
Domestic Dog, Its Evolution, Behavior and Interactions with People, ed Serpell J CUP,
Cambridge), pp 7–20.
3. Vila` C, Maldonado J, Wayne R (1999) Phylogenetic relationships, evolution, and
genetic diversity of the domestic dog. J Hered 90:71–77.
4. Germonpre´ M, et al. (2009) Fossil dogs and wolves from Palaeolithic sites in Belgium,
the Ukraine and Russia: Osteometry, ancient DNA and stable isotopes. J Arch Sci
5. Vila` C, et al. (1997) Multiple and ancient origins of the domestic dog. Science 276:1687–
6. Savolainen P, Zhang Y, Luo J, Lundeberg J, Leitner T (2002) Genetic evidence for an East
Asian origin of domestic dogs. Science 298:1610 –1613.
7. Coppinger R, Coppinger L (2001) in Dogs: A Startling New Understanding of Canine
Origin, Behavior and Evolution (Scribner, New York).
8. Dobney K, Larson G (2006) Genetics and animal domestication: New windows on an
elusive process. J Zool 269:261–271.
9. Miklosi A (2008) in Dog Behaviour, Evolution, and Cognition (Oxford Univ Press,
Oxford), p 304.
10. Pires A, et al. (2006) Mitochondrial DNA sequence variation in Portuguese native breed
dogs: diversity and phylogenetic afﬁnities. J Hered 97:318 –330.
11. Irion D, Schaffer A, Grant S, Wilton A, Pedersen N (2005) Genetic variation analysis of
the Bali street dog using microsatellites. BMC Genet 6:6.
12. Runstadler J, Angles J, Pedersen N (2006) Dog leucocyte antigen class II diversity and
relationships among indigenous dogs of the island nations of Indonesia (Bali). Aus-
tralia and New Guinea Tissue Antigens 68:418– 426.
13. (2008) Police Zone. Encyclopædia Britannica. Online Ed.
14. Pritchard J, Stephens M, Donnelly P (2000) Inference of population structure using
multilocus genotype data. Genetics 155:945–949.
15. Parker HG, et al. (2004) Genetic structure of the purebred domestic dog. Science
16. Parker H, et al. (2007) Breed relationships facilitate ﬁne-mapping studies: A 7.8-kb
deletion cosegregates with Collie eye anomaly across multiple dog breeds. Genome
17. Gundry R, et al. (2007) Mitochondrial DNA analysis of the domestic dog: Control region
variation within and among breeds. J Forensic Sci 52:562–572.
18. Savolainen P, Leitner T, Wilton A, Matisoo-Smith E, Lundeberg J (2004) A detailed
picture of the origin of the Australian dingo, obtained from the study of mitochondrial
DNA. Proc Natl Acad Sci USA 101:12387–12390.
19. Bjo¨ rnerfeldt S, Webster M, Vila` C (2006) Relaxation of selective constraint on dog
mitochondrial DNA following domestication. Genome Res 16:990 –994.
20. Jones P, et al. (2008) Single-nucleotide-polymorphism-based association mapping of
dog stereotypes. Genetics 179:1033–1044.
21. Purcell S, et al. (2007) PLINK: A tool set for whole-genome association and population-
based linkage analyses. Am J Hum Genet 81:559 –575.
22. Patterson N, Price A, Reich D (2006) Population structure and eigenanalysis. PLoS Genet
23. Kaeuffer R, Re´ ale D, Coltman D, Pontier D (2007) Detecting population structure using
STRUCTURE software: Effect of background linkage disequilibrium. J Hered 99:374 –380.
24. Excofﬁer L, Schneider S (2005) Arlequin ver. 3.0: An integrated software package for
population genetics data analysis. Evol Bioinform Online 1:47–50.
25. Jakobsson M, Rosenberg N (2007) CLUMPP: A cluster matching and permutation
program for dealing with label switching and multimodality in analysis of population
structure. Bioinformatics 23:1801–1806.
26. Weir B, Cockerman C (1984) Estimating F-statistics for the analysis of population
structure. Evolution 38:1358 –1370.
27. R Development Core Team (2008) in R: A language and environment for statistical
computing (R Foundation for Statistical Computing, Vienna, Austria).
28. Rosenberg N (2004) DISTRUCT: A program for the graphical display of population
structure. Mol Ecol Notes 4:137–138.
29. Ewens W (1972) The sampling theory of selectively neutral alleles. Theor Pop Biol
www.pnas.org兾cgi兾doi兾10.1073兾pnas.0902129106 Boyko et al.