ArticlePDF Available

Light whole genome sequence for SNP discovery across domestic cat breeds

Authors:

Abstract and Figures

The domestic cat has offered enormous genomic potential in the veterinary description of over 250 hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus--FeLV, feline coronavirus--FECV, feline immunodeficiency virus--FIV) that are homologues to human scourges (cancer, SARS, and AIDS respectively). However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP) map is required in order to accomplish disease and phenotype association discovery. To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds: female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a SNP validation rate of 99%. These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the development of SNP genotyping platforms for mapping feline diseases.
Content may be subject to copyright.
Mullikin et al. BMC Genomics 2010, 11:406
http://www.biomedcentral.com/1471-2164/11/406
Open Access
DATABASE
© 2010 Mullikin et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
Database
Light whole genome sequence for SNP discovery
across domestic cat breeds
James C Mullikin*
1
, Nancy F Hansen
1
, Lei Shen
2
, Heather Ebling
2
, William F Donahue
2
, Wei Tao
2
, David J Saranga
2
,
Adrianne Brand
2
, Marc J Rubenfield
2
, Alice C Young
1
, Pedro Cruz
1
for NISC Comparative Sequencing Program
1
,
Carlos Driscoll
3
, Victor David
3
, Samer WK Al-Murrani
4
, Mary F Locniskar
4
, Mitchell S Abrahamsen
4
, Stephen J O'Brien
3
,
Douglas R Smith
2
and Jeffrey A Brockman
4
Abstract
Background: The domestic cat has offered enormous genomic potential in the veterinary description of over 250
hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus -- FeLV, feline
coronavirus -- FECV, feline immunodeficiency virus - FIV) that are homologues to human scourges (cancer, SARS, and
AIDS respectively). However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP)
map is required in order to accomplish disease and phenotype association discovery.
Description: To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and
combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled
together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential
false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high
lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds:
female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female
Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs
suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and
domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a
SNP validation rate of 99%.
Conclusions: These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the
development of SNP genotyping platforms for mapping feline diseases.
Background
Along with dogs, the domestic cat enjoys extensive veter-
inary surveillance, more than any other animal. A rich lit-
erature of feline veterinary models reveals a unique
opportunity to explore genetic determinants responsible
for genetic diseases, infectious disease susceptibility,
behavioral and neurological phenotypes, reproduction
and physiology (see [1] and [2] for citations). As a highly
venerated pet this extraordinarily successful domestic
species comprises as many as one billion individuals
worldwide. House cats have become a familiar compan-
ion to people since their original domestication from the
Asian wildcat (Felis silvestris lybica), recently estimated at
approximately 10,000 years ago in the Middle East's Fer-
tile Crescent[3]. In spite of our affection for cats,
advances in clinical resolution of genetic maladies and
complex diseases has been slower than for other species
largely due to a delay in achieving a useful whole genome
sequence of the cat. This has changed recently with the
completion of a draft 1.9X genome sequence of a female
Abyssinian cat named Cinnamon who gave us our first
glimpse and hope of developing the species as an active
player in the genomics era[1,4].
The availability of a sufficiently dense single-nucleotide
polymorphism (SNP) map for a species provides a
resource which enables the power of automated high-
* Correspondence: mullikin@mail.nih.gov
1 Genome Technology Branch and NIH Intramural Sequencing Center, National
Human Genome Research Institute, National Institutes of Health, Bethesda,
Maryland 20892, USA
Full list of author information is available at the end of the article
Mullikin et al. BMC Genomics 2010, 11:406
http://www.biomedcentral.com/1471-2164/11/406
Page 2 of 8
throughput genotyping to associate regions of the
genome to hereditary diseases, quantitative traits, and
other phenotypes. High density SNP maps are available
for many species including human, mouse, dog, chicken,
and rice (for a complete list see [5]). The cat genome has a
moderate collection of SNPs, however the 327,037 avail-
able SNPs are clustered into alternating genomic seg-
ments of high SNP density and homozygous regions;
approximately 60% of Cinnamon's genome is homozy-
gous[4]. This patchwork pattern of the current 1.9X
genome sequence was derived from a single inbred cat.
To supplement feline SNP map and the genome assembly,
we created fosmid libraries and sequenced six additional
cats of different breeds and one African wild cat. These
sequences dramatically improve the SNP map by increas-
ing the total number of useful SNPs and by filling in the
long stretches of genomic homozygosity (~60% of the
genome) reported in the 1.9X genome sequence of Cin-
namon[4]. In order to make the best use of the additional
sequence reads for SNP discovery, we generated a new
assembly; the new reads increase the depth-of-coverage
of the genome by 50%. This translates to 25% more
genomic sequence. Thus this resource provides an
improved cat genome assembly as well as a greatly
improved SNP map (Assembly: NCBI Accession
ACBE00000000, NCBI dbSNP handle CAT_POLY_V17E,
and [6]).
Construction and content
Samples and sequence generation
DNA from six domestic cats and one wildcat was col-
lected and isolated using the PAXgene Blood DNA kit as
per manufactures instructions (Qiagen, Inc., Valencia,
CA.) and stored at -80°C until library construction. The
domestic cat samples represented six different breeds
from pet owners in the Topeka, KS area, and the wildcat
sample was from a captive animal residing at the Audu-
bon Nature Institute, LA. Fosmid libraries were prepared
in vector pCC2fos from each DNA sample with an aver-
age insert size of approximately 37 kb. A total of
3,178,297 paired-end sequencing reads were generated
from the seven libraries (Table 1) using standard methods
with SPRI-based DNA purification and Big Dye termina-
tor sequencing reagents on ABI3730xl instruments.
Reads, assembly and mapping
Assembly of the cat genome was carried out in a similar
method as published for the 1.9X assembly of Cinna-
mon[4], except in this case the Phusion method[7] was
used for contig and scaffold generation. All 11.4M reads
(Table 1) were used, comprising approximately 2.8-fold
read redundancy. Contig N50 size is 4.6 kb, an increase of
nearly 100% from 2.4 kb in the 1.9X assembly, with total
assembled bases at 2.0 Gb, an increase from 1.64 Gb in
the 1.9X assembly by 22% (see Additional file 1 Table S1
for a complete listing of the assembly statistics). Mapping
of the assembly onto chromosomes used a similar
method as previously described[4] with the total
sequence placed onto cat chromosomes at 1.71 Gb, an
increase from 1.36 Gb. Comparison of this new assembly
and the 1.9X assembly to the highly collated NIH Intra-
mural Sequencing Center (NISC) generated sequence
across feline ENCODE regions[8] shows substantial
improvement in coverage from 72% to 84% (Additional
file 2 Table S2), as well as excellent order and orientation
(Additional file 3 and Additional file 4). The 84% coverage
figure is a sampling estimate derived from assessing the
exact coverage of 26.6 Mb of BAC clone sequence from
ENCODE target regions to this assembly, resulting in a
higher level of coverage across these regions than an esti-
mated genome average. For example the estimated
euchromatic genome size is 2.5 Gb; thus the overall
assembled base-pair coverage would be closer to 2.0/2.5
= 80%. The ~4% discrepancy could be due to ENCODE's
selection of relatively gene-rich and more highly evolu-
tionarily conserved regions which are likely easier to
assemble. In fact, the two lowest covered regions, ENr112
and ENr113, at about 47% coverage, are ENCODE
regions with no genes and very low multi-species conser-
vation.
SNPs and DIPs
SNPs are called using the ssahaSNP method[9] by com-
paring the sequence of each read relative to the consensus
sequence assembly, with a breakdown of total SNPs dis-
covered per cat provided in Table 1. Some fraction of
these SNPs is discovered in more than one cat, thus the
non-redundant total, 3,077,846, is less than the totals
across all the cats, 3,254,739. The SNP rate is calculated
relative to the reference sequence, and is determined as
the number of neighborhood quality standard (NQS)
bases divided by the SNPs detected from a given cat's
sequence traces. The NQS method[10] uses a set of
parameters to decrease the probability of a base being
called incorrectly, which in turn lowers the false report-
ing of discrepant bases. The settings of the NQS parame-
ters used here are as follows: the base has a
PHRED[11,12] quality score of > = 23, the five bases to
either side have a PHRED quality score > = 15, and nine
of these ten flanking bases are perfect matches. Since the
assembly is made up of all reads from all cats, the SNP
rates should be viewed in a relative sense, from Cinna-
mon with the lowest SNP rate at one SNP per 1520 NQS
bases to Nancy with the highest rate at one SNP per 360
NQS bases. The SNP rate is much lower for Cinnamon
for two reasons. First, Cinnamon dominates the assembly
with the most reads, thus comparing a sequence read
from Cinnamon to the assembly has an increased chance
Mullikin et al. BMC Genomics 2010, 11:406
http://www.biomedcentral.com/1471-2164/11/406
Page 3 of 8
of comparing to the same haploid. Second, Cinnamon is
inbred and her genome is 60% homozygous, further
increasing the chance of comparing a read to the same
haploid. At the other extreme, the wild cat, Nancy, gives
the highest SNP rate since the African wildcat, Felis sil-
vestris cafra, is one of several continental subspecies of
wildcat, the parent species for cat domestication. Domes-
tic cat breeds descend from a founder event (domestica-
tion itself) which reduced genomic diversity appreciably
relative to the wildcat species[3]. Domestic cat breeds
displayed a SNP rate of 1 SNP/5-600 bp. In the 1.9X cat
assembly, the rate of polymorphisms is estimated at one
SNP per 600 bases within the heterozygous segments,
thus the estimates for the other breeds agree quite well
with this previous measure. In addition to SNPs, 682,085
deletion and insertion polymorphisms (DIPs) are
detected.
Sampling SNP variation empirically
To validate the accuracy of SNP detection we randomly
chose 555 SNPs detected from the fosmid-end sequence
of six domestic cats sequenced in this study. We selected
SNPs located at least 750 bases away from a sequence
contig gap (to make primer design feasible) reducing the
number of testable SNPs to 393 of which 348 yielded
primers using a primer design package[13]. Two sets of
PCR primer plates (47 each) passed stringent primer QC
and were sequenced across the six cats plus one addi-
tional unrelated domestic shorthair.
Sequence traces were analyzed using PolyPhred version
6.11[14-16] and the targeted variant bases were viewed
using Consed[17]. Of 94 variants, 92 were confirmed, one
had low quality sequence traces for the cat carrying the
detected variant (Nancy), and one variant detected from
the ragdoll, Scooter, was not observed in this cat. Remov-
ing the single low-quality amplimer gives an overall vali-
dation rate at 92/93 = 99%. Additional file 5 Table S3 gives
a complete listing of all SNPs and genotypes from this
validation experiment.
Forty-five detected DIPs from the sequence traces fell
within the PCR amplicons. Of these, 43 were empirically
validated, one was not tested and one did not validate (a
single base insertion). Therefore, the DIP validation rate
would be 43/45 = 96%.
PCR re-sequencing
gDNA QC
gDNA concentration was determined using a DyNA
Quant 200 fluorometer (Hoefer) and the dsDNA specific
dye Hoechst Dye 22358 according to the manufacturer's
protocol. The gDNA sample was then tested for function-
ality in PCR reactions with positive and negative control
primers.
Pos_For: TGTAAAACGACGGCCAGTATCCCACTG
TTAGGAGAACTGC
Pos_Rev: CAGGAAACAGCTATGACCGGTCAGGA
AAGGGACACAGATA
Negative control primers were the forward and reverse
sequencing primers to lac-Z of M13.
M13_For: TGTAAAACGACGGCCAGT
M13_Rev: CAGGAAACAGCTATGACC
To each gDNA a trace amount of a plasmid with a
unique non-feline insert was added. This plasmid was
used as a biological barcode. The identifying inserts were
amplified and checked using the universal sequencing
primers above. The gDNAs were then diluted to a work-
ing concentration of 2.5 ng/ul.
Primer QC and Sequencing
Primers were obtained from Eurofins MWG Operon in
individual tubes and reconstituted to 100 uM in 10 mM
TRIS, pH 8.0, 0.1 mM EDTA. The primer pairs were
tested at a concentration of 0.16 uM each in 10 ul PCR
reactions containing iQ supermix (BioRad) and 5 ng of
control feline DNA (Tipper sample used for first round
and Speedy used for second round of QC). Cycling condi-
tions were: activate enzyme at 95C for 3 min, followed by
40 cycles of 95C for 15 sec, 60C for 15 sec, 72C for 60 sec,
Table 1: Total sequencing reads generated, SNP counts, SNP rate, and gender
Cat Name Cat Reads SNPs SNP rate (per x bases) Gender
Pixel Burmese 331,813 174,212 524
Zeelie Persian 298,332 174,706 510
Tipper Cornish Rex 272,607 164,054 503
Scooter Ragdoll 298,409 168,455 510
Speedy Domestic Shorthair 310,364 158,148 569
Cocoa Siamese 293,712 152,984 516
Nancy African wild cat 1,373,060 938,386 360
Cinnamon* Abyssinian 8,186,934 1,323,794 1520
* Cinnamon is listed but those reads were not part of the sequence generated from this effort.
Mullikin et al. BMC Genomics 2010, 11:406
http://www.biomedcentral.com/1471-2164/11/406
Page 4 of 8
then 72C for 5 min and hold at 10C. A 5 ul aliquot of the
PCR reaction was examined by agarose gel to look for
multiple or missing bands. The PCR products were then
diluted to 0.4 ng/ul and sequenced in 6 ul reactions using
M13 Universal forward and reverse primers and BDT
version 3.1 (Applied Biosystems) using standard ABI pro-
tocols. The reactions were sequenced on 3730 DNA
Sequencers (Applied Biosystems). The sequence traces
were then individually inspected for quality. Primer pairs
that resulted in high-quality traces were passed. Primers
not passing this round were retested using one additional
control DNA. Primers failing both rounds were not used.
PCR
PCR amplification of amplimers was performed in 10 ul
reactions in 384-well plates. The reaction conditions were
as described above.
Utility and Discussion
Distribution
The existing map of 327 k SNPs from the 1.9X Cinnamon
assembly includes large homozygous segments covering
approximately 60% of the cat genome (see Figure 5 in [4]).
Figure 1 shows the incidence of SNPs along cat chromo-
somes, where the number of SNPs is totaled within adja-
cent 1 Mb windows across each chromosome. In the new
assembly, there are only 8 windows with less than 100
SNPs per Mb across the autosomes (out of 2,569 win-
dows, not counting the unmapped regions). On chromo-
some X there are 34 windows with less than 100 SNPs out
of 143 windows, which suggests that X has relatively
lower heterozygosity due to at least three factors: 1) two
of the eight cats are male, thus reducing the number of X
chromosomes to fourteen instead of sixteen chromo-
somes for an autosomal locus 2) the effective population
size for the X chromosome is ¾ that of the autosomes,
and 3) male hemizygosity allows much stronger purifying
selection to occur around X-linked functional loci.
Figure 2 presents the SNP distribution as a fraction of
windows of a given size that has at least one SNP. Average
SNP spacing in base-pairs is shown in Table 2. For both
SNP counts and the average base-pair distance between
SNPs there are three SNP categories: A) SNPs excluding
those discovered only from Cinnamon and/or Nancy, B)
SNPs excluding those only discovered in Cinnamon, and
C) all SNPs (Table 2). The reason to count SNPs in these
categories is that for some purposes, SNPs derived from
the wildcat Nancy are not as useful as SNPs derived from
the domesticated cat breeds. Likewise, inclusion of the
highly variable densities of SNPs discovered from Cinna-
mon's much deeper sequencing would reflect this cat's
particular pattern of homozygous and heterozygous
regions. Thus, SNPs most useful for domestic cat associa-
tion screens would be those in category A which totals
964,285 SNPs (Table 2). An additional restriction of just
those category A SNPs that are mapped to a cat chromo-
some reduces this count to 844,313. However, relative to
the number of mapped bases, we still have a SNP on aver-
age spaced every 2000 bases, and even non-chromoso-
mally mapped SNPs will be useful once these segments
are mapped in improved future assemblies. Looking
again at Figure 2, over 80% of the 15 kb windows across
the genome contain at least one category A SNP.
With the genotype information, we can estimate how
many informative SNPs are available for a genotyping
study among cat breeds. There are 964,285 SNPs discov-
ered from at least one of the six breeds sequenced (cate-
gory A SNPs in Table 2). A SNP will not be informative
unless it has a minor allele frequency (MAF) of at least
5%. However, the MAFs of these SNPs are not known
until they have been genotyped. With the limited number
of genotypes given in Additional file 5 Table S3, we
observe that 24/92 (26%) are only observed in one cat,
and therefore could be quite rare variants. However, the
other 68/92 (74%) are seen in 2 or more cats, and are
therefore quite likely to be polymorphic, and thus infor-
mative. This informative fraction increases to 50/57
(88%) for SNPs detected from domestic cat samples.
Thus, about 849 k (88% of 964,285) SNPs remain that are
likely to have an informative MAF (> = 5%) among cat
breeds from this SNP resource.
In anticipation of using these SNPs for a genotyping
chip, one would like to select the SNPs at fairly even
intervals across the genome. If a SNP is selected every 15
kb from the category A SNPs, 80% of the genome can be
covered, requiring about 100,000 SNPs in a genotyping
array. The remaining 20% can be filled in with either
more widely spaced category A SNPs, or using additional
category B or C SNPs. Thus perhaps another 20 k geno-
type assays for the remaining regions would yield a 120 k
SNP chip. This is more than double the estimated num-
ber proposed for genome-wide association mapping as
reported previously[4].
Wildcat geographical origin
Finally we investigated the geographical origin of the
wildcat Nancy. With nearly 1.4 M reads available for this
wild cat, this does provide a valuable resource for study-
ing subspecies of Felis silvestris. Nancy was identified as
wild-caught 15 years ago in the Arabian Peninsula, so she
should geographically fall into the African subspecies
Felis silvestris lybica. A STRUCTURE analysis was com-
pleted using a genomic DNA sample from Nancy, and
genotypes from 18 of the 36 short tandem repeat (STR)
loci were used to resolve genomic distinctions among
wildcat subspecies effectively[3]. Nancy showed no evi-
dence of domestic cat introgression but instead clusters
with cats from southern Africa rather than the Near East,
Mullikin et al. BMC Genomics 2010, 11:406
http://www.biomedcentral.com/1471-2164/11/406
Page 5 of 8
Figure 1 SNP density in 1 Mb windows. Number of SNPs in windows of 1 Mb across each chromosome. Zero values are regions that do not have
mapped sequence, totaling about 416 Mb, thus the densities are undetermined.
Mullikin et al. BMC Genomics 2010, 11:406
http://www.biomedcentral.com/1471-2164/11/406
Page 6 of 8
so she likely descends form the subspecies Felis silvestris
cafra rather than F. s . l i by c a. For additional details of this
analysis, see Additional file 4 and Additional file 6 Figure
S1.
Conclusions
The primary goal of this effort is to expand feline SNP
resources, empowering future linkage and association
studies to map feline disease phenotypes to genomic loci.
The usefulness of such a resource is clearly proven in
many species, most notably for the human and canine
genome[18-20]. A recent review[1] highlights this need
for additional cat SNPs to aid the development of a geno-
type array chip of 100,000-150,000 SNPs. The SNPs dis-
covered by this effort should allow the design of such a
chip derived from the 964,285 available SNPs from the
domestic breed cats.
The value of the sequence generated by this effort will
become even greater as the publicly funded effort (see
Felis catus entries in [21]) to generate a high quality draft
of Cinnamon's genome is completed, hopefully within the
next year. With a high quality draft covering over 90% of
the cat genome, even more SNPs can be extracted from
the 3M reads from these seven cats, probably 25% more if
the assembled sequence increases to 2.5 Gb from the cur-
rent 2.0 Gb. This resource is further enhanced by having
all reads generated from paired-ends of fosmid templates.
The insert sizes are all about 40 kb in size with fairly tight
distributions of less than 10% coefficient-of-variation.
Given an increasingly higher quality assembly of Cinna-
mon, these paired-end reads could pinpoint structural
rearrangements among cat breeds using available meth-
ods[22].
Table 2: SNP counts and bases per SNP
SNP Counts Bases per SNP
Chromosome Non-N Bases A B C A B C
chrA1 164,170,763 77,824 151,421 240,266 2,110 1,084 683
chrA2 120,172,290 59,782 118,896 180,685 2,010 1,011 665
chrA3 109,094,838 55,010 110,129 192,266 1,983 991 567
chrB1 131,184,541 62,260 118,189 196,456 2,107 1,110 668
chrB2 101,553,943 49,898 95,152 161,014 2,035 1,067 631
chrB3 96,970,780 47,679 93,489 167,719 2,034 1,037 578
chrB4 108,425,265 53,709 104,123 170,033 2,019 1,041 638
chrC1 160,223,031 76,483 147,928 245,762 2,095 1,083 652
chrC2 107,198,630 53,479 98,226 163,338 2,004 1,091 656
chrD1 81,705,395 45,881 88,125 130,989 1,781 927 624
chrD2 67,243,459 37,877 71,535 134,493 1,775 940 500
chrD3 71,434,721 40,297 79,074 119,894 1,773 903 596
chrD4 67,338,148 34,295 65,034 88,513 1,963 1,035 761
chrE1 44,074,055 24,513 50,193 87,236 1,798 878 505
chrE2 50,431,338 27,836 56,317 94,520 1,812 895 534
chrE3 36,523,145 24,444 47,449 68,476 1,494 770 533
chrF1 45,373,584 24,292 45,507 83,877 1,868 997 541
chrF2 56,475,142 29,011 55,998 93,247 1,947 1,009 606
chrX 83,845,181 19,431 32,619 65,352 4,315 2,570 1,283
chrUnCf 217,904,849 101,325 198,166 323,505 2,151 1,100 674
chrUn 70,839,159 18,959 34,246 70,797 3,736 2,069 1,001
Total 1,992,182,257 964,285 1,861,816 3,078,438 2,066 1,070 647
ASNPs excluding Cinnamon and Nancy
BSNPs excluding Cinnamon
CAll SNPs
Mullikin et al. BMC Genomics 2010, 11:406
http://www.biomedcentral.com/1471-2164/11/406
Page 7 of 8
Availability and requirements
All SNPs and DIPs described are freely available through
dbSNP [5] and through the web browser interface [6].
Additional material
Authors' contributions
JCM drafted the manuscript. SWKA and MFL organized the sample collection.
HE, WFD, WT, DJS, AB, and MJR performed the fosmid library construction and
sequencing. PC, VD, ACY and NISC performed the PCR primer design and
sequencing. VD and CD did the STR typing and analysis to identify the likely
origin of the wildcat. LS, NFH and JCM per formed the sequence alignment and
SNP/DIP detection and validation review. JCM, JAB, SJO, DRS and MSA con-
ceived of the study, and participated in its design and coordination and helped
to draft the manuscript. All authors read and approved the final manuscript.
Acknowledgements
The DNA sample for the wildcat, Nancy, was kindly provided by Dr. Betsy L.
Dresser at the Audubon Nature Institute Center for Research of Endangered
Species, New Orleans, LA. We would like to thank Becky Stone (Pixel), Susy
Tejayadi (Tipper), Josie Kirk-Pagel (Zeelie), Rick Wienckowski (Cocoa), and Patti
Morelock (Scooter) for donating DNA samples to the project. This research was
supported in part by the Intramural Research Program of the National Human
Genome Research Institute, National Institutes of Health.
Author Details
1Genome Technology Branch and NIH Intramural Sequencing Center, National
Human Genome Research Institute, National Institutes of Health, Bethesda,
Maryland 20892, USA, 2Agencourt Bioscience Corporation, Beverly,
Massachusetts 01915, USA, 3Laboratory of Genomic Diversity, National Cancer
Institute, Frederick, Maryland 21702, USA and 4Hill's Pet Nutrition Inc., PO Box
1658, Topeka, KS 66601, USA
Addition al file 1 Whole genome assembly statistics. Table S1 compar-
ing the assembly statistics for this assembly and the previously published
1.9X assembly.
Addition al file 2 Assembly coverage of ENCODE regions. Table S2 list-
ing whole genome shotgun assembly coverage statistics relative to high
quality cat BAC clone assemblies of ENCODE regions for this assembly and
the previously published 1.9X assembly
Addition al file 3 Order and orientation across ENCODE regions. Plots
of assembly order and orientation across all 44 ENCODE regions.
Addition al file 4 Descriptions of how Additional files 3 and 6 were
generated. Methods for generating plots of order and orientation across
ENCODE regions and the STRUCTURE analysis of Nancy.
Addition al file 5 PCR validation results for 94 variants. Table S3 lists the
variants by position on the genome assembly, which alleles are expected,
and the alleles observed across 8 cats. Pink colored cells indicate the cat(s)
from which the alternate allele was discovered in the light whole genome
sequence.
Addition al file 6 STRUCTURE analysis results. Figure S1 shows results of
STRUCTURE analysis showing Nancy and representative known-origin wild-
cat individuals.
Received: 8 September 2009 Accepted: 24 June 2010
Published: 24 June 2010
This article is available from: http://www.biomedcentral.com/1471-2164/11/406© 2010 Mullikin et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.BMC Genomics 2010, 11:406
Figure 2 SNP distribution. The fraction of windows with one or more SNPs for a range of window sizes and three categories of SNPs: all SNPs, all
except Cinnamon and all except Cinnamon and Nancy.
Mullikin et al. BMC Genomics 2010, 11:406
http://www.biomedcentral.com/1471-2164/11/406
Page 8 of 8
References
1. O'Brien SJ, Johnson W, Driscoll C, Pontius J, Pecon-Slattery J, Menotti-
Raymond M: State of cat genomics. Trends Genet 2008, 24:268-279.
2. OMIA - Online Mendelian Inheritance in Animals [http://
omia.angis.org.au/]
3. Driscoll CA, Menotti-Raymond M, Roca AL, Hupe K, Johnson WE, Geffen E,
Harley EH, Delibes M, Pontier D, Kitchener AC, Yamaguchi N, O'brien SJ,
Macdonald DW: The Near Eastern origin of cat domestication. Science
2007:519-523.
4. Pontius JU, Mullikin JC, Smith DR, Agencourt Sequencing Team, Lindblad-
Toh K, Gnerre S, Clamp M, Chang J, Stephens R, Neelam B, Volfovsky N,
Schäffer AA, Agarwala R, Narfström K, Murphy WJ, Giger U, Roca AL,
Antunes A, Menotti-Raymond M, Yuhki N, Pecon-Slattery J, Johnson WE,
Bourque G, Tesler G, NISC Comparative Sequencing Program, O'Brien SJ:
Initial sequence and comparative analysis of the cat genome. Genome
Res 2007, 17:1675-1689.
5. dbSNP Summary [http://www.ncbi.nlm.nih.gov/SNP/snp_summary.cgi]
6. GARField: Genome Annotation Resource Field Felis catus [http://
lgd.abcc.ncifcrf.gov/cgi-bin/gbrowse/cat3x]
7. Mullikin JC, Ning Z: The phusion assembler. Genome Res 2003, 13:81-90.
8. ENCODE Project Consortium: Identification and analysis of functional
elements in 1% of the human genome by the ENCODE pilot project.
Nature 2007, 447:799-816.
9. Ning Z, Cox AJ, Mullikin JC: SSAHA: a fast search method for large DNA
databases. Genome Res 2001, 11:1725-1729.
10. Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Baldwin J, Linton L,
Lander ES: An SNP map of the human genome generated by reduced
representation shotgun sequencing. Nature 2000, 407:513-6.
11. Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated
sequencer traces using PHRED. I. Accuracy assessment. Genome Res
1998, 8:175-185.
12. Ewing B, Green P: Base-calling of automated sequencer traces using
PHRED. II. Error probabilities. Genome Res 1998, 8:186-194.
13. Chines PS, Swift AJ, Bonnycastle LL, Erdos MR, Mullikin JC, Collins FS:
PrimerTile: designing overlapping PCR primers for resequencing. The
American Society of Human Genetics: 2005; Salt Lake City, Utah.
14. Bhangale TR, Rieder MJ, Livingston RJ, Nickerson DA: Comprehensive
identification and characterization of diallelic insertion-deletion
polymorphisms in 330 human candidate genes. Hum Mol Genet 2005,
14:59-69.
15. Bhangale TR, Stephens M, Nickerson DA: Automating resequencing-
based detection of insertion- deletion polymorphisms. Nat Genet 2006,
38:1457-1462.
16. Stephens M, Sloan JS, Robertson PD, Scheet P, Nickerson DA: Automating
sequence-based detection and genotyping of SNPs from diploid
samples. Nat Genet 2006, 38:375-381.
17. Gordon D: Viewing and Editing Assembled Sequences Using Consed.
Current Protocols in Bioinformatics 2003. Section 11.2.1-11.2.43
18. The SNP Consortium: A map of human genome sequence variation
containing 1.42 million single nucleotide polymorphisms. Nature 2001,
409:928-933.
19. The International HapMap Consortium: A second generation human
haplotype map of over 3.1 million SNPs. Nature 2007, 449:851-861.
20. Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, Kamal M,
Clamp M, et al.: Genome sequence, comparative analysis and haplotype
structure of the domestic dog. Nature 2005, 438:803-819.
21. Approved Sequencing Targets [http://www.genome.gov/10002154]
22. Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T,
Hansen N, Teague B, Alkan C, Antonacci F, Haugen E, Zerr T, Yamada NA,
Tsang P, Newman TL, Tüzün E, Cheng Z, Ebling HM, Tusneem N, David R,
Gillett W, Phelps KA, Weaver M, Saranga D, Brand A, Tao W, Gustafson E,
McKernan K, Chen L, Malig M, Smith JD, Korn JM, McCarroll SA, Altshuler
DA, Peiffer DA, Dorschner M, Stamatoyannopoulos J, Schwartz D,
Nickerson DA, Mullikin JC, Wilson RK, Bruhn L, Olson MV, Kaul R, Smith DR,
Eichler EE: Mapping and sequencing of structural vari ation from eight
human genomes. Nature 2008, 453:56-64.
doi: 10.1186/1471-2164-11-406
Cite this article as: Mullikin et al., Light whole genome sequence for SNP
discovery across domestic cat breeds BMC Genomics 2010, 11:406
... Fifty-two genomic DNA samples (~600 ng each) were submitted to GeneSeek (Neogene, Lincoln, NE, USA) for SNP genotyping on the Illumina Infinium Feline 63K iSelect DNA Array (Illumina, San Diego, CA, USA) [22]. The original SNP positions were based on an early assembly of the cat genome [26], and have been since relocalized to the latest feline genome assembly, Felis_catus_9.0. The SNP positions based on the Felis_catus_9.0 assembly were used for the analyses and the required map file is available. ...
... p-values were presented with up to four decimal places. * SNP IDs are based on an early cat genome assembly [26] † Positions based on current cat genome assembly [27]. ...
... Genes 2020, 11, x FOR PEER REVIEW 6 of 16 p-values were presented with up to four decimal places.* SNP IDs are based on an early cat genome assembly [26] † Positions based on current cat genome assembly [27]. ...
Article
Full-text available
An inherited neurologic syndrome in a family of mixed-breed Oriental cats has been characterized as forebrain commissural malformation, concurrent with ventriculomegaly and interhemispheric cysts. However, the genetic basis for this autosomal recessive syndrome in cats is unknown. Forty-three cats were genotyped on the Illumina Infinium Feline 63K iSelect DNA Array and used for analyses. Genome-wide association studies, including a sib-transmission disequilibrium test and a case-control association analysis, and homozygosity mapping, identified a critical region on cat chromosome A3. Short-read whole genome sequencing was completed for a cat trio segregating with the syndrome. A homozygous 7 bp deletion in growth differentiation factor 7 (GDF7) (c.221_227delGCCGCGC [p.Arg74Profs]) was identified in affected cats, by comparison to the 99 Lives Cat variant dataset, validated using Sanger sequencing and genotyped by fragment analyses. This variant was not identified in 192 unaffected cats in the 99 Lives dataset. The variant segregated concordantly in an extended pedigree. In mice, GDF7 mRNA is expressed within the roof plate when commissural axons initiate ventrally-directed growth. This finding emphasized the importance of GDF7 in the neurodevelopmental process in the mammalian brain. A genetic test can be developed for use by cat breeders to eradicate this variant.
... Une amélioration notable du génome félin a été publiée en 2014, avec l'incorporation de données issues du séquençage NGS d'un panel de chats de différentes races et de félins sauvages. La profondeur de séquençage est ainsi passée à 20 X et des données concernant le polymorphisme ont été ajoutées (Mullikin et al. 2010 ...
... Le séquençage du génome de Cinnamon, puis le séquençage basse profondeur (3 X) de six chats de différentes races (American shorthair, Cornish rex, Burmese, Persan, Siamois, Ragdoll) et d'un félin sauvage (Felis silvestris cafra), a permis de découvrir trois millions de marqueurs SNP (Mullikin et al. 2010). Cette ressource a été utilisée pour développer, en 2010, une première puce féline de génotypage. ...
Article
Le séquençage et l'annotation du génome du chat domestique (Felis catus), initialement publiés en 2007, étaient de médiocre qualité. Ils ont depuis été amendés puis complétés par le développement d'outils de génomique qui, combinés à des stratégies génétiques en constante évolution, permettent de rechercher gènes et variants d'intérêt avec efficacité. Ces progrès bénéficient à la médecine vétérinaire féline, mais font également du chat domestique un modèle pour l'étude de la domestication, l'évolution des espèces, mais aussi un modèle biomédical en constante ascension. Progressivement, le chat qui a déjà conquis le cœur des foyers français et du web mondial, est en train de conquérir le cœur des généticiens.
... Cats are the most popular pets worldwide and the most extensively used model to understand human diseases [1][2][3][4]. Cats make a unique contribution to biomedical science and serve as a significant study model in a variety of fields, including neurology related to locomotion and spinal damage, retrovirus and zoonotic disease studies, and the development of therapeutic techniques for inherited disorders. There are human homologs for almost 90% of cat genes, and fewer chromosomal rearrangements in cats than in mice. ...
Chapter
Full-text available
Cats are the most popular pets worldwide and the most extensively used model to understand human diseases [1– 4]. Cats make a unique contribution to biomedical science and serve as a significant study model in a variety of fields, including neurology related to locomotion and spinal damage, retrovirus and zoonotic disease studies, and the development of therapeutic techniques for inherited disorders. There are human homologs for almost 90% of cat genes, and fewer chromosomal rearrangements in cats than in mice. The intermediate size, prolific breeding capacity, likeness of systems to humans, abundance, low cost, and neurobehavioral complexity make cats useful in research including neuroscience and a variety of genetic, ophthalmologic, and infectious disorders. Cloning a feline using somatic cell nuclear transfer (SCNT) is beneficial for generating disease models and averting the extinction of endangered felids [5–7]. The world’s first cat cloning experiment was done by Taeyoung Shin around 2 decades ago, and successfully produced CC (“Carbon Copy” or “Copycat”) via SCNT [8]. SCNT is a robust technology but a contentious procedure because of risks such as high embryonic losses, placental abnormalities, postnatal mortality, and poor survival (1–5%) for more than a short period [9–11]. Felid cloning has disadvantages as well, such as the low in vitro production of cat-cloned embryos [12, 13], complications in the surrogacy process [8], high rate of miscarriage [14], and very low birth rate of healthy clones [8, 15]. Postsurvival health issues have also been associated with cloned animals, such as defects in vital organs and immune system problems [16]. Although there have been few reports of developmental problems in cloned cats after birth, cat cloning research must nevertheless be handled with caution. The information provided in this chapter covers the advances in cat cloning methods and available treatment options for the care of cloned cats proposed to the IETS Health and Safety Advisory Committee.
... A further limitation of this study is the low density of the feline Illumina Infinium iSelect DNA array for identification of loci associated with complex disease conditions 45,83,84 . SNPs incorporated onto this array have been remapped to the feline genome assembly 6.2 46,49 and subsequently assembly 8.0 and 9.0 45,85 . ...
Article
Full-text available
Hypertension (HTN) and chronic kidney disease (CKD) are common in ageing cats. In humans, blood pressure (BP) and renal function are complex heritable traits. We performed the first feline genome-wide association study (GWAS) of quantitative traits systolic BP and creatinine and binary outcomes HTN and CKD, testing 1022 domestic cats with a discovery, replication and meta-analysis design. No variants reached experimental significance level in the discovery stage for any phenotype. Follow up of the top 9 variants for creatinine and 5 for systolic BP, one SNP reached experimental-wide significance for association with creatinine in the combined meta-analysis (chrD1.10258177; P = 1.34 × 10–6). Exploratory genetic risk score (GRS) analyses were performed. Within the discovery sample, GRS of top SNPs from the BP and creatinine GWAS show strong association with HTN and CKD but did not validate in independent replication samples. A GRS including SNPs corresponding to human CKD genes was not significant in an independent subset of cats. Gene-set enrichment and pathway-based analysis (GSEA) was performed for both quantitative phenotypes, with 30 enriched pathways with creatinine. Our results support the utility of GWASs and GSEA for genetic discovery of complex traits in cats, with the caveat of our findings requiring validation.
... Our initial data set comprised all Leopardus individuals sampled by Li et al. (2016), which were genotyped with an Illumina array targeting genome-wide SNPs identified in the domestic cat (Mullikin et al. 2010). We complemented this data set by genotyping the same markers in five additional L. geoffroyi individuals, two of which had suggestive evidence of admixture with L. tigrinus from the Brazilian northeast (Trigo et al. 2013) and six additional L. guttulus individuals (a species that had not been included in Li et al.'s [2016] study) with known geographic origin. ...
Article
Full-text available
Phylogenetic reconstruction and species delimitation are often challenging in the case of recent evolutionary radiations, especially when post-speciation gene flow is present. Leopardus is a Neotropical cat genus that has a long history of recalcitrant taxonomic problems, along with both ancient and current episodes of interspecies admixture. Here we employ genome-wide SNP data from all presently recognized Leopardus species, including several individuals from the tigrina complex (representing L. guttulus and two distinct populations of L. tigrinus), to investigate the evolutionary history of this genus. Our results reveal that the tigrina complex is paraphyletic, containing at least three distinct species. While one can be assigned to L. guttulus, the other two remain uncertain regarding their taxonomic assignment. Our findings indicate that the ‘tigrina’ morphology may be plesiomorphic within this group, which has led to a longstanding taxonomic trend of lumping these poorly known felids into a single species.
Article
Full-text available
Many coat color, behavioral and morphological traits are specific and fixed across cat breeds, with several variants influencing these traits being common among different breeds. In the domestic cat, rexoid mutations have been documented in several breeds. In the Cornish Rex, four bp deletion in the LPAR6 gene has been found to cause a frame shift and a premature stop codon. In addition to the rexoid coat, Cornish Rex cats also have a characteristic head, ear shape and body type. Analysis of the selection signatures in the Cornish Rex genome revealed several regions that are under selective pressure. One of these is located in CFA B4, in the region where the ALX1 gene is located. The ALX1 gene in Burmese cats disrupts the cranial morphogenesis and causes brachycephaly in the heterozygous state. In our study, we confirmed the presence of a deletion in LPAR6 in 20 Cornish Rex and in four F1 hybrids between Cornish Rex and domestic cat. However, we did not confirm the presence of the deletion in ALX1 in Cornish Rex cats. Genome-wide selection signature analysis was performed using ROH islands and integrated haplotype score (iHS) statistics based on publicly available SNP array data of 11 Cornish Rex cats. The selection signatures were detected on chromosomes A1, A3, C2, B1, B4 and D1.
Article
Full-text available
[Objectives] The evolutionary relationship of Felidae has been controversial. As a result, there are highly divergent views on classification of cats at the generic level. The emerging phylogeny using gene or genomic data provides a new viewpoint to understand the evolution of cats. [Methods] This paper reviews the molecular phylogeny of Felidae over recent years, and we deduce the evolutionary history of Felidae in combination twith fossil records. The phylogenies by Johnson et al. (2006) and Li et al. (2016) are used as the core, corroborated by specific fossil records. [Results] Recent molecular phylogenies propose living cats radiated in the Late Miocene and diverged into eight branches. Though the divergent age of these branches largely coincides with fossil evidence, the inferred origination area of some branches is not supported by fossil records. Combing the evidence from these fossil records, we propose that the most living cat lineages are likely originated in Asia except for the Caracal lineage and Leopardus lineage, and living cats experienced at least 30 intercontinental migrations in the process of evolution, far more than those inferred from only molecular phylogeny. [Conclusion] Based on the study of volutionary history and morphology, we suggest that all the living cats should be classified into Felinae, and subdivided into 15 genera and 40 species
Chapter
Breed predispositions are documented for a variety of feline skin diseases, and there are diverse reports describing novel skin diseases where affected individuals within a litter had similar congenital skin changes. Both presentations increase the suspicion of a possible hereditary component to the identified skin disease. Recognized feline genodermatoses represent inherited skin disorders that follow a single-gene mode of inheritance (Leeb et al., Vet Dermatol 28:4–9. https://doi.org/10.1016/j.mcp.2012.04.004, 2017). These diseases are rare in their occurrence, but the number identified has the potential to increase, as available diagnostic tools for evaluating genetic diseases have advanced and single nucleotide polymorphism (SNP) mapping of the feline genome has improved (Lyons, Mol Cell Probes. 26:224–30. https://doi.org/10.1016/j.mcp.2012.04.004, 2012; Mullikin et al., BMC Genomics. 11:406. http://www.biomedcentral.com/1471-2164/11/406, 2010). This chapter will discuss some examples of feline genodermatoses that can affect the epidermis, the dermoepidermal junction, the hair follicles or hair shafts, the dermis, and pigmentation. Discussion about the genetics of feline coat color and coat length occurs in Chapter, Coat Color Genetics.
Chapter
The health burden associated with hypertension in the world is very large. Current estimates suggest that there are over 1 billion individuals with hypertension worldwide with the World Health Organisation estimating that this will increase to 1.5 billion by 2025. Between 35% and 45% of individuals over 25 years demonstrate hypertension with prevalence varying with geographical location and socioeconomic status. In humans, a continuous and incremental risk exists between blood pressure (BP) and cardiovascular disease, stroke and kidney disease. The significant health and economic impact of hypertension has, therefore, driven the requirement for a better understanding of factors, including genetic factors, that may contribute to the development of hypertension. An understanding of these genetic associations might not only improve individual and population risk prediction but, more importantly, may identify novel pathophysiological pathways that are involved in BP regulation and which may also act as pathway targets for new anti-hypertensive drug development assisting with personalisation of anti-hypertensive therapy.
Article
Full-text available
Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.
Article
Full-text available
Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale—particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence map of human structural variation—a standard for genotyping platforms and a prelude to future individual genome sequencing projects.
Article
Full-text available
The availability of massive amounts of DNA sequence information has begun to revolutionize the practice of biology. As a result, current large-scale sequencing output, while impressive, is not adequate to keep pace with growing demand and, in particular, is far short of what will be required to obtain the 3-billion-base human genome sequence by the target date of 2005. To reach this goal, improved automation will be essential, and it is particularly important that human involvement in sequence data processing be significantly reduced or eliminated. Progress in this respect will require both improved accuracy of the data processing software and reliable accuracy measures to reduce the need for human involvement in error correction and make human review more efficient. Here, we describe one step toward that goal: a base-calling program for automated sequencer traces, phred, with improved accuracy. phred appears to be the first base-calling program to achieve a lower error rate than the ABI software, averaging 40%-50% fewer errors in the data sets examined independent of position in read, machine running conditions, or sequencing chemistry.
Article
We describe a map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.
Article
Elimination of the data processing bottleneck in high-throughput sequencing will require both improved accuracy of data processing software and reliable measures of that accuracy. We have developed and implemented in our base-calling program phred the ability to estimate a probability of error for each base-call, as a function of certain parameters computed from the trace data. These error probabilities are shown here to be valid (correspond to actual error rates) and to have high power to discriminate correct base-calls from incorrect ones, for read data collected under several different chemistries and electrophoretic conditions. They play a critical role in our assembly program phrap and our finishing program consed.