ArticlePDF Available

Light whole genome sequence for SNP discovery across domestic cat breeds

June 2010
BMC Genomics 11(1):406

June 2010
11(1):406

DOI:10.1186/1471-2164-11-406

Source
PubMed

License
CC BY 2.0

Authors:

Lei Shen

Hangzhou Dianzi University

Heather Ebling

Show all 19 authorsHide

The domestic cat has offered enormous genomic potential in the veterinary description of over 250 hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus--FeLV, feline coronavirus--FECV, feline immunodeficiency virus--FIV) that are homologues to human scourges (cancer, SARS, and AIDS respectively). However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP) map is required in order to accomplish disease and phenotype association discovery. To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds: female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a SNP validation rate of 99%. These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the development of SNP genotyping platforms for mapping feline diseases.

: Total sequencing reads generated, SNP counts, SNP rate, and gender

…

: SNP counts and bases per SNP

…

SNP density in 1 Mb windows. Number of SNPs in windows of 1 Mb across each chromosome. Zero values are regions that do not have mapped sequence, totaling about 416 Mb, thus the densities are undetermined.

…

SNP distribution. The fraction of windows with one or more SNPs for a range of window sizes and three categories of SNPs: all SNPs, all except Cinnamon and all except Cinnamon and Nancy.

…

Figures - uploaded by Carlos A Driscoll

Content may be subject to copyright.

Content uploaded by Carlos A Driscoll

Content may be subject to copyright.

Available via license: CC BY 2.0

Content may be subject to copyright.

Available via license: CC BY 2.0

Content may be subject to copyright.

Mullikin et al. BMC Genomics 2010, 11:406

http://www.biomedcentral.com/1471-2164/11/406

Open Access

DATABASE

Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in

any medium, provided the original work is properly cited.

Database

Light whole genome sequence for SNP discovery

across domestic cat breeds

James C Mullikin*

, Nancy F Hansen

, Lei Shen

, Heather Ebling

, William F Donahue

, Wei Tao

, David J Saranga

Adrianne Brand

, Marc J Rubenfield

, Alice C Young

, Pedro Cruz

for NISC Comparative Sequencing Program

Carlos Driscoll

, Victor David

, Samer WK Al-Murrani

, Mary F Locniskar

, Mitchell S Abrahamsen

, Stephen J O'Brien

Douglas R Smith

and Jeffrey A Brockman

Abstract

Background: The domestic cat has offered enormous genomic potential in the veterinary description of over 250

hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus -- FeLV, feline

coronavirus -- FECV, feline immunodeficiency virus - FIV) that are homologues to human scourges (cancer, SARS, and

AIDS respectively). However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP)

map is required in order to accomplish disease and phenotype association discovery.

Description: To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and

combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled

together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential

false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high

lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds:

female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female

Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs

suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and

domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a

SNP validation rate of 99%.

Conclusions: These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the

development of SNP genotyping platforms for mapping feline diseases.

Background

Along with dogs, the domestic cat enjoys extensive veter-

inary surveillance, more than any other animal. A rich lit-

erature of feline veterinary models reveals a unique

opportunity to explore genetic determinants responsible

for genetic diseases, infectious disease susceptibility,

behavioral and neurological phenotypes, reproduction

and physiology (see [1] and [2] for citations). As a highly

venerated pet this extraordinarily successful domestic

species comprises as many as one billion individuals

worldwide. House cats have become a familiar compan-

ion to people since their original domestication from the

Asian wildcat (Felis silvestris lybica), recently estimated at

approximately 10,000 years ago in the Middle East's Fer-

tile Crescent[3]. In spite of our affection for cats,

advances in clinical resolution of genetic maladies and

complex diseases has been slower than for other species

largely due to a delay in achieving a useful whole genome

sequence of the cat. This has changed recently with the

completion of a draft 1.9X genome sequence of a female

Abyssinian cat named Cinnamon who gave us our first

glimpse and hope of developing the species as an active

player in the genomics era[1,4].

The availability of a sufficiently dense single-nucleotide

polymorphism (SNP) map for a species provides a

resource which enables the power of automated high-

* Correspondence: mullikin@mail.nih.gov

1 Genome Technology Branch and NIH Intramural Sequencing Center, National

Human Genome Research Institute, National Institutes of Health, Bethesda,

Maryland 20892, USA

Full list of author information is available at the end of the article

Mullikin et al. BMC Genomics 2010, 11:406

http://www.biomedcentral.com/1471-2164/11/406

Page 2 of 8

throughput genotyping to associate regions of the

genome to hereditary diseases, quantitative traits, and

other phenotypes. High density SNP maps are available

for many species including human, mouse, dog, chicken,

and rice (for a complete list see [5]). The cat genome has a

moderate collection of SNPs, however the 327,037 avail-

able SNPs are clustered into alternating genomic seg-

ments of high SNP density and homozygous regions;

approximately 60% of Cinnamon's genome is homozy-

gous[4]. This patchwork pattern of the current 1.9X

genome sequence was derived from a single inbred cat.

To supplement feline SNP map and the genome assembly,

we created fosmid libraries and sequenced six additional

cats of different breeds and one African wild cat. These

sequences dramatically improve the SNP map by increas-

ing the total number of useful SNPs and by filling in the

long stretches of genomic homozygosity (~60% of the

genome) reported in the 1.9X genome sequence of Cin-

namon[4]. In order to make the best use of the additional

sequence reads for SNP discovery, we generated a new

assembly; the new reads increase the depth-of-coverage

of the genome by 50%. This translates to 25% more

genomic sequence. Thus this resource provides an

improved cat genome assembly as well as a greatly

improved SNP map (Assembly: NCBI Accession

ACBE00000000, NCBI dbSNP handle CAT_POLY_V17E,

and [6]).

Construction and content

Samples and sequence generation

DNA from six domestic cats and one wildcat was col-

lected and isolated using the PAXgene Blood DNA kit as

per manufactures instructions (Qiagen, Inc., Valencia,

CA.) and stored at -80°C until library construction. The

domestic cat samples represented six different breeds

from pet owners in the Topeka, KS area, and the wildcat

sample was from a captive animal residing at the Audu-

bon Nature Institute, LA. Fosmid libraries were prepared

in vector pCC2fos from each DNA sample with an aver-

age insert size of approximately 37 kb. A total of

3,178,297 paired-end sequencing reads were generated

from the seven libraries (Table 1) using standard methods

with SPRI-based DNA purification and Big Dye termina-

tor sequencing reagents on ABI3730xl instruments.

Reads, assembly and mapping

Assembly of the cat genome was carried out in a similar

method as published for the 1.9X assembly of Cinna-

mon[4], except in this case the Phusion method[7] was

used for contig and scaffold generation. All 11.4M reads

(Table 1) were used, comprising approximately 2.8-fold

read redundancy. Contig N50 size is 4.6 kb, an increase of

nearly 100% from 2.4 kb in the 1.9X assembly, with total

assembled bases at 2.0 Gb, an increase from 1.64 Gb in

the 1.9X assembly by 22% (see Additional file 1 Table S1

for a complete listing of the assembly statistics). Mapping

of the assembly onto chromosomes used a similar

method as previously described[4] with the total

sequence placed onto cat chromosomes at 1.71 Gb, an

increase from 1.36 Gb. Comparison of this new assembly

and the 1.9X assembly to the highly collated NIH Intra-

mural Sequencing Center (NISC) generated sequence

across feline ENCODE regions[8] shows substantial

improvement in coverage from 72% to 84% (Additional

file 2 Table S2), as well as excellent order and orientation

(Additional file 3 and Additional file 4). The 84% coverage

figure is a sampling estimate derived from assessing the

exact coverage of 26.6 Mb of BAC clone sequence from

ENCODE target regions to this assembly, resulting in a

higher level of coverage across these regions than an esti-

mated genome average. For example the estimated

euchromatic genome size is 2.5 Gb; thus the overall

assembled base-pair coverage would be closer to 2.0/2.5

= 80%. The ~4% discrepancy could be due to ENCODE's

selection of relatively gene-rich and more highly evolu-

tionarily conserved regions which are likely easier to

assemble. In fact, the two lowest covered regions, ENr112

and ENr113, at about 47% coverage, are ENCODE

regions with no genes and very low multi-species conser-

vation.

SNPs and DIPs

SNPs are called using the ssahaSNP method[9] by com-

paring the sequence of each read relative to the consensus

sequence assembly, with a breakdown of total SNPs dis-

covered per cat provided in Table 1. Some fraction of

these SNPs is discovered in more than one cat, thus the

non-redundant total, 3,077,846, is less than the totals

across all the cats, 3,254,739. The SNP rate is calculated

relative to the reference sequence, and is determined as

the number of neighborhood quality standard (NQS)

bases divided by the SNPs detected from a given cat's

sequence traces. The NQS method[10] uses a set of

parameters to decrease the probability of a base being

called incorrectly, which in turn lowers the false report-

ing of discrepant bases. The settings of the NQS parame-

ters used here are as follows: the base has a

PHRED[11,12] quality score of > = 23, the five bases to

either side have a PHRED quality score > = 15, and nine

of these ten flanking bases are perfect matches. Since the

assembly is made up of all reads from all cats, the SNP

rates should be viewed in a relative sense, from Cinna-

mon with the lowest SNP rate at one SNP per 1520 NQS

bases to Nancy with the highest rate at one SNP per 360

NQS bases. The SNP rate is much lower for Cinnamon

for two reasons. First, Cinnamon dominates the assembly

with the most reads, thus comparing a sequence read

from Cinnamon to the assembly has an increased chance

Mullikin et al. BMC Genomics 2010, 11:406

http://www.biomedcentral.com/1471-2164/11/406

Page 3 of 8

of comparing to the same haploid. Second, Cinnamon is

inbred and her genome is 60% homozygous, further

increasing the chance of comparing a read to the same

haploid. At the other extreme, the wild cat, Nancy, gives

the highest SNP rate since the African wildcat, Felis sil-

vestris cafra, is one of several continental subspecies of

wildcat, the parent species for cat domestication. Domes-

tic cat breeds descend from a founder event (domestica-

tion itself) which reduced genomic diversity appreciably

relative to the wildcat species[3]. Domestic cat breeds

displayed a SNP rate of 1 SNP/5-600 bp. In the 1.9X cat

assembly, the rate of polymorphisms is estimated at one

SNP per 600 bases within the heterozygous segments,

thus the estimates for the other breeds agree quite well

with this previous measure. In addition to SNPs, 682,085

deletion and insertion polymorphisms (DIPs) are

detected.

Sampling SNP variation empirically

To validate the accuracy of SNP detection we randomly

chose 555 SNPs detected from the fosmid-end sequence

of six domestic cats sequenced in this study. We selected

SNPs located at least 750 bases away from a sequence

contig gap (to make primer design feasible) reducing the

number of testable SNPs to 393 of which 348 yielded

primers using a primer design package[13]. Two sets of

PCR primer plates (47 each) passed stringent primer QC

and were sequenced across the six cats plus one addi-

tional unrelated domestic shorthair.

Sequence traces were analyzed using PolyPhred version

6.11[14-16] and the targeted variant bases were viewed

using Consed[17]. Of 94 variants, 92 were confirmed, one

had low quality sequence traces for the cat carrying the

detected variant (Nancy), and one variant detected from

the ragdoll, Scooter, was not observed in this cat. Remov-

ing the single low-quality amplimer gives an overall vali-

dation rate at 92/93 = 99%. Additional file 5 Table S3 gives

a complete listing of all SNPs and genotypes from this

validation experiment.

Forty-five detected DIPs from the sequence traces fell

within the PCR amplicons. Of these, 43 were empirically

validated, one was not tested and one did not validate (a

single base insertion). Therefore, the DIP validation rate

would be 43/45 = 96%.

PCR re-sequencing

gDNA QC

gDNA concentration was determined using a DyNA

Quant 200 fluorometer (Hoefer) and the dsDNA specific

dye Hoechst Dye 22358 according to the manufacturer's

protocol. The gDNA sample was then tested for function-

ality in PCR reactions with positive and negative control

primers.

Pos_For: TGTAAAACGACGGCCAGTATCCCACTG

TTAGGAGAACTGC

Pos_Rev: CAGGAAACAGCTATGACCGGTCAGGA

AAGGGACACAGATA

Negative control primers were the forward and reverse

sequencing primers to lac-Z of M13.

M13_For: TGTAAAACGACGGCCAGT

M13_Rev: CAGGAAACAGCTATGACC

To each gDNA a trace amount of a plasmid with a

unique non-feline insert was added. This plasmid was

used as a biological barcode. The identifying inserts were

amplified and checked using the universal sequencing

primers above. The gDNAs were then diluted to a work-

ing concentration of 2.5 ng/ul.

Primer QC and Sequencing

Primers were obtained from Eurofins MWG Operon in

individual tubes and reconstituted to 100 uM in 10 mM

TRIS, pH 8.0, 0.1 mM EDTA. The primer pairs were

tested at a concentration of 0.16 uM each in 10 ul PCR

reactions containing iQ supermix (BioRad) and 5 ng of

control feline DNA (Tipper sample used for first round

and Speedy used for second round of QC). Cycling condi-

tions were: activate enzyme at 95C for 3 min, followed by

40 cycles of 95C for 15 sec, 60C for 15 sec, 72C for 60 sec,

Table 1: Total sequencing reads generated, SNP counts, SNP rate, and gender

Cat Name Cat Reads SNPs SNP rate (per x bases) Gender

Pixel Burmese 331,813 174,212 524 씸

Zeelie Persian 298,332 174,706 510 씸

Tipper Cornish Rex 272,607 164,054 503 씹

Scooter Ragdoll 298,409 168,455 510 씹

Speedy Domestic Shorthair 310,364 158,148 569 씸

Cocoa Siamese 293,712 152,984 516 씸

Nancy African wild cat 1,373,060 938,386 360 씸

Cinnamon* Abyssinian 8,186,934 1,323,794 1520 씸

* Cinnamon is listed but those reads were not part of the sequence generated from this effort.

Mullikin et al. BMC Genomics 2010, 11:406

http://www.biomedcentral.com/1471-2164/11/406

Page 4 of 8

then 72C for 5 min and hold at 10C. A 5 ul aliquot of the

PCR reaction was examined by agarose gel to look for

multiple or missing bands. The PCR products were then

diluted to 0.4 ng/ul and sequenced in 6 ul reactions using

M13 Universal forward and reverse primers and BDT

version 3.1 (Applied Biosystems) using standard ABI pro-

tocols. The reactions were sequenced on 3730 DNA

Sequencers (Applied Biosystems). The sequence traces

were then individually inspected for quality. Primer pairs

that resulted in high-quality traces were passed. Primers

not passing this round were retested using one additional

control DNA. Primers failing both rounds were not used.

PCR

PCR amplification of amplimers was performed in 10 ul

reactions in 384-well plates. The reaction conditions were

as described above.

Utility and Discussion

Distribution

The existing map of 327 k SNPs from the 1.9X Cinnamon

assembly includes large homozygous segments covering

approximately 60% of the cat genome (see Figure 5 in [4]).

Figure 1 shows the incidence of SNPs along cat chromo-

somes, where the number of SNPs is totaled within adja-

cent 1 Mb windows across each chromosome. In the new

assembly, there are only 8 windows with less than 100

SNPs per Mb across the autosomes (out of 2,569 win-

dows, not counting the unmapped regions). On chromo-

some X there are 34 windows with less than 100 SNPs out

of 143 windows, which suggests that X has relatively

lower heterozygosity due to at least three factors: 1) two

of the eight cats are male, thus reducing the number of X

chromosomes to fourteen instead of sixteen chromo-

somes for an autosomal locus 2) the effective population

size for the X chromosome is ¾ that of the autosomes,

and 3) male hemizygosity allows much stronger purifying

selection to occur around X-linked functional loci.

Figure 2 presents the SNP distribution as a fraction of

windows of a given size that has at least one SNP. Average

SNP spacing in base-pairs is shown in Table 2. For both

SNP counts and the average base-pair distance between

SNPs there are three SNP categories: A) SNPs excluding

those discovered only from Cinnamon and/or Nancy, B)

SNPs excluding those only discovered in Cinnamon, and

C) all SNPs (Table 2). The reason to count SNPs in these

categories is that for some purposes, SNPs derived from

the wildcat Nancy are not as useful as SNPs derived from

the domesticated cat breeds. Likewise, inclusion of the

highly variable densities of SNPs discovered from Cinna-

mon's much deeper sequencing would reflect this cat's

particular pattern of homozygous and heterozygous

regions. Thus, SNPs most useful for domestic cat associa-

tion screens would be those in category A which totals

964,285 SNPs (Table 2). An additional restriction of just

those category A SNPs that are mapped to a cat chromo-

some reduces this count to 844,313. However, relative to

the number of mapped bases, we still have a SNP on aver-

age spaced every 2000 bases, and even non-chromoso-

mally mapped SNPs will be useful once these segments

are mapped in improved future assemblies. Looking

again at Figure 2, over 80% of the 15 kb windows across

the genome contain at least one category A SNP.

With the genotype information, we can estimate how

many informative SNPs are available for a genotyping

study among cat breeds. There are 964,285 SNPs discov-

ered from at least one of the six breeds sequenced (cate-

gory A SNPs in Table 2). A SNP will not be informative

unless it has a minor allele frequency (MAF) of at least

5%. However, the MAFs of these SNPs are not known

until they have been genotyped. With the limited number

of genotypes given in Additional file 5 Table S3, we

observe that 24/92 (26%) are only observed in one cat,

and therefore could be quite rare variants. However, the

other 68/92 (74%) are seen in 2 or more cats, and are

therefore quite likely to be polymorphic, and thus infor-

mative. This informative fraction increases to 50/57

(88%) for SNPs detected from domestic cat samples.

Thus, about 849 k (88% of 964,285) SNPs remain that are

likely to have an informative MAF (> = 5%) among cat

breeds from this SNP resource.

In anticipation of using these SNPs for a genotyping

chip, one would like to select the SNPs at fairly even

intervals across the genome. If a SNP is selected every 15

kb from the category A SNPs, 80% of the genome can be

covered, requiring about 100,000 SNPs in a genotyping

array. The remaining 20% can be filled in with either

more widely spaced category A SNPs, or using additional

category B or C SNPs. Thus perhaps another 20 k geno-

type assays for the remaining regions would yield a 120 k

SNP chip. This is more than double the estimated num-

ber proposed for genome-wide association mapping as

reported previously[4].

Wildcat geographical origin

Finally we investigated the geographical origin of the

wildcat Nancy. With nearly 1.4 M reads available for this

wild cat, this does provide a valuable resource for study-

ing subspecies of Felis silvestris. Nancy was identified as

wild-caught 15 years ago in the Arabian Peninsula, so she

should geographically fall into the African subspecies

Felis silvestris lybica. A STRUCTURE analysis was com-

pleted using a genomic DNA sample from Nancy, and

genotypes from 18 of the 36 short tandem repeat (STR)

loci were used to resolve genomic distinctions among

wildcat subspecies effectively[3]. Nancy showed no evi-

dence of domestic cat introgression but instead clusters

with cats from southern Africa rather than the Near East,

Mullikin et al. BMC Genomics 2010, 11:406

http://www.biomedcentral.com/1471-2164/11/406

Page 5 of 8

Figure 1 SNP density in 1 Mb windows. Number of SNPs in windows of 1 Mb across each chromosome. Zero values are regions that do not have

mapped sequence, totaling about 416 Mb, thus the densities are undetermined.

Mullikin et al. BMC Genomics 2010, 11:406

http://www.biomedcentral.com/1471-2164/11/406

Page 6 of 8

so she likely descends form the subspecies Felis silvestris

cafra rather than F. s . l i by c a. For additional details of this

analysis, see Additional file 4 and Additional file 6 Figure

S1.

Conclusions

The primary goal of this effort is to expand feline SNP

resources, empowering future linkage and association

studies to map feline disease phenotypes to genomic loci.

The usefulness of such a resource is clearly proven in

many species, most notably for the human and canine

genome[18-20]. A recent review[1] highlights this need

for additional cat SNPs to aid the development of a geno-

type array chip of 100,000-150,000 SNPs. The SNPs dis-

covered by this effort should allow the design of such a

chip derived from the 964,285 available SNPs from the

domestic breed cats.

The value of the sequence generated by this effort will

become even greater as the publicly funded effort (see

Felis catus entries in [21]) to generate a high quality draft

of Cinnamon's genome is completed, hopefully within the

next year. With a high quality draft covering over 90% of

the cat genome, even more SNPs can be extracted from

the 3M reads from these seven cats, probably 25% more if

the assembled sequence increases to 2.5 Gb from the cur-

rent 2.0 Gb. This resource is further enhanced by having

all reads generated from paired-ends of fosmid templates.

The insert sizes are all about 40 kb in size with fairly tight

distributions of less than 10% coefficient-of-variation.

Given an increasingly higher quality assembly of Cinna-

mon, these paired-end reads could pinpoint structural

rearrangements among cat breeds using available meth-

ods[22].

Table 2: SNP counts and bases per SNP

SNP Counts Bases per SNP

Chromosome Non-N Bases A B C A B C

chrA1 164,170,763 77,824 151,421 240,266 2,110 1,084 683

chrA2 120,172,290 59,782 118,896 180,685 2,010 1,011 665

chrA3 109,094,838 55,010 110,129 192,266 1,983 991 567

chrB1 131,184,541 62,260 118,189 196,456 2,107 1,110 668

chrB2 101,553,943 49,898 95,152 161,014 2,035 1,067 631

chrB3 96,970,780 47,679 93,489 167,719 2,034 1,037 578

chrB4 108,425,265 53,709 104,123 170,033 2,019 1,041 638

chrC1 160,223,031 76,483 147,928 245,762 2,095 1,083 652

chrC2 107,198,630 53,479 98,226 163,338 2,004 1,091 656

chrD1 81,705,395 45,881 88,125 130,989 1,781 927 624

chrD2 67,243,459 37,877 71,535 134,493 1,775 940 500

chrD3 71,434,721 40,297 79,074 119,894 1,773 903 596

chrD4 67,338,148 34,295 65,034 88,513 1,963 1,035 761

chrE1 44,074,055 24,513 50,193 87,236 1,798 878 505

chrE2 50,431,338 27,836 56,317 94,520 1,812 895 534

chrE3 36,523,145 24,444 47,449 68,476 1,494 770 533

chrF1 45,373,584 24,292 45,507 83,877 1,868 997 541

chrF2 56,475,142 29,011 55,998 93,247 1,947 1,009 606

chrX 83,845,181 19,431 32,619 65,352 4,315 2,570 1,283

chrUnCf 217,904,849 101,325 198,166 323,505 2,151 1,100 674

chrUn 70,839,159 18,959 34,246 70,797 3,736 2,069 1,001

Total 1,992,182,257 964,285 1,861,816 3,078,438 2,066 1,070 647

ASNPs excluding Cinnamon and Nancy

BSNPs excluding Cinnamon

CAll SNPs

Mullikin et al. BMC Genomics 2010, 11:406

http://www.biomedcentral.com/1471-2164/11/406

Page 7 of 8

Availability and requirements

All SNPs and DIPs described are freely available through

dbSNP [5] and through the web browser interface [6].

Additional material

Authors' contributions

JCM drafted the manuscript. SWKA and MFL organized the sample collection.

HE, WFD, WT, DJS, AB, and MJR performed the fosmid library construction and

sequencing. PC, VD, ACY and NISC performed the PCR primer design and

sequencing. VD and CD did the STR typing and analysis to identify the likely

origin of the wildcat. LS, NFH and JCM per formed the sequence alignment and

SNP/DIP detection and validation review. JCM, JAB, SJO, DRS and MSA con-

ceived of the study, and participated in its design and coordination and helped

to draft the manuscript. All authors read and approved the final manuscript.

Acknowledgements

The DNA sample for the wildcat, Nancy, was kindly provided by Dr. Betsy L.

Dresser at the Audubon Nature Institute Center for Research of Endangered

Species, New Orleans, LA. We would like to thank Becky Stone (Pixel), Susy

Tejayadi (Tipper), Josie Kirk-Pagel (Zeelie), Rick Wienckowski (Cocoa), and Patti

Morelock (Scooter) for donating DNA samples to the project. This research was

supported in part by the Intramural Research Program of the National Human

Genome Research Institute, National Institutes of Health.

Author Details

1Genome Technology Branch and NIH Intramural Sequencing Center, National

Human Genome Research Institute, National Institutes of Health, Bethesda,

Maryland 20892, USA, 2Agencourt Bioscience Corporation, Beverly,

Massachusetts 01915, USA, 3Laboratory of Genomic Diversity, National Cancer

Institute, Frederick, Maryland 21702, USA and 4Hill's Pet Nutrition Inc., PO Box

1658, Topeka, KS 66601, USA

Addition al file 1 Whole genome assembly statistics. Table S1 compar-

ing the assembly statistics for this assembly and the previously published

1.9X assembly.

Addition al file 2 Assembly coverage of ENCODE regions. Table S2 list-

ing whole genome shotgun assembly coverage statistics relative to high

quality cat BAC clone assemblies of ENCODE regions for this assembly and

the previously published 1.9X assembly

Addition al file 3 Order and orientation across ENCODE regions. Plots

of assembly order and orientation across all 44 ENCODE regions.

Addition al file 4 Descriptions of how Additional files 3 and 6 were

generated. Methods for generating plots of order and orientation across

ENCODE regions and the STRUCTURE analysis of Nancy.

Addition al file 5 PCR validation results for 94 variants. Table S3 lists the

variants by position on the genome assembly, which alleles are expected,

and the alleles observed across 8 cats. Pink colored cells indicate the cat(s)

from which the alternate allele was discovered in the light whole genome

sequence.

Addition al file 6 STRUCTURE analysis results. Figure S1 shows results of

STRUCTURE analysis showing Nancy and representative known-origin wild-

cat individuals.

Received: 8 September 2009 Accepted: 24 June 2010

Published: 24 June 2010

This article is available from: http://www.biomedcentral.com/1471-2164/11/406© 2010 Mullikin et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.BMC Genomics 2010, 11:406

Figure 2 SNP distribution. The fraction of windows with one or more SNPs for a range of window sizes and three categories of SNPs: all SNPs, all

except Cinnamon and all except Cinnamon and Nancy.

Mullikin et al. BMC Genomics 2010, 11:406

http://www.biomedcentral.com/1471-2164/11/406

Page 8 of 8

References

1. O'Brien SJ, Johnson W, Driscoll C, Pontius J, Pecon-Slattery J, Menotti-

Raymond M: State of cat genomics. Trends Genet 2008, 24:268-279.

2. OMIA - Online Mendelian Inheritance in Animals [http://

omia.angis.org.au/]

3. Driscoll CA, Menotti-Raymond M, Roca AL, Hupe K, Johnson WE, Geffen E,

Harley EH, Delibes M, Pontier D, Kitchener AC, Yamaguchi N, O'brien SJ,

Macdonald DW: The Near Eastern origin of cat domestication. Science

2007:519-523.

4. Pontius JU, Mullikin JC, Smith DR, Agencourt Sequencing Team, Lindblad-

Toh K, Gnerre S, Clamp M, Chang J, Stephens R, Neelam B, Volfovsky N,

Schäffer AA, Agarwala R, Narfström K, Murphy WJ, Giger U, Roca AL,

Antunes A, Menotti-Raymond M, Yuhki N, Pecon-Slattery J, Johnson WE,

Bourque G, Tesler G, NISC Comparative Sequencing Program, O'Brien SJ:

Initial sequence and comparative analysis of the cat genome. Genome

Res 2007, 17:1675-1689.

5. dbSNP Summary [http://www.ncbi.nlm.nih.gov/SNP/snp_summary.cgi]

6. GARField: Genome Annotation Resource Field Felis catus [http://

lgd.abcc.ncifcrf.gov/cgi-bin/gbrowse/cat3x]

7. Mullikin JC, Ning Z: The phusion assembler. Genome Res 2003, 13:81-90.

8. ENCODE Project Consortium: Identification and analysis of functional

elements in 1% of the human genome by the ENCODE pilot project.

Nature 2007, 447:799-816.

9. Ning Z, Cox AJ, Mullikin JC: SSAHA: a fast search method for large DNA

databases. Genome Res 2001, 11:1725-1729.

10. Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Baldwin J, Linton L,

Lander ES: An SNP map of the human genome generated by reduced

representation shotgun sequencing. Nature 2000, 407:513-6.

11. Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated

sequencer traces using PHRED. I. Accuracy assessment. Genome Res

1998, 8:175-185.

12. Ewing B, Green P: Base-calling of automated sequencer traces using

PHRED. II. Error probabilities. Genome Res 1998, 8:186-194.

13. Chines PS, Swift AJ, Bonnycastle LL, Erdos MR, Mullikin JC, Collins FS:

PrimerTile: designing overlapping PCR primers for resequencing. The

American Society of Human Genetics: 2005; Salt Lake City, Utah.

14. Bhangale TR, Rieder MJ, Livingston RJ, Nickerson DA: Comprehensive

identification and characterization of diallelic insertion-deletion

polymorphisms in 330 human candidate genes. Hum Mol Genet 2005,

14:59-69.

15. Bhangale TR, Stephens M, Nickerson DA: Automating resequencing-

based detection of insertion- deletion polymorphisms. Nat Genet 2006,

38:1457-1462.

16. Stephens M, Sloan JS, Robertson PD, Scheet P, Nickerson DA: Automating

sequence-based detection and genotyping of SNPs from diploid

samples. Nat Genet 2006, 38:375-381.

17. Gordon D: Viewing and Editing Assembled Sequences Using Consed.

Current Protocols in Bioinformatics 2003. Section 11.2.1-11.2.43

18. The SNP Consortium: A map of human genome sequence variation

containing 1.42 million single nucleotide polymorphisms. Nature 2001,

409:928-933.

19. The International HapMap Consortium: A second generation human

haplotype map of over 3.1 million SNPs. Nature 2007, 449:851-861.

20. Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, Kamal M,

Clamp M, et al.: Genome sequence, comparative analysis and haplotype

structure of the domestic dog. Nature 2005, 438:803-819.

21. Approved Sequencing Targets [http://www.genome.gov/10002154]

22. Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T,

Hansen N, Teague B, Alkan C, Antonacci F, Haugen E, Zerr T, Yamada NA,

Tsang P, Newman TL, Tüzün E, Cheng Z, Ebling HM, Tusneem N, David R,

Gillett W, Phelps KA, Weaver M, Saranga D, Brand A, Tao W, Gustafson E,

McKernan K, Chen L, Malig M, Smith JD, Korn JM, McCarroll SA, Altshuler

DA, Peiffer DA, Dorschner M, Stamatoyannopoulos J, Schwartz D,

Nickerson DA, Mullikin JC, Wilson RK, Bruhn L, Olson MV, Kaul R, Smith DR,

Eichler EE: Mapping and sequencing of structural vari ation from eight

human genomes. Nature 2008, 453:56-64.

doi: 10.1186/1471-2164-11-406

Cite this article as: Mullikin et al., Light whole genome sequence for SNP

discovery across domestic cat breeds BMC Genomics 2010, 11:406

Additional file 6

Data

June 2010

James C Mullikin · Nancy F Hansen · Lei Shen · Heather Ebling · Jeffrey A Brockman

Download

Additional file 3

Data

June 2010

James C Mullikin · Nancy F Hansen · Lei Shen · Heather Ebling · Jeffrey A Brockman

Download

Additional file 2

Data

June 2010

James C Mullikin · Nancy F Hansen · Lei Shen · Heather Ebling · Jeffrey A Brockman

Download

Additional file 5

Data

June 2010

James C Mullikin · Nancy F Hansen · Lei Shen · Heather Ebling · Jeffrey A Brockman

Download

Additional file 4

Data

June 2010

James C Mullikin · Nancy F Hansen · Lei Shen · Heather Ebling · Jeffrey A Brockman

Download

Additional file 1

Data

June 2010

James C Mullikin · Nancy F Hansen · Lei Shen · Heather Ebling · Jeffrey A Brockman

Download

A Deletion in GDF7 is Associated with a Heritable Forebrain Commissural Malformation Concurrent with Ventriculomegaly and Interhemispheric Cysts in Cats

Article

Full-text available

Jun 2020

An inherited neurologic syndrome in a family of mixed-breed Oriental cats has been characterized as forebrain commissural malformation, concurrent with ventriculomegaly and interhemispheric cysts. However, the genetic basis for this autosomal recessive syndrome in cats is unknown. Forty-three cats were genotyped on the Illumina Infinium Feline 63K iSelect DNA Array and used for analyses. Genome-wide association studies, including a sib-transmission disequilibrium test and a case-control association analysis, and homozygosity mapping, identified a critical region on cat chromosome A3. Short-read whole genome sequencing was completed for a cat trio segregating with the syndrome. A homozygous 7 bp deletion in growth differentiation factor 7 (GDF7) (c.221_227delGCCGCGC [p.Arg74Profs]) was identified in affected cats, by comparison to the 99 Lives Cat variant dataset, validated using Sanger sequencing and genotyped by fragment analyses. This variant was not identified in 192 unaffected cats in the 99 Lives dataset. The variant segregated concordantly in an extended pedigree. In mice, GDF7 mRNA is expressed within the roof plate when commissural axons initiate ventrally-directed growth. This finding emphasized the importance of GDF7 in the neurodevelopmental process in the mammalian brain. A genetic test can be developed for use by cat breeders to eradicate this variant.

Génomique féline, progrès récents et intérêts en recherche et en médecine vétérinaire

Article

Jan 2021
B ACAD VET FRANCE

Marie Abitbol

Le séquençage et l'annotation du génome du chat domestique (Felis catus), initialement publiés en 2007, étaient de médiocre qualité. Ils ont depuis été amendés puis complétés par le développement d'outils de génomique qui, combinés à des stratégies génétiques en constante évolution, permettent de rechercher gènes et variants d'intérêt avec efficacité. Ces progrès bénéficient à la médecine vétérinaire féline, mais font également du chat domestique un modèle pour l'étude de la domestication, l'évolution des espèces, mais aussi un modèle biomédical en constante ascension. Progressivement, le chat qui a déjà conquis le cœur des foyers français et du web mondial, est en train de conquérir le cœur des généticiens.

Chapter 11d. Advances in cat cloning and well-being of cloned cats

Chapter

Full-text available

Aug 2022

Cats are the most popular pets worldwide and the most extensively used model to understand human diseases [1– 4]. Cats make a unique contribution to biomedical science and serve as a significant study model in a variety of fields, including neurology related to locomotion and spinal damage, retrovirus and zoonotic disease studies, and the development of therapeutic techniques for inherited disorders. There are human homologs for almost 90% of cat genes, and fewer chromosomal rearrangements in cats than in mice. The intermediate size, prolific breeding capacity, likeness of systems to humans, abundance, low cost, and neurobehavioral complexity make cats useful in research including neuroscience and a variety of genetic, ophthalmologic, and infectious disorders. Cloning a feline using somatic cell nuclear transfer (SCNT) is beneficial for generating disease models and averting the extinction of endangered felids [5–7]. The world’s first cat cloning experiment was done by Taeyoung Shin around 2 decades ago, and successfully produced CC (“Carbon Copy” or “Copycat”) via SCNT [8]. SCNT is a robust technology but a contentious procedure because of risks such as high embryonic losses, placental abnormalities, postnatal mortality, and poor survival (1–5%) for more than a short period [9–11]. Felid cloning has disadvantages as well, such as the low in vitro production of cat-cloned embryos [12, 13], complications in the surrogacy process [8], high rate of miscarriage [14], and very low birth rate of healthy clones [8, 15]. Postsurvival health issues have also been associated with cloned animals, such as defects in vital organs and immune system problems [16]. Although there have been few reports of developmental problems in cloned cats after birth, cat cloning research must nevertheless be handled with caution. The information provided in this chapter covers the advances in cat cloning methods and available treatment options for the care of cloned cats proposed to the IETS Health and Safety Advisory Committee.

First genome-wide association study investigating blood pressure and renal traits in domestic cats

Article

Full-text available

Feb 2022

Hypertension (HTN) and chronic kidney disease (CKD) are common in ageing cats. In humans, blood pressure (BP) and renal function are complex heritable traits. We performed the first feline genome-wide association study (GWAS) of quantitative traits systolic BP and creatinine and binary outcomes HTN and CKD, testing 1022 domestic cats with a discovery, replication and meta-analysis design. No variants reached experimental significance level in the discovery stage for any phenotype. Follow up of the top 9 variants for creatinine and 5 for systolic BP, one SNP reached experimental-wide significance for association with creatinine in the combined meta-analysis (chrD1.10258177; P = 1.34 × 10–6). Exploratory genetic risk score (GRS) analyses were performed. Within the discovery sample, GRS of top SNPs from the BP and creatinine GWAS show strong association with HTN and CKD but did not validate in independent replication samples. A GRS including SNPs corresponding to human CKD genes was not significant in an independent subset of cats. Gene-set enrichment and pathway-based analysis (GSEA) was performed for both quantitative phenotypes, with 30 enriched pathways with creatinine. Our results support the utility of GWASs and GSEA for genetic discovery of complex traits in cats, with the caveat of our findings requiring validation.

Genome-Wide SNPs Clarify a Complex Radiation and Support Recognition of an Additional Cat Species

Article

Full-text available

Jul 2021

Phylogenetic reconstruction and species delimitation are often challenging in the case of recent evolutionary radiations, especially when post-speciation gene flow is present. Leopardus is a Neotropical cat genus that has a long history of recalcitrant taxonomic problems, along with both ancient and current episodes of interspecies admixture. Here we employ genome-wide SNP data from all presently recognized Leopardus species, including several individuals from the tigrina complex (representing L. guttulus and two distinct populations of L. tigrinus), to investigate the evolutionary history of this genus. Our results reveal that the tigrina complex is paraphyletic, containing at least three distinct species. While one can be assigned to L. guttulus, the other two remain uncertain regarding their taxonomic assignment. Our findings indicate that the ‘tigrina’ morphology may be plesiomorphic within this group, which has led to a longstanding taxonomic trend of lumping these poorly known felids into a single species.

Selection Signatures Reveal Candidate Genes for the Cornish Rex Breed-Specific Phenotype

Article

Full-text available

Mar 2024

Many coat color, behavioral and morphological traits are specific and fixed across cat breeds, with several variants influencing these traits being common among different breeds. In the domestic cat, rexoid mutations have been documented in several breeds. In the Cornish Rex, four bp deletion in the LPAR6 gene has been found to cause a frame shift and a premature stop codon. In addition to the rexoid coat, Cornish Rex cats also have a characteristic head, ear shape and body type. Analysis of the selection signatures in the Cornish Rex genome revealed several regions that are under selective pressure. One of these is located in CFA B4, in the region where the ALX1 gene is located. The ALX1 gene in Burmese cats disrupts the cranial morphogenesis and causes brachycephaly in the heterozygous state. In our study, we confirmed the presence of a deletion in LPAR6 in 20 Cornish Rex and in four F1 hybrids between Cornish Rex and domestic cat. However, we did not confirm the presence of the deletion in ALX1 in Cornish Rex cats. Genome-wide selection signature analysis was performed using ROH islands and integrated haplotype score (iHS) statistics based on publicly available SNP array data of 11 Cornish Rex cats. The selection signatures were detected on chromosomes A1, A3, C2, B1, B4 and D1.

Genetische Krankheiten

Chapter

Jul 2023

Catherine A Outerbridge

The Modern Classification of Felidae --Combining Molecular Phylogeny Framework and Fossil Evidence

Article

Full-text available

Mar 2023

[Objectives] The evolutionary relationship of Felidae has been controversial. As a result, there are highly divergent views on classification of cats at the generic level. The emerging phylogeny using gene or genomic data provides a new viewpoint to understand the evolution of cats. [Methods] This paper reviews the molecular phylogeny of Felidae over recent years, and we deduce the evolutionary history of Felidae in combination twith fossil records. The phylogenies by Johnson et al. (2006) and Li et al. (2016) are used as the core, corroborated by specific fossil records. [Results] Recent molecular phylogenies propose living cats radiated in the Late Miocene and diverged into eight branches. Though the divergent age of these branches largely coincides with fossil evidence, the inferred origination area of some branches is not supported by fossil records. Combing the evidence from these fossil records, we propose that the most living cat lineages are likely originated in Asia except for the Caracal lineage and Leopardus lineage, and living cats experienced at least 30 intercontinental migrations in the process of evolution, far more than those inferred from only molecular phylogeny. [Conclusion] Based on the study of volutionary history and morphology, we suggest that all the living cats should be classified into Felinae, and subdivided into 15 genera and 40 species

Genetic Diseases

Chapter

Jun 2020

Catherine A Outerbridge

Breed predispositions are documented for a variety of feline skin diseases, and there are diverse reports describing novel skin diseases where affected individuals within a litter had similar congenital skin changes. Both presentations increase the suspicion of a possible hereditary component to the identified skin disease. Recognized feline genodermatoses represent inherited skin disorders that follow a single-gene mode of inheritance (Leeb et al., Vet Dermatol 28:4–9. https://doi.org/10.1016/j.mcp.2012.04.004, 2017). These diseases are rare in their occurrence, but the number identified has the potential to increase, as available diagnostic tools for evaluating genetic diseases have advanced and single nucleotide polymorphism (SNP) mapping of the feline genome has improved (Lyons, Mol Cell Probes. 26:224–30. https://doi.org/10.1016/j.mcp.2012.04.004, 2012; Mullikin et al., BMC Genomics. 11:406. http://www.biomedcentral.com/1471-2164/11/406, 2010). This chapter will discuss some examples of feline genodermatoses that can affect the epidermis, the dermoepidermal junction, the hair follicles or hair shafts, the dermis, and pigmentation. Discussion about the genetics of feline coat color and coat length occurs in Chapter, Coat Color Genetics.

Genetics of Hypertension: The Human and Veterinary Perspectives

Chapter

Jan 2020

Rosanne E Jepson

The health burden associated with hypertension in the world is very large. Current estimates suggest that there are over 1 billion individuals with hypertension worldwide with the World Health Organisation estimating that this will increase to 1.5 billion by 2025. Between 35% and 45% of individuals over 25 years demonstrate hypertension with prevalence varying with geographical location and socioeconomic status. In humans, a continuous and incremental risk exists between blood pressure (BP) and cardiovascular disease, stroke and kidney disease. The significant health and economic impact of hypertension has, therefore, driven the requirement for a better understanding of factors, including genetic factors, that may contribute to the development of hypertension. An understanding of these genetic associations might not only improve individual and population risk prediction but, more importantly, may identify novel pathophysiological pathways that are involved in BP regulation and which may also act as pathway targets for new anti-hypertensive drug development assisting with personalisation of anti-hypertensive therapy.

Genome sequence, comparative analysis and haplotype structure of the domestic dog

Article

Full-text available

Dec 2005

Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.

Fine-Scale Mapping and Sequencing of Structural Variation from Eight Human Genomes

Article

Full-text available

May 2008

Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale—particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence map of human structural variation—a standard for genotyping platforms and a prelude to future individual genome sequencing projects.

Base-Calling of Automated Sequencer Traces Using Phred. I. Accuracy Assessment

Article

Full-text available

Mar 1998
GENOME RES

The availability of massive amounts of DNA sequence information has begun to revolutionize the practice of biology. As a result, current large-scale sequencing output, while impressive, is not adequate to keep pace with growing demand and, in particular, is far short of what will be required to obtain the 3-billion-base human genome sequence by the target date of 2005. To reach this goal, improved automation will be essential, and it is particularly important that human involvement in sequence data processing be significantly reduced or eliminated. Progress in this respect will require both improved accuracy of the data processing software and reliable accuracy measures to reduce the need for human involvement in error correction and make human review more efficient. Here, we describe one step toward that goal: a base-calling program for automated sequencer traces, phred, with improved accuracy. phred appears to be the first base-calling program to achieve a lower error rate than the ABI software, averaging 40%-50% fewer errors in the data sets examined independent of position in read, machine running conditions, or sequencing chemistry.

A second generation human haplotype map of over 3.1 million SNPs

Article

Jan 2007

Identification and Analysis of Functional Elements in 1% of The Human Genome by The ENCODE Pilot Project

Article

Jan 2007

Genome sequence, comparative analysis and haplotype structure of the domestic dog

Article

Jan 2005

A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms

Article

Jan 2001

A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms

Article

Jan 2001
NATURE

The International SNP Map Working Group

We describe a map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.

Current Protocols in Bioinformatics

Article

Jan 2002

Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities

Article

Mar 1998
GENOME RES

Elimination of the data processing bottleneck in high-throughput sequencing will require both improved accuracy of data processing software and reliable measures of that accuracy. We have developed and implemented in our base-calling program phred the ability to estimate a probability of error for each base-call, as a function of certain parameters computed from the trace data. These error probabilities are shown here to be valid (correspond to actual error rates) and to have high power to discriminate correct base-calls from incorrect ones, for read data collected under several different chemistries and electrophoretic conditions. They play a critical role in our assembly program phrap and our finishing program consed.

Light whole genome sequence for SNP discovery across domestic cat breeds

Abstract and Figures

Supplementary resources (6)

Recommended publications

Parasite prevalence in free-ranging farm cats, Felis silvestris catus

Diseases of the European wildcat (Felis silvestris Schreber, 1777) in Great Britain

Viral infections in cats in Wroclaw city

Evaluation of the association of Bartonella species, feline herpesvirus 1, feline calicivirus, felin...