Automating sequence-based detection and genotyping of SNPs from diploid samples

Department of Statistics, University of Washington, Seattle, Washington 98195, USA.
Nature Genetics (Impact Factor: 29.35). 04/2006; 38(3):375-81. DOI: 10.1038/ng1746
Source: PubMed


The detection of sequence variation, for which DNA sequencing has emerged as the most sensitive and automated approach, forms the basis of all genetic analysis. Here we describe and illustrate an algorithm that accurately detects and genotypes SNPs from fluorescence-based sequence data. Because the algorithm focuses particularly on detecting SNPs through the identification of heterozygous individuals, it is especially well suited to the detection of SNPs in diploid samples obtained after DNA amplification. It is substantially more accurate than existing approaches and, notably, provides a useful quantitative measure of its confidence in each potential SNP detected and in each genotype called. Calls assigned the highest confidence are sufficiently reliable to remove the need for manual review in several contexts. For example, for sequence data from 47-90 individuals sequenced on both the forward and reverse strands, the highest-confidence calls from our algorithm detected 93% of all SNPs and 100% of high-frequency SNPs, with no false positive SNPs identified and 99.9% genotyping accuracy. This algorithm is implemented in a software package, PolyPhred version 5.0, which is freely available for academic use.

10 Reads
  • Source
    • "Chromatograms were analyzed with Phred (version: 0.020425.c) (Ewing and Green, 1998; Ewing et al., 1998); Phrap and Cross match (version 0.990319); PolyPhred (version 6.18, April 29, 2009) (Nickerson et al., 1997; Stephens et al., 2006), and Consed (version 20.0) (Gordon et al., 1998). For comparison of the patient sequences with an NF1 reference sequence, a pseudochromatogram was generated with Sudophred (version 6.18.; April 29, 2009), using positions 17: 31,094,927 to 31,382,116 of the human genome (Ensembl release 76, GRCh38) ( "
    [Show abstract] [Hide abstract]
    ABSTRACT: Neurofibromatosis type I is an autosomal dominant disease with complete penetrance and variable age-dependent expressivity. It is caused by heterozygous mutations in neurofibromin 1 (NF1). These occur throughout the length of the gene, with no apparent hotspots. Even though some mutations have been found repeatedly, most have been observed only once. This, along with the variable expressivity, has made it difficult to establish genotype-phenotype correlations. Here, we report the clinical and molecular characteristics of four pediatric patients with neurofibromatosis type I. Patients were clinically examined and DNA was extracted from peripheral blood. The whole coding sequence of NF1, plus flanking intronic regions, was examined by Sanger sequencing, and four frameshift mutations were identified. The mutation c.3810_3820delCATGCAGACTC was observed in a familial case. This mutation occurred within a sequence comprising two 8-bp direct repeats (GCAGACTC) separated by a CAT trinucleotide, with the deletion leading to the loss of the trinucleotide and the 8-bp repeat following it. The deletion might have occurred due to misalignment of the direct repeats during cell division. In the mutation c.5194delG, the deleted G is nested between two separate mononucleotide tracts (AAAGTTT), which could have played a role in creating the deletion. The other two mutations reported here are c.4076_4077insG, and c.3193_3194insA. All four mutations create premature stop codons. In three mutations, the consequence is predicted to be loss of the GAP-related, Sec14 homology, and pleckstrin homology-like domains; while in the fourth, only the latter two domains would be lost.
    Genetics and molecular research: GMR 09/2015; 14(3):8326-8337. DOI:10.4238/2015.July.27.21 · 0.78 Impact Factor
  • Source
    • "Cultivated barley is highly self-fertilizing, resulting in low levels of heterozygosity; PCR products can be sequenced directly, and generally yield a single haplotype. Polyphred (Nickerson et al. 1997; Stephens et al. 2006) was used to screen for potentially heterozygous sites. Allele-specific PCR was used to experimentally resolve haplotypes in individuals that were found to be heterozygous (Gordon et al. 1998; Morrell et al. 2006; Chen et al. 2010). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The levels of diversity and extent of linkage disequilibrium in cultivated species are largely determined by diversity in their wild progenitors. We report a comparison of nucleotide sequence diversity in wild and cultivated barley (Hordeum vulgare ssp. spontaneum and ssp. vulgare) at 7 nuclear loci totaling 9296bp, using sequence from Hordeum bulbosum to infer the ancestral state of mutations. The sample includes 36 accessions of cultivated barley, including 23 landraces (cultivated forms not subject to modern breeding) and 13 cultivated lines and genetic stocks compared to either 25 or 45 accessions of wild barley for the same loci. Estimates of nucleotide sequence diversity indicate that landraces retain >80% of the diversity in wild barley. The primary population structure in wild barley, which divides the species into eastern and western populations, is reflected in significant differentiation at all loci in wild accessions and at 3 of 7 loci in landraces. "Oriental" landraces have slightly higher diversity than "Occidental" landraces. Genetic assignment suggests more admixture from Occidental landraces into Oriental landraces than the converse, which may explain this difference. Based on θπ for silent sites, modern western cultivars have ~73% of the diversity found in landraces and ~71% of the diversity in wild barley.
    The Journal of heredity 12/2013; 105(2). DOI:10.1093/jhered/est083 · 2.09 Impact Factor
  • Source
    • "New primers (Forward Genome Walker [FGW]; Reverse Genome Walker [RGW]) were designed from the newly added microsatellite flanking sequence (Table 2). These primers were then used to amplify DNA of 'Pound 7', 'P 30', 'PA 7', and nine other cacao genotypes, each representing the major genetic groups of cacao as described by Motamayor et al. (2008) The amplified products were sequenced with an ABI 3730 genetic analyzer (Applied Biosystems, Foster City, CA, USA) and aligned with Phred, Phrap, Polyphred, and Consed software for sequence comparison and SNP detection (Ewing and Green 1998; Ewing et al. 1998; Gordon et al. 1998; Stephens et al. 2006). Each SNP site detected was named after the mTcCIR microsatellite marker (locus) from which it was identified, followed by the distance in nucleotides of the SNP from the 5' end of the sequence. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The majority of the world's cacao for chocolate manufacture is produced in West Africa. Cocoa breeding programs in West Africa need genetic markers to reduce the time needed for improving cocoa by screening seedlings for the presence of the markers rather than mature plants for the phenotypic traits (i.e., marker-assisted selection [MAS]). For MAS to be successful, the breeder must have both access to markers linked to desired traits and a convenient marker-assay system that can be performed locally. In this study, microsatellite markers that flanked disease resistance quantitative trait loci (QTL) but could not be assayed conveniently in West Africa were converted using a genome walking method into single nucleotide polymorphism (SNP) markers that could be assayed locally. The SNP and microsatellite markers were equally effective in identifying off-types in two different mapping populations of cacao. Also, SNPs cast doubt on whether all microsatellite markers are identical by descent.
    Journal of Crop Improvement 03/2013; 27(2). DOI:10.1080/15427528.2012.752773
Show more

Preview (3 Sources)

10 Reads
Available from