Automating sequence-based detection and genotyping of SNPs from diploid samples

Department of Statistics, University of Washington, Seattle, Washington 98195, USA.
Nature Genetics (Impact Factor: 29.35). 04/2006; 38(3):375-81. DOI: 10.1038/ng1746
Source: PubMed


The detection of sequence variation, for which DNA sequencing has emerged as the most sensitive and automated approach, forms the basis of all genetic analysis. Here we describe and illustrate an algorithm that accurately detects and genotypes SNPs from fluorescence-based sequence data. Because the algorithm focuses particularly on detecting SNPs through the identification of heterozygous individuals, it is especially well suited to the detection of SNPs in diploid samples obtained after DNA amplification. It is substantially more accurate than existing approaches and, notably, provides a useful quantitative measure of its confidence in each potential SNP detected and in each genotype called. Calls assigned the highest confidence are sufficiently reliable to remove the need for manual review in several contexts. For example, for sequence data from 47-90 individuals sequenced on both the forward and reverse strands, the highest-confidence calls from our algorithm detected 93% of all SNPs and 100% of high-frequency SNPs, with no false positive SNPs identified and 99.9% genotyping accuracy. This algorithm is implemented in a software package, PolyPhred version 5.0, which is freely available for academic use.

  • Source
    • "Chromatograms were analyzed with Phred (version: 0.020425.c) (Ewing and Green, 1998; Ewing et al., 1998); Phrap and Cross match (version 0.990319); PolyPhred (version 6.18, April 29, 2009) (Nickerson et al., 1997; Stephens et al., 2006), and Consed (version 20.0) (Gordon et al., 1998). For comparison of the patient sequences with an NF1 reference sequence, a pseudochromatogram was generated with Sudophred (version 6.18.; April 29, 2009), using positions 17: 31,094,927 to 31,382,116 of the human genome (Ensembl release 76, GRCh38) ( "
    [Show abstract] [Hide abstract]
    ABSTRACT: Neurofibromatosis type I is an autosomal dominant disease with complete penetrance and variable age-dependent expressivity. It is caused by heterozygous mutations in neurofibromin 1 (NF1). These occur throughout the length of the gene, with no apparent hotspots. Even though some mutations have been found repeatedly, most have been observed only once. This, along with the variable expressivity, has made it difficult to establish genotype-phenotype correlations. Here, we report the clinical and molecular characteristics of four pediatric patients with neurofibromatosis type I. Patients were clinically examined and DNA was extracted from peripheral blood. The whole coding sequence of NF1, plus flanking intronic regions, was examined by Sanger sequencing, and four frameshift mutations were identified. The mutation c.3810_3820delCATGCAGACTC was observed in a familial case. This mutation occurred within a sequence comprising two 8-bp direct repeats (GCAGACTC) separated by a CAT trinucleotide, with the deletion leading to the loss of the trinucleotide and the 8-bp repeat following it. The deletion might have occurred due to misalignment of the direct repeats during cell division. In the mutation c.5194delG, the deleted G is nested between two separate mononucleotide tracts (AAAGTTT), which could have played a role in creating the deletion. The other two mutations reported here are c.4076_4077insG, and c.3193_3194insA. All four mutations create premature stop codons. In three mutations, the consequence is predicted to be loss of the GAP-related, Sec14 homology, and pleckstrin homology-like domains; while in the fourth, only the latter two domains would be lost.
    Full-text · Article · Sep 2015 · Genetics and molecular research: GMR
  • Source
    • "Sequences were performed through a partnership with La Plate-Forme Séquençage et Génomique (Institut Cochin, Paris, Electrophoregrams were analyzed using PolyPhred 5.04 software (Stephens et al. 2006). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The association of E670G (rs505151) polymorphism in PCSK9 gene with an increased risk of coronary artery disease (CAD) and ischemic stroke (IS) was reported in previous studies. We investigated the effect of the E670G (rs505151) on the risk of CAD and IS in a Tunisian cohort. Genotyping of the PCSK9 E670G was performed using polymerase chain reaction (PCR)-based restriction fragment length polymorphism (RFLP) and then confirmed by direct sequencing. The frequency of the 670G allele was significantly higher in the CAD than in the no-CAD subgroup (0.132 vs. 0.068, p = 0.030). As expected, the incidence of E670G was significantly important in IS subgroup than control group (0.122 vs. 0.073, p = 0.032). Furthermore in CAD patients, the 670G carriers showed significantly increased plasma total cholesterol and LDL-cholesterol levels compared to E670 carriers (6.78 [6.47-7.00] vs. 4.92 [4.02-5.46] mmol/l, p < 0.0001 and 4.60 [4.00-5.04] vs. 3.00 [2.22-3.70] mmol/l p = 0.001, respectively). The risk and severity of CAD were significantly increased in 670G carriers between no-CAD subgroup and CAD patients presenting a stenosis ≥50 % in two or three major coronary arteries (0.068 vs. 0.198, p = 0.001, OR = 3.39 [1.55-7.37]). The E670G polymorphism of the PCSK9 gene is mainly associated with a increased risk and severity of CAD and IS in Tunisian cohort.
    Full-text · Article · Mar 2014 · Journal of Molecular Neuroscience
  • Source
    • "Cultivated barley is highly self-fertilizing, resulting in low levels of heterozygosity; PCR products can be sequenced directly, and generally yield a single haplotype. Polyphred (Nickerson et al. 1997; Stephens et al. 2006) was used to screen for potentially heterozygous sites. Allele-specific PCR was used to experimentally resolve haplotypes in individuals that were found to be heterozygous (Gordon et al. 1998; Morrell et al. 2006; Chen et al. 2010). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The levels of diversity and extent of linkage disequilibrium in cultivated species are largely determined by diversity in their wild progenitors. We report a comparison of nucleotide sequence diversity in wild and cultivated barley (Hordeum vulgare ssp. spontaneum and ssp. vulgare) at 7 nuclear loci totaling 9296bp, using sequence from Hordeum bulbosum to infer the ancestral state of mutations. The sample includes 36 accessions of cultivated barley, including 23 landraces (cultivated forms not subject to modern breeding) and 13 cultivated lines and genetic stocks compared to either 25 or 45 accessions of wild barley for the same loci. Estimates of nucleotide sequence diversity indicate that landraces retain >80% of the diversity in wild barley. The primary population structure in wild barley, which divides the species into eastern and western populations, is reflected in significant differentiation at all loci in wild accessions and at 3 of 7 loci in landraces. "Oriental" landraces have slightly higher diversity than "Occidental" landraces. Genetic assignment suggests more admixture from Occidental landraces into Oriental landraces than the converse, which may explain this difference. Based on θπ for silent sites, modern western cultivars have ~73% of the diversity found in landraces and ~71% of the diversity in wild barley.
    Full-text · Article · Dec 2013 · The Journal of heredity
Show more