Automating sequence-based detection and genotyping of SNPs from diploid samples

Department of Statistics, University of Washington, Seattle, Washington 98195, USA.
Nature Genetics (Impact Factor: 29.65). 04/2006; 38(3):375-81. DOI: 10.1038/ng1746
Source: PubMed

ABSTRACT The detection of sequence variation, for which DNA sequencing has emerged as the most sensitive and automated approach, forms the basis of all genetic analysis. Here we describe and illustrate an algorithm that accurately detects and genotypes SNPs from fluorescence-based sequence data. Because the algorithm focuses particularly on detecting SNPs through the identification of heterozygous individuals, it is especially well suited to the detection of SNPs in diploid samples obtained after DNA amplification. It is substantially more accurate than existing approaches and, notably, provides a useful quantitative measure of its confidence in each potential SNP detected and in each genotype called. Calls assigned the highest confidence are sufficiently reliable to remove the need for manual review in several contexts. For example, for sequence data from 47-90 individuals sequenced on both the forward and reverse strands, the highest-confidence calls from our algorithm detected 93% of all SNPs and 100% of high-frequency SNPs, with no false positive SNPs identified and 99.9% genotyping accuracy. This algorithm is implemented in a software package, PolyPhred version 5.0, which is freely available for academic use.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Markov Chain is very effective in prediction basically in long data set. In DNA sequencing it is always very important to find the existence of certain nucleotides based on the previous history of the data set. We imposed the Chapman Kolmogorov equation to accomplish the task of Markov Chain. Chapman Kolmogorov equation is the key to help the address the proper places of the DNA chain and this is very powerful tools in mathematics as well as in any other prediction based research. It incorporates the score of DNA sequences calculated by various techniques. Our research utilize the fundamentals of Warshall Algorithm (WA) and Dynamic Programming (DP) to measures the score of DNA segments. The outcomes of the experiment are that Warshall Algorithm is good for small DNA sequences on the other hand Dynamic Programming are good for long DNA sequences. On the top of above findings, it is very important to measure the risk factors of local sequencing during the matching of local sequence alignments whatever the length.
    Interdisciplinary Sciences Computational Life Sciences 08/2014; DOI:10.1007/s12539-013-0042-7 · 0.67 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective Polycystic ovary syndrome (PCOS) is a common endocrinologic disease in women. In the present study, we examined the relationship of the IRS-1 Gly972Arg and IRS-2 Gly1057Asp polymorphisms to PCOS and phenotypic features of PCOS in a Chinese population from Taiwan.Materials and methodsA total of three hundred and forty genetically unrelated women with age from 18 to 45 years, including two hundred and forty-eight PCOS patients and ninety-two control subjects, were recruited. The hormone and biochemical measurements were evaluated for each woman. Genotyping of the IRS-1 gene Gly972Arg variant and IRS-2 gene Gly1057Asp variant were performed by using direct sequencing.ResultsWe found significant difference in the genotypic distribution of IRS-2 gene Gly1057Asp between the PCOS group and the control group (p¿=¿0.004). The carriers of homozygous IRS-2 Asp had an increased risk of PCOS compared with the carriers of Gly/Gly (OR 4.08, 95% C.I. 1.60-10.41, p¿=¿0.003). No significant difference in genotype frequencies of IRS-1 Gly972Arg was observed between two groups. We further investigated the effect of interaction of IRS-1 Gly972Arg and IRS-2 Gly1057Asp on the risk of PCOS and found that women carried IRS-1 Gly/Arg or IRS-2 Asp/Asp or carried both IRS-1 Gly/Arg and IRS-2 Asp/Asp had a much higher risk of PCOS compared with their counterpart, respectively (OR 2.49, 95% C.I. 1.16-5.37, p¿=¿0.019; OR 11.87, 95% C.I. 1.21-116.84, p¿=¿0.034). We further found, the non-obese PCOS patients carried significantly higher frequency of IRS-2 Asp/Asp as compared with the control group (p¿=¿0.004). A significant effect of interaction of carrying both IRS-1 Gly/Arg and IRS-2 Asp/Asp was also observed in the non-obese PCOS patients (p¿=¿0.003), but not in the obese PCOS patients.Conclusions In this study, we found significant association of the variant of IRS-2 gene as well as the interaction of IRS-1 and IRS-2 genes with PCOS, especially in non-obese women. Women with IRS-2 homozygous Asp variant may be considered as a risk factor for PCOS that needs early detection to prevent further complication in the Chinese population from Taiwan.
    Journal of Ovarian Research 10/2014; 7(1):92. DOI:10.1186/s13048-014-0092-4 · 2.03 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance. When MI occurs early in life, genetic inheritance is a major component to risk. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families, whereas common variants at more than 45 loci have been associated with MI risk in the population. Here we evaluate how rare mutations contribute to early-onset MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes in which rare coding-sequence mutations were more frequent in MI cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare non-synonymous mutations were at 4.2-fold increased risk for MI; carriers of null alleles at LDLR were at even higher risk (13-fold difference). Approximately 2% of early MI cases harbour a rare, damaging mutation in LDLR; this estimate is similar to one made more than 40 years ago using an analysis of total cholesterol16. Among controls, about 1 in 217 carried an LDLR coding-sequence mutation and had plasma LDL cholesterol > 190 mg dl−1. At apolipoprotein A-V (APOA5), carriers of rare non-synonymous mutations were at 2.2-fold increased risk for MI. When compared with non-carriers, LDLR mutation carriers had higher plasma LDL cholesterol, whereas APOA5 mutation carriers had higher plasma triglycerides. Recent evidence has connected MI risk with coding-sequence mutations at two genes functionally related to APOA5, namely lipoprotein lipase15, 17 and apolipoprotein C-III. Combined, these observations suggest that, as well as LDL cholesterol, disordered metabolism of triglyceride-rich lipoproteins contributes to MI risk.
    Nature 12/2014; advance online publication. DOI:10.1038/nature13917 · 42.35 Impact Factor

Preview (2 Sources)

Available from