Article
SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, United Kingdom.
Genome Research (impact factor:
13.61).
10/2010;
21(6):952-60.
DOI:10.1101/gr.113084.110
Source: PubMed
-
Citations (0)
- Cited In (6)
-
Article: Efficiency and power as a function of sequence coverage, SNP array density, and imputation.
[show abstract] [hide abstract]
ABSTRACT: High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (MAF < 5%), when low coverage sequence reads are added to dense genome-wide SNP arrays--the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling.PLoS Computational Biology 07/2012; 8(7):e1002604. · 5.22 Impact Factor -
Article: Sniper: improved SNP discovery by multiply mapping deep sequenced reads.
[show abstract] [hide abstract]
ABSTRACT: SNP (single nucleotide polymorphism) discovery using next-generation sequencing data remains difficult primarily because of redundant genomic regions, such as interspersed repetitive elements and paralogous genes, present in all eukaryotic genomes. To address this problem, we developed Sniper, a novel multi-locus Bayesian probabilistic model and a computationally efficient algorithm that explicitly incorporates sequence reads that map to multiple genomic loci. Our model fully accounts for sequencing error, template bias, and multi-locus SNP combinations, maintaining high sensitivity and specificity under a broad range of conditions. An implementation of Sniper is freely available at http://kim.bio.upenn.edu/software/sniper.shtml.Genome biology 06/2011; 12(6):R55. · 6.63 Impact Factor -
Article: SNP calling, genotype calling, and sample allele frequency estimation from New-Generation Sequencing data.
[show abstract] [hide abstract]
ABSTRACT: We present a statistical framework for estimation and application of sample allele frequency spectra from New-Generation Sequencing (NGS) data. In this method, we first estimate the allele frequency spectrum using maximum likelihood. In contrast to previous methods, the likelihood function is calculated using a dynamic programming algorithm and numerically optimized using analytical derivatives. We then use a bayesian method for estimating the sample allele frequency in a single site, and show how the method can be used for genotype calling and SNP calling. We also show how the method can be extended to various other cases including cases with deviations from Hardy-Weinberg equilibrium. We evaluate the statistical properties of the methods using simulations and by application to a real data set.PLoS ONE 01/2012; 7(7):e37558. · 4.09 Impact Factor
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed.
The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual
current impact factor.
Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence
agreement may be applicable.
Keywords
1000 Genomes Project
20 ancestral recombination graphs
efficient approach
flanking genotype sites
genotype call
genotype single-nucleotide polymorphism
haplotype data
independent sequence
low coverage
low-coverage sequencing data
population genetic prior distribution
possible mutations
real data
relative tradeoff
sequence variants segregating
sequencing depth
simulation data
SNP candidates
tree-branch length
whole-genome sequencing