Quantitative analysis of single nucleotide polymorphisms within copy number variation.

Bioinformatics Program, Boston University, Boston, MA, USA.
PLoS ONE (Impact Factor: 3.73). 02/2008; 3(12):e3906. DOI:10.1371/journal.pone.0003906
Source: PubMed

ABSTRACT Single nucleotide polymorphisms (SNPs) have been used extensively in genetics and epidemiology studies. Traditionally, SNPs that did not pass the Hardy-Weinberg equilibrium (HWE) test were excluded from these analyses. Many investigators have addressed possible causes for departure from HWE, including genotyping errors, population admixture and segmental duplication. Recent large-scale surveys have revealed abundant structural variations in the human genome, including copy number variations (CNVs). This suggests that a significant number of SNPs must be within these regions, which may cause deviation from HWE.
We performed a Bayesian analysis on the potential effect of copy number variation, segmental duplication and genotyping errors on the behavior of SNPs. Our results suggest that copy number variation is a major factor of HWE violation for SNPs with a small minor allele frequency, when the sample size is large and the genotyping error rate is 0~1%.
Our study provides the posterior probability that a SNP falls in a CNV or a segmental duplication, given the observed allele frequency of the SNP, sample size and the significance level of HWE testing.

0 0
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Copy number variants (CNV) can be called from SNP-arrays; however, few studies have attempted to combine both CNV and SNP calls to test for association with complex diseases. Even when SNPs are located within CNVs, two separate association analyses are necessary, to compare the distribution of bi-allelic genotypes in cases and controls (referred to as SNP-only strategy) and the number of copies of a region (referred to as CNV-only strategy). However, when disease susceptibility is actually associated with allele specific copy-number states, the two strategies may not yield comparable results, raising a series of questions about the optimal analytical approach. We performed simulations of the performance of association testing under different scenarios that varied genotype frequencies and inheritance models. We show that the SNP-only strategy lacks power under most scenarios when the SNP is located within a CNV; frequently it is excluded from analysis as it does not pass quality control metrics either because of an increased rate of missing calls or a departure from fitness for Hardy-Weinberg proportion. The CNV-only strategy also lacks power because the association testing depends on the allele which copy number varies. The combined strategy performs well in most of the scenarios. Hence, we advocate the use of this combined strategy when testing for association with SNPs located within CNVs.
    PLoS ONE 01/2013; 8(9):e75350. · 3.73 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Gene duplications are scattered widely throughout the human genome. A single-base difference located in nearly identical duplicated segments may be misjudged as a single nucleotide polymorphism (SNP) from individuals. This imperfection is undistinguishable in current genotyping methods. As the next-generation sequencing technologies become more popular for sequence-based association studies, numerous ambiguous SNPs are rapidly accumulated. Thus, analyzing duplication variations in the reference genome to assist in preventing false positive SNPs is imperative. We have identified >10% of human genes associated with duplicated gene loci (DGL). Through meticulous sequence alignments of DGL, we systematically designated 1,236,956 variations as duplicated gene nucleotide variants (DNVs). The DNV database (dbDNV) ( has been established to promote more accurate variation annotation. Aside from the flat file download, users can explore the gene-related duplications and the associated DNVs by DGL and DNV searches, respectively. In addition, the dbDNV contains 304,110 DNV-coupled SNPs. From DNV-coupled SNP search, users observe which SNP records are also variants among duplicates. This is useful while ∼58% of exonic SNPs in DGL are DNV-coupled. Because of high accumulation of ambiguous SNPs, we suggest that annotating SNPs with DNVs possibilities should improve association studies of these variants with human diseases.
    Nucleic Acids Research 01/2011; 39(Database issue):D920-5. · 8.28 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Although single nucleotide polymorphisms (SNPs) are increasingly being recognized as powerful molecular markers, their application to non-model organisms can bring significant challenges. Among these are imperfect conversion rates of assays designed from in silico resources and the enhanced potential for genotyping error relative to pre-validated, highly optimized human SNPs. To explore these issues, we used Illumina's GoldenGate assay to genotype 480 Antarctic fur seal (Arctocephalus gazella) individuals at 144 putative SNPs derived from a 454 transcriptome assembly. One hundred and thirty-five polymorphic SNPs (93.8%) were automatically validated by the program GenomeStudio, and the initial genotyping error rate, estimated from nine replicate samples, was 0.004 per reaction. However, an almost tenfold further reduction in the error rate was achieved by excluding 31 loci (21.5%) that exhibited unclear clustering patterns, manually editing clusters to allow rescoring of ambiguous or incorrect genotypes, and excluding 18 samples (3.8%) with unreliable genotypes. After stringent quality filtering, we also found a counter-intuitive negative relationship between in silico minor allele frequency and the conversion rate, suggesting that some of our assays may have been designed from paralogous loci. Nevertheless, we obtained over 45 000 individual SNP genotypes with a final error rate of 0.0005, indicating that the GoldenGate assay is eminently capable of generating large, high-quality data sets for non-model organisms. This has positive implications for future studies of the evolutionary, behavioural and conservation genetics of natural populations.
    Molecular Ecology Resources 06/2012; 12(5):861-72. · 7.43 Impact Factor

Full-text (2 Sources)

Available from