Quantitative Analysis of Single Nucleotide Polymorphisms within Copy Number Variation

Bioinformatics Program, Boston University, Boston, MA, USA.
PLoS ONE (Impact Factor: 3.23). 02/2008; 3(12):e3906. DOI: 10.1371/journal.pone.0003906
Source: PubMed


Single nucleotide polymorphisms (SNPs) have been used extensively in genetics and epidemiology studies. Traditionally, SNPs that did not pass the Hardy-Weinberg equilibrium (HWE) test were excluded from these analyses. Many investigators have addressed possible causes for departure from HWE, including genotyping errors, population admixture and segmental duplication. Recent large-scale surveys have revealed abundant structural variations in the human genome, including copy number variations (CNVs). This suggests that a significant number of SNPs must be within these regions, which may cause deviation from HWE.
We performed a Bayesian analysis on the potential effect of copy number variation, segmental duplication and genotyping errors on the behavior of SNPs. Our results suggest that copy number variation is a major factor of HWE violation for SNPs with a small minor allele frequency, when the sample size is large and the genotyping error rate is 0~1%.
Our study provides the posterior probability that a SNP falls in a CNV or a segmental duplication, given the observed allele frequency of the SNP, sample size and the significance level of HWE testing.

Full-text preview

Available from: PubMed Central
  • Source
    • "Additionally, and on the basis of real data, we observed that, for a SNP located in a CNV region, the missing rate for bi-allelic genotypes was often quite high, and there were significant departures from Hardy-Weinberg equilibrium. Thus, and as previously mentioned [19], SNPs located in CNVs were likely to be excluded from classical SNP analysis and accordingly the test for the effect of the allele was not considered nor tested. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Copy number variants (CNV) can be called from SNP-arrays; however, few studies have attempted to combine both CNV and SNP calls to test for association with complex diseases. Even when SNPs are located within CNVs, two separate association analyses are necessary, to compare the distribution of bi-allelic genotypes in cases and controls (referred to as SNP-only strategy) and the number of copies of a region (referred to as CNV-only strategy). However, when disease susceptibility is actually associated with allele specific copy-number states, the two strategies may not yield comparable results, raising a series of questions about the optimal analytical approach. We performed simulations of the performance of association testing under different scenarios that varied genotype frequencies and inheritance models. We show that the SNP-only strategy lacks power under most scenarios when the SNP is located within a CNV; frequently it is excluded from analysis as it does not pass quality control metrics either because of an increased rate of missing calls or a departure from fitness for Hardy-Weinberg proportion. The CNV-only strategy also lacks power because the association testing depends on the allele which copy number varies. The combined strategy performs well in most of the scenarios. Hence, we advocate the use of this combined strategy when testing for association with SNPs located within CNVs.
    Preview · Article · Sep 2013 · PLoS ONE
  • Source
    • "Otherwise, a duplicate with different copy numbers across the population is so called a copy-number variation (CNV). This type of genetic variation represents a DNA segment that exhibits copy-number differences in the population (21–23). Hence, if some identified DNVs locate in CNVs, they may not be presented in every individual and increase the difficulty for using these DNVs in CNVs for genotyping analysis. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Gene duplications are scattered widely throughout the human genome. A single-base difference located in nearly identical duplicated segments may be misjudged as a single nucleotide polymorphism (SNP) from individuals. This imperfection is undistinguishable in current genotyping methods. As the next-generation sequencing technologies become more popular for sequence-based association studies, numerous ambiguous SNPs are rapidly accumulated. Thus, analyzing duplication variations in the reference genome to assist in preventing false positive SNPs is imperative. We have identified >10% of human genes associated with duplicated gene loci (DGL). Through meticulous sequence alignments of DGL, we systematically designated 1 236 956 variations as duplicated gene nucleotide variants (DNVs). The DNV database (dbDNV) ( has been established to promote more accurate variation annotation. Aside from the flat file download, users can explore the gene-related duplications and the associated DNVs by DGL and DNV searches, respectively. In addition, the dbDNV contains 304 110 DNV-coupled SNPs. From DNV-coupled SNP search, users observe which SNP records are also variants among duplicates. This is useful while ∼58% of exonic SNPs in DGL are DNV-coupled. Because of high accumulation of ambiguous SNPs, we suggest that annotating SNPs with DNVs possibilities should improve association studies of these variants with human diseases.
    Full-text · Article · Jan 2011 · Nucleic Acids Research
  • Source
    • "Human Human, chimpanzee, orangutan, rhesus macaque, dog, mouse, rat, horse, cow, opossum, chicken, zebrafish, tetraodon, fugu, stickleback, medaka Chimpanzee Chimpanzee, human, orangutan, rhesus macaque, mouse, rat, opossum, chicken, zebrafish Rhesus macaque Rhesus macaque, human, chimpanzee, orangutan, mouse, rat Mouse Mouse, human, chimp, orangutan, rhesus macaque, rat, horse, dog, cow, opossum, chicken, tetraodon, fugu, stickleback, medaka, zebrafish Rat Rat, human, chimp, rhesus macaque, mouse, dog, horse, cow, opossum, chicken, zebrafish Dog Dog, human, mouse, rat, horse, cow Chicken Chicken, human, orangutan, mouse, rat, horse, opossum, zebrafish, fugu Stickleback human, mouse, chicken, zebrafish, tetraodon, fugu, medaka in >1% of a population (Lee et al., 2008). Further experimental evidence is needed to support the polymorphism status of the inferred CNVs in this database. "
    [Show abstract] [Hide abstract]
    ABSTRACT: CNVVdb is a web interface for identification of putative copy number variations (CNVs) among 16 vertebrate species using the-same-species self-alignments and cross-species pairwise alignments. By querying genomic coordinates in the target species, all the potential paralogous/orthologous regions that overlap ≥80–100% (adjustable) of the query sequences with user-specified sequence identity (≥60%∼≥90%) are returned. Additional information is also given for the genes that are included in the returned regions, including gene description, alternatively spliced transcripts, gene ontology descriptions and other biologically important information. CNVVdb also provides information of pseudogenes and single nucleotide polymorphisms (SNPs) for the CNV-related genomic regions. Moreover, multiple sequence alignments of shared CNVs across species are also provided. With the combination of CNV, SNP, pseudogene and functional information, CNVVdb can be very useful for comparative and functional studies in vertebrates. Availability: CNVVdb is freely accessible at Contact:
    Full-text · Article · May 2009 · Bioinformatics
Show more