CNV Workshop: An integrated platform for high-throughput copy number variation discovery and clinical diagnostics

Center for Biomedical Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
BMC Bioinformatics (Impact Factor: 2.58). 02/2010; 11(1):74. DOI: 10.1186/1471-2105-11-74
Source: PubMed


Recent studies have shown that copy number variations (CNVs) are frequent in higher eukaryotes and associated with a substantial portion of inherited and acquired risk for various human diseases. The increasing availability of high-resolution genome surveillance platforms provides opportunity for rapidly assessing research and clinical samples for CNV content, as well as for determining the potential pathogenicity of identified variants. However, few informatics tools for accurate and efficient CNV detection and assessment currently exist.
We developed a suite of software tools and resources (CNV Workshop) for automated, genome-wide CNV detection from a variety of SNP array platforms. CNV Workshop includes three major components: detection, annotation, and presentation of structural variants from genome array data. CNV detection utilizes a robust and genotype-specific extension of the Circular Binary Segmentation algorithm, and the use of additional detection algorithms is supported. Predicted CNVs are captured in a MySQL database that supports cohort-based projects and incorporates a secure user authentication layer and user/admin roles. To assist with determination of pathogenicity, detected CNVs are also annotated automatically for gene content, known disease loci, and gene-based literature references. Results are easily queried, sorted, filtered, and visualized via a web-based presentation layer that includes a GBrowse-based graphical representation of CNV content and relevant public data, integration with the UCSC Genome Browser, and tabular displays of genomic attributes for each CNV.
To our knowledge, CNV Workshop represents the first cohesive and convenient platform for detection, annotation, and assessment of the biological and clinical significance of structural variants. CNV Workshop has been successfully utilized for assessment of genomic variation in healthy individuals and disease cohorts and is an ideal platform for coordinating multiple associated projects.
Available on the web at:

Download full-text


Available from: Peter S White,
  • Source
    • "tched , disease - free children was used as a healthy control group . All case and control samples were genotyped using the Illumina HumanHap550K BeadChip and a single consistent protocol . Genotype data were uni - formly analyzed for CNVs using Illumina ' s GenomeStudio software in combination with CNV Workshop and PennCNV ( Wang et al . , 2007 ; Gai et al . , 2010 ) ."
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: We sought to characterize the landscape of structural variation associated with the subset of congenital cardiac defects characterized by left-sided obstruction. Methods: Cases with left-sided cardiac defects (LSCD) and pediatric controls were uniformly genotyped and assessed for copy number variant (CNV) calls. Significance testing was performed to ascertain differences in overall CNV incidence, and for CNV enrichment of specific genes and gene functions in LSCD cases relative to controls. Results: A total of 257 cases of European descent and 962 ethnically matched, disease-free pediatric controls were included. Although there was no difference in CNV rate between cases and controls, a significant enrichment in rare LSCD CNVs was detected overall (p=7.30 × 10(-3) , case/control ratio=1.26) and when restricted either to deletions (p=7.58 × 10(-3) , case/control ratio=1.20) or duplications (3.02 × 10(-3) , case/control ratio=1.43). Neither gene-based, functional nor knowledge-based analyses identified genes, loci or pathways that were significantly enriched in cases as compared to controls when appropriate corrections for multiple tests were applied. However, several genes of interest were identified by virtue of their association with cardiac development, known human conditions, or reported disruption by CNVs in other patient cohorts. Conclusion: This study examines the largest cohort to date with LSCD for structural variation. These data suggest that CNVs play a role in disease risk and identify numerous genes disrupted by CNVs of potential disease relevance. These findings further highlight the genetic heterogeneity and complexity of these disorders.
    Birth Defects Research Part A Clinical and Molecular Teratology 12/2014; 100(12). DOI:10.1002/bdra.23279 · 2.09 Impact Factor
  • Source
    • "In this study, we combined the genomics data generated from multiple genome-wide association studies (GWAS) consisting of 3,017 unrelated Thai subjects with no undiagnosed genetic disorders. We carried out CNV discovery from these dataset using the two commonly used CNV calling algorithms, PennCNV [13] and CNV Workshop [14], to identify the most accurate set of CNVs, and put together the first large reference CNV database for Thais. Furthermore, we performed population Copy Number Variation Region (CNVR) frequency comparison between Thais and 11 HapMap3 populations, and identified unique CNVRs in Thais as well as CNVs overlapping with genes associated with Thai population. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Copy number variation (CNV) is a major genetic polymorphism contributing to genetic diversity and human evolution. Clinical application of CNVs for diagnostic purposes largely depends on sufficient population CNV data for accurate interpretation. CNVs from general population in currently available databases help classify CNVs of uncertain clinical significance, and benign CNVs. Earlier studies of CNV distribution in several populations worldwide showed that a significant fraction of CNVs are population specific. In this study, we characterized and analyzed CNVs in 3,017 unrelated Thai individuals genotyped with the Illumina Human610, Illumina HumanOmniexpress, or Illumina HapMap550v3 platform. We employed hidden Markov model and circular binary segmentation methods to identify CNVs, extracted 23,458 CNVs consistently identified by both algorithms, and cataloged these high confident CNVs into our publicly available Thai CNV database. Analysis of CNVs in the Thai population identified a median of eight autosomal CNVs per individual. Most CNVs (96.73%) did not overlap with any known chromosomal imbalance syndromes documented in the DECIPHER database. When compared with CNVs in the 11 HapMap3 populations, CNVs found in the Thai population shared several characteristics with CNVs characterized in HapMap3. Common CNVs in Thais had similar frequencies to those in the HapMap3 populations, and all high frequency CNVs (>20%) found in Thai individuals could also be identified in HapMap3. The majorities of CNVs discovered in the Thai population, however, were of low frequency, or uniquely identified in Thais. When performing hierarchical clustering using CNV frequencies, the CNV data were clustered into Africans, Europeans, and Asians, in line with the clustering performed with single nucleotide polymorphism (SNP) data. As CNV data are specific to origin of population, our population-specific reference database will serve as a valuable addition to the existing resources for the investigation of clinical significance of CNVs in Thais and related ethnicities.
    PLoS ONE 08/2014; 9(8):e104355. DOI:10.1371/journal.pone.0104355 · 3.23 Impact Factor
  • Source
    • "With this increased resolution comes additional multiple testing burden, although multiple probes are needed to call a given CNV and many probes may not detect any CNVs (conservative standard is P < 5 × 10−4 [(22); see ‘Materials and Methods’ section). Assessment of CNVs across the genome has continued to improve (30–35). Recent reports of the extent of discordance between different arrays and CNV calling algorithms have been published (17). "
    [Show abstract] [Hide abstract]
    ABSTRACT: A number of copy number variation (CNV) calling algorithms exist; however, comprehensive software tools for CNV association studies are lacking. We describe ParseCNV, unique software that takes CNV calls and creates probe-based statistics for CNV occurrence in both case-control design and in family based studies addressing both de novo and inheritance events, which are then summarized based on CNV regions (CNVRs). CNVRs are defined in a dynamic manner to allow for a complex CNV overlap while maintaining precise association region. Using this approach, we avoid failure to converge and non-monotonic curve fitting weaknesses of programs, such as CNVtools and CNVassoc, and although Plink is easy to use, it only provides combined CNV state probe-based statistics, not state-specific CNVRs. Existing CNV association methods do not provide any quality tracking information to filter confident associations, a key issue which is fully addressed by ParseCNV. In addition, uncertainty in CNV calls underlying CNV associations is evaluated to verify significant results, including CNV overlap profiles, genomic context, number of probes supporting the CNV and single-probe intensities. When optimal quality control parameters are followed using ParseCNV, 90% of CNVs validate by polymerase chain reaction, an often problematic stage because of inadequate significant association review. ParseCNV is freely available at
    Nucleic Acids Research 01/2013; 41(5). DOI:10.1093/nar/gks1346 · 9.11 Impact Factor
Show more