Integrated annotation and analysis of genetic variants from next-generation sequencing studies with variant tools

Department of Epidemiology, University of Texas, MD Anderson Cancer Center, Houston, TX, USA.
Bioinformatics (Impact Factor: 4.62). 12/2011; 28(3):421-2. DOI: 10.1093/bioinformatics/btr667
Source: PubMed

ABSTRACT Storing, annotating and analyzing variants from next-generation sequencing projects can be difficult due to the availability of a wide array of data formats, tools and annotation sources, as well as the sheer size of the data files. Useful tools, including the GATK, ANNOVAR and BEDTools can be integrated into custom pipelines for annotating and analyzing sequence variants. However, building flexible pipelines that support the tracking of variants alongside their samples, while enabling updated annotation and reanalyses, is not a simple task.
We have developed variant tools, a flexible annotation and analysis toolset that greatly simplifies the storage, annotation and filtering of variants and the analysis of the underlying samples. variant tools can be used to manage and analyze genetic variants obtained from sequence alignments, and the command-line driven toolset could be used as a foundation for building more sophisticated analytical methods.
variant tools consists of two command-line driven programs vtools and vtools_report. It is freely available at, distributed under a GPL license.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Two recent reports have identified the Endothelial Protein C Receptor (EPCR) as a key molecule implicated in severe malaria pathology. First, it was shown that EPCR in the human microvasculature mediates sequestration of Plasmodium falciparum-infected erythrocytes. Second, microvascular thrombosis, one of the major processes causing cerebral malaria, was linked to a reduction in EPCR expression in cerebral endothelial layers. It was speculated that genetic variation affecting EPCR functionality could influence susceptibility to severe malaria phenotypes, rendering PROCR, the gene encoding EPCR, a promising candidate for an association study. Here, we performed an association study including high-resolution variant discovery of rare and frequent genetic variants in the PROCR gene. The study group, which previously has proven to be a valuable tool for studying the genetics of malaria, comprised 1,905 severe malaria cases aged 1-156 months and 1,866 apparently healthy children aged 2-161 months from the Ashanti Region in Ghana, West Africa, where malaria is highly endemic. Association of genetic variation with severe malaria phenotypes was examined on the basis of single variants, reconstructed haplotypes, and rare variant analyses. A total of 41 genetic variants were detected in regulatory and coding regions of PROCR, 17 of which were previously unknown genetic variants. In association tests, none of the single variants, haplotypes or rare variants showed evidence for an association with severe malaria, cerebral malaria, or severe malaria anemia. Here we present the first analysis of genetic variation in the PROCR gene in the context of severe malaria in African subjects and show that genetic variation in the PROCR gene in our study population does not influence susceptibility to major severe malaria phenotypes.
    PLoS ONE 12/2014; 9(12):e115770. DOI:10.1371/journal.pone.0115770 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Parkinson's disease (PD) can be divided in familial (Mendelian) and sporadic forms. A number of causal genes have been discovered for the Mendelian form, which constitutes 10-20% of the total cases. Genome-wide association studies (GWASs) have successfully uncovered a number of susceptibility loci for sporadic cases but those only explain a small fraction (6-7%) of PD heritability. It has been observed that some genes that confer susceptibility to PD through common risk variants also contain rare causing mutations for the Mendelian forms of the disease. These results suggest a possible functional link between Mendelian and sporadic PD and led us to investigate the role that rare and low-frequency variants could have on the sporadic form. Through a targeting approach, we have resequenced at 49X coverage the exons and regulatory regions of 38 genes (including Mendelian and susceptibility PD genes) in 249 sporadic PD patients and 145 unrelated controls of European origin. Unlike susceptibility genes, Mendelian genes show a clear general enrichment of rare functional variants in PD cases, observed directly as well as with Tajima's D statistic and several collapsing methods. Our findings suggest that rare variation on PD Mendelian genes may have a role in the sporadic forms of the disease. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email:
    Human Molecular Genetics 12/2014; 24(7). DOI:10.1093/hmg/ddu616 · 6.68 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Next-generation sequencing technologies (NGS) have revolutionized the field of genetics and are trending toward clinical diagnostics. Exome and targeted sequencing in a disease context represent a major NGS clinical application, considering its utility and cost-effectiveness. With the ongoing discovery of disease-associated genes, various gene panels have been launched for both basic research and diagnostic tests. However, the fundamental inconsistencies among the diverse annotation sources, software packages, and data formats have complicated the subsequent analysis. To manage disease-associated NGS data, we developed Vanno, a web-based application for in-depth analysis and rapid evaluation of disease causative genome sequence alterations. Vanno integrates information from biomedical databases, functional predictions from available evaluation models, and mutation landscapes from TCGA cancer types. A highly integrated framework that incorporates filtering, sorting, clustering and visual analytic modules is provided to facilitate exploration of oncogenomics datasets at different levels, such as gene, variant, protein domain or 3D structure. Such design is crucial for the extraction of knowledge from sequence alterations and translating biological insights into clinical applications. Taken together, Vanno supports almost all disease-associated gene tests and exome sequencing panels designed for NGS, providing a complete solution for targeted and exome sequencing analysis. Vanno is freely available at This article is protected by copyright. All rights reserved.
    Human Mutation 02/2015; 36(2). DOI:10.1002/humu.22684 · 5.05 Impact Factor


Available from