Paralogous annotation of disease-causing variants in long QT syndrome genes

Medical Research Council Clinical Sciences Centre, Imperial College London, London, United Kingdom.
Human Mutation (Impact Factor: 5.05). 08/2012; 33(8):1188-91. DOI: 10.1002/humu.22114
Source: PubMed

ABSTRACT Discriminating between rare benign and pathogenic variation is a key challenge in clinical genetics, particularly as increasing numbers of nonsynonymous single-nucleotide polymorphisms (SNPs) are identified in resequencing studies. Here, we describe an approach for the functional annotation of nonsynonymous variants that identifies functionally important, disease-causing residues across protein families using multiple sequence alignment. We applied the methodology to long QT syndrome (LQT) genes, which cause sudden death, and their paralogues, which largely cause neurological disease. This approach accurately classified known LQT disease-causing variants (positive predictive value = 98.4%) with a better performance than established bioinformatic methods. The analysis also identified 1078 new putative disease loci, which we incorporated along with known variants into a comprehensive and freely accessible long QT resource (, based on newly created Locus Reference Genomic sequences ( We propose that paralogous annotation is widely applicable for Mendelian human disease genes.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Locus Reference Genomic (LRG; records contain internationally recognized stable reference sequences designed specifically for reporting clinically relevant sequence variants. Each LRG is contained within a single file consisting of a stable 'fixed' section and a regularly updated 'updatable' section. The fixed section contains stable genomic DNA sequence for a genomic region, essential transcripts and proteins for variant reporting and an exon numbering system. The updatable section contains mapping information, annotation of all transcripts and overlapping genes in the region and legacy exon and amino acid numbering systems. LRGs provide a stable framework that is vital for reporting variants, according to Human Genome Variation Society (HGVS) conventions, in genomic DNA, transcript or protein coordinates. To enable translation of information between LRG and genomic coordinates, LRGs include mapping to the human genome assembly. LRGs are compiled and maintained by the National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). LRG reference sequences are selected in collaboration with the diagnostic and research communities, locus-specific database curators and mutation consortia. Currently >700 LRGs have been created, of which >400 are publicly available. The aim is to create an LRG for every locus with clinical implications.
    Nucleic Acids Research 11/2013; 42(Database issue). DOI:10.1093/nar/gkt1198 · 8.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Sudden cardiac death (SCD) resulting from ventricular tachyarrhythmia is a major contributor to mortality. Clinical management of SCD, currently based on clinical markers of SCD risk, can be improved by integrating genetic information. The identification of multiple disease-causing gene variants has already improved patient management and increased our understanding of the rare Mendelian diseases associated with SCD risk in the young, but marked variability in disease severity suggests that additional genetic modifiers exist. Next-generation DNA sequencing could be crucial to the discovery of SCD-associated genes, but large data sets can be difficult to interpret. SCD usually occurs in patients with an average age of 65 years who have complex cardiac disease stemming from multiple, common, acquired disorders. Heritable factors are largely unknown, but are likely to have a role in determining the risk of SCD in these patients. Numerous genetic loci have been identified that affect electrocardiogram indices, which are regarded as intermediate phenotypes for tachyarrhythmia. These loci could help to identify new molecules and pathways affecting cardiac electrical function. These loci are often located in intergenic regions, so our evolving understanding of the noncoding regulatory regions of the genome are likely to aid in the identification of novel genes that are important for cardiac electrical function and possibly SCD.
    Nature Reviews Cardiology 12/2013; DOI:10.1038/nrcardio.2013.186 · 10.40 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: NECTAR (Non-synonymous Enriched Coding muTation ARchive; is a database and web application to annotate disease-related and functionally important amino acids in human proteins. A number of tools are available to facilitate the interpretation of DNA variants identified in diagnostic or research sequencing. These typically identify previous reports of DNA variation at a given genomic location, predict its effects on transcript and protein sequence and may predict downstream functional consequences. Previous reports and functional annotations are typically linked by the genomic location of the variant observed. NECTAR collates disease-causing variants and functionally important amino acid residues from a number of sources. Importantly, rather than simply linking annotations by a shared genomic location, NECTAR annotates variants of interest with details of previously reported variation affecting the same codon. This provides a much richer data set for the interpretation of a novel DNA variant. NECTAR also identifies functionally equivalent amino acid residues in evolutionarily related proteins (paralogues) and, where appropriate, transfers annotations between them. As well as accessing these data through a web interface, users can upload batches of variants in variant call format (VCF) for annotation on-the-fly. The database is freely available to download from the ftp site:
    Nucleic Acids Research 12/2013; 42(D1). DOI:10.1093/nar/gkt1245 · 8.81 Impact Factor


Available from
May 22, 2014