Paralogous annotation of disease-causing variants in long QT syndrome genes.
ABSTRACT Discriminating between rare benign and pathogenic variation is a key challenge in clinical genetics, particularly as increasing numbers of nonsynonymous single-nucleotide polymorphisms (SNPs) are identified in resequencing studies. Here, we describe an approach for the functional annotation of nonsynonymous variants that identifies functionally important, disease-causing residues across protein families using multiple sequence alignment. We applied the methodology to long QT syndrome (LQT) genes, which cause sudden death, and their paralogues, which largely cause neurological disease. This approach accurately classified known LQT disease-causing variants (positive predictive value = 98.4%) with a better performance than established bioinformatic methods. The analysis also identified 1078 new putative disease loci, which we incorporated along with known variants into a comprehensive and freely accessible long QT resource (http://cardiodb.org/Paralogue_Annotation/), based on newly created Locus Reference Genomic sequences (http://www.lrg-sequence.org/). We propose that paralogous annotation is widely applicable for Mendelian human disease genes.
- [show abstract] [hide abstract]
ABSTRACT: Sudden cardiac death (SCD) resulting from ventricular tachyarrhythmia is a major contributor to mortality. Clinical management of SCD, currently based on clinical markers of SCD risk, can be improved by integrating genetic information. The identification of multiple disease-causing gene variants has already improved patient management and increased our understanding of the rare Mendelian diseases associated with SCD risk in the young, but marked variability in disease severity suggests that additional genetic modifiers exist. Next-generation DNA sequencing could be crucial to the discovery of SCD-associated genes, but large data sets can be difficult to interpret. SCD usually occurs in patients with an average age of 65 years who have complex cardiac disease stemming from multiple, common, acquired disorders. Heritable factors are largely unknown, but are likely to have a role in determining the risk of SCD in these patients. Numerous genetic loci have been identified that affect electrocardiogram indices, which are regarded as intermediate phenotypes for tachyarrhythmia. These loci could help to identify new molecules and pathways affecting cardiac electrical function. These loci are often located in intergenic regions, so our evolving understanding of the noncoding regulatory regions of the genome are likely to aid in the identification of novel genes that are important for cardiac electrical function and possibly SCD.Nature Reviews Cardiology 12/2013; · 10.40 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Locus Reference Genomic (LRG; http://www.lrg-sequence.org/) records contain internationally recognized stable reference sequences designed specifically for reporting clinically relevant sequence variants. Each LRG is contained within a single file consisting of a stable 'fixed' section and a regularly updated 'updatable' section. The fixed section contains stable genomic DNA sequence for a genomic region, essential transcripts and proteins for variant reporting and an exon numbering system. The updatable section contains mapping information, annotation of all transcripts and overlapping genes in the region and legacy exon and amino acid numbering systems. LRGs provide a stable framework that is vital for reporting variants, according to Human Genome Variation Society (HGVS) conventions, in genomic DNA, transcript or protein coordinates. To enable translation of information between LRG and genomic coordinates, LRGs include mapping to the human genome assembly. LRGs are compiled and maintained by the National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). LRG reference sequences are selected in collaboration with the diagnostic and research communities, locus-specific database curators and mutation consortia. Currently >700 LRGs have been created, of which >400 are publicly available. The aim is to create an LRG for every locus with clinical implications.Nucleic Acids Research 11/2013; · 8.28 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: NECTAR (Non-synonymous Enriched Coding muTation ARchive; http://nectarmutation.org) is a database and web application to annotate disease-related and functionally important amino acids in human proteins. A number of tools are available to facilitate the interpretation of DNA variants identified in diagnostic or research sequencing. These typically identify previous reports of DNA variation at a given genomic location, predict its effects on transcript and protein sequence and may predict downstream functional consequences. Previous reports and functional annotations are typically linked by the genomic location of the variant observed. NECTAR collates disease-causing variants and functionally important amino acid residues from a number of sources. Importantly, rather than simply linking annotations by a shared genomic location, NECTAR annotates variants of interest with details of previously reported variation affecting the same codon. This provides a much richer data set for the interpretation of a novel DNA variant. NECTAR also identifies functionally equivalent amino acid residues in evolutionarily related proteins (paralogues) and, where appropriate, transfers annotations between them. As well as accessing these data through a web interface, users can upload batches of variants in variant call format (VCF) for annotation on-the-fly. The database is freely available to download from the ftp site: ftp://ftp.nectarmutation.org.Nucleic Acids Research 12/2013; · 8.28 Impact Factor
Paralogous Annotation of Disease-Causing
Variants in Long QT Syndrome Genes
James S. Ware,1∗†Roddy Walsh,2†Fiona Cunningham,3Ewan Birney,3and Stuart A. Cook1,2
1Medical Research Council Clinical Sciences Centre, Imperial College London, London, United Kingdom;2Cardiovascular Biomedical Research
Unit, Royal Brompton and Harefield NHS Trust, London, United Kingdom;3European Bioinformatics Institute, Wellcome Trust Genome Campus,
Hinxton, Cambridge, United Kingdom
Communicated by Raymond Dalgleish
Received 14 November 2011; accepted revised manuscript 19 April 2012.
Published online 11 May 2012 in Wiley Online Library (www.wiley.com/humanmutation).DOI: 10.1002/humu.22114
pathogenic variation is a key challenge in clinical genet-
ics, particularly as increasing numbers of nonsynonymous
single-nucleotide polymorphisms (SNPs) are identified in
resequencing studies. Here, we describe an approach for
the functional annotation of nonsynonymous variants that
identifies functionally important, disease-causing residues
across protein families using multiple sequence alignment.
We applied the methodology to long QT syndrome (LQT)
genes, which cause sudden death, and their paralogues,
which largely cause neurological disease. This approach
accurately classified known LQT disease-causing variants
(positive predictive value = 98.4%) with a better per-
formance than established bioinformatic methods. The
analysis also identified 1078 new putative disease loci,
which we incorporated along with known variants into
a comprehensive and freely accessible long QT resource
newly created Locus Reference Genomic sequences
(http://www.lrg-sequence.org/). We propose that paralo-
gous annotation is widely applicable for Mendelian human
Hum Mutat 33:1188–1191, 2012.C ?2012 Wiley Periodicals, Inc.
KEY WORDS: variant annotation; paralogue; nonsynony-
mous; long QT syndrome; inherited heart disease
Discriminating between rare benign and
Inherited long QT syndrome (LQT; MIM# 607542) is a life-
threatening Mendelian disease caused by genetic variants in ion
While clinical guidelines [Ackerman et al., 2011] indicate that ge-
netic testing of patients with LQT should be performed, it is often
Additional Supporting Information may be found in the online version of this article.
†First two authors have contributed equally to this work.
∗Correspondence to: James S. Ware, Medical Research Council Clinical Sci-
ences Centre, Imperial College London, London W12 0NN, United Kingdom. E-mail:
Contract grant sponsors: Wellcome Trust (087183/Z/08/Z [to JSW]); British Heart
Foundation (SP/10/10/28431 [to SAC and EB]); Medical Research Council (UK); Royal
Brompton and Harefield Cardiovascular Biomedical Research Unit, National Institute
for Health Research, UK.
difficult to reach a conclusive genetic diagnosis when a new or rare
sequence change is detected [Cooper and Shendure, 2011]. This
relates to uncertainty concerning the significance of novel nonsyn-
annotation of genomic coordinates of known disease variants, and
databases [Cooper and Shendure, 2011; Dalgleish et al., 2010].
In one study, it was noted that variation at an equivalent residue
in KCNQ1 (expressed in the heart; MIM# 607542) and KCNQ4
(expressed in the inner ear; MIM# 603537) causes LQT and autoso-
mal dominant deafness respectively [Kubisch et al., 1999] (MIM#s
192500, 600101). To date pathogenic variation in gene paralogues
represents an unexploited resource for the functional annotation
of nsSNPs in disease genes. We hypothesized that systematic anal-
ysis of disease-causing amino acid substitutions in LQT paralogues
could be used to annotate new pathogenic residues in LQT genes
for clinical research and molecular diagnostics.
paralogues predict pathogenic variation in LQT genes. Potential
of the IUPHAR Ion Channel Database [Sharman et al., 2011] and
BLAST [Altschul et al., 1990], and disease-causing nsSNPs in these
genes were retrieved from HGMD professional 2011.1 [Stenson
et al., 2003] and locus-specific databases (LSDBs) where available
http://grenada.lumc.nl/LOVD2/FHM/). We analyzed 12 of the
13 genes known to cause LQT [Hedley et al., 2009; Yang et al.,
2010]. Eight have one or more paralogues (n = 47) that contain
variants causing autosomal dominant disease (Supp. Table S1).
No disease-causing nsSNPs were found in paralogues of four LQT
genes (LQT9-12: CAV3, SCN4B, AKAP9, SNTA1; MIM#s 601253,
608256, 604001, 601017). LQT13 (KCNJ5; MIM# 600734) was not
included as only one disease-causing variant has been reported in a
single study. Typically, LQT paralogues represent ion channels and
cause diseases such as familial epilepsy, ataxia, and deafness that are
attributable to perturbed neuronal ion channel function.
genes and their paralogues using the M-Coffee algorithm [Wallace
of reliability of the alignment in any given region, so that mappings
alogues without disease-causing nsSNPs were included to achieve
the best possible alignments. Using these multiple sequence align-
C ?2012 WILEY PERIODICALS, INC.
protein. In total, known disease-causing amino acid substitutions
in LQT paralogues mapped to 1277 residues across the eight LQT
proteins (Table 1).
We next annotated all known disease-causing amino acid
substitutions in these eight LQT proteins using HGMD pro-
fessional 2011.1 [Stenson et al., 2003], dbSNP build 132
[Sherry et al., 2001], LSDBs (Gene Connection for the Heart,
http://www.fsm.it/cardmoc/, Human Variome project in China,
http://www.genomed.org/LOVD/), and additional variants re-
trieved from the published literature [Kapa et al., 2009]. All online
for each LQT gene was used for this analysis, using new Locus Ref-
these isoforms are shown in Supp. Table S1.
All variants reported as definite causes of long QT, short QT (SQT;
MIM# 609620), Brugada syndrome (BrS; MIM# 601144), or vari-
ants thereof were grouped together under the label “LQT.” LQT,
SQT and BrS are caused by variants in the same set of genes, but
grouped together so that any deleterious altered function (gain or
of direction of effect. Variants causing other inherited diseases were
categorized as Other Disease Phenotype (ODP). Disease associa-
tions (e.g., from genome-wide association studies) were excluded.
ODP variants included those reported as “possible” causes of LQT,
BrS or SQT, or as “definite” causes of an intermediate phenotype
such as cardiac arrhythmia. Many or most of these variants may
in fact cause LQT, but have been annotated distinctly to allow for
comparison of only the most robustly phenotyped variants.
Published variants described as benign in LSDBs or liter-
ature case series, or reported in dbSNP (dbSNP, http://www
.ncbi.nlm.nih.gov/projects/SNP/) with no disease phenotype and
a population frequency of >1% in any population (which we con-
sider incompatible with the population frequency of LQT), were
categorized as Benign polymorphisms. All other missense muta-
tions in dbSNP with no reported cardiac phenotype were classified
as Probably Benign (PB): as many variants have been reported in
dbSNP without associated phenotype data, and LQT is not always
highly penetrant, there may be some false negatives in this dataset.
Instances where the same variant or residue was classified as disease
causing and benign in different reports were labeled as conflicts.
Of the 1277 mappable pathogenic variants from LQT gene
paralogues, 185 mapped to LQT residues where variation has
previously been unambiguously defined as either pathogenic
(n = 182, 98.4%) or benign (n = 3, 1.6%). This demonstrates that
LQT residues at equivalent sites to known pathogenic variation in
LQT paralogues are significantly enriched for LQT-causing variants
(Figure 1; see also Supp. Figure S1). Comparing the most robust
a positive predictive value (PPV) of LQT pathogenicity of 98.4%.
When including less reliable annotations, that is, including ODP
as additional true positives and PB as false positives, the PPV of
the method remains high (96.4%), and the enrichment significant
(P = 2.0 × 10–7). By comparison, SIFT [Kumar et al., 2009] (ver-
sion 4.0.3b) and PolyPhen-2 [Adzhubei et al., 2010] (version 2.1.0,
HumVar classifier) have PPVs of 93.6–95.5% and 90.6–95.9%, re-
spectively, using the same dataset (Supp. Table S3).
Mendelian pathogenic variants are more frequently located in
culate domain-specific estimates of the probability of pathogenicity
of variants in three LQT genes [Kapa et al., 2009]. We annotated
Table 1. Functional Classification of Residues in Long QT Proteins
Residues with published variants in LQT proteins
Residues with disease variants mapped from paralogues
Left panel: for each protein, residues are categorized according to the phenotype associated with variants at that residue: variation at the residue causes definite long QT; short QT or Brugada syndrome (LQT); causes other disease phenotype (ODP);
is benign (Benign); or is probably benign (PB). A number of residues have conflicting reports of pathogenicity in the literature (Conflict) and many residues have no reported variation and are unannotated (UN). Right panel: residues identified by
mapping of disease-causing variants from paralogues are significantly enriched for known disease-causing variants (P = 4.8 × 10, Fisher’s exact test), and annotate 1078 novel putative disease-causing loci.
HUMAN MUTATION, Vol. 33, No. 8, 1188–1191, 2012
paralogous annotation. Previously annotated disease-causing residues are shown in black. Predicted disease-causing residues are shown in red.
of protein domains. All known variants and novel paralogue mappings for this and for the other seven LQT genes are shown in Supp. Figure S1.
Schematic representation of SCN5A protein showing known disease-causing residues and those predicted to be disease causing by
each protein with its protein structure, using the domain annota-
and SCN5A), or annotations from Swiss-Prot (for KCNE1, KCNE2,
KCNJ2, and CACNA1C). ANK2 is not presented due to a paucity of
domain features and small number of mapped variants. Expected
and observed numbers of variants in each protein region were tab-
ulated, and chi-square tests used to identify genes with nonuniform
distributions of variants (Supp. Table S4). We observed that vari-
ants in paralogous genes mapped to LQT residues with patterns
of domain enrichment similar to those previously reported, and
we suggest that the location of mapped paralogue variants may be
used to inform domain-specific estimates of pathogenicity for less
well-characterized LQT genes (e.g., CACNA1C and KCNJ2).
Paralogous annotation of pathogenic Mendelian variation is
widely applicable. By inference the technique can be applied in a
reciprocal fashion across the gene families that we have studied,
using annotated variants in LQT genes to interpret variation in in-
herited epilepsy genes (Supp. Table S5). We examined all disease
genes in HGMD pro 2011.1 [Stenson et al., 2003], and identified
1824 genes (45.5% of the dataset) with one or more paralogues
that contain disease-causing variants (average 3.2 paralogues per
gene). This preliminary analysis suggests there are over 150,000 po-
tentially informative annotations from disease-causing variants in
paralogues of human disease genes.
The accuracy of the method we describe here depends on re-
liable phenotype data associated with genetic variants that need
to be in an accessible format and readily available (e.g., via the
European Genome-Phenome Archive, http://www.ebi.ac.uk/ega/).
The correct and standardized annotation of genetic variants in
these datasets is fundamental. In the course of this study, we
identified numerous errors arising from the use of alternate ref-
erence sequences (Supp. Table S6). The Human Genome Varia-
tion Society recommends the use of a Locus Reference Genomic
(LRG) sequence [Dalgleish et al., 2010] to overcome these de-
ficiencies (http://www.lrg-sequence.org/). Hence, to enable accu-
rate annotation of LQT variants for clinical application we es-
tablished new LRG coordinates for all 13 LQT genes. For the
eight LQT genes studied here, we have collated published vari-
ants, submitted annotations to dbSNP and created a compre-
hensive and freely available resource for LQT research and clini-
cal diagnostics (http://cardiodb.org/Paralogue_Annotation/; Supp.
In summary, we applied systematic multiple sequence alignment
This identified novel putative disease-causing variants in ∼10% of
previously unannotated LQT residues that adds significantly to the
functional annotation of LQT genes. We generated a comprehen-
along with existing annotations, for the wider scientific and clinical
community that will become increasingly powerful with cumula-
tive annotations from resequencing studies [Cooper and Shendure,
2011]. The technique, we describe here, is widely transferable to
human disease genes and, we believe, important for clinical inter-
pretation of novel missense mutations.
Published variants in LQT genes collated in this paper have been submitted
to dbSNP with batch ID 1056584, submitter handle RBH_CV_BRU.
Ackerman MJ, Priori SG, Willems S, Berul C, Brugada R, Calkins H, Camm AJ, Ellinor
PT, Gollob M, Hamilton R, Hershberger RE, Judge DP, et al. 2011. HRS/EHRA
expert consensus statement on the state of genetic testing for the channelopathies
and cardiomyopathies: this document was developed as a partnership between
the Heart Rhythm Society (HRS) and the European Heart Rhythm Association
(EHRA). Europace 13:1077–1109.
Adzhubei IA,SchmidtS,Peshkin L, RamenskyVE,Gerasimova A,BorkP, Kondrashov
AS, Sunyaev SR. 2010. A method and server for predicting damaging missense
mutations. Nat Methods 7:248–249.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment
search tool. J Mol Biol 215:403–410.
Campuzano O, Beltran-Alvarez P, Iglesias A, Scornik F, Perez G, Brugada R. 2010.
Genetics and cardiac channelopathies. Genet Med 12:260–267.
Cooper GM, Shendure J. 2011. Needles in stacks of needles: finding disease-causal
variants in a wealth of genomic data. Nat Rev Genet 12:628–640.
WM, Larsson P, Vaughan BW, Beroud C, Dobson G, et al. 2010. Locus Refer-
ence Genomic sequences: an improved basis for describing human DNA variants.
Genome Med 2:24.
Hedley PL, Jorgensen P, Schlamowitz S, Wangari R, Moolman-Smook J, Brink PA,
Kanters JK, Corfield VA, Christiansen M. 2009. The genetic basis of long QT and
short QT syndromes: a mutation update. Hum Mutat 30:1486–1511.
HUMAN MUTATION, Vol. 33, No. 8, 1188–1191, 2012
mutations from benign variants. Circulation 120:1752–1760.
Kubisch C, Schroeder BC, Friedrich T, Lutjohann B, El-Amraoui A, Marlin S, Petit C,
Jentsch TJ. 1999. KCNQ4, a novel potassium channel expressed in sensory outer
hair cells, is mutated in dominant deafness. Cell 96:437–446.
Kumar P, Henikoff S, Ng PC. 2009. Predicting the effects of coding non-synonymous
variants on protein function using the SIFT algorithm. Nat Protoc 4:1073–
Sharman JL, Mpamhanga CP, Spedding M, Germain P, Staels B, Dacquet C, Laudet
V, Harmar AJ. 2011. IUPHAR-DB: new receptors and tools for easy search-
ing and visualization of pharmacological data. Nucleic Acids Res 39:D534–
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. 2001.
dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:308–311.
Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS, Abeysinghe S,
Krawczak M, Cooper DN. 2003. Human Gene Mutation Database (HGMD):
2003 update. Hum Mutat 21:577–581.
Tester DJ, Ackerman MJ. 2011. Genetic testing for potentially lethal, highly treat-
able inherited cardiomyopathies/channelopathies in clinical practice. Circulation
Yang Y, Liang B, Liu J, Li J, Grunnet M, Olesen SP, Rasmussen HB, Ellinor PT, Gao L,
Lin X, Li L, Wang L, et al. 2010. Identification of a Kir3.4 mutation in congenital
long QT syndrome. Am J Hum Genet 86:872–880.
HUMAN MUTATION, Vol. 33, No. 8, 1188–1191, 2012