Paralogous Annotation of Disease-Causing
Variants in Long QT Syndrome Genes
James S. Ware,1∗†Roddy Walsh,2†Fiona Cunningham,3Ewan Birney,3and Stuart A. Cook1,2
1Medical Research Council Clinical Sciences Centre, Imperial College London, London, United Kingdom;2Cardiovascular Biomedical Research
Unit, Royal Brompton and Harefield NHS Trust, London, United Kingdom;3European Bioinformatics Institute, Wellcome Trust Genome Campus,
Hinxton, Cambridge, United Kingdom
Communicated by Raymond Dalgleish
Received 14 November 2011; accepted revised manuscript 19 April 2012.
Published online 11 May 2012 in Wiley Online Library (www.wiley.com/humanmutation).DOI: 10.1002/humu.22114
pathogenic variation is a key challenge in clinical genet-
ics, particularly as increasing numbers of nonsynonymous
single-nucleotide polymorphisms (SNPs) are identified in
resequencing studies. Here, we describe an approach for
the functional annotation of nonsynonymous variants that
identifies functionally important, disease-causing residues
across protein families using multiple sequence alignment.
We applied the methodology to long QT syndrome (LQT)
genes, which cause sudden death, and their paralogues,
which largely cause neurological disease. This approach
accurately classified known LQT disease-causing variants
(positive predictive value = 98.4%) with a better per-
formance than established bioinformatic methods. The
analysis also identified 1078 new putative disease loci,
which we incorporated along with known variants into
a comprehensive and freely accessible long QT resource
newly created Locus Reference Genomic sequences
(http://www.lrg-sequence.org/). We propose that paralo-
gous annotation is widely applicable for Mendelian human
Hum Mutat 33:1188–1191, 2012.C ?2012 Wiley Periodicals, Inc.
KEY WORDS: variant annotation; paralogue; nonsynony-
mous; long QT syndrome; inherited heart disease
Discriminating between rare benign and
Inherited long QT syndrome (LQT; MIM# 607542) is a life-
threatening Mendelian disease caused by genetic variants in ion
While clinical guidelines [Ackerman et al., 2011] indicate that ge-
netic testing of patients with LQT should be performed, it is often
Additional Supporting Information may be found in the online version of this article.
†First two authors have contributed equally to this work.
∗Correspondence to: James S. Ware, Medical Research Council Clinical Sci-
ences Centre, Imperial College London, London W12 0NN, United Kingdom. E-mail:
Contract grant sponsors: Wellcome Trust (087183/Z/08/Z [to JSW]); British Heart
Foundation (SP/10/10/28431 [to SAC and EB]); Medical Research Council (UK); Royal
Brompton and Harefield Cardiovascular Biomedical Research Unit, National Institute
for Health Research, UK.
difficult to reach a conclusive genetic diagnosis when a new or rare
sequence change is detected [Cooper and Shendure, 2011]. This
relates to uncertainty concerning the significance of novel nonsyn-
annotation of genomic coordinates of known disease variants, and
databases [Cooper and Shendure, 2011; Dalgleish et al., 2010].
In one study, it was noted that variation at an equivalent residue
in KCNQ1 (expressed in the heart; MIM# 607542) and KCNQ4
(expressed in the inner ear; MIM# 603537) causes LQT and autoso-
mal dominant deafness respectively [Kubisch et al., 1999] (MIM#s
192500, 600101). To date pathogenic variation in gene paralogues
represents an unexploited resource for the functional annotation
of nsSNPs in disease genes. We hypothesized that systematic anal-
ysis of disease-causing amino acid substitutions in LQT paralogues
could be used to annotate new pathogenic residues in LQT genes
for clinical research and molecular diagnostics.
paralogues predict pathogenic variation in LQT genes. Potential
of the IUPHAR Ion Channel Database [Sharman et al., 2011] and
BLAST [Altschul et al., 1990], and disease-causing nsSNPs in these
genes were retrieved from HGMD professional 2011.1 [Stenson
et al., 2003] and locus-specific databases (LSDBs) where available
http://grenada.lumc.nl/LOVD2/FHM/). We analyzed 12 of the
13 genes known to cause LQT [Hedley et al., 2009; Yang et al.,
2010]. Eight have one or more paralogues (n = 47) that contain
variants causing autosomal dominant disease (Supp. Table S1).
No disease-causing nsSNPs were found in paralogues of four LQT
genes (LQT9-12: CAV3, SCN4B, AKAP9, SNTA1; MIM#s 601253,
608256, 604001, 601017). LQT13 (KCNJ5; MIM# 600734) was not
included as only one disease-causing variant has been reported in a
single study. Typically, LQT paralogues represent ion channels and
cause diseases such as familial epilepsy, ataxia, and deafness that are
attributable to perturbed neuronal ion channel function.
genes and their paralogues using the M-Coffee algorithm [Wallace
of reliability of the alignment in any given region, so that mappings
alogues without disease-causing nsSNPs were included to achieve
the best possible alignments. Using these multiple sequence align-
C ?2012 WILEY PERIODICALS, INC.
protein. In total, known disease-causing amino acid substitutions
in LQT paralogues mapped to 1277 residues across the eight LQT
proteins (Table 1).
We next annotated all known disease-causing amino acid
substitutions in these eight LQT proteins using HGMD pro-
fessional 2011.1 [Stenson et al., 2003], dbSNP build 132
[Sherry et al., 2001], LSDBs (Gene Connection for the Heart,
http://www.fsm.it/cardmoc/, Human Variome project in China,
http://www.genomed.org/LOVD/), and additional variants re-
trieved from the published literature [Kapa et al., 2009]. All online
for each LQT gene was used for this analysis, using new Locus Ref-
these isoforms are shown in Supp. Table S1.
All variants reported as definite causes of long QT, short QT (SQT;
MIM# 609620), Brugada syndrome (BrS; MIM# 601144), or vari-
ants thereof were grouped together under the label “LQT.” LQT,
SQT and BrS are caused by variants in the same set of genes, but
grouped together so that any deleterious altered function (gain or
of direction of effect. Variants causing other inherited diseases were
categorized as Other Disease Phenotype (ODP). Disease associa-
tions (e.g., from genome-wide association studies) were excluded.
ODP variants included those reported as “possible” causes of LQT,
BrS or SQT, or as “definite” causes of an intermediate phenotype
such as cardiac arrhythmia. Many or most of these variants may
in fact cause LQT, but have been annotated distinctly to allow for
comparison of only the most robustly phenotyped variants.
Published variants described as benign in LSDBs or liter-
ature case series, or reported in dbSNP (dbSNP, http://www
.ncbi.nlm.nih.gov/projects/SNP/) with no disease phenotype and
a population frequency of >1% in any population (which we con-
sider incompatible with the population frequency of LQT), were
categorized as Benign polymorphisms. All other missense muta-
tions in dbSNP with no reported cardiac phenotype were classified
as Probably Benign (PB): as many variants have been reported in
dbSNP without associated phenotype data, and LQT is not always
highly penetrant, there may be some false negatives in this dataset.
Instances where the same variant or residue was classified as disease
causing and benign in different reports were labeled as conflicts.
Of the 1277 mappable pathogenic variants from LQT gene
paralogues, 185 mapped to LQT residues where variation has
previously been unambiguously defined as either pathogenic
(n = 182, 98.4%) or benign (n = 3, 1.6%). This demonstrates that
LQT residues at equivalent sites to known pathogenic variation in
LQT paralogues are significantly enriched for LQT-causing variants
(Figure 1; see also Supp. Figure S1). Comparing the most robust
a positive predictive value (PPV) of LQT pathogenicity of 98.4%.
When including less reliable annotations, that is, including ODP
as additional true positives and PB as false positives, the PPV of
the method remains high (96.4%), and the enrichment significant
(P = 2.0 × 10–7). By comparison, SIFT [Kumar et al., 2009] (ver-
sion 4.0.3b) and PolyPhen-2 [Adzhubei et al., 2010] (version 2.1.0,
HumVar classifier) have PPVs of 93.6–95.5% and 90.6–95.9%, re-
spectively, using the same dataset (Supp. Table S3).
Mendelian pathogenic variants are more frequently located in
culate domain-specific estimates of the probability of pathogenicity
of variants in three LQT genes [Kapa et al., 2009]. We annotated
Table 1. Functional Classification of Residues in Long QT Proteins
Residues with published variants in LQT proteins
Residues with disease variants mapped from paralogues
Left panel: for each protein, residues are categorized according to the phenotype associated with variants at that residue: variation at the residue causes definite long QT; short QT or Brugada syndrome (LQT); causes other disease phenotype (ODP);
is benign (Benign); or is probably benign (PB). A number of residues have conflicting reports of pathogenicity in the literature (Conflict) and many residues have no reported variation and are unannotated (UN). Right panel: residues identified by
mapping of disease-causing variants from paralogues are significantly enriched for known disease-causing variants (P = 4.8 × 10, Fisher’s exact test), and annotate 1078 novel putative disease-causing loci.
HUMAN MUTATION, Vol. 33, No. 8, 1188–1191, 2012
paralogous annotation. Previously annotated disease-causing residues are shown in black. Predicted disease-causing residues are shown in red.
of protein domains. All known variants and novel paralogue mappings for this and for the other seven LQT genes are shown in Supp. Figure S1.
Schematic representation of SCN5A protein showing known disease-causing residues and those predicted to be disease causing by
each protein with its protein structure, using the domain annota-
and SCN5A), or annotations from Swiss-Prot (for KCNE1, KCNE2,
KCNJ2, and CACNA1C). ANK2 is not presented due to a paucity of
domain features and small number of mapped variants. Expected
and observed numbers of variants in each protein region were tab-
ulated, and chi-square tests used to identify genes with nonuniform
distributions of variants (Supp. Table S4). We observed that vari-
ants in paralogous genes mapped to LQT residues with patterns
of domain enrichment similar to those previously reported, and
we suggest that the location of mapped paralogue variants may be
used to inform domain-specific estimates of pathogenicity for less
well-characterized LQT genes (e.g., CACNA1C and KCNJ2).
Paralogous annotation of pathogenic Mendelian variation is
widely applicable. By inference the technique can be applied in a
reciprocal fashion across the gene families that we have studied,
using annotated variants in LQT genes to interpret variation in in-
herited epilepsy genes (Supp. Table S5). We examined all disease
genes in HGMD pro 2011.1 [Stenson et al., 2003], and identified
1824 genes (45.5% of the dataset) with one or more paralogues
that contain disease-causing variants (average 3.2 paralogues per
gene). This preliminary analysis suggests there are over 150,000 po-
tentially informative annotations from disease-causing variants in
paralogues of human disease genes.
The accuracy of the method we describe here depends on re-
liable phenotype data associated with genetic variants that need
to be in an accessible format and readily available (e.g., via the
European Genome-Phenome Archive, http://www.ebi.ac.uk/ega/).
The correct and standardized annotation of genetic variants in
these datasets is fundamental. In the course of this study, we
identified numerous errors arising from the use of alternate ref-
erence sequences (Supp. Table S6). The Human Genome Varia-
tion Society recommends the use of a Locus Reference Genomic
(LRG) sequence [Dalgleish et al., 2010] to overcome these de-
ficiencies (http://www.lrg-sequence.org/). Hence, to enable accu-
rate annotation of LQT variants for clinical application we es-
tablished new LRG coordinates for all 13 LQT genes. For the
eight LQT genes studied here, we have collated published vari-
ants, submitted annotations to dbSNP and created a compre-
hensive and freely available resource for LQT research and clini-
cal diagnostics (http://cardiodb.org/Paralogue_Annotation/; Supp.
In summary, we applied systematic multiple sequence alignment
This identified novel putative disease-causing variants in ∼10% of
previously unannotated LQT residues that adds significantly to the
functional annotation of LQT genes. We generated a comprehen-
along with existing annotations, for the wider scientific and clinical
community that will become increasingly powerful with cumula-
tive annotations from resequencing studies [Cooper and Shendure,
2011]. The technique, we describe here, is widely transferable to
human disease genes and, we believe, important for clinical inter-
pretation of novel missense mutations.
Published variants in LQT genes collated in this paper have been submitted
to dbSNP with batch ID 1056584, submitter handle RBH_CV_BRU.
Ackerman MJ, Priori SG, Willems S, Berul C, Brugada R, Calkins H, Camm AJ, Ellinor
PT, Gollob M, Hamilton R, Hershberger RE, Judge DP, et al. 2011. HRS/EHRA
expert consensus statement on the state of genetic testing for the channelopathies
and cardiomyopathies: this document was developed as a partnership between
the Heart Rhythm Society (HRS) and the European Heart Rhythm Association
(EHRA). Europace 13:1077–1109.
Adzhubei IA,SchmidtS,Peshkin L, RamenskyVE,Gerasimova A,BorkP, Kondrashov
AS, Sunyaev SR. 2010. A method and server for predicting damaging missense
mutations. Nat Methods 7:248–249.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment
search tool. J Mol Biol 215:403–410.
Campuzano O, Beltran-Alvarez P, Iglesias A, Scornik F, Perez G, Brugada R. 2010.
Genetics and cardiac channelopathies. Genet Med 12:260–267.
Cooper GM, Shendure J. 2011. Needles in stacks of needles: finding disease-causal
variants in a wealth of genomic data. Nat Rev Genet 12:628–640.
WM, Larsson P, Vaughan BW, Beroud C, Dobson G, et al. 2010. Locus Refer-
ence Genomic sequences: an improved basis for describing human DNA variants.
Genome Med 2:24.
Hedley PL, Jorgensen P, Schlamowitz S, Wangari R, Moolman-Smook J, Brink PA,
Kanters JK, Corfield VA, Christiansen M. 2009. The genetic basis of long QT and
short QT syndromes: a mutation update. Hum Mutat 30:1486–1511.
HUMAN MUTATION, Vol. 33, No. 8, 1188–1191, 2012
mutations from benign variants. Circulation 120:1752–1760.
Kubisch C, Schroeder BC, Friedrich T, Lutjohann B, El-Amraoui A, Marlin S, Petit C,
Jentsch TJ. 1999. KCNQ4, a novel potassium channel expressed in sensory outer
hair cells, is mutated in dominant deafness. Cell 96:437–446.
Kumar P, Henikoff S, Ng PC. 2009. Predicting the effects of coding non-synonymous
variants on protein function using the SIFT algorithm. Nat Protoc 4:1073–
Sharman JL, Mpamhanga CP, Spedding M, Germain P, Staels B, Dacquet C, Laudet
V, Harmar AJ. 2011. IUPHAR-DB: new receptors and tools for easy search-
ing and visualization of pharmacological data. Nucleic Acids Res 39:D534–
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. 2001.
dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:308–311.
Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS, Abeysinghe S,
Krawczak M, Cooper DN. 2003. Human Gene Mutation Database (HGMD):
2003 update. Hum Mutat 21:577–581.
Tester DJ, Ackerman MJ. 2011. Genetic testing for potentially lethal, highly treat-
able inherited cardiomyopathies/channelopathies in clinical practice. Circulation
Yang Y, Liang B, Liu J, Li J, Grunnet M, Olesen SP, Rasmussen HB, Ellinor PT, Gao L,
Lin X, Li L, Wang L, et al. 2010. Identification of a Kir3.4 mutation in congenital
long QT syndrome. Am J Hum Genet 86:872–880.
HUMAN MUTATION, Vol. 33, No. 8, 1188–1191, 2012