ArticlePDF Available

Credibility Analysis of Putative Disease-Causing Genes Using Bioinformatics

Authors:

Abstract and Figures

Genetic studies are challenging in many complex diseases, particularly those with limited diagnostic certainty, low prevalence or of old age. The result is that genes may be reported as disease-causing with varying levels of evidence, and in some cases, the data may be so limited as to be indistinguishable from chance findings. When there are large numbers of such genes, an objective method for ranking the evidence is useful. Using the neurodegenerative and complex disease amyotrophic lateral sclerosis (ALS) as a model, and the disease-specific database ALSoD, the objective is to develop a method using publicly available data to generate a credibility score for putative disease-causing genes. Genes with at least one publication suggesting involvement in adult onset familial ALS were collated following an exhaustive literature search. SQL was used to generate a score by extracting information from the publications and combined with a pathogenicity analysis using bioinformatics tools. The resulting score allowed us to rank genes in order of credibility. To validate the method, we compared the objective ranking with a rank generated by ALS genetics experts. Spearman's Rho was used to compare rankings generated by the different methods. THE AUTOMATED METHOD RANKED ALS GENES IN THE FOLLOWING ORDER: SOD1, TARDBP, FUS, ANG, SPG11, NEFH, OPTN, ALS2, SETX, FIG4, VAPB, DCTN1, TAF15, VCP, DAO. This compared very well to the ranking of ALS genetics experts, with Spearman's Rho of 0.69 (P = 0.009). We have presented an automated method for scoring the level of evidence for a gene being disease-causing. In developing the method we have used the model disease ALS, but it could equally be applied to any disease in which there is genotypic uncertainty.
Content may be subject to copyright.
Credibility Analysis of Putative Disease-Causing Genes
Using Bioinformatics
Olubunmi Abel
1
, John F. Powell
2
, Peter M. Andersen
3,4
, Ammar Al-Chalabi
1
*
1King’s Health Partners Centre for Neurodegeneration Research, King’s College London, Department of Clinical Neuroscience, London, United Kingdom, 2Department of
Neuroscience, King’s College London, London, United Kingdom, 3Institute of Pharmacology and Clinical Neuroscience, Section for Neurology, Umea
˚University, Umea
˚,
Sweden, 4Department of Neurology, University of Ulm, Ulm, Germany
Abstract
Background:
Genetic studies are challenging in many complex diseases, particularly those with limited diagnostic certainty,
low prevalence or of old age. The result is that genes may be reported as disease-causing with varying levels of evidence,
and in some cases, the data may be so limited as to be indistinguishable from chance findings. When there are large
numbers of such genes, an objective method for ranking the evidence is useful. Using the neurodegenerative and complex
disease amyotrophic lateral sclerosis (ALS) as a model, and the disease-specific database ALSoD, the objective is to develop
a method using publicly available data to generate a credibility score for putative disease-causing genes.
Methods:
Genes with at least one publication suggesting involvement in adult onset familial ALS were collated following an
exhaustive literature search. SQL was used to generate a score by extracting information from the publications and
combined with a pathogenicity analysis using bioinformatics tools. The resulting score allowed us to rank genes in order of
credibility. To validate the method, we compared the objective ranking with a rank generated by ALS genetics experts.
Spearman’s Rho was used to compare rankings generated by the different methods.
Results:
The automated method ranked ALS genes in the following order: SOD1,TARDBP,FUS,ANG,SPG11,NEFH,OPTN,
ALS2,SETX,FIG4,VAPB,DCTN1,TAF15,VCP,DAO. This compared very well to the ranking of ALS genetics experts, with
Spearman’s Rho of 0.69 (P = 0.009).
Conclusion:
We have presented an automated method for scoring the level of evidence for a gene being disease-causing. In
developing the method we have used the model disease ALS, but it could equally be applied to any disease in which there
is genotypic uncertainty.
Citation: Abel O, Powell JF, Andersen PM, Al-Chalabi A (2013) Credibility Analysis of Putative Disease-Causing Genes Using Bioinformatics. PLoS ONE 8(6): e64899.
doi:10.1371/journal.pone.0064899
Editor: Bart Dermaut, Pasteur Institute of Lille, France
Received January 24, 2013; Accepted April 19, 2013; Published June 5, 2013
Copyright: ß2013 Abel et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: The authors are especially grateful for the long-standing and continued funding of this project from the ALS Association and the MND Association of
Great Britain and Northern Ireland. They also thank ALS Canada, MNDA Iceland and the ALS Therapy Alliance for support. The research leading to these results has
received funding from the European Community ’s Health Seventh Framework Programme FP7/2007–2013 under grant agreement number 259867. AA-C
receives salary support from the National Institute for Health Research (NIHR) Dementia Biomedical Research Unit at South London and Maudsley NHS Foundation
Trust and King’s College London. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health. Aleks
Radunovic, Nigel Leigh, and Ian Gowrie originally conceived ALSoD. ALSoD is a joint project of the World Federation of Neurology (WFN) and European Network
for the Cure ALS (ENCALS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: AA-C is a consultant for Biogen Idec and Cytokinetics. There are no patents, products in development or marketed products to declare.
This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials.
* E-mail: al-chalabi@kcl.ac.uk
Introduction
Genetic studies are challenging in many complex diseases,
particularly those with limited diagnostic certainty, low incidence
and prevalence, or those of old age. Association studies suffer a
reduction in power when there is phenotypic heterogeneity
resulting from difficulty with diagnosis, and linkage studies are
limited because the older generations are not available and the
younger generations have not yet reached the age of risk. The
result is that genes are reported as causative with varying levels of
evidence and it can be difficult for those not in the field to assess
how credible any genetic evidence is.
One such condition is amyotrophic lateral sclerosis (ALS). This
is an adult onset neurodegenerative syndrome of upper and lower
motor neuron degeneration, with a mean age of onset of 56 in
diagnosed familial cases (FALS) and 60 to 70 years in apparently
sporadic cases, and an average survival of 3 to 5 years from
symptom onset [1] [2]. Illustrating the complexity and difficulty in
performing genetic research on ALS, the reported frequency of
familial ALS varies from 0.8% [3] to 17–18% [4] although all
studies agree that most cases are apparently sporadic [5]. There is,
however, a genetic basis both to familial and apparently sporadic
ALS [6,7,8]. All genes reported mutated in familial ALS have also
been found mutated in sporadic ALS. Because of the late age of
onset and poor prognosis, suitable families are difficult to collect
for linkage, and large populations are difficult to collect for
association.
PLOS ONE | www.plosone.org 1 June 2013 | Volume 8 | Issue 6 | e64899
The first gene identified for familial ALS was SOD1 [9] [10].
Through linkage and association studies of SNPs, microsatellites
and copy number variants, as well as through direct sequencing of
candidate genes and whole exome sequencing using high
throughput methods, over 100 genes have now been implicated
in the cause of ALS [11]. The level of supporting evidence for each
gene or gene variant varies from small to overwhelming, and is in
some cases contradictory. Furthermore, the increasing cooperation
between ALS researchers internationally, and the understanding
that large datasets are needed, coupled with advances in
technology, mean that the rate of detection of putative new ALS
genes is rapid and increasing. This leads to two immediate
problems: first, it is difficult to keep up with what is an ‘‘accepted’’
ALS gene, and second, there is no simple, objective way to define
the list of ALS-causing genes. As a result, researchers may find
themselves unable to agree on whether any one gene is an ALS
gene or not. The situation is further compounded by the loose
definition of ALS, which for genetic purposes has a far wider
phenotypic definition than most ALS researchers would accept in
a clinical setting [12]. For example, ALS2 includes an infantile,
slowly progressive upper motor neuron syndrome that is most
similar to hereditary spastic paraparesis, rather than an adult onset
mixed upper and lower motor neuron syndrome with a poor
prognosis for survival. Similarly, ALS with frontotemporal
dementia is regarded as a slightly different entity from ALS even
though frontotemporal dementia and ALS are in at least some
cases a continuum of disease, and in many cases ALS genes and
frontotemporal dementia genes are the same as genes for ALS with
frontotemporal dementia.
One solution to this problem is to design some method for
objectively scoring the level of evidence supporting a gene or gene
variant as disease causing. This would have the advantage that the
phenotype could be defined by the user, allowing a loose definition
or more stringent definition as required.
The ALSoD database stores data on putative ALS genes using
information derived from publications and directly input by
researchers. We have therefore explored the possibility of using
these data to generate a credibility score for ALS genes with the
aim of producing a system that can be generalized to other similar
conditions.
Methods
PRISMA revision [13] with respect to development and
reporting of results were taken into consideration. (Checklist S1).
Data Collection
Genes with at least one publication suggesting involvement in
adult onset familial ALS were studied [14]. We excluded genes
with limited clinical data, absent mutational data or unreplicated
results. Publicly listed variants for the included genes derived from
ALSGene, Uniprot, ALS Mutation and HGMD databases were
merged with variant lists in ALSoD, and filtered for duplicates
(Figure 1).
Pathogenicity Analysis Using Bioinformatic Tools
PANTHER (Protein Analysis Through Evolutionary Relation-
ships) [15], SIFT (Sorting Intolerant From Tolerant) [16] and
POLYPHEN (Polymorphism Phenotyping) [17] programs were
used to analyse variants for possible pathogenicity. These tools
generated a set of scores for the variants analysed, which for
PANTHER are given as a subSPEC (substitution position-specific
evolutionary conservation) score and for POLYPHEN given as
score differences for PSIC (position-specific independent counts).
In PANTHER, all possible mutations for each gene were
generated using perl scripts and run on the web service in batches.
SubPSEC scores ,=25.0 were defined as damaging and
subPSEC scores .25.0 defined as not damaging. In SIFT, all
possible unique codons in each gene were generated using perl
script with scores ,= 0.05 defined as damaging and scores .0.05
defined as not damaging. In POLYPHEN, all mutations available
in a gene on ALSoD were run through the web service one after
the other and PSIC score differences .= 1.5 defined as damaging
and PSIC score differences ,1.5 as not damaging.
Data Extraction from Publications
We conducted a systematic review of all publications related to
ALS genetics with an exhaustive combination of search queries on
the 15 genes mentioned above. (Flow diagram S1 and Protocol
S1).
In the PubMed database, we used title keywords consisting of
the gene name, ‘‘mutation’’ and ‘‘ALS’’ or ‘‘Motor Neuron
Disease’’, or gene name and ‘‘novel’’ to identify key publications
and then used the related citations function to generate a list of
publications for data extraction. For example, (SOD1[Title] OR
(superoxide dismutase[Title]) AND (mutation[Title] OR novel
[Title] )AND ((Amyotrophic Lateral Sclerosis[Title]) OR (Motor
Neuron Disease[Title]) OR ALS[Title]) yielding 181 results.
These results were further filtered by choosing ‘‘Humans’’ as
Species and sorted by ‘‘Recently Added’’ thereby displaying 160
unique publications. From the list displayed, we also searched the
‘‘Related citations’’ link on the first publication [10] of the selected
gene SOD1 yielding 204 results.
We used Google Scholar (http://scholar.google.co.uk/) to
identify publications for import into the ALSoD database, starting
with basic search queries to generate a large number of
publications. For example, ‘‘SOD1’’ gave about 28,600 results
but ‘‘SOD1 novel mutations variants ALS ‘‘amyotrophic lateral
sclerosis’’ ‘‘motor neuron disease’’ gave 2050 results. We went
through the first 20 pages containing 20 publications on each page
and already sorted by relevance. Publications with animal models
or associated with other diseases were excluded from the long list.
A manual comparison with already discovered publications from
pubmed was conducted and these were excluded from the list.
Manually curated data extracted from all publications included
family history, El Escorial category [18,19] mutations per gene,
number of cases and controls used in the studies, mutations in the
same codon, number of patients with family history (FALS),
number of patients without family history, mutations replicated in
other studies, number of countries replicating the mutation and for
linkage studies, LOD scores. Several genes implicated in ALS are
also implicated in other diseases, including frontotemporal
dementia, spinocerebellar ataxia and parkinsonism. To avoid the
problem of non-ALS patients being included in the database, we
restricted data curation to publications specifying ALS.
Automated Gene Ranking
Eleven queries stored as procedures were performed on data
collated. These were: 1. The total number of affected patients with
El Escorial defined ALS having a mutation in each gene [14,15].
2. The total number of ALS affected patients used in each study.
This measure was used to account for sampling variance and
power [20]. 3. The total number of healthy individuals with a
mutation reported in each study. 4. The total number of healthy
individuals used in each study. 5. The total number of mutations
sharing the same codon. 6. The total number of variants detected
in ALS patients for each gene. 7. The total number of mutations
with positive pathogenic predictions from the use of the three
Credibility Analysis of FALS Genes
PLOS ONE | www.plosone.org 2 June 2013 | Volume 8 | Issue 6 | e64899
bioinformatics tools described above. 8. The number of patients
with a family history defined as at least one other affected member
of the family. 9. The number of patients without a family history of
ALS. 10. The number of times a particular variation was
replicated across different studies. 11. The number of unique
populations where affected patients originated.
For each procedure above, a query was generated using
Structured Query Language (SQL) on Microsoft SQL Server
2008 and displayed on the ASP.NET platform webpage, ranking
the gene. The predicted pathogenicity score for each tool was
scored 1 for predicted pathogenic and 0 for predicted not
pathogenic and then summed to generate a final score for ranking
(http://alsod.iop.kcl.ac.uk/Statistics/pathogenicity.aspx). The
rank score for each query was summed to generate an overall
rank for the gene under study. For example, from Figure 2, the last
row for the DAO gene gives the column score 15 for Rank_Muta-
tions, 14 for Rank_Patients and 9 for Rank_Pathogenicity. This
produces a total of 38 (that is 15+14+9) in the Rank_Sum column.
The generated Rank_Sum for all the genes are arranged in
ascending order placing DAO 12th by final rank. On the other
hand, FUS is placed 3rd by final rank as the corresponding scores
are 3+3+2=8.
There are two possible ways of ranking results in SQL. The
default method allocates rank based on the true position, such that
if two genes are given equal first position for example, the next
gene is in third position, not second. The dense rank method
allocates the next gene as second so that there are no gaps in the
rank numbering. We used the dense ranking system.
Validation of the Method
The purpose of the credibility score tool is to generate a list of
genes in order of the weight of evidence supporting involvement in
ALS. Such a list should correlate closely with one generated by
ALS genetics experts, since such experts should have a good
working knowledge of the available evidence. We therefore
conducted a survey of ALS genetic experts, defined as being
individuals who had published as first or senior author on ALS
genetics. Experts were surveyed using the freely available online
questionnaire tool, Surveymonkey on http://www.surveymonkey.
com/s/WRDW5WT (Figure 3). The survey link showed the genes
randomly ordered differently every time the link was clicked to
prevent bias in the responses that might occur based on ordering.
We also embedded the questionnaire as a submenu on the
feedback menu of the ALSoD website. Experts were randomly
assigned to one of two groups, one in which the same rank could
be assigned to several genes, and one in which responders were
forced to rank each gene in order. The first group mimics the final
score of the automated method closely, while the second group
mimics the detail of the automated ranking method closely, since
the automated method is forced to rank each query uniquely but
Figure 1. Overview of credibility analysis method.
doi:10.1371/journal.pone.0064899.g001
Credibility Analysis of FALS Genes
PLOS ONE | www.plosone.org 3 June 2013 | Volume 8 | Issue 6 | e64899
the combined ranking could result in the same value for different
genes.
User Interface
The Credibility Analysis page at (http://alsod.iop.kcl.ac.uk/
Statistics/credibility.aspx) allows criteria to be selected by users in
the form of checkboxes. Clicking the ‘Analyse’ button then
displays the ranked result. A detailed summary of ranked
credibility data are also displayed for further reference by users
giving the outcome of each procedure based query. Any
combination of queries can be included in generating the score
except Number of patients and Number of mutations found in
each gene which are mandatory selections.
Statistical Methods
Spearman’s Rho [21,22,23] was used to compare rankings
generated by the automated method and the ALS genetics experts.
Results
For the pathogenicity prediction, using a threshold score .1
(that is, where the combination score is 2 or 3) to define
pathogenicity, just 110 mutations out of 425 were identified as
pathogenic, with particularly poor predictions for FUS and
TARDBP when compared with biological evidence of pathogenic-
ity. Using a threshold score of .0 (that is, where the combination
score is 1 or 2 or 3) to define pathogenicity brought the number of
pathogenic mutations to 198, suggesting that about 50% of
recorded FALS mutations are pathogenic based on bioinformatics
predictions.
There were 14 genes that fulfilled the inclusion criteria for
generation of a credibility score at the time of the survey, and had
sufficient data manually curated from publications as explained in
the data extraction process above. These were ALS2,FUS,DAO,
VCP,VAPB,ANG,DCTN1,FIG4,SETX,SOD1,TARDBP,SPG11,
NEFH, and OPTN.
Figure 2. Credibility Analysis webpage.
doi:10.1371/journal.pone.0064899.g002
Credibility Analysis of FALS Genes
PLOS ONE | www.plosone.org 4 June 2013 | Volume 8 | Issue 6 | e64899
Using the full set of 11 procedures, the automated method
ranked these as ALS-causing genes in the following order: SOD1,
TARDBP,FUS,ANG,SPG11,NEFH,OPTN,ALS2,SETX,FIG4,
VAPB,DCTN1,TAF15,VCP,DAO.
Subsets of the 11 procedures may be defined by the user if
needed. This allows flexibility in which evidence is regarded as
useful. For example in Figure 3, using the number of mutations
reported in a single gene and the number predicted as pathogenic
as test criteria ranks the genes in the following order: SOD1,
TARDBP,FUS,ANG,OPTN,SETX,ALS2,SPG11,FIG4,DCTN1,
VAPB,VCP,DAO. The output shows that the first six genes, SOD1,
TARDBP,FUS,ANG,OPTN and SETX, have a total of 121, 17, 19,
12, 5 and 4 pathogenic mutations respectively and, for example,
the I113T, D90A and A4V pathogenic mutations of the SOD1
gene were replicated in 17, 14 and 12 studies. It also shows there
are 6 different mutations in codon 93 of SOD1 and 5 different
mutations in codon 521 of FUS. Other displayed information
includes the number of countries in which gene mutations have
been reported. For example, SOD1 mutation has been reported in
34 countries with representation from every continent of the
world, while TARDBP,ALS2,ANG,FUS,SETX and NEFH have
been reported in 13, 9, 7, 7, 6 and 5 unique countries respectively.
Genes like FIG4,DPP6,DCTN1,UBQLN2,TAF15 which were
recorded in only 1 country each have the lowest ranks.
8/25 ALS genetics experts selected based on having published
at least one paper on ALS genetics responded. Comparison of the
full automated method with the ALS genetics experts’ rankings
gave a Spearman’s Rho of 0.69 (P = 0.009) for the forced expert
rankings, and 0.57 (P = 0.042) for the unforced rankings,
indicating a good correlation between the methods.
Discussion
We have presented an automated method for using published
information to score the level of evidence supporting a causative
relationship between gene mutation and a disease. The informa-
tion on which the credibility analysis is based is collected routinely
by locus-specific databases and the method can therefore be
generalized to other diseases. The method used has been applied
to amyotrophic lateral sclerosis but could equally be applied to any
disease in which there is phenotypic and genotypic heterogeneity.
A strength of this method is that multiple lines of evidence are
used to generate an objective opinion as to the credibility of a gene
as a disease gene, and while publication bias will affect the score,
this is minimized by several factors. First, in this study unpublished
Figure 3. Surveymonkey survey tool for ranking 14 genes.
doi:10.1371/journal.pone.0064899.g003
Credibility Analysis of FALS Genes
PLOS ONE | www.plosone.org 5 June 2013 | Volume 8 | Issue 6 | e64899
data are used since the database includes directly input
information from researchers who have not published. Second, a
major part of the score is generated using theoretical models of
pathogenicity. Third, once published, any information remains
useable, and not prone to the vagaries of scientific fashion, or the
bias of individual opinion leaders. The effects of these components
on the score can be seen by comparing the automated ranking and
the ranking generated by both groups of ALS genetics experts. In
general the rankings were in agreement. For example, with one
exception, the top five genes were the same for all three methods.
For some genes there were strikingly different ranks. ANG was
ranked 9 of 13 by the experts who could give equal ranks, but in
the top five for the other two methods. The biggest discrepancies
were otherwise for ALS2, NEFH, and VAPB, each of which was
ranked in the bottom two for one of the methods and in the middle
for the other two methods.
Similar approaches have been used in association studies. In
previous work, three criteria used to determine how credible a
disease gene might be were the amount of evidence, manifest as
number of studies and population size studied, replicability of a
result, and protection from bias by good study design [24]. We
have tried to follow similar principles in generating this credibility
score.
A weakness of this method is that it relies on an agreed set of
criteria for analysis to generate the score, but there is no way to
decide objectively whether the criteria are reasonable or what their
relative weights should be. For example, we have not included
pathogenicity demonstrated in animal models in the score but
others might regard this as a vital component. Although we have
tried to build in flexibility so that researchers can include or
exclude certain criteria, unless the available criteria are exhaustive
there will always be the possibility that the method is incomplete.
Similarly, because the criteria can be user-selected, there can be no
truly universal measure of credibility using this system.
Since this tool was developed, pathological expansion in the
C9orf72 gene has been identified as a cause of ALS and
frontotemporal dementia [25,26]. At the time of our survey of
experts this was not the case and it has therefore been excluded
from the analysis presented.
A major advantage of this tool is the automation which changes
the rank of a gene depending on the evidence provided on the
database. This system could be applied to other complex diseases
where multiple genes are responsible for a phenotype.
Supporting Information
Checklist S1
(DOC)
Flow Diagram S1
(DOC)
Protocol S1
(DOC)
Author Contributions
Conceived and designed the experiments: OA JFP PMA AA-C. Wrote the
paper: OA JFP PMA AA-C. Advised on criteria used and literature review:
JFP PMA AA-C. Contributed genetic data: PMA AA-C. Survey and
Statistical Analysis of data: OA AA-C.
References
1. Charcot J, Joffroy A (1869) Deux cas d’atrophie musculaire progressive. Arch
Physiol Norm Pathol 2: 744–760.
2. Cleveland DW, Rothstein JD (2001) From Charcot to Lou Gehrig: deciphering
selective motor neuron death in ALS. Nature Reviews Neuroscience 2: 806–819.
3. Fong GCY, Kwok KHH, Song Y, Cheng T, Ho PWL, et al. (2006) Clinical
phenotypes of a large Chinese multigenerational kindred with autosomal
dominant familial ALS due to Ile149Thr SOD1 gene mutation. Amyotrophic
Lateral Sclerosis 7: 142–149.
4. Eisen A, Mezei MM, Stewart HG, Fabros M, Gibson G, et al. (2008) SOD1
gene mutations in ALS patients from British Columbia, Canada: clinical
features, neurophysiology and ethical issues in management. Amyotrophic
Lateral Sclerosis 9: 108–119.
5. Byrne S, Walsh C, Lynch C, Bede P, Elamin M, et al. (2011) Rate of familial
amyotrophic lateral sclerosis: a systematic review and meta-analysis. Journal of
Neurology, Neurosurgery and Psychiatry 82: 623.
6. Al-Chalabi A, Lewis CM (2011) Modelling the effects of penetrance and family
size on rates of sporadic and familial disease. Human Heredity 71: 281–288.
7. Hanby MF, Scott KM, Scotton W, Wijesekera L, Mole T, et al. (2011) The risk
to relatives of patients with sporadic amyotrophic lateral sclerosis. Brain.
8. Al-Chalabi A, Leigh PN (2005) Trouble on the pitch: are professional football
players at increased risk of developing amyotrophic lateral sclerosis? Brain 128:
451.
9. Siddique T, Figlewigz DA, Pericak-Vance MA, Haines JL, Rouleau G, et al.
(1991) Linkage of a gene causing familial amyotrophic lateral sclerosis to
chromosome 21 and evidence of genetic-locus heterogeneity. New England
Journal of Medicine 324: 1381–1384.
10. Rosen D, Siddique T, Patterson D, Figlewicz D, Sapp P, et al. (1993) Mutations
in Cu/Zn superoxide dismutase gene are associated with familial amyotrophic
lateral sclerosis.
11. Lill CM, Abel O, Bertram L, Al-Chalabi A (2011) Keeping up with genetic
discoveries in a myotrophic lateral sclerosis: The A LSoD and ALSGene
databases. Amyotrophic Lateral Sclerosis 12: 238–249.
12. Hamosh A, Scott A, Amberger J, Valle D, McKusick V (2000) Online
Mendelian Inheritance in Man (OMIM) Hum. Mutat 15: 57–61.
13. Moher D, Liberati A, Tetzlaff J, Altman DG (2009) Preferred reporting items for
systematic reviews and meta-analyses: the PRISMA statement. PLoS medicine
6: e1000097.
14. Andersen PM, Al-Chalabi A (2011) Clinical genetics of amyotrophic lateral
sclerosis: what do we really know? Nature Reviews Neurology.
15. Thomas PD, Kejariwal A, Campbell MJ, Mi H, Diemer K, et al. (2003)
PANTHER: a browsable database of gene products organized by biological
function, using curated protein family and subfamily classification. Nucleic Acids
Research 31: 334.
16. Ng PC, Henikoff S (2003) SIFT: Predicting amino acid changes that affect
protein function. Nucleic Acids Research 31: 3812.
17. Sunyaev S, Ramensky V, Koch I, Lathe W III, Kondrashov AS, et al. (2001)
Prediction of deleterious human alleles. Human Molecular Genetics 10: 591.
18. Brooks BR (1994) El Escorial World Federation of Neurology criteria for the
diagnosis of amyotrophic lateral sclerosis. Subcommittee on Motor Neuron
Diseases/Amyotrophic Lateral Sclerosis of the World Federation of Neurology
Research Group on Neuromuscular Diseases and the El Escorial ‘‘Clinical limits
of amyotrophic lateral sclerosis’’ workshop contributors. Journal of the
Neurological Sciences 124: 96.
19. Brooks BR, Miller RG, Swash M, Munsat TL (2000) El Escorial revisited:
revised criteria for the diagnosis of amyotrophic lateral sclerosis. Amyotrophic
lateral sclerosis and other motor neuron disorders: official publication of the
World Federation of Neurology, Research Group on Motor Neuron Diseases 1:
293.
20. Agency for Toxic Substances and Disease Registry (2011) National Amyotrophic
Lateral Sclerosis (ALS) Registry.
21. David F, Mallows C (1961) The variance of Spearman’s rho in normal samples.
Biometrika 48: 19–28.
22. Ramsey PH (1989) Critical values for Spearman’s rank order correlation.
Journal of Educational and Behavioral Statistics 14: 245–253.
23. Yue S, Pilon P, Cavadias G (2002) Power of the Mann-Kendall and Spearman’s
rho tests for detecting monotonic trends in hydrological series. Journal of
Hydrology 259: 254–271.
24. Ioannidis J, Boffetta P, Little J, O’Brien TR, Uitterlinden AG, et al. (2008)
Assessment of cumulative evidence on genetic associations: interim guidelines.
International Journal of Epidemiology 37: 120.
25. DeJesus-Hernandez M, Mackenzie IR, Boeve BF, Boxer AL, Baker M, et al.
(2011) Expanded GGGGCC Hexanucleotide Repeat in Noncoding Region of
C9ORF72 Causes Chromosome 9p-Linked FTD and ALS. Neuron.
26. Renton AE, Majounie E, Waite A, SimA
˜3n-SA
˜¡nchez J, Rollinson S, et al.
(2011) A Hexanucleotide Repeat Expansion in C9ORF72 Is the Cause of
Chromosome 9p21-Linked ALS-FTD. Neuron.
Credibility Analysis of FALS Genes
PLOS ONE | www.plosone.org 6 June 2013 | Volume 8 | Issue 6 | e64899
... uk/Statistics/pathogenicity.aspx referenced in [43], [47,48]. Fig 1 lists genes with known causal ALS mutations, along with their chromosomal position and number of "pathogenic" ALS variants reported in at least one of three databanks queried. ...
... We next checked whether any of the remaining variants corresponded to one of the "pathogenic" ALS mutations catalogued in at least one of three databases queried (the databank http://alsod.iop.kcl.ac.uk/Statistics/pathogenicity.aspx referenced in [43,47,48]). ...
Article
Full-text available
ALS is a rapidly progressive, devastating neurodegenerative illness of adults that produces disabling weakness and spasticity arising from death of lower and upper motor neurons. No meaningful therapies exist to slow ALS progression, and molecular insights into pathogenesis and progression are sorely needed. In that context, we used high-depth, next generation RNA sequencing (RNAseq, Illumina) to define gene network abnormalities in RNA samples depleted of rRNA and isolated from cervical spinal cord sections of 7 ALS and 8 CTL samples. We aligned >50 million 2X150 bp paired-end sequences/sample to the hg19 human genome and applied three different algorithms (Cuffdiff2, DEseq2, EdgeR) for identification of differentially expressed genes (DEG’s). Ingenuity Pathways Analysis (IPA) and Weighted Gene Co-expression Network Analysis (WGCNA) identified inflammatory processes as significantly elevated in our ALS samples, with tumor necrosis factor (TNF) found to be a major pathway regulator (IPA) and TNFα-induced protein 2 (TNFAIP2) as a major network “hub” gene (WGCNA). Using the oPOSSUM algorithm, we analyzed transcription factors (TF) controlling expression of the nine DEG/hub genes in the ALS samples and identified TF’s involved in inflammation (NFkB, REL, NFkB1) and macrophage function (NR1H2::RXRA heterodimer). Transient expression in human iPSC-derived motor neurons of TNFAIP2 (also a DEG identified by all three algorithms) reduced cell viability and induced caspase 3/7 activation. Using high-density RNAseq, multiple algorithms for DEG identification, and an unsupervised gene co-expression network approach, we identified significant elevation of inflammatory processes in ALS spinal cord with TNF as a major regulatory molecule. Overexpression of the DEG TNFAIP2 in human motor neurons, the population most vulnerable to die in ALS, increased cell death and caspase 3/7 activation. We propose that therapies targeted to reduce inflammatory TNFα signaling may be helpful in ALS patients.
... 21 The genes that encode the protein products are from amyotrophic lateral sclerosis online genetics database (ALSOD) (alsod.org). 22 The first STRING analysis presented a complex network centered on ubiquitin C (UBC) and ubiquitin B (UBB). Genes widely verified as ALS risk factors (SOD1, TARDBP, FUS, C9orf72, and TBK1) 23 or identified by GWASs in the Chinese Han population (TYW3, CRYZ, FGD4, H3F3C, SUSD2, and CAMK1G) were selected to construct a subnetwork. ...
Article
Full-text available
Objective: A 2-stage genome-wide association was conducted to explore the genetic etiology of amyotrophic lateral sclerosis (ALS) in the Chinese Han population. Methods: Totally, 700 cases and 4,027 controls were genotyped in the discovery stage using Illumina Human660W-Quad BeadChips. Top associated single nucleotide polymorphisms from the discovery stage were then genotyped in an independent cohort with 884 cases and 5,329 controls. Combined analysis was conducted by combining all samples from the 2 stages. Results: Two novel loci, 1p31 and 12p11, showed strong associations with ALS. These novel loci explained 2.2% of overall variance in disease risk. Expression quantitative trait loci searches identified TYW/CRYZ and FGD4 as risk genes at 1p13 and 12p11, respectively. Conclusions: This study identifies novel susceptibility genes for ALS. Identification of TYW3/CRYZ in the current study supports the notion that insulin resistance may be involved in ALS pathogenesis, whereas FGD4 suggests an association with Charcot-Marie-Tooth disease.
... The ALSoD database currently lists genetic variants in 126 genes as associated to ALS. Gene-gene and gene-environment interactions have also been suggested as playing a major role in the disease's appearance and phenotype (Andersen and Al-Chalabi, 2011;van Blitterswijk et al., 2012;Abel et al., 2013;Al-Chalabi et al., 2013;Leblond et al., 2014;Renton et al., 2014;Jones et al., 2015;Al-Chalabi et al., 2017;Brown and Al-Chalabi, 2017;Hardiman et al., 2017;van Es et al., 2017;Chia et al., 2018). ...
Article
Full-text available
Despite the genetic heterogeneity reported in familial amyotrophic lateral sclerosis (ALS) (fALS), Cu/Zn superoxide-dismutase (SOD1) gene mutations are the second most common cause of the disease, accounting for around 20% of all families (ALS1) and isolated sporadic cases (sALS). At least 186 different mutations in the SOD1 gene have been reported to date. The possibility of a single founder and separate founders have been investigated for D90A (p.D91A) and A4V (p.A5V), the most common mutations worldwide. High-throughput single nucleotide polymorphism genotyping studies have suggested two founders for A4V (one for the Amerindian population and another for the European population) although the possibility that the two populations are descended from a single ancient founder cannot be ruled out. We used 15 genetic variants spanning the human chromosome 21 from the SOD1 gene to the SCAF4 gene, comparing them with the population reference panels, to demonstrate that the first A4V Spanish pedigree shared the genetic background reported in the European population.
... Three other genes (SOD1, TARDBP, and FUS) harbor pathogenic variants which together account for~20% of fALS cases (Renton et al. 2014) (MIM#105400, #612069, #608030, respectively). In total, reported variants that fulfill adequate objective criteria for causation (Abel et al. 2013;MacArthur et al. 2014;Stenson et al. 2014) are in~30 genes (Morgan et al. 2015). Most of these are autosomal dominant variants located in exonic regions of protein coding genes, however, many ALS-associated variants, even those in well-known genes, have not been sufficiently confirmed as pathogenic (Eisen et al. 2008). ...
Article
Full-text available
Background Gene discovery has provided remarkable biological insights into amyotrophic lateral sclerosis (ALS). One challenge for clinical application of genetic testing is critical evaluation of the significance of reported variants. Methods We use whole exome sequencing (WES) to develop a clinically relevant approach to identify a subset of ALS patients harboring likely pathogenic mutations. In parallel, we assess if DNA methylation can be used to screen for pathogenicity of novel variants since a methylation signature has been shown to associate with the pathogenic C9orf72 expansion, but has not been explored for other ALS mutations. Australian patients identified with ALS‐relevant variants were cross‐checked with population databases and case reports to critically assess whether they were “likely causal,” “uncertain significance,” or “unlikely causal.” Results Published ALS variants were identified in >10% of patients; however, in only 3% of patients (4/120) could these be confidently considered pathogenic (in SOD1 and TARDBP). We found no evidence for a differential DNA methylation signature in these mutation carriers. Conclusions The use of WES in a typical ALS clinic demonstrates a critical approach to variant assessment with the capability to combine cohorts to enhance the largely unknown genetic basis of ALS.
... Amyotrophic lateral sclerosis (ALS) is an untreatable, relentlessly progressive degenerative disorder of motor neurons that is lethal within 3-5 years. Recent studies have documented that susceptibility to amyotrophic lateral sclerosis can be influenced by variants in multiple genes (1). While heritability in familial cases is typically transmitted in a Mendelian-dominant manner, it is now evident that seemingly sporadic ALS may also reflect genetic variants acting individually or potentially in combination (2,3). ...
Article
A series of studies suggests that susceptibility to ALS may be influenced by variants in multiple genes. While analyses of the 10% of cases of familial origin have identified more than 33 monogenic ALS-causing genetic defects, little is known about genetic factors that influence susceptibility or phenotype in sporadic ALS (SALS). We and others conducted a genome-wide association study (GWAS) in a cohort of 1014 ALS cases from Western Europe, England and the United States, and identified an intronic single nucleotide polymorphism (SNP) rs1541160 in the KIFAP3 gene that was statistically associated with improved survival. We have now completed an additional survival analysis examining the impact of the rs1541160 genotype in a cohort of 264 ALS and progressive bulbar palsy (PBP) cases. In the combined cohort of 264 patients, the CC, CT and TT genotypes for rs1541160 were detected, respectively, in 8.3% (22), 41.7% (110) and 50.0% (132). This study does not show an influence of KIFAP3 variants on survival in the studied Swiss and Swedish cohort. There was a difference in survival between the US and English patients and the patients from the Netherlands. The effect of KIFAP3 variants may be population specific, or the rs1541160 association reported previously may have been a false-positive.
... To date, over 100 genes have been implicated in ALS (Lill et al., 2011). The level of supporting evidence for each gene or gene variant varies from small to overwhelming and is in some cases contradictory (Abel et al., 2013). Therefore, most authors assign ALS genes to different categories of certainty, although there is no consistent nomenclature. ...
Article
Evidence of genetic heterogeneity in ALS has been found, with at least 31 genes being identified to date as causing ALS, and other genes being suggested as risk factors for susceptibility to the disease and for phenotype modifications. In recent years, new molecular genetic methodologies, especially GWAS and exome sequencing, have contributed to the identification of new ALS genes. Some of these genes (SOD1, TARDBP, FUS, and C9orf72) have homogenous frequencies in different populations. However, a few genes are rare in populations other than those in which they were first identified. Here we investigate the frequency of the PFN1 gene in a Catalan ALS population. A mutational analysis of the PFN1 gene was carried out on a Catalan cohort of 42 ALS families (FALS) and 423 sporadic ALS patients (SALS). The screening included 600 healthy controls. No PFN1 mutations were identified in either the FALS or SALS group. We also found no mutations in the control group. Our results are consistent with those described in other populations with very low frequencies, suggesting that PFN1 is a very rare cause of ALS worldwide. Together with the absence of a distinctive phenotype associated with ALS18, these results mean that this gene should be a second or third line for inclusion in screening in patients requesting genetic counseling.
Article
Full-text available
Systematic reviews should build on a protocol that describes the rationale, hypothesis, and planned methods of the review; few reviews report whether a protocol exists. Detailed, well-described protocols can facilitate the understanding and appraisal of the review methods, as well as the detection of modifications to methods and selective reporting in completed reviews. We describe the development of a reporting guideline, the Preferred Reporting Items for Systematic reviews and Meta-Analyses for Protocols 2015 (PRISMA-P 2015). PRISMA-P consists of a 17-item checklist intended to facilitate the preparation and reporting of a robust protocol for the systematic review. Funders and those commissioning reviews might consider mandating the use of the checklist to facilitate the submission of relevant protocol information in funding applications. Similarly, peer reviewers and editors can use the guidance to gauge the completeness and transparency of a systematic review protocol submitted for publication in a journal or other medium.
Article
Full-text available
Systematic reviews and meta-analyses have become increasingly important in health care. Clinicians read them to keep up to date with their field [1],[2], and they are often used as a starting point for developing clinical practice guidelines. Granting agencies may require a systematic review to ensure there is justification for further research [3], and some health care journals are moving in this direction [4]. As with all research, the value of a systematic review depends on what was done, what was found, and the clarity of reporting. As with other publications, the reporting quality of systematic reviews varies, limiting readers' ability to assess the strengths and weaknesses of those reviews. Several early studies evaluated the quality of review reports. In 1987, Mulrow examined 50 review articles published in four leading medical journals in 1985 and 1986 and found that none met all eight explicit scientific criteria, such as a quality assessment of included studies [5]. In 1987, Sacks and colleagues [6] evaluated the adequacy of reporting of 83 meta-analyses on 23 characteristics in six domains. Reporting was generally poor; between one and 14 characteristics were adequately reported (mean = 7.7; standard deviation = 2.7). A 1996 update of this study found little improvement [7]. In 1996, to address the suboptimal reporting of meta-analyses, an international group developed a guidance called the QUOROM Statement (QUality Of Reporting Of Meta-analyses), which focused on the reporting of meta-analyses of randomized controlled trials [8]. In this article, we summarize a revision of these guidelines, renamed PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses), which have been updated to address several conceptual and practical advances in the science of systematic reviews (Box 1). Box 1: Conceptual Issues in the Evolution from QUOROM to PRISMA Completing a Systematic Review Is an Iterative Process The conduct of a systematic review depends heavily on the scope and quality of included studies: thus systematic reviewers may need to modify their original review protocol during its conduct. Any systematic review reporting guideline should recommend that such changes can be reported and explained without suggesting that they are inappropriate. The PRISMA Statement (Items 5, 11, 16, and 23) acknowledges this iterative process. Aside from Cochrane reviews, all of which should have a protocol, only about 10% of systematic reviewers report working from a protocol [22]. Without a protocol that is publicly accessible, it is difficult to judge between appropriate and inappropriate modifications.
Article
Online Mendelian Inheritance In Man (OMIM) is a public database of bibliographic information about human genes and genetic disorders. Begun by Dr. Victor McKusick as the authoritative reference Mendelian Inheritance in Man , it is now distributed electronically by the National Center for Biotechnology Information (NCBI). Material in OMIM is derived from the biomedical literature and is written by Dr. McKusick and his colleagues at Johns Hopkins University and elsewhere. Each OMIM entry has a full text summary of a genetic phenotype and/or gene and has copious links to other genetic resources such as DNA and protein sequence, PubMed references, mutation databases, approved gene nomenclature, and more. In addition, NCBI's neighboring feature allows users to identify related articles from PubMed selected on the basis of key words in the OMIM entry. Through its many features, OMIM is increasingly becoming a major gateway for clinicians, students, and basic researchers to the ever‐growing literature and resources of human genetics. Hum Mutat 15:57–61, 2000. © 2000 Wiley‐Liss, Inc.
Article
Online Mendelian Inheritance In Man (OMIM) is a public database of bibliographic information about human genes and genetic disorders. Begun by Dr. Victor McKusick as the authoritative reference Mendelian Inheritance in Man, it is now distributed electronically by the National Center for Biotechnology Information (NCBI). Material in OMIM is derived from the biomedical literature and is written by Dr. McKusick and his colleagues at Johns Hopkins University and elsewhere. Each OMIM entry has a full text summary of a genetic phenotype and/or gene and has copious links to other genetic resources such as DNA and protein sequence, PubMed references, mutation databases, approved gene nomenclature, and more. In addition, NCBI's neighboring feature allows users to identify related articles from PubMed selected on the basis of key words in the OMIM entry. Through its many features, OMIM is increasingly becoming a major gateway for clinicians, students, and basic researchers to the ever-growing literature and resources of human genetics. Hum Mutat 15:57–61, 2000. © 2000 Wiley-Liss, Inc.
Article
Introductory statistics texts have been noted to have inaccuracies in the tables of critical values for Spearman's correlation. Even the best texts currently available use critical values from the exact distribution only for N? 11. Zar's table gives critical values for N ? 100 but does not use the most accurate approximation procedure available. This paper provides a table of critical values based on the exact distribution for 3 ? N ? 18 and very accurate critical values for 19 ? N ? 100 estimated using the Edgeworth approximation.