Use of support vector machines for disease risk prediction in genome-wide association studies: Concerns and opportunities

Center for Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany.
Human Mutation (Impact Factor: 5.14). 12/2012; 33(12). DOI: 10.1002/humu.22161
Source: PubMed


The success of genome-wide association studies (GWAS) in deciphering the genetic architecture of complex diseases has fueled the expectations whether the individual risk can also be quantified based on the genetic architecture. So far, disease risk prediction based on top-validated single-nucleotide polymorphisms (SNPs) showed little predictive value. Here, we applied a support vector machine (SVM) to Parkinson disease (PD) and type 1 diabetes (T1D), to show that apart from magnitude of effect size of risk variants, heritability of the disease also plays an important role in disease risk prediction. Furthermore, we performed a simulation study to show the role of uncommon (frequency 1-5%) as well as rare variants (frequency <1%) in disease etiology of complex diseases. Using a cross-validation model, we were able to achieve predictions with an area under the receiver operating characteristic curve (AUC) of ∼0.88 for T1D, highlighting the strong heritable component (∼90%). This is in contrast to PD, where we were unable to achieve a satisfactory prediction (AUC ∼0.56; heritability ∼38%). Our simulations showed that simultaneous inclusion of uncommon and rare variants in GWAS would eventually lead to feasible disease risk prediction for complex diseases such as PD. The used software is available at

Download full-text


Available from: Manu Sharma
  • Source
    • "Until now, the underlying genetic and molecular mechanisms of PD have not been completely understood. Mittag et al. showed in their study that it is not possible to predict the disease risk for PD with top-validated single-nucleotide polymorphisms, although such a prediction is possible for type 1 diabetes [8]. Thus, in the case of PD, genetic markers alone cannot explain the disease outbreak. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Parkinson's disease is an age-related disease whose pathogenesis is not completely known. Animal models exist for investigating the disease but not all results can be easily transferred to humans. Therefore, mathematical or probabilistic models for the human disease are to be constructed \textit{in silico} in order to predict specific processes within a cell, such as the dopamine metabolism and transport processes in a neuron. We present a Systems Biology Markup Language (SBML) model of a whole dopaminergic nerve cell consisting of 139 reactions and 111 metabolites which includes, among others, the dopamine metabolism and transport, oxidative stress, aggregation of alpha-synuclein (alphaSYN), lysosomal and proteasomal degradation, and mitophagy. The predictive power of the model was investigated using flux balance analysis for the identification of steady model states. To this end, we performed six experiments: (i) investigation of the normal cell behavior, (ii) increase of O2, (iii) increase of ATP, (iv) influence of neurotoxins, (v) increase of alphaSYN in the cell, and (vi) increase of dopamine synthesis. The SBML model is available in the BioModels database with identifier MODEL1302200000. It is possible to simulate the normal behavior of an in vivo nerve cell with the developed model. We show that the model is sensitive for neurotoxins and oxidative stress. Further, an increased level of alphaSYN induces apoptosis and an increased flux of alphaSYN to the extracellular space was observed.
    Full-text · Article · Nov 2013 · BMC Neuroscience
  • [Show abstract] [Hide abstract]
    ABSTRACT: Malaria is one of the principal health problems in Mozambique, affecting mostly children. The prediction of accurate future incidence cases is crucial for the implementation of appropriate policies of intervention and disease control in order to strengthen the health system. We propose a model based on support vector machines (SVM) for predicting yearly malaria incidence cases for children 0–4 years of age in the Maputo province, Mozambique. The predictive model is trained on two years of historical malaria data in combination with climatic and malaria control factors. A grid optimization parameter tuning procedure was firstly employed to detect the best parameters and select the kernel. In order to determine the most influential factors, variable importance was calculated through estimating the impact of permuting feature values on the predictive performance. The most important malaria incidence predictors turned out to be temperature variation, followed by Matutuine (district), April (month) and Namaacha (district).
    No preview · Article · May 2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: It is important to correctly and efficiently map drugs and enzymes to their possible interaction network in modern drug research. In this work, a novel approach was introduced to encode drug and enzyme molecules with physicochemical molecular descriptors and pseudo amino acid composition, respectively. Based on this encoding method, Random Forest was adopted to build the drug-enzyme interaction network. After selecting the optimal features that are able to represent the main factors of drug-enzyme interaction in our prediction, totally 129 features were attained which can be clustered into nine categories: Elemental Analysis, Geometry, Chemistry, Amino Acid Composition, Secondary Structure, Polarity, MolecularVolume, Codon Diversity and Electrostatic Charge. It is further found that Geometry features were the most important of all the features. As a result, our predicting model achieved an MCC of 0.915 and a Sensitivity of 87.9% at the Specificity level of 99.8% for 10-fold cross-validation test, and achieved an MCC of 0.895 and a Sensitivity of 95.7% at the Specificity level of 95.4% for independent set test. This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications.
    Full-text · Article · Jul 2013 · Biochimica et Biophysica Acta
Show more