Article

The application of naive Bayes model averaging to predict Alzheimer's disease from genome-wide data.

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA.
Journal of the American Medical Informatics Association (impact factor: 3.61). 07/2011; 18(4):370-5. DOI:10.1136/amiajnl-2011-000101 pp.370-5
Source: PubMed

ABSTRACT Predicting patient outcomes from genome-wide measurements holds significant promise for improving clinical care. The large number of measurements (eg, single nucleotide polymorphisms (SNPs)), however, makes this task computationally challenging. This paper evaluates the performance of an algorithm that predicts patient outcomes from genome-wide data by efficiently model averaging over an exponential number of naive Bayes (NB) models.
This model-averaged naive Bayes (MANB) method was applied to predict late onset Alzheimer's disease in 1411 individuals who each had 312,318 SNP measurements available as genome-wide predictive features. Its performance was compared to that of a naive Bayes algorithm without feature selection (NB) and with feature selection (FSNB).
Performance of each algorithm was measured in terms of area under the ROC curve (AUC), calibration, and run time.
The training time of MANB (16.1 s) was fast like NB (15.6 s), while FSNB (1684.2 s) was considerably slower. Each of the three algorithms required less than 0.1 s to predict the outcome of a test case. MANB had an AUC of 0.72, which is significantly better than the AUC of 0.59 by NB (p<0.00001), but not significantly different from the AUC of 0.71 by FSNB. MANB was better calibrated than NB, and FSNB was even better in calibration. A limitation was that only one dataset and two comparison algorithms were included in this study.
MANB performed comparatively well in predicting a clinical outcome from a high-dimensional genome-wide dataset. These results provide support for including MANB in the methods used to predict outcomes from large, genome-wide datasets.

0 0
 · 
0 Bookmarks
 · 
40 Views
  • Source
    Article: Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms.
    [show abstract] [hide abstract]
    ABSTRACT: Currently, single-nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) of >5% are preferentially used in case-control association studies of common human diseases. Recent technological developments enable inexpensive and accurate genotyping of a large number of SNPs in thousands of cases and controls, which can provide adequate statistical power to analyze SNPs with MAF <5%. Our purpose was to determine whether evaluating rare SNPs in case-control association studies could help identify causal SNPs for common diseases. We suggest that slightly deleterious SNPs (sdSNPs) subjected to weak purifying selection are major players in genetic control of susceptibility to common diseases. We compared the distribution of MAFs of synonymous SNPs with that of nonsynonymous SNPs (1) predicted to be benign, (2) predicted to be possibly damaging, and (3) predicted to be probably damaging by PolyPhen. Our sources of data were the International HapMap Project, ENCODE, and the SeattleSNPs project. We found that the MAF distribution of possibly and probably damaging SNPs was shifted toward rare SNPs compared with the MAF distribution of benign and synonymous SNPs that are not likely to be functional. We also found an inverse relationship between MAF and the proportion of nsSNPs predicted to be protein disturbing. On the basis of this relationship, we estimated the joint probability that a SNP is functional and would be detected as significant in a case-control study. Our analysis suggests that including rare SNPs in genotyping platforms will advance identification of causal SNPs in case-control association studies, particularly as sample sizes increase.
    The American Journal of Human Genetics 01/2008; 82(1):100-12. · 10.60 Impact Factor

Keywords

clinical care
 
clinical outcome
 
comparison algorithms
 
feature selection
 
FSNB
 
genome-wide data
 
genome-wide datasets
 
genome-wide measurements
 
genome-wide predictive features
 
high-dimensional genome-wide dataset
 
model-averaged naive Bayes
 
naive Bayes
 
naive Bayes algorithm
 
one dataset
 
onset Alzheimer's disease
 
Predicting patient outcomes
 
predicts patient outcomes
 
ROC curve
 
single nucleotide polymorphisms
 
three algorithms
 

Wei Wei