Comparing Linear Discriminant Function With Logistic Regression for the Two-Group Classification Problem

The Journal of Experimental Education (Impact Factor: 1.09). 03/1999; 67(3):265-286. DOI: 10.1080/00220979909598356


The performances of predictive discriminant analysis (PDA) and logistic regression (LR) for the 2-group classification problem were compared. The authors used a fully crossed 3-factor experimental design (sample size, group proportions, and equal or unequal covariance matrices) and 2 data patterns. When the 2 groups had equal covariance matrices, PDA and LR performed comparably for the conditions of both equal and unequal group proportions. When the 2 groups had unequal covariance matrices (4:1, as implemented in this study) and very different group proportions, PDA and LR differed somewhat with regard to the classification error rates of the 2 groups, but the classification error rates of the 2 methods for the total sample remained comparable. Sample size played a relatively minor role in the classification accuracy of the 2 methods, except when LR was used under relatively small sample-size conditions.

30 Reads
  • Source
    • "In addition, prior probabilities, which are the proportions of group members that exist in the populations, also affect the classification results of BDA and LR. For instance, Fan and Wang (1999) and Lei and Koehly (2003) compared the classification error rates of LDA and LR by using the Monte Carlo simulation under different prior probabilities in the binary cases. Consequently, both methods are very applicable in debris flow prediction and worthy of study. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this study, the high risk areas of Sichuan Province with debris flow, Panzhihua and Liangshan Yi Autonomous Prefecture, were taken as the studied areas. By using rainfall and environmental factors as the predictors and based on the different prior probability combinations of debris flows, the prediction of debris flows was compared in the areas with statistical methods: logistic regression (LR) and Bayes discriminant analysis (BDA). The results through the comprehensive analysis show that (a) with the mid-range scale prior probability, the overall predicting accuracy of BDA is higher than those of LR; (b) with equal and extreme prior probabilities, the overall predicting accuracy of LR is higher than those of BDA; (c) the regional predicting models of debris flows with rainfall factors only have worse performance than those introduced environmental factors, and the predicting accuracies of occurrence and nonoccurrence of debris flows have been changed in the opposite direction as the supplemented information.
    Geomorphology 11/2013; DOI:10.1016/j.geomorph.2013.06.003 · 2.79 Impact Factor
  • Source
    • "However, adjustments for prior probabilities are unlikely to affect classification results if the sample groups are very distinct because a minimal number of cases will be near the " borderlines " between the groups (Klecka, 1980, p. 47). Fan and Wang (1999) also suggested that the size of a prior may affect classification error rates for predicting each class, but not for predicting both classes. They indicated that the decision between using equal versus nonequal priors seems to be less relevant when researchers are concerned with the overall predictive accuracy. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The authors examined the distributional properties of 3 improvement-over-chance, I, effect sizes each derived from linear and quadratic predictive discriminant analysis and from logistic regression analysis for the 2-group univariate classification. These 3 classification methods (3 levels) were studied under varying levels of data conditions, including population separation (3 levels), variance pattern (3 levels), total sample size (3 levels), and prior probabilities (5 levels). The results indicated that the decision of which effect size to choose is primarily determined by the variance pattern and prior probabilities. Some of the I indices performed well for some small sample cases and quadratic predictive discriminant analysis I tended to work well with extreme variance heterogeneity and differing prior probabilities.
    The Journal of Experimental Education 08/2013; 82(2). DOI:10.1080/00220973.2013.813359 · 1.09 Impact Factor
  • Source
    • "[34,47,58,67,68]]. However, controversy still prevails regarding the effects on classifiers' performance of different combinations of predictors, data assumptions, sample sizes and parameters tuning [16,17,31,58,69,70]. Different application with different data sets (both real and simulated) have failed to produce a classifier that ranks best in all applications as shown in the studies by Michie et al., [71] (STALOG project with 23 different classifiers evaluated in 22 real datasets); Lim et al [72] (33 classifiers evaluated on 16 real data sets) and Meyer et al. [34] (24 classifiers, available in the R Software, evaluated on 21 data sets). "
    [Show abstract] [Hide abstract]
    ABSTRACT: ABSTRACT: Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.
    BMC Research Notes 08/2011; 4(1):299. DOI:10.1186/1756-0500-4-299
Show more