Comparing Linear Discriminant Function With Logistic Regression for the Two-Group Classification Problem

The Journal of Experimental Education (Impact Factor: 1.09). 01/1999; 67(3):265-286. DOI: 10.1080/00220979909598356

ABSTRACT The performances of predictive discriminant analysis (PDA) and logistic regression (LR) for the 2-group classification problem were compared. The authors used a fully crossed 3-factor experimental design (sample size, group proportions, and equal or unequal covariance matrices) and 2 data patterns. When the 2 groups had equal covariance matrices, PDA and LR performed comparably for the conditions of both equal and unequal group proportions. When the 2 groups had unequal covariance matrices (4:1, as implemented in this study) and very different group proportions, PDA and LR differed somewhat with regard to the classification error rates of the 2 groups, but the classification error rates of the 2 methods for the total sample remained comparable. Sample size played a relatively minor role in the classification accuracy of the 2 methods, except when LR was used under relatively small sample-size conditions.

  • Source
    • "In addition, prior probabilities, which are the proportions of group members that exist in the populations, also affect the classification results of BDA and LR. For instance, Fan and Wang (1999) and Lei and Koehly (2003) compared the classification error rates of LDA and LR by using the Monte Carlo simulation under different prior probabilities in the binary cases. Consequently, both methods are very applicable in debris flow prediction and worthy of study. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this study, the high risk areas of Sichuan Province with debris flow, Panzhihua and Liangshan Yi Autonomous Prefecture, were taken as the studied areas. By using rainfall and environmental factors as the predictors and based on the different prior probability combinations of debris flows, the prediction of debris flows was compared in the areas with statistical methods: logistic regression (LR) and Bayes discriminant analysis (BDA). The results through the comprehensive analysis show that (a) with the mid-range scale prior probability, the overall predicting accuracy of BDA is higher than those of LR; (b) with equal and extreme prior probabilities, the overall predicting accuracy of LR is higher than those of BDA; (c) the regional predicting models of debris flows with rainfall factors only have worse performance than those introduced environmental factors, and the predicting accuracies of occurrence and nonoccurrence of debris flows have been changed in the opposite direction as the supplemented information.
    Geomorphology 11/2013; DOI:10.1016/j.geomorph.2013.06.003 · 2.79 Impact Factor
  • Source
    • "However, adjustments for prior probabilities are unlikely to affect classification results if the sample groups are very distinct because a minimal number of cases will be near the " borderlines " between the groups (Klecka, 1980, p. 47). Fan and Wang (1999) also suggested that the size of a prior may affect classification error rates for predicting each class, but not for predicting both classes. They indicated that the decision between using equal versus nonequal priors seems to be less relevant when researchers are concerned with the overall predictive accuracy. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The authors examined the distributional properties of 3 improvement-over-chance, I, effect sizes each derived from linear and quadratic predictive discriminant analysis and from logistic regression analysis for the 2-group univariate classification. These 3 classification methods (3 levels) were studied under varying levels of data conditions, including population separation (3 levels), variance pattern (3 levels), total sample size (3 levels), and prior probabilities (5 levels). The results indicated that the decision of which effect size to choose is primarily determined by the variance pattern and prior probabilities. Some of the I indices performed well for some small sample cases and quadratic predictive discriminant analysis I tended to work well with extreme variance heterogeneity and differing prior probabilities.
    The Journal of Experimental Education 08/2013; 82(2). DOI:10.1080/00220973.2013.813359 · 1.09 Impact Factor
  • Source
    • "Logistic regression is a more appropriate technique for Credit scoring cases [12]. Fan and Wang [13] and Sautory et al. [14] recommend the use of the binary logistic regression in Credit scoring cases, when discriminant analysis application conditions are not obtainable. This choice becomes imperative if qualitative variables get involved in the model [15]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The credit scoring risk management is a fast grow- ing field due to consumer's credit requests. Credit requests, of new and existing customers, are often evaluated by classical discrimination rules based on customers information. However, these kinds of strategies have serious limits and don't take into account the characteristics difference between current customers and the future ones. The aim of this paper is to measure credit worthiness for non customers borrowers and to model potential risk given a heterogeneous population formed by borrowers customers of the bank and others who are not. We hold on previous works done in generalized discrimination and transpose them into the logistic model to bring out efficient discrimination rules for non customers' subpopulation. Therefore we obtain seven simple models of connection between parameters of both logistic models associated respectively to the two subpopulations. The German credit data set is selected as the experimental data to compare the seven models. Experimental results show that the use of links between the two subpopulations improve the classification accuracy for the new loan applicants. Index Terms—Logistic model, Gaussian discrimination, Sub- population links, Credit scoring, Subpopulations mixture
    Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2011, part of the IEEE Symposium Series on Computational Intelligence 2011, April 11-15, 2011, Paris, France; 01/2011
Show more