Comparison of logistic regression and linear discriminant analysis: a simulation study

01/2004; 1:143-161.

ABSTRACT Two of the most widely used statistical methods for analyzing categorical outcome variables are linear discriminant analysis and logistic regression. While both are appropriate for the development of linear classification models, linear discriminant analysis makes more assumptions about the underlying data. Hence, it is assumed that logistic regression is the more flexible and more robust method in case of violations of these assumptions. In this paper we consider the problem of choosing between the two methods, and set some guidelines for proper choice. The comparison between the methods is based on several measures of predictive accuracy. The performance of the methods is studied by simulations. We start with an example where all the assumptions of the linear discriminant analysis are satisfied and observe the impact of changes regarding the sample size, covariance matrix, Mahalanobis distance and direction of distance between group means. Next, we compare the robustness of the methods towards categorisation and non-normality of explanatory variables in a closely controlled way. We show that the results of LDA and LR are close whenever the normality assumptions are not too badly violated, and set some guidelines for recognizing these situations. We discuss the inappropriateness of LDA in all other cases.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we report a comparison study of 7 non parametric classifiers (Multilayer perceptron Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification trees and Ran-dom Forests) as compared to Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression tested in a real data application of mild cognitive impaired elderly patients conversion to dementia. When classification results are compared both on overall accuracy, specificity and sensitivity, Linear Discriminant Analysis and Random Forests rank first among all the classifiers.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Citation: Rahman K, Bowen A, Muhajarine N (2014) Examining the Factors that Moderate and Mediate the Effects on Depression during Pregnancy and Postpartum. J Preg Child Health 1: 116. doi:10.4172/jpch.1000116 Copyright: © 2014 Rahman K, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Introduction Maternal depression encompasses a spectrum of depressive conditions that can affect expectant mothers and those up to twelve months postpartum [1]. Estimates of antenatal and postpartum depression in the general population range from 12 to 20% [2,3]. Antenatal depression is a relatively new area of study compared to postpartum depression and the depth and sophistication of this research is still developing. Studies have found that the prevalence of antenatal depression could be higher than postpartum depression [4]. We also noted higher prevalence of antenatal depression (14.1% in early pregnancy; 10.4% in late pregnancy) than postpartum depression (8.1%) [5]. Maternal depression has both immediate and longer-term consequences. Mothers with depression may have diminished capacity for self-care, as well as care for her infant [6]. They reported more sleep disturbances, and anxiety [7]. They were likely to have less frequent antenatal care [6], and reduced optimal fetal monitoring during pregnancy [8]. Antenatal depression was associated with preterm delivery [9], lower birth weight, and small for gestational age [10]. Studies have found maternal postpartum depression to hamper a child's cognitive, emotional, and social development in infancy and early childhood [11-13]. Given the high prevalence and serious consequences of antenatal and postpartum depression, are view of the empirical literature revealed a range of antecedent risk factors, but very little reported on the specific role of the risk factors, for example either as moderating or mediating role on depression. Studies examining mediating or moderating role of the antecedent risk factors in relation to antenatal and postpartum depression is relatively rare in epidemiological research. A mediator is defined as an intermediate variable that accounts for the relationship between predictor and outcome variable [14]. Mediators attempt to describe 'why' and 'how' effects occur [14]. In behavioral research, psychosocial variables such as social support and self-efficacy are often hypothesized as mediating roles [15]. Moderator variables, on the other hand, specify the conditions under which the variable exerts its effect, such as ethnicity and gender [14]. Moderators attempt to describe 'when' and in 'whom' effects may occur. Understanding the mediating and moderating role of risk factors in predicting maternal depression could not only contribute to explain the mechanism of depression, but also greatly enable us to intervene to minimize the harmful effects of depression by focusing on certain factors or on certain patient groups. We hypothesized that socio-demographic factors such as younger maternal age, Aboriginal ethnicity, low education, low income, and single mother status will increase the depression status in late pregnancy and early postpartum Abstract Background: This research report will address the knowledge gap in understanding the role of risk factors as moderators or mediators to explain the variability in the magnitude of exposure and the causal pathway for antenatal and postpartum depression.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Landslide susceptibility zonation mapping assists researchers greatly to understand the spatial distribution of slope failure probability in a region. Being extremely useful in reducing landslide hazards, such maps could simply be produced using both qualitative and quantitative methods. In the present study, a multivariate statistical method called ‘logistic regression’ was used to assess landslide susceptibility in Hashtchin region, situated in west of Alborz Mountainsnorthwest of Iran. In this study, two independent variables, categorical (predictor) and continuous, were drawn on together in the model. To identify the region’s landslides use was made of aerial photographs, field studies and topographic maps. To prepare the database of factors affecting the region’s landslides and to determine landslide zones, geographic information system (GIS) was used. Using such information, landslide susceptibility modeling was accomplished. The data related to factors causing landslides were extracted as independent variables in each cell (in 50 m×50 m cells). Then, the whole data were input into the SPSS, Version 18. The prepared database was later analyzed using logistic regression, the forward stepwise method and based on maximum likelihood estimation. Regression equation was determined using obtained constants and coefficients and the landslide susceptibility of the area in grid-cells (pixels) was computed between 0 and 0.9954. The Receiver Operating Characteristic (ROC) curve was used to assess the accuracy of the logistic regression model. The predicting ability of the model was 84.1% given the area under ROC curve. Finally, the degree of success of landslide susceptibility zonation mapping was estimated to be 79%.
    Journal of the Geological Society of India 07/2014; 84(1):68-86. DOI:10.1007/s12594-014-0111-5 · 0.51 Impact Factor

Full-text (2 Sources)

Available from
Jul 16, 2014