Page 1

Metodološki zvezki, Vol. 1, No. 1, 2004, 143-161

Comparison of Logistic Regression and Linear

Discriminant Analysis: A Simulation Study

Maja Pohar1, Mateja Blas2, and Sandra Turk3

Abstract

Two of the most widely used statistical methods for analyzing

categorical outcome variables are linear discriminant analysis and logistic

regression. While both are appropriate for the development of linear

classification models, linear discriminant analysis makes more assumptions

about the underlying data. Hence, it is assumed that logistic regression is

the more flexible and more robust method in case of violations of these

assumptions. In this paper we consider the problem of choosing between the

two methods, and set some guidelines for proper choice. The comparison

between the methods is based on several measures of predictive accuracy.

The performance of the methods is studied by simulations. We start with an

example where all the assumptions of the linear discriminant analysis are

satisfied and observe the impact of changes regarding the sample size,

covariance matrix, Mahalanobis distance and direction of distance between

group means. Next, we compare the robustness of the methods towards

categorisation and non-normality of explanatory variables in a closely

controlled way. We show that the results of LDA and LR are close

whenever the normality assumptions are not too badly violated, and set

some guidelines for recognizing these situations. We discuss the

inappropriateness of LDA in all other cases.

1 Introduction

Linear discriminant analysis (LDA) and logistic regression (LR) are widely used

multivariate statistical methods for analysis of data with categorical outcome

1 Department of Medical Informatics, University of Ljubljana; maja.pohar@mf.uni-lj.si

2 Postgraduate student of Statistics, University of Ljubljana; mateja.blas@guest.arnes.si

3 Sandra Turk, Krka d.d., Novo mesto; sandra.turk@krka.biz

Page 2

144

Maja Pohar, Mateja Blas, and Sandra Turk

variables. Both of them are appropriate for the development of linear

classification models, i.e. models associated with linear boundaries between the

groups.

Nevertheless, the two methods differ in their basic idea. While LR makes no

assumptions on the distribution of the explanatory data, LDA has been developed

for normally distributed explanatory variables. It is therefore reasonable to expect

LDA to give better results in the case when the normality assumptions are

fulfilled, but in all other situations LR should be more appropriate. The theoretical

properties of LR and LDA are thoroughly dealt with in the literature, however the

choice of the method is often more related to the field of statistics than to the

actual condition of fulfilled assumptions.

The goal of this paper is not to discourage the current practice but rather to set

some guidelines as to when the choice of either one of the methods is still

appropriate. While LR is much more general and has a number of theoretical

properties, LDA must be the better choice if we know the population is normally

distributed. However, in practice, the assumptions are nearly always violated, and

we have therefore tried to check the performance of both methods with

simulations. This kind of research demands a careful control, so we have decided

to study just a few chosen situations, trying to find a logic in the behaviour and

then to think about the expansion onto more general cases. We have confined

ourselves to compare only the predictive power of the methods.

The article is organized as follows. Section 2 briefly reviews LR and LDA and

explains their graphical representation. Section 3 details the criteria chosen to

compare both methods. Section 4 describes the process of the simulations. The

results obtained are presented and discussed in Section 5, starting with the case

where all the assumptions of LDA are fulfilled and continuing with cases where

normality is violated in sense of categorization and skewness. It is shown how

violation of the assumptions of LDA affects both methods and how robust the

methods are. The paper concludes with some guidelines for the choice between the

models and a discussion.

2 Logistic regression and linear discriminant analysis

The goal of LR is to find the best fitting and most parsimonious model to describe

the relationship between the outcome (dependent or response variable) and a set of

independent (predictor or explanatory) variables. The method is relatively robust,

flexible and easily used, and it lends itself to a meaningful interpretation. In LR,

unlike in the case of LDA, no assumptions are made regarding the distribution of

the explanatory variables.

Contrary to the popular beliefs, both methods can be applied to more than two

categories (Hosmer and Lemeshow, 1989, p. 216). To simplify, we only focus on

Page 3

Comparison of Logistic Regression and Linear…

145

the case of a dichotomous outcome variable (Y). The LR model can be expressed

as

T

β Xi

T

ii

β Xi

e

+

P(Y1|X )

1 e

==

(2.1)

where the Yi are independent Bernoulli random variables. The coefficients of this

model are estimated using the maximum likelihood method. LR is discussed

further by Hosmer and Lemeshow (1989).

Linear discriminant analysis can be used to determine which variable

discriminates between two or more classes, and to derive a classification model for

predicting the group membership of new observations (Worth and Cronin, 2003).

For each of the groups, LDA assumes the explanatory variables to be normally

distributed with equal covariance matrices. The simplest LDA has two groups. To

discriminate between them, a linear discriminant function that passes through the

centroids of the two groups can be used. LDA is discussed further by Kachigan

(1991). The standard LDA model assumes that the conditional distribution of X|y

is multivariate normal with mean vector µy and common covariance matrix Σ.

With some algebra we can show that we assign x to group 1 as

()

1

x

1

P(1|x)

1e

−

α+β

=

+

(2.2)

where α and β coefficients are

T1

10

T1

1

1010

0

β

,

(µµ )

,

1

2

log ()()

−

−

=−

∑

π

π

α = −+µ +µ

∑

µ −µ

(2.3)

π1 and π0 are prior probabilities of belonging to group 1 and group 0. In practice

the parameters π1, π0, µ1, µ0 and Σ will be unknown, so we replace them by their

sample estimates, i. e.:

nn

ˆˆ

nn

1

ˆˆ

x x ,x

n

=

∑

()()( )()

0

1

10

11i00i

yy

∑

i 1i 0

=

10

TT

i1i1i0i0

y 1

i

=

∑

y

∑

0

i

1

n

x ,

ˆ

xxxxxxxx /n

=

π =π =

µ == µ ==

∑ =−−+−−

(2.4)

(2.2) is equal in form to LR. Hence, the two methods do not differ in functional

form, they only differ in the estimation of coefficients.

Page 4

146

Maja Pohar, Mateja Blas, and Sandra Turk

2.1 Graphical representation: An explanation

When the values of α and β are known, the expression for a set of points with

equal probability of allocation can be derived as

e

0.5

1 e

+

In two-dimensional perspective this set of points is a line, while in three

dimensions it is a plane.

Figure 1 shows the scatterplot for two explanatory variables. Each of the two

groups is plotted with a different character. The linear borders presented are

calculated on the basis of the estimates of each method. The ellipses indicate the

distributions assumed by the LDA.

Tx

T

Tx

0x

α+β

α+β

=

⇒

= α +β

(2.5)

-2024

-2

0

2

4

x1

x2

logistic regression

discriminant analysis

Figure 1: The linear borders between the groups for LR (solid) and LDA (dotted line).

3 Comparison criteria

The simplest and the most frequently used criterion for comparison between the

two methods is classification error (percent of incorrectly classified objects; CE).

However, classification error is a very insensitive and statistically inefficient

measure (Harrell, 1997). The fact is that the classification error is usually nearly

the same in both methods, but, when differences exist, they are often

overestimated (for example, if the threshold for “yes” is 0.50, a prediction of 0.99

rates the same as one of 0.51). The minimum information gained with the

classification error is in the case of categorical explanatory variables. The

boundary lines in figures below differ approximately equally in coefficients, but

the classification errors provide different information. In Figure 2a, one of the

Page 5

Comparison of Logistic Regression and Linear…

147

possible outcomes lies in the area where the lines are different, and therefore the

predictions will differ in all objects with this outcome. On the contrary, the area

between the lines in Figure 2b covers none of the possible outcomes. The

classification error therefore does not reveal any difference.

12345

1

2

3

4

5

x1

x2

12345

1

2

3

4

5

x1

x2

Figure 2a and 2b: Examples of categorised explanatory variables.

Since more information is needed regarding the predictive accuracy of the

methods than just a binary classification rule, Harrell and Lee (1985) proposed

four different measures of comparing predictive accuracy of the two methods.

These measures are indexes A, B, C and Q. They are better and more efficient

criteria for comparisons and they tell us how well the models discriminate between

the groups and/or how good the prediction is. Theoretical insight and experiences

with simulations revealed that some indexes are more and some less appropriate at

different assumptions. In this work, we focus on three measures of predictive

accuracy, the B, C and Q indexes. Because of its intuitive clearness we sometimes

add the classification error (CE) as well.

The C index is purely a measure of discrimination (discrimination refers to the

ability of a model to discriminate or separate values of Y). It is written as follows

nn

jiji01

i 1

Y 0 Y 1

i

=

j 1

=

j

1

2

C[I(P P) I(PP)]/n n

=

=

=>+=

∑∑

(3.1)

where Pk denotes an estimate of P(Yk=1|Xk) from (2.1) and I is an indicator

function.

We can see that the value of the C index is independent of the actual group

membership (Y), and as such it is only a measure of discrimination between the

groups, and not a measure of accuracy of prediction. A C index of 1 indicates

perfect discrimination; a C index of 0.5 indicates random prediction.

Page 6

148

Maja Pohar, Mateja Blas, and Sandra Turk

The B and Q indexes can be used to assess the accuracy of the outcome

prediction. The B index measures an average of squared difference between an

estimated and actual value:

n

2

ii

i 1

=

∑

B 1

= −

(PY) /n

−

(3.2)

where Pi is a probability of classification into group i, Yi is the actual group

membership (1 or 0), and n is the sample size of both populations. The values of

the B index are on the interval [0,1], where 1 indicates perfect prediction. In the

case of random prediction in two equally sized groups, the value of the B index is

0.75.

The Q index is similar to the B index and is also a measure of predictive

accuracy:

n

Y 1 Y

−

ii

2ii

i 1

=

∑

Q1 log (P (1 P)

+

) /n

=−

. (3.3)

A score of 1 of the Q index indicates perfect prediction. A Q index of 0 indicates

random predictions, and values less than 0 indicate worse than random predictions.

When predicted probabilities of 0 or 1 exist, the Q index is undefined. The B, C

and Q indexes are discussed further by Harrell and Lee (1985).

While the C index is purely a measure of discrimination, the B and Q indexes

(besides discrimination) also consider accuracy of prediction. Hence, we can

expect these two indexes to be the most sensitive measures in our simulations.

Instead of comparing the indexes directly, we will often focus only on the

proportion of simulations in which LR predicts better than LDA. As we always

perform 50 simulations, this proportion will be statistically significant whenever it

lies outside the interval [0.36, 0.64].

4 Description of the Simulations

4.1

The basic function enables us to draw random samples of size n and m from two

multivariate normal populations with different mean vectors, but equal covariance

matrix Σ. The mean vector of one group is always set at (0,0). The distance to the

other one is measured using Mahalanobis distance, while the direction is set as the

angle (denoted by υ) to the direction of the eigenvector of the covariance matrix.

Each sample is then randomly divided into two parts, a training and a test

sample. The coefficents of LDA and LR are computed using the first sample and

then predictions are made in the second one. The sampling experiment is

replicated 50 times. Each time the indexes for both methods are computed. Finally,

the average value of indexes and the proportion of simulations in which LR

performs better are recorded.

Basic function

Page 7

Comparison of Logistic Regression and Linear…

149

4.2 Categorization

After sampling, the normally distributed variables can be categorised, either only

one or both of them. The minimum and maximum value are computed, then the

whole interval is divided into a certain number of categories of equal size.

4.3 Skewness

As in the case of categorization, we can also decide here to transform only one of

two explanatory variables or both of them. The Box-Cox type of transformation

(Box and Cox, 1964) is used to make normal distribution skewed.

4.4 Remarks

To ensure clarity of the graphical representation, we have confined ourselves to a

two- dimensional perspective, i.e. two explanatory variables. We have nevertheless

made some simulations in more dimensions, but the trends of the results seemed to

follow the same pattern.

In most of the simulations we have also set an upper limit for the Mahalanobis

distance, in order to prevent LR from failing to converge and LDA from giving

unreliable results.

To simplify, we have fixed the two group sizes as the same. As unequally sized

groups (or unequal a prori probabilities in LR) only shift the border line closer to

the smaller group (the one with the less probable outcome), this only impacts the

constant, while the coefficient estimates remain the same.

All the simulations and computations were performed by using the statistical

software package R.

5 Results

5.1

We start from the situation where both explanatory variables are normally

distributed. We observe the impact of changes connected with the parameters:

sample size, covariance matrix, Mahalanobis distance and direction of distance

between the group means.

The sample size has the most obvious impact on the difference between

methods. LDA assumes normality and the errors it makes in prediction are only

due to the errors in estimation of the mean and variance on the sample. On the

contrary, LR adapts itself to distribution and assumes nothing about it. Therefore,

Comparison of methods when LDA assumptions are satisfied

Page 8

150

Maja Pohar, Mateja Blas, and Sandra Turk

in the case of small samples, the difference between the distribution of the training

sample and that of the test sample can be substantial. But, as the sample size

increases, the sampling distributions become more stable which leads to better

results for the LR. Consequently, the results of the two methods are getting closer

because the populations are normally distributed.

Table 1: Simulation results for the effect of sample size (n).

n

40

60

100

200

1000

B C Q CE

LR LDA

0.7861

0.7925

0.7993

0.7982

0.8011

LR LDA

0.7199

0.7405

0.7590

0.7537

0.7609

LR LDA

0.1089

0.1334

0.1541

0.1514

0.1608

LR LDA

0.1700

0.1647

0.1527

0.1585

0.1543

0.7747

0.7846

0.7939

0.7967

0.8008

0.7190

0.7405

0.7593

0.7536

0.7609

0.0489

0.1029

0.1313

0.1456

0.1595

0.1785

0.1693

0.1591

0.1593

0.1550

The proportion of simulations in which LR performs better

B C

LR better same LR better same

0.18 0.00 0.36 0.18

0.20 0.00 0.36 0.28

0.20 0.00 0.48 0.16

0.24 0.00 0.48 0.08

0.26 0.00

0.62

0.00

N

40

60

100

200

1000

Q CE

LR better

0.14

0.20

0.22

0.24

0.30

same

0.00

0.00

0.00

0.00

0.00

LR better

0.24

0.36

0.26

0.36

0.32

same

0.32

0.28

0.18

0.24

0.18

Parameters:

=

15. 0

5. 01

Σ

, υ=π/4

-2-10123

-1

0

1

2

3

x1

x2

-2-10123

-2

-1

0

1

x1

x2

-3-2-10123

-3

-2

-1

0

1

2

3

x1

x2

Figure 3: The impact of sample size of n=50 (left), n=100 (middle) and n=200 (right).

The results from Table 1 confirm the consideration above. As the sample size

increases, the LDA coefficient estimations become more accurate and therefore all

four indexes are improving (bold face is used to highlight the method that

performs better). The LR indexes are increasing even faster, thus approaching

those of LDA. Decreasing difference between the two methods is best presented

with the Q index, which is the most sensitive one. As the differences between

index means are negligible, it is also interesting to look at the proportion of

simulations where LR performs better. It can be seen that the value of rates to

Page 9

Comparison of Logistic Regression and Linear…

151

which we pay special attention, that of B index and of Q index, is constantly

increasing.

In the case of other changes (tables below) the results of the two methods

remain very close, in fact LDA is only a little bit better than LR. The exception

appears in the case of large Mahalanobis distance presented in Table 4. We can see

that for low values of Mahalanobis distance LDA yields better results, but as this

distance increases and it takes values above 2, LR performs better.

Table 2: Simulation results for the effect of correlation between explanatory

variables(σ).

B C

LR better same LR better same

0.20 0.00 0.54 0.12

0.20 0.12 0.00 0.32 0.12

0.50

0.20 0.00 0.44 0.12

0.90 0.20 0.00 0.46 0.18

Parameters: υ= π/4, m=n=50

Π/4

Π/2

σ

B C Q CE

LR LDA

0.7979

0.7967

0.7965

0.7990

LR LDA

0.7533

0.7495

0.7498

0.7567

LR LDA

0.1499

0.1456

0.1456

0.1535

LR LDA

0.1587

0.1587

0.1580

0.1561

0 0.7938

0.7909

0.7925

0.7961

0.7536

0.7490

0.7497

0.7568

0.1340

0.1215

0.1291

0.1403

0.1623

0.1629

0.1601

0.1575

0.20

0.50

0.90

The proportion of simulations in which LR performs better

Σ

Q CE

LR better

0.26

0.18

0.20

0.26

same

0.00

0.00

0.00

0.00

LR better

0.30

0.20

0.34

0.32

same

0.22

0.36

0.22

0.30

0

Table 3: Simulation results for the effect of direction of distance between group

means(υ).

ν

ν

B C Q CE

LR LDA

0.7969

0.7989

0.8029

0.8012

LR LDA

0.7501

0.7547

0.7645

0.7619

LR LDA

0.1475

0.1524

0.1644

0.1613

LR LDA

0.1609

0.1565

0.1480

0.1569

0 0.7928

0.7957

0.7991

0.7966

0.7502

0.7548

0.7642

0.7620

0.1322

0.1392

0.1491

0.1428

0.1629

0.1579

0.1511

0.1579

Π/3

Parameters:

Π/2

The proportion of simulations in which LR performs better

B C

LR better same LR better same

0.18 0.00 0.44 0.14

0.30 0.00 0.44 0.26

0.22 0.00 0.40 0.14

0.22 0.00 0.36 0.30

Q CE

LR better

0.16

0.36

0.28

0.24

same

0.00

0.00

0.00

0.00

LR better

0.28

0.34

0.24

0.32

same

0.36

0.18

0.34

0.30

0

Π/4

Π/3

=

15. 0

5. 01

Σ

, m=n=50

Page 10

152

Maja Pohar, Mateja Blas, and Sandra Turk

Table 4: Simulation results for the effect of Mahalanobis distance (M).

B C Q CE

M LR LDA

0.7697

0.7985

0.8067

0.8315

0.8557

0.8816

LR LDA

0.6767

0.7551

0.7747

0.8374

0.8860

0.9305

LR LDA

0.0554

0.1512

0.1799

0.2650

0.3492

0.4398

LR LDA

0.1871

0.1569

0.1458

0.1224

0.0975

0.0747

0.50

1.00

1.25

2.00

3.00

4.50

0.7687

0.7947

0.8014

0.8305

0.8570

0.8922

0.6769

0.7552

0.7741

0.8372

0.8857

0.9310

0.0525

0.1331

0.1568

0.2612

0.3575

0.4994

0.1889

0.1606

0.1486

0.1241

0.1026

0.0756

The proportion of simulations in which LR performs better

B C

LR better same LR better same

0.46 0.00 0.46 0.22

0.24 0.00 0.42 0.24

0.20 0.00 0.20 0.22

0.56 0.00 0.36 0.28

0.60

0.00 0.38 0.22

0.90

0.00 0.42 0.08

Q CE

M LR better

0.52

0.28

0.16

0.60

0.70

0.90

same

0.00

0.00

0.00

0.00

0.00

0.00

LR better

0.38

0.30

0.22

0.28

0.26

0.26

same

0.26

0.30

0.38

0.30

0.24

0.40

0.50

1.00

1.25

2.00

3.00

4.50

Parameters:

=

15. 0

5. 01

Σ

, υ= π/4, m=n=50

To sum up, we can say that in the case of normality LDA yields better results

than LR. However, for very large sample sizes the results of the two methods

become really close.

5.2 The effect of categorisation

The effect of categorisation is studied under the assumption that the explanatory

variables are in fact normally distributed, but measured only discretely. This

means they only have a limited number of values or categories. When the number

of categories is big enough not to disturb the accuracy of the estimates, the

categorisation will not cause any changes in our results. But when the values are

forced into just a few categories, we can expect more discrepancies.

All the simulations in this section are performed in the following way: First,

the values of the indexes for LR and LDA are calculated for the samples from the

normally distributed population. We start from the situation, where the LDA

performs better as shown in the previous section (in the tables, these results are

denoted with ∞). These samples are then categorised into a certain number of

categories and the indexes are again calculated and compared.

As expected, the effect of the categorisation depends somewhat on the data

structure (the correlation among the variables), but nevertheless, in all the

simulations similar trends can be observed.

Linear discriminant analysis proves to be rather robust. Its prediction power is

not much lower when the values are in 5 or more categories, and it usually

Page 11

Comparison of Logistic Regression and Linear…

153

performs better than LR. The story changes when the number of categories is low,

and LR is the only appropriate choice in the binary case.

The effect of categorisation also depends on the significance of the effect of a

certain explanatory variable on the outcome. This is understandable – a

nonsignificant variable will not change the model if transformed. On the other

hand, if two covariates, equally powerful when predicting the result, are

categorised, each of them will have a similar impact on the result.

-4-202

-3

-2

-1

0

1

2

3

x1

x2

-2 -101234

-2

-1

0

1

2

3

4

x1

x2

-3-2-101234

-3

-2

-1

0

1

2

x1

x2

Figure 4a, 4b and 4c: The basic situations used in the study. The ellipses describe the

distributions within the groups.

We have studied the impact of categorisation in two extreme and one

intermediate case. Figures below present the situations that were the basis of our

simulations. Figure 4a presents two uncorrelated explanatory variables with a

similar impact on the outcome. In Figure 4b only one of the variables is

significant, while in Figure 4c the covariates are correlated and both have a

significant but different impact on the outcome variable.

Table 5a summarizes the results of the situation shown in Figure 4c. The upper

part of this table contains the Q indexes for the case in which both covariates are

categorised. It can be seen that the categorisation into only two categories

severally lowers the predictive power of the two variables (the Q index falls close

to zero) and that this effect is greater with LDA. For better clarity, the lower part

of this table concentrates only on the proportion of the simulations in which the

LR performs better (with regard to index Q) and compares these results with the

categorisation of only one variable at a time. It is obvious that LR always

outperforms LDA in the binary case. As discussed above, this effect is greater

when we categorise the more significant variable (x2) and even more so when we

categorise both explanatory variables.

The results summed up in Table 5b are similar. The effect of both x1 and x2 is

similar and therefore the trends are even more comparable. However, logistic

regression is not truly better even in the two category case. That is probably due to

the too big “head start” of LDA. When categorising both covariates the advantages

of LR are again more obvious.

Page 12

154

Maja Pohar, Mateja Blas, and Sandra Turk

Table 5a: Simulation results for different number of categories (Figure 4c).

Q

Num. of categ. LR LDA LR better

0.88

0.78

0.58

2

3

4

0.0712

0.0891

0.1084

0.0579

0.0839

0.1076

5 0.1267 0.1281 0.46

10 0.1467 0.1505 0.18

∞ 0.1553 0.1595 0.20

The proportion of simulations in which LR performs

better (Q index)

Num. of categ. x1

0.58

0.50

0.36

x2

0.70

0.44

0.36

Both

0.88

0.78

0.58

2

3

4

5 0.30 0.26 0.46

∞

2

3

4

0.20 0.20 0.20

Parameters:

=

15. 0

5. 01

Σ

,υ=0, m=n=200

0.28

0.26

Table 5b: Simulation results for different number of categories (Figure 4a).

The proportion of simulations in which LR performs

better (Q index)

Num. of categ. x1

0.48

x2

0.40

0.24

0.24

Both

0.74

0.46

0.32

5 0.24 0.26 0.24

∞ 0.26 0.26 0.26

Parameters:

=

10

01

Σ

,υ=π/4, m=n=200

The proportion of simulations in which LR performs

better (Q index)

Table 5c clearly shows the absence of any effect on the result when we

categorise an insignificant variable (x1). The results in the second and the third

column are practically the same, because categorising only x2 variable is the same

as categorising both.

Table 5c: Simulation results for different number of categories (Figure 4b).

Num. of categ. x1

0.20

0.18

0.22

x2

0.78

0.48

0.34

Both

0.76

0.48

0.34

2

3

4

5 0.20 0.30 0.30

∞ 0.26 0.20 0.20

Parameters:

=

10

01

Σ

,υ=0, m=n=200

Page 13

Comparison of Logistic Regression and Linear…

155

If the study of the categorisation effect is done by taking smaller samples, the

advantages of LDA are greater (see the previous section). Therefore they do not

tail off even in the case of a small number of categories. Table 5d presents the

results of an identical situation as in the lower part of Table 5a, but the sample

size is shrunk to 100 units.

Table 5d: Simulation results for different number of categories (Figure 4c).

The proportion of simulations in which LR performs

better (Q index)

Num. of categ. x1

0.42

0.34

0.24

x2

0.24

0.32

0.22

Both

2

3

4

0.54

0.42

0.26

5 0.24 0.30 0.26

∞ 0.22 0.22 0.20

Parameters:

=

15 . 0

5 . 01

Σ

,υ=0, m=n=100

The results in this table tend to vary a bit. Too small a sample size, and at the

same time a small number of outcomes, causes the results to be unreliable. This is

even more obvious when the Mahalanobis distance is increased, because LR often

has problems with convergence.

5.3 The effect of non-normality

In the case of categorical explanatory variables above, the assumption of normality

has been preserved and only the consequences of discrete measurement have been

studied. Now, we are interested in the robustness of LDA when the normality

assumptions are not met and in how much better can LR be in these cases. As non-

normality is a very broad term, we have confined ourselves to transforming normal

distributions with a Box-Cox transformation and thus making them skewed.

Again we begin with the three situations shown in Figure 4 and transform them

into what is shown in Figure 5.

0510

0

2

4

6

8

10

12

x1

x2

0510

0

2

4

6

8

10

x1

x2

02468

0

2

4

6

x1

x2

Figure 5a, 5b and 5c: Examples of right skewed distributions (to make groups more

discernible, a part of the convex hull has been drawn for each of them).

Page 14

156

Maja Pohar, Mateja Blas, and Sandra Turk

Table 6a: Simulation results for different degree of skewness (Figures 4c, 5c).

Q

CS* LR LDA LR better

0.88

0.78

0.60

0.28

0.44

0.60

0.88

0.96

-0.5

-0.4

-0.2

-0.1

0.1

0.2

0.4

0.3149

0.2685

0.2262

0.1885

0.1269

0.1025

0.0648

0.2969

0.2610

0.2259

0.1920

0.1293

0.1007

0.0494

0.5 0.0505 0.0267

*Coefficient of skewness

Parameters:

Σ

focus on the proportion of simulations where LR does better. Tables 6b, 6c and 6d

show the results for all the three cases we have described in Figures 4 and 5. The

first two columns always show the results when only one of the two explanatory

variables is skewed, while in the third column both are transformed.

The trends we can see are rather similar. When the skewness is small and

therefore the distribution close to normal, LDA performs better. But when the

skewness increases, LR becomes more and more constantly better.

=

15. 0

5. 01

,υ=0, m=n=200

The performance of LDA and LR does not depend on the sign of the skewness.

Therefore we have used the same transformation function to check the impact of

the extent of separation of the groups at the same time. Right skewness thus also

mean less separated groups. This is obvious in Table 6a, as index Q is constantly

decreasing.

To be able to compare LR and LDA solely in terms of skewness we again

Table 6b: Simulation results for different degree of skewness (Figures 4c, 5c).

The proportion of simulations in which LR performs

better (Q index)

CS* x1

0.68

x2

0.74

Both

-0.5

0.88

0.78

0.60

0.28

0.44

0.60

0.88

-0.4 0.50 0.44

-0.2

-0.1

0.1

0.38

0.24

0.28

0.26

0.24

0.28

0.2 0.38

0.52

0.42

0.54

0.4

0.5

0.58 0.64 0.96

*Coefficient of skewness

Parameters:

Σ

=

15 . 0

5. 01

,υ=0, m=n=200

Page 15

Comparison of Logistic Regression and Linear…

157

If both explanatory variables are skewed, the highest value of skewness under

which LDA is still more appropriate is about ±0,2. We can observe that these

boundaries are the same regardless of the separation of the groups.

If only one of the covariates is asymmetric and the other one is left as normal,

the LDA is expectedly more robust – the interval widens a bit and the trends again

remain similar with positive and negative skewness. The same effect on robustness

can be seen by lowering the sample size as discussed in the previous sections.

Table 6d again shows that transforming insignificant variables has no impact

on the results. However, it is impossible to control the simulations to the extent

where we could say anything exact about the boundaries depending on the

significance of the variables.

Table 6c: Simulation results for different degree of skewness (Figures 4a, 5a).

The proportion of simulations in which LR performs

better (Q index)

CS* x1 x2 Both

-0.5

0.72 0.68 0.96

-0.4

0.54 0.58 0.80

-0.2 0.38 0.32

0.62

-0.1

0.1

0.16

0.16

0.18

0.20

0.30

0.32

0.56

0.86

0.2 0.28 0.28

0.4 0.50 0.40

0.5

0.58 0.50 0.94

*Coefficient of skewness

Parameters:

=

10

01

Σ

, υ=π/4, m=n=200

0.26

0.26

0.26

Table 6d: Simulation results for different degree of skewness (Figures 4b, 5b).

The proportion of simulations in which LR performs

better (Q index)

CS* x1 x2

0.92

0.84

0.64

0.32

0.38

0.62

0.92

Both

-0.5 0.26

0.92

0.84

0.64

0.32

0.38

0.60

0.92

-0.4 0.26

-0.2

-0.1

0.1

0.2 0.26

0.4 0.26

0.5 0.26

0.98 0.98

*Coefficient of skewness

Parameters:

=

10

01

Σ

, υ=0, m=n=200