Content uploaded by Bruce F Walker

Author content

All content in this area was uploaded by Bruce F Walker

Content may be subject to copyright.

COMSIG REVIEW

Volume 4 • Number 3 • November 1995

61

THE CHI SQUARE TEST

An introduction

ANTONY UGONI B.Sc. (Hons).

¬

BRUCE F. WALKER D.C., M.P.H.

†

Abstract: The Chi square test is a statistical test

which measures the association between two

categorical variables. A working knowledge of tests of

this nature are important for the chiropractor and

osteopath in order to be able to critically appraise the

literature.

Key Indexing Terms: Chi square, chiropractic,

osteopathy.

THE CHI SQUARE TEST

The constant collation of data in medical research

provides statisticians and researchers with various

types of data. The most recognizable of these is data

in a quantitative form. For example, straight leg

raising (SLR) in subjects able to raise their legs

greater than 0 degrees allows us to calculate the

average SLR for say two groups and perform a t-test.

Unfortunately, not all data is in this quantitative form.

For example, instead of measuring an individuals SLR

we may be interested in the patients’ subjective

improvement (using just “Yes” or “No” responses)

after 2 types of treatment. Can we then calculate the

average improvement for each group and perform a t-

test? Is it possible to calculate the difference between

levels of improvement? Is it possible to calculate the

ratio of improvement?

The answer to all these questions, of course, is a

resounding ‘no’, and other methods need to be

employed. The most common method used to analyze

such data is the Chi Squared (χ

2

) test of association,

and the outline for the simplest scenario is given

below in table 1.

¬

DEPARTMENT OF PUBLIC HEALTH & COMMUNITY MEDICINE.

THE UNIVERSITY OF MELBOURNE. PARKVILLE. VIC, 3052.

†

PRIVATE PRACTICE.

33 WANTIRNA RD, RINGWOOD, VIC. 3134. PH 03 879 5555

Table 1

Category II

1 2

Category I 1 a b n

1

=a+b

2 c d n

2

=c+d

n=n

1

+n

2

In words, the elements of the table are,

a = number of individuals who are of type 1 in

category I and type 1 in category II

b = number of individuals who are of type 1 in

category I and type 2 in category II

c = number of individuals who are of type 2 in

category I and type 1 in category II

d = number of individuals who are of type 2 in

category I and type 2 in category II

n

1

= the number of individuals who are of type 1

in category 1

n

2

= the number of individuals who are of type 2

in category 1

n = total number of individuals studied

To illustrate this, consider for example two groups of

patients with sciatica who undergo 6 weeks of spinal

manipulative therapy (SMT) or 6 weeks of intermittent

motorized traction (IMT). We wish to know whether

there is an association between improvement and the

type of treatment received for these sciatica patients.

In our example 190 patients receive IMT and 200

receive SMT. After 6 weeks we ask them whether

they have improved. For IMT, 85 reply ‘Yes’ and 85

reply ‘No’, and for SMT 45 reply ‘Yes’ and 155 reply

‘No’.

We can display this data in a 2×2 contingency

(frequency) table, shown in table 2.

Table 2

Improved

Yes No

IMT

95

a

95

b

190

SMT

45

c

155

d

200

140 250 390

THE CHI SQUARE TEST

UGONI / WALKER

COMSIG REVIEW

62

Volume 4 • Number 3 • November 1995

In our example our observations are categorical and

not quantitative, so our focus should move from means

to proportions. We now display the following table

(table 3) to explain.

Table 3

where

p

1

= the proportion of individuals who are of type

1 in category I and type 1 in category II

p

2

= the proportion of individuals who are of type

1 in category I and type 2 in category II

q

1

= the proportion of individuals who are of type

2 in category I and type 1 in category II

q

2

= the proportion of individuals who are of type

2 in category I and type 2 in category II

Notice that p

1

+p

2

=q

1

+q

2

=1. Thus p

1

and p

2

can be

thought of as the way people who are of type 1 in

category 1 are distributed across category 2, and q

1

and q

2

can be thought of as the way people who are of

type 2 in category 1 are distributed across category 2.

In an earlier paper (1), it was stated that the statistical

hypothesis of interest is always nothing happens (null

hypothesis). This can be extended to this case by

testing the hypothesis of p

1

=q

1

, and p

2

=q

2

. That is, the

distribution of individuals across category 2 is the

same for all types of category 1. In other words, the

distribution of individuals across category 2 is

independent of category 1.

To test this hypothesis, we need to compare what

would be expected if the hypothesis were true, against

what has actually been observed.

If we analyse our example above, we observed 140

patients who subjectively improved. This represents

140 out of the total 390 in the trial, or 36%. So, if

there is no association between treatment and

improvement (as hypothesised), then we would expect

36% of each treatment group to improve regardless of

management.

Therefore, using our example again,

36% of 190 = 68 on the IMT should improve, and

36% of 200 = 72 on the SMT should improve.

But what about the “no improvement” patients? We

observed 250 out of the 390 who did not improve (ie

64%). So, if there is no association between treatment

and improvement then we would expect 64% of both

treatment groups not to improve. That is,

64% of 190 = 122 on the IMT should not improve,

and

64% of 200 = 128 on the SMT should not improve.

So our contingency table can be drawn thus (table 4),

where the figures in brackets are the expected

frequencies.

Table 4

Improved

YES NO

IMT

95 (68) 95 (122) 190

SMT

45 (72) 155 (128) 200

140 250 390

There exists a simple formula to calculate the expected

value for any cell in the above table.

Equation 1

Expected value = (Row total)×(Column total)/(Grand total)

For example, the expected number of individuals who

receive IMT and improve is,

190×140/390 = 68.2 ≈ 68

It should be noted that the expected cell frequencies

add up to the same row and column totals as the

observed frequencies. It should also be noted that the

cell frequencies are calculated under the null

hypothesis of no association between treatment and

improvement.

Having obtained these expected values, we now need

to compare them with what has actually been

observed. To do this, we calculate the χ

2

statistic,

which is shown below.

Equation 2

χ

2

=

2

(Observed - Expected)

Expected

∑

That is, take each expected value and subtract from the

corresponding expected value. Square this result, and

divide by the corresponding expected value. Calculate

this quantity for each cell in the table, and add

together.

Category II

1 2

Category I

1 p

1

p

2

2 q

1

q

2

THE CHI SQUARE TEST

UGONI / WALKER

COMSIG REVIEW

Volume 4 • Number 3 • November 1995

63

The calculations for the example above, are shown

below in table 5.

Table 5

Obs Exp Obs-Exp (Obs-Exp)

2

(Obs-Exp)

2

/Exp

95 68 27 729 10.72

95 122 -27 729 5.98

45 72 -27 729 10.13

155 128 27 729 5.70

32.53

Thus, the value of χ

2

is 32.53.

Inspection of the formula for χ

2

will show that the

value of χ

2

will be small when the null hypothesis is

true. This is due to the fact that expected values are

calculated under the assumption that the null

hypothesis is true, and that the term (Observed-

Expected) will be small if the observed data lies close

to the expected data. Alternatively, if the null

hypothesis is false, then the expected values will not

be close to the observed values, and the value of χ

2

will be large.

The question to be addressed now is ‘How large

should χ

2

be to reject the null hypothesis?’

The value of χ

2

comes from a Chi Square distribution.

This distribution is defined by 1 parameter, which is

known as the degrees of freedom. The degrees of

freedom is dependent on the size of the table being

studied, and can be calculated using the following

simple formula.

Equation 3

Degrees of freedom = (# Rows - 1) × (# Columns - 1)

A Chi Squared distribution with 1 degree of freedom

is shown in figure 1.

Figure 1

0 1 2 3 4 5 6 7

nb. The range of the horizontal axis is 0 → ∞.

The p-value associated with our test (or any Chi

Squared test with a 2×2 table) is the area under the

curve and to the right of the calculated value of Chi

Squared. The area under the curve and to the right of

6.64 is less than 0.01 (or 1%). Since the calculated

value of Chi Squared is 32.53, it is clear that the p-

value is less than 0.01 (2). The conclusion is that we

reject the null hypothesis. That is, the proportion of

improved individuals who received IMT and

improved, is different to the proportion of individuals

who received SMT and improved.

In many trials involving improvement, more than 2

levels of improvement is used. For example, let us

examine a comparison trial between spinal

manipulation with the use of hot packs (Trt 1) and

spinal manipulation with the use of cold packs (Trt 2)

for acute low back pain. For our improvement scale

we could use a 5 point categorical scale such that

shown in table 6.

Table 6

None

Mild

Noticeable

Definite

Complete

Trt 1

39 43 89 126 87 384

Trt 2

12 32 65 98 65 272

51 75 154 224 152 656

The null hypothesis is that the distribution of

improvement is the same for both treatments.

Expected values need to be calculated first, and

equation 1 can be applied. The expected value for the

Trt 1/None cell is 384×51/656=29.85. For the Trt

1/Mild cell, 384×75/656=43.90 etc. Once all the

expected values are calculated, the value for Chi

Square can be computed (table 7).

Table 7

Obs Exp Obs-Exp

(Obs-Exp)

2

(Obs-Exp)

2

/Exp

39 29.85 9.15 83.72 2.80

43 43.90 -0.90 0.81 0.02

89 90.15 -1.15 1.32 0.02

126 131.12

-5.12 26.21 0.20

87 88.98 -1.98 3.92 0.04

12 21.15 -9.15 83.72 3.96

32 31.10 0.90 0.81 0.03

65 63.85 1.15 1.32 0.02

98 92.88 5.12 26.21 0.28

65 63.02 1.98 3.92 0.06

7.43

Thus, the value of χ

2

is 7.43.

Using equation 3, the degrees of freedom are (2-1)×(5-

1)=4. A Chi Square distribution with 4 degrees of

freedom looks like.

THE CHI SQUARE TEST

UGONI / WALKER

COMSIG REVIEW

64

Volume 4 • Number 3 • November 1995

Figure 2

0 2 4 6 8 10 12 14

The p-value is the area beneath the curve and to the

right of 7.43. This turns out to be 0.1148. If we use a

significance level of 0.05, then we do not reject the

null hypothesis. Therefore there is no difference

between the two treatment outcomes. To interpret this

further, consider table 8, where the data has been

transformed into row percentages.

Table 8

None Mild Noticeable

Definite

Complete

Trt 1

10.2%

11.2%

23.2% 32.8% 22.7%

Trt 2

4.4% 11.8%

23.9% 36.0% 23.9%

Strictly speaking, these distributions differ from each

(10.2%≠4.4%, 11.2%≠11.8%,.....,22.7%≠23.9%).

However, when we consider the possibility of random

error being present in the data, we do not have enough

evidence to state that the differences observed are

indicative of a true underlying difference.

There are key assumptions which need to be adhered

to when using the χ

2

test. They are,

1. Each individual appears in the table once only.

2. The result for each individual is independent of

all other individuals.

3. The table of expected values should have 80% of

all expected values greater than 5.

CONCLUSION

The chi-square test is a statistical test of association

between two categorical variables. It is used very

commonly in clinical research and a good

understanding of the test is useful for chiropractors

and osteopaths to be able to critically appraise the

literature.

REFERENCES

1. Ugoni A. On the subject of hypothesis testing.

COMSIG Review, 1993; 2(2): 45-8.

2. Neave, HR. Statistics Tables for Mathematicians,

Engineers, Economists, and the Behavioural and

Management Sciences. Unwin Hyman Ltd, 1988:

42-3