ArticlePDF Available

The Chi square test: an introduction


Abstract and Figures

The Chi square test is a statistical test which measures the association between two categorical variables. A working knowledge of tests of this nature are important for the chiropractor and osteopath in order to be able to critically appraise the literature.
Content may be subject to copyright.
Volume 4 • Number 3 • November 1995
An introduction
Abstract: The Chi square test is a statistical test
which measures the association between two
categorical variables. A working knowledge of tests of
this nature are important for the chiropractor and
osteopath in order to be able to critically appraise the
Key Indexing Terms: Chi square, chiropractic,
The constant collation of data in medical research
provides statisticians and researchers with various
types of data. The most recognizable of these is data
in a quantitative form. For example, straight leg
raising (SLR) in subjects able to raise their legs
greater than 0 degrees allows us to calculate the
average SLR for say two groups and perform a t-test.
Unfortunately, not all data is in this quantitative form.
For example, instead of measuring an individuals SLR
we may be interested in the patients’ subjective
improvement (using just “Yes” or “No” responses)
after 2 types of treatment. Can we then calculate the
average improvement for each group and perform a t-
test? Is it possible to calculate the difference between
levels of improvement? Is it possible to calculate the
ratio of improvement?
The answer to all these questions, of course, is a
resounding ‘no’, and other methods need to be
employed. The most common method used to analyze
such data is the Chi Squared (χ
) test of association,
and the outline for the simplest scenario is given
below in table 1.
33 WANTIRNA RD, RINGWOOD, VIC. 3134. PH 03 879 5555
Table 1
Category II
1 2
Category I 1 a b n
2 c d n
In words, the elements of the table are,
a = number of individuals who are of type 1 in
category I and type 1 in category II
b = number of individuals who are of type 1 in
category I and type 2 in category II
c = number of individuals who are of type 2 in
category I and type 1 in category II
d = number of individuals who are of type 2 in
category I and type 2 in category II
= the number of individuals who are of type 1
in category 1
= the number of individuals who are of type 2
in category 1
n = total number of individuals studied
To illustrate this, consider for example two groups of
patients with sciatica who undergo 6 weeks of spinal
manipulative therapy (SMT) or 6 weeks of intermittent
motorized traction (IMT). We wish to know whether
there is an association between improvement and the
type of treatment received for these sciatica patients.
In our example 190 patients receive IMT and 200
receive SMT. After 6 weeks we ask them whether
they have improved. For IMT, 85 reply ‘Yes’ and 85
reply ‘No’, and for SMT 45 reply ‘Yes’ and 155 reply
We can display this data in a 2×2 contingency
(frequency) table, shown in table 2.
Table 2
Yes No
140 250 390
Volume 4 • Number 3 • November 1995
In our example our observations are categorical and
not quantitative, so our focus should move from means
to proportions. We now display the following table
(table 3) to explain.
Table 3
= the proportion of individuals who are of type
1 in category I and type 1 in category II
= the proportion of individuals who are of type
1 in category I and type 2 in category II
= the proportion of individuals who are of type
2 in category I and type 1 in category II
= the proportion of individuals who are of type
2 in category I and type 2 in category II
Notice that p
=1. Thus p
and p
can be
thought of as the way people who are of type 1 in
category 1 are distributed across category 2, and q
and q
can be thought of as the way people who are of
type 2 in category 1 are distributed across category 2.
In an earlier paper (1), it was stated that the statistical
hypothesis of interest is always nothing happens (null
hypothesis). This can be extended to this case by
testing the hypothesis of p
, and p
. That is, the
distribution of individuals across category 2 is the
same for all types of category 1. In other words, the
distribution of individuals across category 2 is
independent of category 1.
To test this hypothesis, we need to compare what
would be expected if the hypothesis were true, against
what has actually been observed.
If we analyse our example above, we observed 140
patients who subjectively improved. This represents
140 out of the total 390 in the trial, or 36%. So, if
there is no association between treatment and
improvement (as hypothesised), then we would expect
36% of each treatment group to improve regardless of
Therefore, using our example again,
36% of 190 = 68 on the IMT should improve, and
36% of 200 = 72 on the SMT should improve.
But what about the “no improvement” patients? We
observed 250 out of the 390 who did not improve (ie
64%). So, if there is no association between treatment
and improvement then we would expect 64% of both
treatment groups not to improve. That is,
64% of 190 = 122 on the IMT should not improve,
64% of 200 = 128 on the SMT should not improve.
So our contingency table can be drawn thus (table 4),
where the figures in brackets are the expected
Table 4
95 (68) 95 (122) 190
45 (72) 155 (128) 200
140 250 390
There exists a simple formula to calculate the expected
value for any cell in the above table.
Equation 1
Expected value = (Row total)×(Column total)/(Grand total)
For example, the expected number of individuals who
receive IMT and improve is,
190×140/390 = 68.2 68
It should be noted that the expected cell frequencies
add up to the same row and column totals as the
observed frequencies. It should also be noted that the
cell frequencies are calculated under the null
hypothesis of no association between treatment and
Having obtained these expected values, we now need
to compare them with what has actually been
observed. To do this, we calculate the χ
which is shown below.
Equation 2
(Observed - Expected)
That is, take each expected value and subtract from the
corresponding expected value. Square this result, and
divide by the corresponding expected value. Calculate
this quantity for each cell in the table, and add
Category II
1 2
Category I
1 p
2 q
Volume 4 • Number 3 • November 1995
The calculations for the example above, are shown
below in table 5.
Table 5
Obs Exp Obs-Exp (Obs-Exp)
95 68 27 729 10.72
95 122 -27 729 5.98
45 72 -27 729 10.13
155 128 27 729 5.70
Thus, the value of χ
is 32.53.
Inspection of the formula for χ
will show that the
value of χ
will be small when the null hypothesis is
true. This is due to the fact that expected values are
calculated under the assumption that the null
hypothesis is true, and that the term (Observed-
Expected) will be small if the observed data lies close
to the expected data. Alternatively, if the null
hypothesis is false, then the expected values will not
be close to the observed values, and the value of χ
will be large.
The question to be addressed now is ‘How large
should χ
be to reject the null hypothesis?’
The value of χ
comes from a Chi Square distribution.
This distribution is defined by 1 parameter, which is
known as the degrees of freedom. The degrees of
freedom is dependent on the size of the table being
studied, and can be calculated using the following
simple formula.
Equation 3
Degrees of freedom = (# Rows - 1) × (# Columns - 1)
A Chi Squared distribution with 1 degree of freedom
is shown in figure 1.
Figure 1
0 1 2 3 4 5 6 7
nb. The range of the horizontal axis is 0 .
The p-value associated with our test (or any Chi
Squared test with a 2×2 table) is the area under the
curve and to the right of the calculated value of Chi
Squared. The area under the curve and to the right of
6.64 is less than 0.01 (or 1%). Since the calculated
value of Chi Squared is 32.53, it is clear that the p-
value is less than 0.01 (2). The conclusion is that we
reject the null hypothesis. That is, the proportion of
improved individuals who received IMT and
improved, is different to the proportion of individuals
who received SMT and improved.
In many trials involving improvement, more than 2
levels of improvement is used. For example, let us
examine a comparison trial between spinal
manipulation with the use of hot packs (Trt 1) and
spinal manipulation with the use of cold packs (Trt 2)
for acute low back pain. For our improvement scale
we could use a 5 point categorical scale such that
shown in table 6.
Table 6
Trt 1
39 43 89 126 87 384
Trt 2
12 32 65 98 65 272
51 75 154 224 152 656
The null hypothesis is that the distribution of
improvement is the same for both treatments.
Expected values need to be calculated first, and
equation 1 can be applied. The expected value for the
Trt 1/None cell is 384×51/656=29.85. For the Trt
1/Mild cell, 384×75/656=43.90 etc. Once all the
expected values are calculated, the value for Chi
Square can be computed (table 7).
Table 7
Obs Exp Obs-Exp
39 29.85 9.15 83.72 2.80
43 43.90 -0.90 0.81 0.02
89 90.15 -1.15 1.32 0.02
126 131.12
-5.12 26.21 0.20
87 88.98 -1.98 3.92 0.04
12 21.15 -9.15 83.72 3.96
32 31.10 0.90 0.81 0.03
65 63.85 1.15 1.32 0.02
98 92.88 5.12 26.21 0.28
65 63.02 1.98 3.92 0.06
Thus, the value of χ
is 7.43.
Using equation 3, the degrees of freedom are (2-1)×(5-
1)=4. A Chi Square distribution with 4 degrees of
freedom looks like.
Volume 4 • Number 3 • November 1995
Figure 2
0 2 4 6 8 10 12 14
The p-value is the area beneath the curve and to the
right of 7.43. This turns out to be 0.1148. If we use a
significance level of 0.05, then we do not reject the
null hypothesis. Therefore there is no difference
between the two treatment outcomes. To interpret this
further, consider table 8, where the data has been
transformed into row percentages.
Table 8
None Mild Noticeable
Trt 1
23.2% 32.8% 22.7%
Trt 2
4.4% 11.8%
23.9% 36.0% 23.9%
Strictly speaking, these distributions differ from each
(10.2%4.4%, 11.2%11.8%,.....,22.7%23.9%).
However, when we consider the possibility of random
error being present in the data, we do not have enough
evidence to state that the differences observed are
indicative of a true underlying difference.
There are key assumptions which need to be adhered
to when using the χ
test. They are,
1. Each individual appears in the table once only.
2. The result for each individual is independent of
all other individuals.
3. The table of expected values should have 80% of
all expected values greater than 5.
The chi-square test is a statistical test of association
between two categorical variables. It is used very
commonly in clinical research and a good
understanding of the test is useful for chiropractors
and osteopaths to be able to critically appraise the
1. Ugoni A. On the subject of hypothesis testing.
COMSIG Review, 1993; 2(2): 45-8.
2. Neave, HR. Statistics Tables for Mathematicians,
Engineers, Economists, and the Behavioural and
Management Sciences. Unwin Hyman Ltd, 1988:
... For three topics of Learning, Game, and Purchase & Order (reported with a bold font in the table), the ratio of women is more than men. Although both men and women share similar topics of user reviews, we investigate their proportion of contribution to each topic using χ 2 test (Bolboacȧ et al. 2011;Ugoni and Walker 1995). First, we form a contingency table for each topic. ...
... Overall, developers have responded to 40,227 user reviews by men (14% of men's reviews) and only to 14,669 user reviews by women (10% of women's reviews). We apply χ 2 test (Bolboacȧ et al. 2011;Ugoni and Walker 1995) on a contingency table given gender on one side and developers' response on the other side. The χ 2 test confirms a statistically significant difference in developers' response with a p-value < 2.2e − 16. ...
Full-text available
User reviews that are posted on the Google Play Store provide app developers with important information such as bug reports, feature requests, and user experience. Developers should maintain their apps while taking user feedback into account to succeed in the competitive market of mobile apps. the Google Play Store provides a star-rating mechanism for users to rate apps on a scale of one to five. Apps that are ranked higher and have higher star ratings are more likely to be downloaded. In this paper, we investigate and compare men’s and women’s participation in user reviews that are posted on the Google Play Store. We analyze 438,707 user reviews of the top 156 Android apps over six months. We find that women give higher star ratings and use more positive sentiment in their reviews than men. Furthermore, women’s reviews receive more likes and are ranked higher in the top 10 by the Google Play Store. For the reviews from which user gender can be inferred, we find that men submit more reviews than women, making reviews by men more likely to be visible to app developers and other users. Past research has shown that app developers respond more to negative reviews with fewer stars. We found that developers respond to a greater percentage of men’s reviews than women’s. The small number of and more positive reviews by women are less likely to be addressed by app developers; thus, the resulting changes to apps will align more with the needs of men users, perhaps causing even less participation by women in the Google Play Store reviews. Our findings suggest that developers should take gender into consideration when responding to reviews to help mitigate a feedback loop of bias. Our observations also suggest a need for future research in this area to understand the motivations of men and women in reviewing apps and how developers respond to reviews.
... In this research work we have conducted deductive research to test our hypothesis, we have surveyed the acceptance of EM from the employees' point of view, and we have collected a quantitative data by distributing an electronic questionnaire sent randomly to 200 persons of different age categories, different educational level to find the acceptance degree with the variation in the age, education, occupation, we have received only 138 answers. After collecting data, our data is analyzed statistically using SPSS (Statistical Package for the Social Science) considering the Chi-Square test (Ugoni and Walker, 1995), to quantify our results and to find the relationship between the acceptance of EM and our research hypothesis, then we have compared our results with similar work performed 12 years ago (Al-Rjoub et al., 2008). Our hypothesis is: Does EM increase the feeling of safety and security in the workplace? ...
... For the Chi-square analysis, the null hypothesis was that the observed frequencies of phq-9 and the pss-10 results in the categories of the independent variables (gender, age, education level, employment status, worries about losing jobs, financial worries, and worries about income), were not different from the expected frequencies (Ugoni & Walker, 1995). Cramer's v was calculated as an effect size measure and a significance level of α = .05 ...
We conducted an exploratory study using a survey inquiring on seven topics on how people were reacting to the COVID-19 pandemic of 2020 aiming to trace a map of symptoms and feelings related to mental health and isolation. 1785 people participated in the survey. Additionally, we applied two psychological scales to analyse depression and stress (prevalent in previous studies). We found that people in isolation during the pandemic presented symptoms related to dissociative disorders, negative affect, and anxiety syndrome. Also, depression and stress had a high prevalence compared to the average rates indicated by the World Health Organisation and the Colombian Ministry of Health. The results indicated an association between depression and stress and being previously diagnosed with a mental health problem, and job and financial situation. Our results highlight the need to design prevention and intervention programmes to reduce the negative consequences of isolation.
... Chi-square tests have been used in the determination of the source of results for more than a hundred years (Sharpe, 2015). In research, the examination of cross-classified category data is very common and the chisquare test of Karl Pearson's family is one of the most utilized statistical analyses of the association of categorical variables (Franke and Christie, 2012;Ugoni and Walker, 1995). When experimental frequencies are compared with theoretical frequencies based on hypotheses, the chi-square test is widely applicable (Tallarida and Murray, 1987). ...
Full-text available
This study aims to explore the relationship and trend among the macro variables (total revenues, tax revenues, non-tax revenues, direct tax, indirect tax, per capita income, GDP, and annual budget) of Bangladesh in successive periods. In this study, we used secondary data from the fiscal year 1972-1973 to 2019-2020 from the National Board of Revenues and Ministry of Finance. Chi-Square Test and Durbin Watson Test have been executed in this study. The result of the study revealed that total non-tax revenues, total indirect tax, and annual budget have statistically significant associations with annual GDP, whereas, total direct tax and per capita income have no statistically significant impacts on annual GDP in Bangladesh. The time factors also have an impact on the GDP and government revenue generation. Next, we found that total revenues of the government, total non-tax revenues, total direct tax, and total indirect tax has a significant positive association but per capita income has no impact on revenues of the government. However, government revenues have a significant impact on the annual budget influenced by time factors. The findings of the study can be used in policymaking in raising government revenues in consideration of macro-variables in Bangladesh.
... Prior to the analysis, we used Cronbach's alpha to calculate the internal reliability of the three categories (socio-cultural, economic, and environmental). The chi-square test is a statistical test for assessing whether two variables are associated with each other (Ugoni and Walker 1995;Rana and Singhal 2015). The chi-square test was used for evaluating whether the prioritization of coastal management and involvement in tourism sector-related occupations could be associated with local residents' perceptions of tourism impacts. ...
Full-text available
In Indonesia, tourism has become a promising major economic sector, particularly because of its contributions toward developing the economy and creating employment opportunities for local communities with rich coastal ecosystems. However, the balance between the environmental, social, and economic realms has come into question, as unsustainable tourism practices continue to be promoted in Indonesia. To address such challenges, it is important to identify tourism impacts and provide sustainable policies and plans. Communities often record tourism impacts through their perceptions and act as important stakeholders in the process of sustainable tourism development. We examined tourism impacts on coastal ecosystems in Karimunjawa from the perspective of local communities. More comprehensively, we investigated their perceptions from three perspectives: socio-cultural, economic, and environmental. The study results revealed that the respondents held positive perceptions about tourism's impact on socio-cultural and economic sectors and negative perceptions about its impact in the environmental domain. A chi-square test and Spearman's correlation analysis indicated that the respondents' educational attainment and tourism involvement influenced their perceptions on these issues. The current study results could be used as a baseline reference for contextualizing sustainable tourism plans regarding small island ecosystems in Indonesia. Supplementary information: The online version contains supplementary material available at 10.1007/s11852-022-00852-9.
... Chi-square independence test was used to evaluate if students' profile has a significant relationship with their knowledge and perspective on biodiversity. Chi-square is a statistical test that is used to measure the association between two categorical variables (Ugoni & Walker, 1995). The students' profile, a categorical variable, tested were age, gender, department, and their environmentalrelated subject (whether or not they have taken it). ...
Full-text available
The study aimed to assess the students’ extent of knowledge and identify their perspectives towards biodiversity and its protection and conservation. A total of 268 randomly selected students at Aurora State College of Technology were involved in the study. Survey questionnaires were used to obtain data and information which were subjected to statistical tests. The students had a moderate knowledge level on biodiversity with a mean score of 6.65 out of 10 items (SD = 1.50). Their perspective on biodiversity was leaning toward its protection and conservation, with a mean score of 7.2 out of 10 items (SD = 1.29). Factors affecting the students’ knowledge were gender (p = .003) and academic department (p = 0.003). Females and those associated with the Department of Forestry and Environmental Sciences and Department of Industrial technology were found to have more knowledge than the others. Males, on the other hand, were found to have more positive perspectives towards biodiversity. Knowledge and perspective had a weak correlation with r = 0.39. Students were not well-aware, but were in support of the Philippines’ biodiversity-related laws, which could help shape their mindset and actions towards biodiversity conservation and protection. Thus, the college administration must revisit the curricula of all degree programs and ensure that students from each degree program are environmentally educated, emphasizing biodiversity conservation. Keywords: Biodiversity education, Biodiversity conservation, Biodiversity protection, Knowledge, Perspective
... Participant characteristics were initially described in terms of frequencies and percentages for categorical variables, while summary statistics such as means, medians, standard deviations and quartiles were calculated for continuous variables. Bivariate associations amongst categorical variables were estimated using Chi-square (Chi 2 ) test [49] and mean and median differences in continuous variables between the target and comparison populations were determined using T-test and Mann-Whitney U-test respectively [50]. ...
Full-text available
Background Maintaining optimal adherence and viral suppression in people living with HIV (PLWH) is essential to ensure both preventative and therapeutic benefits of antiretroviral therapy (ART). Prisoners bear a particularly high burden of HIV infection and are highly likely to transmit to others during and after incarceration. However, the level of treatment adherence and viral suppression in incarcerated populations in low-income countries is unknown. This study aimed to determine factors affecting optimal adherence to antiretroviral therapy and viral suppression amongst HIV-infected prisoners in South Ethiopia. Methods A comparative cross-sectional study was conducted between June 1, 2019 and May 31, 2020 to compare the level of adherence and viral suppression between incarcerated and non-incarcerated PLWH. Patient information including demographic, socio-economic, behavioral, and incarceration-related characteristics were collected using a structured questionnaire. Medication adherence was assessed according to self-report and pharmacy refill. Plasma viral load measurements undertaken within the study period were prospectively extracted to determine viral suppression. Univariate and multivariate logistic and fractional regression models were used to analyse data. Results Seventy-four inmates living with HIV (ILWH) and 296 non-incarcerated PLWH participated in the study. While ILWH had a significantly higher pharmacy refill adherence compared to non-incarcerated PLWH (89 vs 75%), they had a slightly lower dose adherence (81% vs 83%). The prevalence of viral non-suppression was also slightly higher in ILWH (6.0%; 95% confidence interval (CI): 1.7–14.6%) compared to non-incarcerated PLWH (4.5%; 95%CI: 2.4–7.5%). Overall, missing ART appointments, dissatisfaction with ART services, inability to comply with a specified medication schedule, and types of methods used to monitor the schedule (e.g., news time on radio/TV or other social cues) were significantly associated with non-adherence according to self-report. In ILWH specifically, accessing ART services from a hospital, inability to properly attend clinic appointments, depressive symptoms, and lack of social support predicted NA. Viral non-suppression was significantly higher in males, people of age 31to 35 years and in those who experienced social stigma, regardless of their incarceration status. Conclusions Sub-optimal dose adherence and viral suppression are generally higher in HIV-infected prisoners in South Ethiopia compared to their non-incarcerated counterparts. A multitude of factors were found to be responsible for this requiring multilevel intervention strategies focusing on the specific needs of prisoners.
Full-text available
Patients wait at the outpatient department (OPD) for a long time to see the doctor and different patients react differently in response to being awaited for a long time. In this regard, this empirical research was conducted to analyze the association of comfortable waiting time (CWT) of patients with the various patients demographics. A questionnaire was used to collect the data. Two hundred twenty (220) questionnaires were distributed among patients, two hundred ten (210) of them were returned, and above 10 were uncompleted. The statistical software for social sciences (SPSS) version 22 was used to examine the data. To reveal the association between the comfortable waiting time of patients and various demographics, the Pearson chi-square test was used and phi was used to find the magnitude of the revealed relationship. As per the interpretation of the results, the association of age groups (0.422), OPD visiting experience (0.387), geographical region (0.789), OPD visiting/accompanying status (0.442), and income class (0.325) was found to be significant with comfortable waiting times.
The challenge to refine the spontaneity and productivity of a machine and human coherence, speech emotion recognition has been an overriding area of research. The trustability and fulfillment of such emotion recognition are largely involved with the feature extraction and selection processes. An important role is played in exploring and distinguishing audio content during the feature extraction phase. Also, the features that have been extracted should be tough to a number of disturbances and reliable enough for an adequate classification system. This paper focuses on three main components of a Speech Emotion Recognition (SER) Process. The first one is the optimal feature extraction method for Punjabi SER system. The second one is the use of an appropriate feature selection method that desires to select effectual features from the ones extracted in the first step, and removes the redundant features, to improve the conduct of emotion recognition. The third one is the classification model that has been used further for emotion recognition. So, the scope of this paper is to explain the three main steps of Punjabi SER system, feature extraction, feature selection, and emotion recognition with classifier. The results have been calculated and compared for number of feature set combinations, with and without feature selection process. A total of 10 experiments are carried out and various performance metrics such as precision, recall, F1-score, accuracy, etc. are used to demonstrate the results.
In this paper, the definition of a statistical hypothesis is discussed, and the considerations which need to be addressed when testing a hypothesis. In particular, the p-value, significance level, and power of a test are reviewed. Finally, the often quoted confidence interval is given a brief introduction.