When you have undertaken a Wilcoxon Signed Rank Test on a questionnaire that has been distributed to the same subjects on two different occasions...
And the scores on the questionnaire are on a likert scale e.g. 1 = none, 2 = rarely, 3 = a few times, 4 = often, 5 = very often, should you then undertake a 'descriptives' test in SPSS to get the mean scores to see at which time point the scores were higher? As I have just done this but I have a result of Mean=3.24 (time point 1) and Mean=3.70 (time point 2). Should I round the two numbers off to i) time point 1 mean = 3 (a few times) and ii) time point 2 mean = 4 (often)?
All Answers (36)
Technically you do not have to round off. But according to what you are saying and how the variable is coded, it looks like you can because the way each value represents and decimal may not have any meaning.
However, it is important to note that mean value may mask the truth as to where exactly the change took place. You may also want to look at the score difference (time 1 score time 2 score) so that way you can identify what type of individuals and characterize how and where change had occured.I have tested both ways and in my experience I learned more by looking at the score difference.
Hope this helps.
John, thanks for your advice about rounding off to 1 decimal place but I see Susie's point about rounding off to 1 decimal place as not having much meaning (due to what the whole numbers represent on the corresponding likert scale). Do you agree?
Susie, when you say to report the score difference instead, i.e. 3.24 (time 1 score)  3.70 (time 2 score) = 0.46 what do you mean when you say to identify what type of individuals and characterize how and why change occurred? As I am not sure how to do this when all I have from them is their questionnaire answer (score on the likert scale) from both time points? I have no other information about the subject answering the questionnaire? Please excuse my sillyness if this should be clear to me :)
Sergio, I think the use of a histogram is really useful. When you say to report the mode at each time point, do you mean gathering ALL participants (71 in total) answers to that particular question and reporting the mode on a histogram? As I am hoping to just publish the results straight to peerreviewed manuscript submission, which may take up a lot of room?
Susie, I have age and gender variables, they were children measured initially five years ago (time point one) and this year when they are all teenagers (time point two).
I suggest to plot your data as a bar chart (I apologize for my English: histogram is used for continuous variable), with frequency on vertical axis and category on the horizontal axis, clustered by time point. Using all answers you will estimate the modal value (= the value that appears most often) at each time point for each question. And you will make comments … for each question or a cluster of them if it applies.
You can make the same for each subject, to estimate what is his/her prevalent answer.
If you want to evaluate a one group pretestposttest design and if it makes sense to collapse your data obtaining a dichotomous variable (score 13; score 4 and 5, or as you prefer) you can apply the McNemar test.
For example: n subjects are administered a pretest on a dichotomous dependent variable. Following the pretest, all of the subject are exposed to an experimental treatment, after which they are administered a posttest on the same variable. The hypothesis evaluated is whether or not there is a significant difference between the pretest or posttest scores of subjects on the dependent variable.
I have provided an example (see attachment).
In this case, and maybe in yours, you can see the changing, estimate the direction, and apply a statistical test (McNemar test).
"During the past three months, my family or friends encouraged me not to eat unhealthy foods when I was tempted to do so". The participants have to give one score for their family and one score for their friends using the likert type scale.
They haven't undergone any intervention, I just want to see if there has been any changes in these influences as the children have aged (they are now all teenagers). I delivered the questionnaire to them first of all five years ago when they were still in primary school. Do you still think that I am using and reporting the right means? Many thanks for your advice
LikertType Data::
Central Tendency > Median or mode;
Variability > Frequencies;
Associations > Kendall tau B or C;
Other Statistic> Chisquare.
If you were combining four or more Likerttype items (e.g., four or more questions relating to the influence of family, as a measure of 'family influence') , then you could have proceeded with the following measures:
Likert Scale Data::
Central Tendency> Mean;
Variability> Standard deviation;
Associations > Pearson's r;
Other Statistics> ANOVA, ttest, regression.
Have a look at this website for more info: http://www.joe.org/joe/2012april/tt2.php
Get back with any more queries.
Combined with your original Wilcoxon signed rank test result, which says whether a significant shift in the location of the values has taken place, you can use the 5x5 table to identify where the shift took place (i.e., which offdiagonal cells have high frequencies).
The link you gives deals with independance test, at first glance.
If I understood well Tara's question, the marginal homogeneity test seems more adapted (to compare the frequencies of each score on the Likertscale on the two times; any change, detected by the test, means an change in influences but one can also imagine situations were marginal frequencies are the same despite changes occured, I guess). The test for that is the StuartMaxwell test or the Bhapkar test.
I think the Wilcoxon test for paired samples is also adapted to this problem, unless there is half changes in a way and half changes in the other way.
All these assuming I correctly understood the question...
good discussion so far.. I've one question though. You mentioned that at Time 1 the participants were children and when they completed the survey at Time 2 they were teenagers. Is the instrument you used valid for both developmental ages? For example, how do small children understand the concept of "unhealthy foods" and how does this concept change overtime? How does this impact the scores? just a thought..
http://www2.sas.com/proceedings/forum2008/3822008.pdf
The best I could find quickly for doing the test in SPSS was a workaround...but I could wrong and the newer SPSS may have the test (I use STATA so am not running it myself):
http://zomobo.net/McNemartest
I also found evidence for how to perform in STATA in the STATA bulletin...
http://www.stata.com/products/stb/journals/stb7.pdf
The command and output follows:.
******
mcc3i 35 5 0 15 20 5 10 5 5
Controls
Cases  A B C  Total
++
A  35 5 0  40
B  15 20 5  40
C  10 5 5  20
++
Total  60 30 10  100
3X3 Matched CaseControl Tests
StuartMaxwell Chi2 = 14.00 Pr>chi2(2) = 0.0009
Extend McNemar Chi2 = 15.00 Pr>chi2(3) = 0.0018
FleissEveritt Chi2 = 12.86 Pr>chi2(1) Pre = 0.0003
(ordered cells) Pr>chi2(2) Post = 0.0016
Summary Differences Between Cases and Controls
Diff 1 = 20
Diff 2 = 10
Diff 3 = 10
********
My understanding is only if you have a significant result from one of these, can you proceed to interpret the contingency table, perhaps by Emmanuel's suggestionWilcoxon test for paired samples, but I don't know off hand...but would start there. Of course, hopefully you should be planning what you would want to look at a prior if a significant StuartMaxwell test or the Bhapkar test was found. Also, be aware of the family wise error rate when you do your planningas each test performed increases it. Hopefully Emmanuel or Jeremy can clarify as well if what I've said is in err.
The Chi square test and the Mc Nemar test won't be powerfull any more for an ordinal variable with 5 categories.
Sometimes, we want to study the reproducibility of the scores at time 1 and time 2 and not only the difference between mean, and I think it's a good idea in your case: You can use the Intraclass correlation coefficient, or the Spearman correlation since the scores are not normally distributed.
I do not think that the fact that you are interested on several questions, and not only one, precludes the usage of the paired Wilcoxon test (aka Wilcoxon signed rank test) for each question individually. But it may introduce some discussions about multiplicity corrections, as evoked by Larry.
In addition, for each question that can have more than two answers (that is, all questions with your 15 Likert scale), overall Bhapkar or StuartMaxwell tests revealing something could, as described by Larry, be followed by more specific comparisons for which multiplicity correction must also be discussed.
There exists some variants of the StuartMaxwell tests that handle ordinal variables, so should be comparable to Wilcoxon tests  you should also think that Wilcoxon test and Bhapkar tests may not test exactly the same thing if you have questions with more than two answers:
 Wilcoxon tests will test for a (« median ») shift in the answers with time ;
 Bhapkar test will test for equality of proportions of each answer at each time ("marginal homogeneity" of the contingence table).
One can have situations where tests will give different results because of that. Imagine for instance that you have 3 patients. At t = 0, all of them will answer "3" (on a 5degree scale). At t = 1, answers are (2, 3, 4). The median shift is 0 ==> wilcoxon test should not detect anything. The frequencies of answers change ==> Bhapkar test should detect something (well, not with 3 patients, but this is the idea).
Note also that if all patients swap their answers, both tests will not detect anything, but may be this is informative to you. This could be detected from agreements measures, so it can also be a set of technics that may be of interest for you. The simplest way beeing to study the proportion of offdiagonal patients in your contingency tables.
If I understood the data description we have three dimensions : (i) the subjects, (ii) the questions and (iii) the occasions (incidentally how many subjects or questions?)
The table 5x5 format appears to me inadequate : suppose that for one question the response are binary (Yes/No for instance), then the table is no more feasible. We loose some features of the observations collapsing them in this contingency table. In my opinion it is something related to the fact that a response "5" (as an example) is counted the same whatever the question.
As already pointed a Wilcoxon test (or the like) is also neglecting details as the fact that answers to the questions are given by the same subject, hence correlated. A fact neglected when summing up the Likert scores. Beside the probably numerous ties, which is an issue with this test.
I do not have at the moment a clear vision of a proper way to handle these data. I would suggest to give a look at the Rasch model whose native form was for binary response at one occasion, but since the '50 (or so) it has been certainly developed to accommodate polytomous responses and more than one occasion.
A bold treatment would be to treat the Likert scale as if it were continuous and use and perform an Anova. This is certainly horrifying for any serious statistician, but nevertheless it maintains the three dimensions : subjects, questions and occasions.
below a quote of the (excellent) book
Rost & Langeheine (eds)
Applications of latent traits and latent class models in social sciences
Waxmann
available on line
http://www.ipn.unikiel.de/aktuell/buecher/rostbuch/ltlc.htm
«1.2 The multifacet extension of the Rasch model
In the previous section the dichotomous Rasch model was introduced as a kind of main effects model for two factors, i.e., persons and items. In many applications more than two factors are involved, e.g., when persons are tested at different time points or when the performances or free responses to test items are rated by several judges. In these cases, three factors or facets (persons x items x timepoints, persons x items x judges) or even more facets have to be considered.»
Statistical treatment of questionnaires is more common in psychology than in biometry, but it may worth the walk for biometricians as well.
Careful analysis of the 5x5 table acknowledging the paired structure of your data is what I would recommend. Larry mentioned an extension of the McNenar test to more than three categories implemented in the STATA routine mcc3i. The same tests can be carried out with the built in STATA procedure symmi:
. symmi 35 5 0\15 20 5\10 5 5, exact

 col
row  1 2 3 Total
+
1  35 5 0 40
2  15 20 5 40
3  10 5 5 20

Total  60 30 10 100

chi2 df Prob>chi2

Symmetry (asymptotic)  15.00 3 0.0018
Marginal homogeneity (StuartMaxwell)  14.00 2 0.0009

Symmetry (exact significance probability) 0.0006
In fact, the symmetry test is availabe also in SPSS under
Analyze > Descriptive Statistics > Crosstabs > Statistics > McNemar
For the example Lary provided it will give you a ChiSquare statistic of 15.000 on 3 df and an asymptotic 2sided Pvalue of .002. The name of the extended McNemar test in SPSS is the McNemarBowker Test, just to add to the cunfusion.
Then you obtain for each subject a latent value that represents the affinity of him with the matter at each time. These are continuos variables with logistic or normal distribution. Then you can compare both times using a t test for paired observations.
http://www.statsoft.com/textbook/nonparametricstatistics/
http://sites.google.com/site/deborahhilton/ and then click on favourite links there is a good stats book that I refer to and there is the chapter above and I think I've just found the gold nugget being an answer. Please send me a comment for my website if you use this in the end as I need to get some comments as my work is really slack at the moment. Cheers, Debbie
Spearman R. Spearman R (Siegel & Castellan, 1988) assumes that the variables under consideration were measured on at least an ordinal (rank order) scale, that is, that the individual observations can be ranked into two ordered series. Spearman R can be thought of as the regular Pearson product moment correlation coefficient, that is, in terms of proportion of variability accounted for, except that Spearman R is computed from ranks. I'm also just wondering if the gamma maybe even better, but I know nothing about this but it says it is for tied observations.