Content uploaded by Aurora Villarroel
Author content
All content in this area was uploaded by Aurora Villarroel on Jul 16, 2018
Content may be subject to copyright.
260
© 2007 The Authors. Journal compilation © 2007 ESVD and ACVD.
18
; 260–266
Blackwell Publishing Ltd
Frequency of appropriate and inappropriate
presentation and analysis methods of ordered
categorical data in the veterinary dermatology
literature from January 2003 to June 2006
Jon D. Plant*, Jack N. Giovanini† and
Aurora Villarroel*
*Department of Clinical Sciences, College of Veterinary Medicine,
Oregon State University, Corvallis, Oregon, USA
†Department of Statistics, College of Science, Oregon State
University, Corvallis, Oregon, USA
Correspondence: J.D. Plant, 287 Magruder Hall, College of Veterinary
Medicine, Oregon State University, Corvallis, Oregon, 97331, USA.
Tel. 1541 737 4810; Fax: 1541 737 4818;
E-mail: jon.plant@oregonstate.edu
What is known about the topic of this paper
•Ordered categorical level data do not contain as much
information as interval level data.
•Ordered categorical scales require statistical method
-
ology different than that required for continuous data in
order to be consistent with the structure of the data.
•Ordinal data are frequently presented and analysed with
inappropriate methods in the human medical literature.
What this paper adds to the field of veterinary
dermatology
•Ordered categorical scales are frequently reported in the
recent veterinary dermatology literature.
•The frequency of inappropriate presentation and analysis
methods of ordered categorical scales is similar to that
reported in the human medical literature.
Abstract
Clinical outcomes that are difficult to measure directly
are often graded with ordinal scales in the veterinary
dermatology literature to approximate objective
evaluation. Ordered categorical scales require statistical
presentation and analysis methods consistent with
the structure of the data. The objective of this study
was to determine the frequency of inappropriate
presentation and analysis methods of ordered cate-
gorical data in the recent veterinary dermatology
literature. A total of 62 articles published between 1
January 2003 and 30 June 2006 in 16 journals reported
categorical scales and were included in the study.
The presentation and analysis methods of ordered
categorical data were classified as appropriate or
inappropriate based on published recommendations.
Forty articles (64.5%) utilized a median of four ordinal
scales (range 1–13). Inappropriate presentation methods
of ordered categorical data were identified in 23 of 40
articles (57.5%). These included reporting inappropriate
summary statistics (
n
= 17) and summation of ad hoc
numerical rating scales (
n
= 15). Inappropriate analytical
methods were used in nine of 40 articles (22.5%).
These included inappropriate use of
t
-tests (
n
= 3) and
analysis of variance (
ANOVA
,
n
= 6). The frequency of
inappropriate presentation and analysis methods of
ordered categorical data in the veterinary dermatology
literature is similar to that reported for several fields
in the human medical literature. In order to reduce the
likelihood of making unwarranted implications or con-
clusions regarding ordinal data, authors should follow
established guidelines for methods of presentation
and analysis of ordered categorical scales.
Accepted 22 May 2007
Introduction
Veterinarians and physicians often use ordered categorical
scales to approximate an objective evaluation of outcome
variables for which precise measurements on continuous
scales are not available.
1–3
One advantage of assigning
numerical values is that the severity of conditions can
be ranked. However, these numerical values cannot be
analysed as continuous data, because the values assigned
in ordered categorical scales describe a ranking, not a
measurement. Clinical outcomes measured with ordinal
scales are often presented and analysed with inappropriate
statistical methods in the medical literature. In the fields of
anaesthesia, rheumatology, and nursing, several studies
indicated that ordinal data were presented appropriately
in only 39–49% and analysed appropriately in 57–63% of
journal articles.
4–6
Data can be classified into four increasing levels of quality,
as described by Stevens.
7
Nominal data represent the
lowest quality level because they simply describe the
nature of the data. There is no ‘numerical’ value attached
to them. Examples include breed, colour and diagnosis.
Because of the descriptive nature, the categories cannot
be ranked. In other words, a black animal is different than
a white one, but neither is ‘higher’ or is ‘more’ than the
other. The next quality level is represented by ordinal data,
which can be ranked in mutually exclusive hierarchical
categories. Ordinal data may be graded with scales incor-
porating groups such as ‘mild’, ‘moderate’, and ‘severe.’
The main limitation of these data is that the interval or
distance between the groups is unknown. The third and
fourth levels are represented by continuous or ‘numerical’
data with equal spacing between adjacent ranks and are
© 2007 The Authors. Journal compilation © 2007 ESVD and ACVD.
261
Statistical methods in veterinary dermatology
called interval and ratio scales.
8,9
Interval scales are
characterized by having an arbitrary zero value. In contrast,
ratio scales have a meaningful zero reference point.
Measurement of body temperature is an example of an
interval scale (zero degrees Celsius is an arbitrary point),
while length is a ratio scale, with zero signifying a complete
absence of distance.
It is important to understand the differences between
these types of data, because the various data types
require different data presentation and statistical analysis
methods to account for the intrinsic characteristics of the
data. For example, it is inappropriate to calculate the mean
or average colour of a group of dogs that include white,
yellow and black dogs. It is possible, however, to determine
the most frequent colour (mode). Similarly, it is inappropriate
to calculate the mean severity of a lesion in a group of
animals that have mild, moderate and severe lesions.
Although these grading scores are mutually exclusive and
encompass all possible outcomes, they do not represent
equal spacing between adjacent ranks, as occurs in interval
or ratio data. Accordingly, calculating the sum, product,
mean, or standard deviation of ordinal data is not appropriate
because these functions assume that there is equal spacing
between adjacent values.
3,10
Ranking ordinal data into alphabetical categories (e.g. A
through E) makes these limitations more intuitive than
ranking the same data numerically (e.g. 1 though 5); adding
or multiplying letters together does not make sense. An
appropriate method to describe ordinal data is the median,
which represents the middle value of an ordered data set
(i.e. half of the values will be lower and half of the values
will be higher than the median). For example, assume a
lesion severity score with five possible values: 1–5, signify-
ing mild through severe. If 100 lesions were evaluated and
the median post-treatment severity score were 2, it would
mean that more than half of the lesions would be scored
as a 1 or a 2, while the remainder would include scores of
2, 3, 4 and/or 5. It would not, however, be justified to state
that a post-treatment score of 2 represents a 50% reduction
in lesion severity from a pretreatment score of 4, as this
would require multiplication or division of ordinal data.
Parametric statistical methods (e.g.
t
-tests and
ANOVA
)
that are used to analyse interval or ratio data assume
normality of the data.
3,11
However, ordinal data do not follow
a normal (Gaussian) distribution and cannot be analysed
with these methods. Presentation and analysis of ordered
categorical data with methods that are inconsistent with the
structure of the data may lead to unjustified implications
and conclusions.
3
Appropriate statistical presentation and
analysis methods of ordinal scales account for the limita-
tions of the data while preserving the information inherent
in the rankings.
Lesion severity,
1
pruritus severity,
2
intradermal test
reactivity,
12
and global assessment scales,
13
are examples
of ordered categorical scales used in veterinary dermatology.
Many scales are used in an ad hoc manner for lack of a
validated alternative. When used in an ad hoc manner, the
accuracy and reliability of scales are uncertain. Scale validity
and reliability testing require a significant investment of
resources.
14
Often numerical rating scales (NRS) using
nonnegative integers (e.g. 0–5) are reported,
1,15
although
visual analogue scales (VAS) are also employed in order
to approximate a continuous range of possible values
(e.g. mm per 100 mm).
16,17
However, VAS are not a direct
measurement of the nominal phenomenon (e.g. pruritus)
and should not be treated as continuous data.
3,18,19
Composite scores or indices that are derived from the
summation of disparate ordinal scales are commonly
utilized in medical research. In the field of veterinary
dermatology, the Canine Atopic Dermatitis Extent and
Severity Index (CADESI) or ad hoc modifications of the
CADESI are often reported.
1,2,20
The CADESI is derived by
summing erythema, excoriation, and lichenification scores
(0–3) from 40 body sites. Doing so implicitly attributes equal
significance to each site and lesion type. The NRS are
summed, despite the ordinal nature of the data. By utilizing
a composite scale the researcher is at risk of losing the
ordinal information contained in the component scales.
18
These limitations can potentially be overcome by weighting
individual components and testing the scale’s reliability
and validity.
14
To the authors’ knowledge, only a single,
small-scale, preliminary CADESI reliability study with few
details provided had been published until the recent
description and validation of the new CADESI-03.
21,22
Well-designed studies with proper statistical presentation
and analysis methods are essential given the prominence
of evidence-based medicine in veterinary dermatology.
23,24
Because of the extensive use of ordinal scales, researchers,
reviewers, editors and readers of the veterinary dermatology
literature should be familiar with the statistical properties
governing ordinal data. Yet, upon reviewing the curricula of
29 North American veterinary colleges, we found that only
six incorporate a course in biostatistics, potentially leaving
many veterinarians poorly prepared to critically evaluate
statistical methodology.
The frequency of inappropriate presentation and analysis
methods of ordinal data in the veterinary dermatology
literature has not been previously reported. Thus, the
objective of this study was to determine the frequency of
inappropriate ordered categorical scale presentation and
analysis methods in the recent veterinary dermatology
literature.
Materials and methods
Selection of publications
Prospective clinical trials in the field of veterinary dermatology
published since January 2003 were included in this study. We
searched for articles with PubMed using the terms (‘veterinary’
[Subheading] OR ‘Veterinary Medicine’[MeSH] OR ‘Animals’[MeSH]
NOT ‘Humans’[MeSH]) AND (‘Dermatology’[MeSH Major Topic] OR
‘Skin Diseases’[MeSH Major Topic]). A search limited to English
language, clinical trials, and randomized controlled trials published
from 1 January 2003 to 30 June 2006 returned 74 articles in 23 journals.
Twelve of these were judged to be either outside the usual scope of
veterinary dermatology (
n
= 11, e.g. mastitis, human psoriasis, and
oligogalactic syndrome) or were case reports (
n
= 1). The remaining
62 articles from 16 journals were screened for the use of one or more
ordered categorical outcome variables.
Review of statistical methods
We defined appropriate presentation and analysis methods of ordinal
outcome data as conforming to the recommendations summarized
in Table 1.
5,18,25,26
A veterinary dermatologist (JDP) and a statistician
(JNG) classified articles as reporting either appropriate or inappropriate
presentation methods and as reporting either appropriate or inappropriate
262
© 2007 The Authors. Journal compilation © 2007 ESVD and ACVD.
Plant et al.
analysis methods. When presentation or analysis results were not
reported, articles were classified as having none. The presentation of
ad hoc composite scores was classified as inappropriate unless each
component scale was also reported. However, if ad hoc composite
scores were analysed according to the recommendations in Table 1,
the analysis method was considered appropriate. The number of
articles which presented graphs of ordinal data depicting data points
connected by lines was recorded.
Citation per article and journal impact factor
The ISI Web of Science® was accessed on 2 November 2006 to
determine the number of citations to each article. Additionally, the
2005 Journal Citation Reports® journal impact factor (JIF) for the
source journal was recorded. JIF is a measure of the average number
of times one journal’s articles are referenced in the 2 years following
publication.
27
The 2005 JIF were used in the analysis because they
were the most recently published at the time this study was con-
ducted. Within disciplines, JIF usually change slowly.
27
Statistical analysis
The frequency of appropriately and inappropriately presented and
analysed articles during the period of January 2003 to June 2006 was
expressed as a percentage of all articles reporting ordinal outcome
variables. The frequency of specific inappropriate methodologies was
determined in the same manner. The median number of citations
and median JIF were compared between those articles containing
appropriate and inappropriate presentation methods, and between
those containing appropriate and inappropriate analysis methods by
use of the Mann–Whitney
U
-test. Articles from journals not covered
in the veterinary science section of the Journal Citation Report® (
n
= 2)
were excluded from JIF analysis, because comparisons between
different disciplines are not valid. Statistical significance was set at
5% (two-sided
P
< 0.05). Calculations were performed with a statistical
software program (StatsDirect, StatsDirect Ltd, Cheshire, UK).
Statistical simulation
To assess the probability that inappropriately analysing pruritus NRS
data would lead to a type I error (rejecting a true null hypothesis), a
simulation was carried out using a statistical software program (
SAS
,
version 9.1; SAS Institute, Cary, NC, USA). Two groups of 15 hypothetical
subjects were randomly assigned integers from 0 to 5 using a discrete
uniform distribution. Both the two-sample
t
-test and the exact Wilcoxon
rank sum test were performed on the data set to compare groups
and the two sided
P
-values were recorded. A total of 10 000 iterations
were run. McNemar’s test was performed to determine if the two
tests resulted in different conclusions. Statistical significance was set
at 5% (two-sided
P
< 0.05).
Results
Forty of the 62 evaluated articles (64.5%) from 16 journals
reported ordinal outcome data. In these 40 reports, the
median number of ordinal scales used was 4 (range 1–13).
NRS were more frequently used than VAS (Table 2). For
both grading methods, severity of lesions and pruritus
were the most commonly reported variables.
Of the 40 articles reporting ordinal outcome data, 23
(57.5%) showed an inappropriate presentation of the data
(Table 3). The most common error was the presentation of
a mean value for ordinal data. Statistical analysis methods
were inappropriate in nine articles (22.5%). The use of
ANOVA
to analyse ordinal data was the most common error for the
analysis. Additionally, five articles (12.5%) did not report
data analysis, and thus could not be classified as either
appropriate or inappropriate.
The use of inappropriate presentation methods was
about 2.5 times more frequent than the use of inappropriate
analysis methods in the articles examined (Table 3). Nine
articles used two inappropriate presentation methods
each. Eleven articles included graphs of ordinal data in
which data points were connected by lines. None of the
nine articles using inappropriate analysis methods reported
the raw data required to reanalyse the data appropriately.
Table 1. Appropriate methods of presentation and analysis of
ordered categorical data
Category Method
Presentation Median
Range or interquartile range
Percentage within each rank
of a numerical rating scale
Analysis Wilcoxon signed rank
Wilcoxon rank sum
Mann–Whitney U
Kruskal–Wallis
Spearman rank correlation
Kendall’s rank correlation
Logistic regression
Cohen’s kappa
Table 2. Number of ordinal outcome scales reported in 62 veterinary
dermatology journal articles (January 2003 to June 2006)
Scale type NRS VAS
Lesion severity 122 4
Pruritus severity 14 5
Global assessment 12
Lameness 5
Pain 3
Intradermal reactivity 3
Lesion extent 2
Medication score 2
Cellular infiltrate 2
Fungal culture 1
Wood’s light 1
Fluorescence intensity 1
Radiographic improvement 1
Total 167 11
NRS, numerical rating scale; VAS, visual analogue scale.
Statistical method Appropriate Inappropriate None
Presentation 40% (n = 16) 57.5% (n = 23)* 2.5% (n = 1)
Mean, without median 42.5% (n = 17)
Ad hoc composite scale 37.5% (n = 15)
Analysis 65% (n = 26) 22.5% (n = 9) 12.5% (n = 5)
t-test 7.5% (n = 3)
ANOVA 15.0% (n = 6)
*Nine articles used two inappropriate presentation methods.
Table 3. Frequency of appropriate and
inappropriate presentation and analysis
methods for ordered categorical data in
40 veterinary dermatology journal articles
during a 42-month period (January 2003 to
June 2006)
© 2007 The Authors. Journal compilation © 2007 ESVD and ACVD.
263
Statistical methods in veterinary dermatology
There was no difference in the median JIF between groups
of articles containing appropriate presentation, appropriate
analysis, inappropriate presentation, or inappropriate analysis
methods of ordered categorical data (median JIF = 1.2 for
all groups). This value corresponded to the journal
Veterinary
Dermatology
, which published more articles (
n
= 18)
included in the study than any other journal. No significant
differences were observed between the median number
of citations to articles with appropriate or inappropriate
data presentation or analysis methods (
P
= 0.60 and 0.93,
respectively, data not shown).
The type I error rates (
α
) of the simulation data analyses
at the nominal 0.05 level for the
t
-test and exact Wilcoxon
rank sum test are displayed in Table 4. The confidence
interval of the
t
-test does not encompass the nominal level
(0.05). Because each test was computed on the same set
of data, the main diagonal of Table 5 indicates that the tests
gave the same conclusion. We tested the off-diagonal
elements to determine if the two methods of analysis
gave different conclusions. In 105 (69 + 36) out of 10 000
iterations (1%), the
t
-test and exact Wilcoxon rank sum
test led to different conclusions at the nominal 0.05 level
(
P
= 0.0013). An example of an iteration that generated a
noteworthy difference in
P
-values when analysed with the
two methods is depicted in Fig. 1.
Discussion
This study confirms that ordered categorical scales are
frequently used and relatively frequently inappropriately
presented or analysed in the veterinary dermatology litera-
ture. According to our results, articles with inappropriate
presentation and analysis methods of ordinal data appear
in the veterinary dermatology literature with frequencies
similar to those reported in several disciplines of the medical
literature.
4
The inappropriate treatment of ordinal data could have
several explanations. Most probably, the authors may not
have the statistical training to understand the limitations of
ordinal data. With the availability of user-friendly statistical
software programs the authors may neglect to obtain
advice of a statistician or someone knowledgeable in
statistical analyses. Based on our results, it is possible that
researchers may consult statisticians regarding analysis
but not presentation methods, which would explain the
higher frequency of inappropriate presentation methods
compared to analysis methods. Another explanation may
be that researchers may emulate previously published
statistical methodology under the erroneous assumption
that the ordinal data therein were appropriately presented
and analysed. Finally, authors may rely on the peer-review
process to expose statistical errors, although journal
guidelines emphasize the authors’ ultimate responsibility.
We found 13 types of outcome phenomena measured
with ordered categorical scales. NRS were the most
common type of ordinal scale in the articles examined,
used most frequently for evaluation of lesion severity.
When composite scales were used, these were often
summed in an ad hoc manner, without appropriate validation
of the resulting composite scale. Validation and reliability
testing of composite scales is an area which should receive
more attention in the field of veterinary dermatology,
given their frequent usage. Unless the composite scale
has been validated, authors should consider presenting
and analysing each scale used separately.
Table 4. Type I error rates (α) for the analyses of 10 000 iterations of
simulated numerical rating scale comparisons at the nominal 0.05
level
Test
t-test Exact Wilcoxon rank sum test
0.0553 (0.0510, 0.0600) 0.0520 (0.0478, 0.0565)
Mean (95% confidence interval).
Table 5. Agreement of appropriate (exact Wilcoxon rank sum test)
and inappropriate (t-test) analyses of 10 000 iterations of numerical
rating scale comparisons
Level = 0.05
Exact Wilcoxon
rank sum test
Do not reject Reject
t-test Do not reject 9411 36
Reject 69 484
McNemar’s P-value = 0.0013.
Figure 1. (a) Appropriate (median, interquartile range, range, exact Wilcoxon rank sum P-value) and (b) inappropriate (mean, standard error of the
mean, t-test P-value) presentation and analysis methods of identical pruritus numerical rating scale data for hypothetical control and treatment
groups.
264
© 2007 The Authors. Journal compilation © 2007 ESVD and ACVD.
Plant et al.
Although not classified as an inappropriate presentation
method in this study because of the lack of specific
guidelines, the graphical depiction of data points with
connecting lines is best reserved for interval scale data and
should be avoided when presenting ordered categorical
data, for which an interval relationship may not exist.
18
One purpose of depicting lines in a graph is to emphasize
slopes and trends, but these may have little meaning for
ordered categorical data. In fact, a precise interval measure-
ment of a phenomenon (continuous data) under identical
study conditions might produce a line with a very different
slope than when measured with an ordinal scale. On the
other hand, lines may help direct the reader’s eye between
related points, where this relationship might otherwise
be unclear. Authors should consider whether the clarity
gained by connecting ordinal data points outweighs the
potential to make an inference based on the magnitude of
the difference between them. Inappropriate graphical
presentations have a great potential to mislead the reader
because of their strong visual impact.
25,28
It is unclear whether or not unjustified conclusions were
drawn because of inappropriate analysis in the articles
evaluated in this study, as none with inappropriate analysis
reported raw data that would have allowed repeating
the analysis appropriately. Determining whether or not
unjustified conclusions were drawn in articles classified as
having appropriate analyses was beyond the scope of this
study. In order to estimate the likelihood that inappropriate
analysis methods would support unwarranted conclusions,
a statistical simulation was carried out comparing the use
of nonparametric (appropriate) and parametric (inappropriate)
analysis methods for ordinal scale data. Nonparametric
tests appropriate to ordinal data are less powerful than
inappropriate parametric tests, such as
t
-tests, and may
produce different
P
-values on a given set of data. A lower
statistical power represents a lower probability in rejecting
a false null hypothesis. The inappropriate use of the
t
-test
on our simulated NRS data sets led to a type I error rate
(false positive) confidence interval which did not encom-
pass the nominal level (0.05), indicating that the
t
-test
rejected the null hypothesis more often than it should
have. An increased type I error rate is cause for concern
because incorrect conclusions about treatments are made.
Researchers would too often conclude that two treatment
groups were significantly different when in fact there was
no difference. Although this occurred in less than 1% of
the iterations, a situation such as the one depicted in Fig. 1
demonstrates that the impact of inappropriate analysis on
the resulting conclusion could be large.
We found no significant difference in the median impact
factor for the source journal between articles that presented
or analysed ordinal data appropriately and those that did
not. Our results suggest that articles that inappropriately
presented or analysed ordinal data were not published in
less frequently cited journals than articles with appropriate
presentation and analysis methods. Neither was there a
significant difference between the median numbers of
citations to articles with appropriately or inappropriately
reported or analysed ordinal data. The relatively high pro-
portion of articles published in a single journal, the small
sample size and the resulting low power may explain why
we did not detect a difference in either JIF or number of
citations per article. The source JIF and number of citations
to each article are imperfect measures of a publication’s
importance. The number of citations is influenced by the
date of publication, which we did not analyse. A comparison
of the actual number of citations to the expected citation
rate may have proven more informative; however, the
recent time period evaluated and the limited availability of
the expected citation rate precluded the application of this
approach.
Unless precise interval or ratio level measurements of
pruritus and lesion severity become practical, ordered
categorical scales will continue to be widely used by
researchers in the field of veterinary dermatology. Appro-
priate presentation and analysis methods for ordinal data
can readily be achieved with the assistance of statistical
software and a statistician. Failure of authors to follow
established guidelines for the presentation and analysis of
ordered categorical scales may result in the reporting of
unjustified conclusions. Finally, it is important that journal
reviewers and editors are familiar with appropriate statistical
methods to ensure the utmost quality of scientific studies
published in their journals.
References
1. Iwasaki T, Hasegawa A. A randomized comparative clinical trial of
recombinant canine interferon-gamma (kt-100) in atopic dogs
using antihistamine as control. Veterinary Dermatology 2006; 17:
195–200.
2. Steffan J, Horn J, Gruet P et al. Remission of the clinical signs of
atopic dermatitis in dogs after cessation of treatment with
cyclosporin A or methylprednisolone. Veterinary Record 2004;
154: 681–4.
3. Jakobsson U. Statistical presentation and analysis of ordinal data
in nursing research. Scandinavian Journal of Caring Sciences
2004; 18: 437–40.
4. Jakobsson U, Westergren A. Statistical methods for assessing
agreement for ordinal data. Scandinavian Journal of Caring Sciences
2005; 19: 427–31.
5. LaValley MP, Felson DT. Statistical presentation and analysis of
ordered categorical outcome data in rheumatology journals.
Arthritis and Rheumatism 2002; 47: 255–9.
6. Avram MJ, Shanks CA, Dykes MH et al. Statistical methods in
anesthesia articles. An evaluation of two American journals
during two six-month periods. Anesthesia and Analgesia 1985;
64: 607–11.
7. Stevens SS. On the theory of scales of measurement. Science
1946; 103: 677–80.
8. Petrie A, Watson P. Statistics for Veterinary and Animal Science.
Oxford: Blackwell Science, 1999: 4.
9. Dunn G, Everitt B. Clinical Biostatistics: An Introduction to Evidence-
Based Medicine. London: Halsted Press, 1995.
10. Triola MM, Triola MF. Biostatistics for the Biological and Health
Sciences. Boston: Pearson Addison-Wesley 2006: 7–9.
11. Montiani-Ferreira F, Cardoso FF, Petersen-Jones S. Basic
concepts in statistics for veterinary ophthalmologists. Veterinary
Ophthalmology 2004; 7: 79–85.
12. Marsella R, Nicklin CF, Saglio S et al. Investigation on the effects
of topical therapy with 0.1% tacrolimus ointment (protopic)
on intradermal skin test reactivity in atopic dogs. Veterinary
Dermatology 2004; 15: 218–24.
13. Burton G, Burrows A, Walker R et al. Efficacy of cyclosporin in the
treatment of atopic dermatitis in dogs – combined results from
two veterinary dermatology referral centres. Australian Veterinary
Journal 2004; 82: 681–5.
14. Streiner DL, Norman GR. Health Measurement Scales: a Practical
Guide to Their Development and Use, 3rd edn. Oxford: Oxford
University Press, 2003.
© 2007 The Authors. Journal compilation © 2007 ESVD and ACVD.
265
Statistical methods in veterinary dermatology
15. Steffan J, Parks C, Seewald W. Clinical trial evaluating the efficacy
and safety of cyclosporine in dogs with atopic dermatitis. Journal of
the American Veterinary Medical Association 2005; 226: 1855 –63.
16. Steffan J, Strehlau G, Maurer M et al. Cyclosporin A pharmacok-
inetics and efficacy in the treatment of atopic dermatitis in dogs.
Journal of Veterinary Pharmacology and Therapeutics 2004; 27:
231–8.
17. Saevik BK, Bergvall K, Holm BR et al. A randomized, controlled
study to evaluate the steroid sparing effect of essential fatty acid
supplementation in the treatment of canine atopic dermatitis.
Veterinary Dermatology 2004; 15: 137–45.
18. Forrest M, Andersen B. Ordinal scale and statistics in medical
research. British Medical Journal 1986; 292: 537–8.
19. Altman DG. Practical Statistics for Medical Research. London:
Chapman & Hall, 1991: 16.
20. Olivry T, Dunston SM, Rivierre C et al. A randomized controlled
trial of misoprostol monotherapy for canine atopic dermatitis:
Effects on dermal cellularity and cutaneous tumour necrosis
factor-alpha. Veterinary Dermatology 2003; 14: 37– 46.
21. Olivry T, Rivierre C, Jackson HA et al. Cyclosporine decreases
skin lesions and pruritus in dogs with atopic dermatitis: a blinded
randomized prednisolone-controlled trial. Veterinary Dermatology
2002; 13: 77–87.
22. Olivry T, Marsella R, Iwasaki T et al. Validation of CADESI-03, a
severity scale for clinical trials enrolling dogs with atopic dermatitis.
Veterinary Dermatology 2007; 18: 78–86.
23. Mueller RS. Treatment protocols for demodicosis: an evidence-
based review. Veterinary Dermatology 2004; 15: 75–89.
24. Olivry T, Mueller RS. Evidence-based veterinary dermatology:
a systematic review of the pharmacotherapy of canine atopic
dermatitis. Veterinary Dermatology 2003; 14: 121–46.
25. Bland M. An Introduction to Medical Statistics, 3rd edn. Oxford:
Oxford University Press, 2000.
26. Woolson RF, Clarke WR. Statistical Methods for the Analysis of
Biomedical Data, 2nd edn. New York: Wiley-Interscience, 2002:
275–88.
27. Garfield E. Journal impact factor: a brief review. Canadian Medical
Association Journal 1999; 161: 979–80.
28. Wainer H. How to display data badly. The American Statistician
1984; 38: 137–47.
Résumé
Les résultats cliniques qui sont difficiles à mesurer directement sont souvent côtés avec des
scores ordinaires dans la littérature vétérinaire dermatologiuque, pour mimer une évaluation objective. Les
scores cliniques catégoriels nécessitent une présentation statistique et des méthodes analytiques compatibles
avec la structure des données. Le but de cette étude était de déterminer la fréquence des présentations
inadéquates et des méthodes inadaptées dans la littérature vétérinaire dermatologique. Un total de 62 articles
publiés entre janvier 2003 et juin 2006 dans 16 journaux ont été inclus dans cet essai. La présentation et
l’analyse des méthodes des données catégorielles ont été classées comme appropriées ou inappropriées
en se basant sur les données publiées. Quarante articles (64.5%) utilisaient la médiane de quatre échelles
(variation 1–13). Une présentation inappropriée des méthodes des données catégorielles a été identifiée
dans 23/40 articles (57.5%). Ceci incluait des statistiques inappropriées (
n
= 17) et la somme de scores ad
hoc numériques (
n
= 15). Des techniques analytiques Inappropriées étaient utilisées dans 9/40 articles
(22.5%). Ceci incluait l’utilisation inadéquate des t-tests (
n
= 3) ou de l’analyse de la variance (ANOVA,
n
= 6).
La fréquence de présentation inadéquate et d’analyse inappropriée des données catégorielles dans la
littérature vétérinaire est la même que celle rapportée chez l’homme. Afin de diminuer le risque de
conclusions érronées les auteurs devraient suivre les données établies pour la présentation et l’analyse des
données des échelles catégorielles.
Resumen
Los resultados clínicos que son difíciles de medir directamente son gradados con frecuencia
en escalas ordinales en la literatura médica veterinaria para aproximarse a una valoración objetiva. Las escalas
categóricas ordenadas requieren una representación estadística y un análisis de los métodos consistente
con la estructura de datos. El objetivo de este estudio fue determinar la frecuencia de presentaciones
y análisis de los métodos de datos categóricos ordenados no adecuados en reciente literatura médica
dermatológica veterinaria. Un total de 62 artículos publicados entre el 1 de Enero del 2003 y el 30 de Junio
del 2006 en 16 revistas publicaron escalas categóricas y se incluyeron en el estudio. La presentación del
análisis de los métodos de datos ordenados categóricos se clasificó como apropiada o inapropiada basados
en recomendaciones publicadas. Cuarenta artículos (64.5%) utilizaron una media de cuatro escalas ordinales
(rango 1–13). Una presentación no apropiada de los métodos de datos categóricos ordenados se identificó
en 23/40 artículos (57.5%). Estos incluyeron la publicación de resúmenes estadísticos no adecuados (
n
= 17)
y la suma de escalas con datos numéricos predeterminados (
n
= 15). Se utilizaron métodos analíticos no
apropiados en 9/40 artículos (22.5%). Estos incluyeron el uso inadecuado de pruebas t (
n
= 3) y del análisis
de varianza (ANOVA,
n
= 6). La frecuencia de presentaciones y análisis de métodos de datos categóricos
ordenados no apropiados es similar a la publicada en varios campos de la literatura medica humana. Para
reducir la posibilidad de llegar a conclusiones o implicaciones erróneas con respecto a datos ordinales los
autores deben seguir las pautas establecidas para los métodos de presentación y de análisis de escalas de
datos categóricos ordinales.
Zusammenfassung
Klinische Ergebnisse, die direkt schwierig zu messen sind, werden in der veter-
inärdermatologischen Literatur oft mittels Ordinalskala eingestuft, um in etwa einer objektiven Evaluierung
zu entsprechen. Geordnete kategorische Skalen bedürfen einer statistischen Präsentation sowie Analyse-
methoden, die vereinbar sind mit der Struktur der Daten. Das Ziel dieser Studie war es, die Häufigkeit einer
ungeeigneten Präsentation und unzulänglicher Analysemethoden für geordnete kategorische Daten in der
jüngsten veterinärdermatologischen Literatur festzustellen. Insgesamt wurden 62 Artikel, die zwischen 1.
Jänner 2003 und 30. Juni 2006 in 16 Magazinen publiziert worden waren und kategorische Skalen beschrieben,
in diese Studie inkludiert. Die Präsentation und Analysemethoden der geordneten kategorischen Daten
266
© 2007 The Authors. Journal compilation © 2007 ESVD and ACVD.
Plant et al.
wurden, basierend auf publizierten Empfehlungen, als angemessen oder unangemessen eingestuft. Vierzig
Artikel (64.5%) verwendeten einen Median von vier Ordinalskalen (Spannweite 1–13). Unzulängliche
Präsentationsmethoden für geordnete kategorische Daten wurden in 23/40 Artikeln (57.5%) identifiziert.
Diese inkludierten die Beschreibung einer unpassenden statistischen Auswertung (
n
= 17) und die Summierung
von ad hoc numerischen Bewertungsskalen (
n
= 15). Unzureichende analytische Methoden wurden in 9/40
Artikeln (22.5%) verwendet. Diese beinhalteten eine ungeeignete Verwendung des t-Tests (
n
= 3) und der
Varianzanalyse (ANOVA,
n
= 6). Die Häufigkeit unzulänglicher Präsentation und Analysemethoden für geordnete
kategorische Daten ist in der veterinärdermatologischen Literatur ähnlich wie jene, die auf verschiedenen
Gebieten der humanmedizinischen Literatur publiziert wurde. Um die Wahrscheinlichkeit zu vermindern, im
Bezug auf ordinale Daten ungerechtfertigte Implikationen oder Schlüsse zu ziehen, sollten die Autoren
bestehende Richtlinien für die Präsentations- und Analysemethoden für geordnete kategorische Skalen
befolgen.
xxxxxxxxxxxxxxx