ArticlePDF Available


In the second in the series, Professor Ruud Halfens and Dr Judith Meijers give an overview of statistics, both descriptive and inferential. They describe the first principles of statistics, including some relevant inferential tests.
Back to basics: an introduction
to statistics
In the second in the series, Professor Ruud Halfens and Dr Judith Meijers give an
overview of statistics, both descriptive and inferential. They describe the rst
principles of statistics, including some relevant inferential tests.
escriptive statistics are used to describe
data that have been collected, such as
the number of wounds in a hospital or
the number of patients with diabetes.
However, in many instances you will
want to draw conclusions beyond the specic data
you have collected. For instance, if you have found
that a specic wound treatment is effective in your
hospital, you will probably want to generalise this
conclusion to the whole population of wounds. In
that case, the hospital’s data would be considered
to be a sample of a bigger population. Inferential
statistics make this possible.
In short, descriptive statistics describe what is
going on in a data set and inferential statistics make
it possible to generalise beyond the data observed.
Descriptive statistics
To describe the data collected, there are two essen-
tial concepts: variables and frequency distributions.
These are characteristics of the population under
study, for example, gender, age, body mass index
(BMI) and the number or colour of wounds. Vari-
ables can be measured according to four different
measurement levels.
lNominal measurements are the lowest level of
measurement. They involve assigning numbers to
classify characteristics into categories (such as
males and females). Although numbers are used in
nominal measurements, they cannot be treated
mathematically. For instance, it makes no sense to
calculate the average gender of a sample; however,
a category’s frequency can be stated (percentage of
the sample that is male).
l Ordinal measurements are the next level of
measurement, in which the characteristics are
ordered according to some criteria, such as the
classication of pressure ulcers (PU), which are
ordered according to their severity. Although the
data is arranged in a specic order (PU categoryII
is more severe than PU categoryI), the order does
not say anything about the difference in severity
between the categories (the difference in severity
between categories I and II is not the same as
between categoriesII andIII).
lInterval measurements allow for the degree of
difference between measurements; that is, those
where the difference between the categories is
the same. A classic example is temperature on the
Celsius scale: 25°C is 5°C warmer than 20°C, which
is 5°C warmer than 15°C. However, 20°C is not
twice as warm as 10°C. This is because the zero is
arbitrarily dened and is not an absolute value.
lRatio measurements are the highest level of meas-
urement. All mathematical calculations are possible
at this level. For example, age or number of wounds
both have an absolute zero, which makes it possible
to say that two wounds are twice one wound, as well
as allowing for the degree of difference.
Furthermore, a distinction has to be made between
dependent and independent variables. The depend-
ent variable is usually the variable that the researcher
is interested in, while the independent variable is the
one that the researcher expects to inuence the
dependent variable. The independent variable is also
known as the manipulated or treatment variable.
After data are obtained, they can be summarised in
several ways. First, the frequency distribution of the
variables can be explored to give an overview of the
data. It is especially important to look at the shape
of the distributions for interval and ratio variables.
Some distributions are found so frequently that
they have special names. A normal distribution
means that the scores are clustered near the middle
of the range of observed values and there is a gradual
and symmetric decrease in frequency in both direc-
tions away from the middle area (Fig1). Examples of
a normal distribution are height and intelligence.
Another distribution shape is the skewed distri-
bution, which means that scores are clustered more
to the gure’s left side (negative skew; Fig2a) or
right side (positive skew; Fig2b). An example of a
positively- skewed distribution is income—most
people have a low to moderate income, while
R.J.G. Halfens,1 PhD,
Associate Professor;
J.M.M. Meijers,1
RN, PhD;
1 Department of Health
Services Research, School
for Public Health and
Primary Care (CAPHRI),
Maastricht University,
the Netherlands.
Email: r.halfens@
relatively few people have a high or very-high
income. Age at death, on the other hand, is an
example of a negatively skewed distribution because
most people die at an older age.
Using this sort of frequency distribution is a good
way to get insight into the data and to clarify pat-
terns. However, it is impossible to make a frequency
table or gure for all variables, so it is better to
summarise the data into one score per variable.
Calculating the average (central tendency) can do
this. The mean is the most commonly used average
measurement; it is calculated by dividing the sum
of the scores by the number of scores. Other meas-
ures of central tendency are the mode and the
median. The mode is simply the most common
score, while the median is the middle value of a set
of scores arranged in numerical order.
For example, assuming you have data from nine
patients with wounds: four patients have one
wound, three have two wounds, one has three
wounds and one has nine wounds. The mean would
be 2.4 (22/9), the median would be 2
(1,1,1,1,2,2,2,3,9) and the mode would be 1. This
shows that the mean is inuenced by one extreme
score; however, the median is a more stable (robust)
measure, which is not inuenced by extreme scores.
The mode shows that the distribution is very
skewed—most patients have only one wound.
Although most researchers only present a variable’s
mean, as it is versatile, also presenting the mode
and the median would give more information about
the frequency distribution.
In addition to the average, another important char-
acteristic of a variable is the variability. As shown
in the examples above, a variable’s scores always
vary. The variability of scores can be expressed in
several indexes; the most common are the range
and the standard deviation.
The range is simply the highest score minus the
lowest score, so the range in the above wounds
example is 8 (9–1). However, range is an unstable
characteristic, as it depends on extreme scores.
A better index for variation is the standard devia-
tion (often abbreviated as SD). This is an indication
of the average amount of deviation of the scores from
the mean. Just like the mean, the standard deviation
is calculated based on all scores. Sometimes the vari-
ance is used instead, which is simply the square of
the standard deviation (SD2). The standard deviation
can be interpreted as the average deviation from the
mean, to either side. That does not mean that all
scores lie within one standard deviation (± SD). Based
on a normal distribution, it is assumed that 68% of
the cases fall within ± SD of the mean, while 95% fall
within ±2·SD and 98% within ±3·SD (Fig1). In a sam-
ple with a mean of three wounds and a standard
deviation of1, 68% of the sample would have a score
between two and four wounds.
Inferential statistics
After you have described the data, more conclusions
can be drawn. Most studies only measure a sample
of a population, but you may want to generalise
conclusions to a bigger population. Inferential
statistics can be used to make an educated guess
about a population’s characteristics.
A statistical inference can be made based on a
sample’s characteristics. There are different types of
samples, such as a probability sample, a simple
random sample, a stratied sample or a systematic
sample. Discussing all the types of samples is
beyond the scope of this article, but it is important
to realise that a sample must be suitable for the goal
Fig 2. Negative (a) and positive (b) skewed distributions
a b Positive skew
Negative skew
Fig 1. The normal (Gaussian) distribution
σ = standard deviation
–3σ –2σ –1σ 0 1σ 2σ 3σ
mean, median, mode
Further reading:
lPolit, D.F, Beck, C.T.
Nursing Research,
Generating and Assessing
Evidence for Nursing
Practice (9th edn).
Wolters Kluwer Health/
Lippincott Williams &
Wilkinson, 2012.
lHuck, S.W. Reading
Statistics and Research
(6th edn). Pearson, 2012.
of the study. For instance, if you want to say some-
thing about the frequency of a characteristic in a
population, you need to have a representative sam-
ple of the population; however, if you want to draw
a conclusion about relationships, it is more impor-
tant that all possible scores on each variable are
available in the sample.
Special attention needs to be given to non-
response within a study. Do the reasons for non-
response inuence the aim of the study? For exam-
ple, if you invite older people to come to a research
institute for a study about mobility, you will clearly
miss a lot of people who are not mobile.
A sample is never an exact copy of the popula-
tion. Each time you extract a sample of a population
it will have slightly different characteristics. When
extracting an innite number of samples from a
population, the distribution of the mean of the
characteristic under study will follow the normal
frequency distribution. As was stated earlier, 68% of
scores will fall between ±SD, so a randomly drawn
sample has a 68% chance of falling between ±SD of
the population mean, but what does this mean?
Condence intervals
When we have found a mean score of a characteris-
tic of a sample (such as the number of wounds),
then we want to make an inference of the mean in
the total population. Since samples are different, it
is clear that we cannot generalise the sample’s mean
score to the population. A condence interval (CI)
can be used to show within which interval the pop-
ulation’s mean score probably will fall. Most
researchers use a CI of 95%.
Using CI allows you to go one step further.
Researchers often look at differences (such as com-
paring two wound treatments, A and B). Using treat-
ment A, the mean wound healing time was 30days
and using treatment B it was 40days. Generalising
this score to the whole population depends on the
CI of the difference between both treatments. If the
mean difference is 0, it suggest there is no difference
between the two treatments. Therefore, if 0 falls
within the agreed CI, it can be concluded that there
is no signicant difference. However, when 0 lies
outside the CI, researchers will conclude that there
is a statistically signicant difference.
By using a CI of 95%, researchers accept that there
is a 5% chance that they made a wrong decision.
Furthermore, it is important to realise that statistical
signicance is not that same thing as an important
or clinically-signicant difference. The greater a
sample size, the easier it is for a difference to become
statistically signicant; for example, a difference in
body mass index (BMI) of 0.5kgm–2 (21.5kgm–2 vs
22.0kgm–2) may be statistically signicant when a
researcher has data from more than 1000 patients,
but it has no clinical value.
Statistical tests
Researchers use statistical tests to calculate whether
differences are statistically signicant. There are two
broad classes of tests—parametric and non-paramet-
ric tests. Parametric tests require several assump-
tions, for instance the data must be normally dis-
tributed. The assumptions used in non-parametric
tests are less strict, so variables that are not normally
distributed can still be used; however, they are less
precise than parametric tests, so it is generally rec-
ommended to use parametric tests with large sam-
ple sizes, even if not all assumptions are fullled.
l Categorical (nominal) tests This category of
tests can be used when the dependent, or outcome,
variable is categorical (nominal), such as the dif-
ference between two wound treatments and the
healing of the wound (healed versus non-healed).
One of the most used tests in this category is the
chi-squared test (χ2). The chi-squared statistic is
calculated by comparing the differences between
the observed and the expected frequencies. The
expected frequencies are the frequencies that
would be found if there was no relationship
between the two variables. Based on the calculated
χ2 statistic, a probability (p-value) is given, which
indicates the probability that the two means are
not different from each other. As discussed above,
researchers are often satised if the probability is
5% or less, which means that the researchers would
conclude that for p < 0.05, there is a signicant dif-
ference. A p-value 0.05 suggests that there is no
signicant difference between the means.
l Continuous tests This category of tests can be
used when the dependent variable is continuous
(interval and ratio measurements). One of the most
used tests in this category is the Student’s t-test.
This t-test can be used to test differences between
two groups (t-test for independent groups) or
between two measures of the same person (paired
t-test). For instance, a t-test can be used to compare
the effect of two wound treatments on the duration
of healing (in days). The test calculates a t-value,
which can be reduced to the probability (p-value)
that the two means of duration are not different
from each other. With p < 0.05, the researcher can
conclude that the two treatments require a dif-
ferent number of healing days.
lGroups of measurements A t-test is used for two
groups or measurements; when we want to analyse
more than two groups or measurements, we need to
use another statistic, the F-ratio, which is calculated
with an analysis of variance (ANOVA). Several forms
of ANOVA exist, such as the one-way and multi-
factor ANOVAs. The one-way ANOVA tests the rela-
tionship between one categorically independent
variable (different groups/interventions) and one
continuous (interval/ratio) variable. For example, it
can be used to compare the relationship between
the use of three wound treatments and the time
taken for the wound to heal. The ANOVA analysis
results in an F-value, which can be translated into a
p-value. If the p-value is less than 0.05 or 0.01, it can
be concluded that there is a difference between the
treatments and the duration. However this does not
tell you which treatment is better than the others;
that would require a post hoc test, which analyses
the differences between the treatments.
lLinear regression analyses The last category that
we will discuss here are the tests where both the
independent and the dependent variable are
continuous. Suppose you want to know if age is
related to the duration of healing, you could use a
t-test by dividing age into two groups or make an
ANOVA by dividing it into three groups. However,
it is better to use the independent variable as a con-
tinuous variable and calculate the relationship with
linear regression analyses.
Linear regression analyses describe the relation-
ship between both variables as a linear line. The
regression analysis tests whether there is a relation-
ship (in our example, how many days the duration
of healing of the wound will increase with each year
of age). For example, an unstandardised coefcient
of0.3 rst suggests that there is a positive relation-
ship between the two variables, and also shows that
with each year of age the mean duration of healing
is prolonged by 0.3 days. However, the unstandard-
ised coefcient (which must be independent) is not
useful for comparing the relevance of the depend-
ent variables. For this we need the standardised
regression coefcient, which can be compared
between the variables. When we take the square of
the standardised coefcient, it tells us the propor-
tion of explained variance in the duration of heal-
ing by age (how much of the variability of duration
can be understood by the variability of age). This
square of the standardised regression coefcient is
also called Pearson’s correlation.
Nowadays more and more advanced and sophisticat-
ed analyses are used, but these are beyond the scope
of this article. Here, we described the more simple
analyses that readers are likely to be confronted with
and hope this will help them interpret the results in
presented articles. n
... Categorical or nominal data was gathered to identify the type of institution, geographic location, number of fulltime students, and specific privatized services (Creswell, 2009;Patten 2005). This data was used primarily for classification purposes as it does not permit rigorous statistical tests (Halfens & Meijers, 2013). The most common statistical test for nominal data is chi squared (χ 2 ), which is calculated by comparing the differences between observed frequencies to expected frequencies (Fisher & Marshall, 2008;Halfens & Meijers, 2013). ...
... This data was used primarily for classification purposes as it does not permit rigorous statistical tests (Halfens & Meijers, 2013). The most common statistical test for nominal data is chi squared (χ 2 ), which is calculated by comparing the differences between observed frequencies to expected frequencies (Fisher & Marshall, 2008;Halfens & Meijers, 2013). Measurements of central tendency for nominal data consists of the mean, median, and mode, and presentation of nominal data is typically in a contingency table format (Fisher & Marshall, 2008). ...
... The analysis for survey question one, two, and three included frequency tables containing institutional size, regional geographic location, and type of management by the specific service areas (Fink, 2006;Halfens & Meijers, 2013;Patten, 2005). ...
Full-text available
The dynamics of higher education funding present unique challenges and opportunities for administrators. One method university administrators employ to contain expenses and provide additional revenue is privatization of academic and non-academic services. The purpose of this quantitative study was to investigate the specific factors considered in a decision to privatize bookstore and/or dining service operations, and perceptions about whether the post-privatization decision met pre-privatization expectations. Gordon’s (2019) Privatization Decision Framework was created based upon existing research and then used to develop survey questions. Twelve pre-privatization decision factors, nine post-privatization contracted relationship expectation factors, overall satisfaction with the privatization decision, and a privatization decision reflection were utilized to answer the research questions. An online survey instrument collected data from 140 auxiliary services professionals at public, four-year universities across the United States, representing 45.0% of such institutions who are members of the National Association of College and Auxiliary Services; responses were proportional to the regional membership of this organization. Full-time equivalents (FTEs) ranged from 500 to 110,000 students with a mean of 19,642 students. Overall, over half of the university respondents 79 (56.4%) are contracting their bookstore operations, and satisfaction with the bookstore contractor’s performance generally met expectations with a mean of 3.87 (out of a five point scale with five being greatly exceeded expectations). The top areas of satisfaction included: transfer of inventory costs carried by the contractor, management specialization/expertise, and transfer of risk externally. Most respondents (85.5%) also indicated satisfaction by noting a strong preference to contract with the same bookstore contractor if the decision could be made again. One half of university respondents indicated that their dining services operations are under contracted management. Overall satisfaction with the dining services contractor’s performance generally met expectations with a mean of 3.54 (on a five point scale). The highest areas of satisfaction were: management specialization/expertise, transfer of risk externally, and external capital. Most respondents (73.0%) indicated overall satisfaction with a strong preference to contract with the same dining services contractor if the decision could be made again. In addition to overall satisfaction, respondents were asked to indicate the level to which their contractor met their pre-privatization goals related to nine expectation factors, and all factors for both bookstore and dining services contracts were rated as at least generally meeting expectations. Six of the nine post-privatization expectation factors had a significant difference between the bookstore and dining services operations in the factors of: external capital for renovation or facilities construction, inventory costs carried by the contractor, customer service/quality improvements, external legal pressure, human resources/staffing issues, and management specialization/expertise; for all such factors, the bookstore contractor yielded higher levels of satisfaction.
... The quantitative data analysis was performed on the questionnaires from the students with descriptive statistics [15]. Continuous variables are presented with mean and standard deviation (SD) and categorical variables as n (percent). ...
Full-text available
The operating room is a challenging learning environment for many students. Preparedness for practice is important as perceived stress and the fear of making mistakes are known to hamper learning. The aim was to evaluate students' perspectives of an e-learning resource for achieving preparedness. A mixed methods design was used. Students (n = 52) from three educational nursing and medical programs were included. A questionnaire was used to explore demographics, student use of the e-learning resource, and how the learning activities had helped them prepare for their clinical placement. Five focus group interviews were conducted as a complement. Most students (79%) stated that the resource prepared them for their clinical placement and helped them to feel more relaxed when attending to the operating room. In total, 93% of the students recommended other students to use the e-learning resource prior to a clinical placement in the operating room. Activities containing films focusing on practical procedures were rated as the most useful. We conclude that an e-learning resource seems to increase students' perceived preparedness for their clinical practice in the operating room. The development of e-learning resources has its challenges, and we recommend student involvement to evaluate the content.
SamenvttingKernpuntenBij ouderen met een slecht genezende wond is de wondgenezing vaak verstoord door diabetes, vaatstoornissen, een dunne huid of andere comorbiditeit.Het uitgangspunt bij wondbehandeling is: ‘eenvoudig en goedkoop als het kan, complex en duur als het moet’.Voor de behandeling van een nieuwe wond gelden zeven tips:primair gesloten wonden niet reinigen;acute open wonden reinigen met schoon kraanwater;de WHO-pijnladder gebruiken voor de keuze van analgetica;lidocaïne of prilocaïne geven als lokale pijnbestrijding bij manipulaties;primair gesloten wonden niet bedekken met verbandmateriaal;gebruik simpele bedekkers voor open wonden;geef de patiënt heldere instructies mee.Bij chronische wonden is het belangrijk om onderscheid te maken tussen rode, gele en zwarte wonden.
Medical dosimetry education occupies a specialized branch of allied health higher education. Noted international shortages of health care workers, reduced university funding, limitations on faculty staffing, trends in learner attrition, and increased enrollment of nontraditional students force medical dosimetry educational leadership to reevaluate current admission practices. Program officials wish to select medical dosimetry students with the best chances of successful graduation. The purpose of the quantitative ex post facto correlation study was to investigate the relationship between applicant characteristics (cumulative undergraduate grade point average (GPA), science grade point average (SGPA), prior experience as a radiation therapist, and previous academic degrees) and the successful completion of a medical dosimetry program, as measured by graduation. A key finding from the quantitative study was the statistically significant positive correlation between a student׳s previous degree and his or her successful graduation from the medical dosimetry program. Future research investigations could include a larger research sample, representative of more medical dosimetry student populations, and additional studies concerning the relationship of previous work as a radiation therapist and the effect on success as a medical dosimetry student. Based on the quantitative correlation analysis, medical dosimetry leadership on admissions committees could revise student selection rubrics to place less emphasis on an applicant׳s undergraduate cumulative GPA and increase the weight assigned to previous degrees.
Nursing Research, Generating and Assessing Evidence for Nursing Practice
  • D F Polit
  • C T Beck
Reading Statistics and Research (6th edn)
  • S W Huck