ArticlePDF Available

A Practical Guide for Item Generation in Measure Development: Insights From the Development of a Patient-Reported Experience Measure of Compassion

Authors:

Abstract and Figures

Background and Purpose Although various measure development guidelines exist, practical guidance on how to systematically generate items is nascent. This article provides practical guidance on item generation in measure development and the use of a Table of Specifications (TOS) in this process. Methods In addition to a review of the literature, the item generation process within an ongoing study to develop a valid and reliable patient-reported measure of compassion is provided. Results Consensus on an initial pool of 109 items and their response scale was achieved with the aid of a TOS. Conclusions Dynamic, experiential, and relational care constructs such as compassion lie at the heart of nursing. Practical guidance on item generation is needed to allow nurses to identify, measure, and improve compassion in research and practice.
Content may be subject to copyright.
Journal of Nursing Measurement, Volume 28, Number 1, 2020
A
ID:ti0005
Practical Guide for Item Generation in
Measure Development: Insights From the
Development of a Patient-Reported
Experience Measure of Compassion
Shane
ID:p0050
Sinclair, PhD
Faculty
ID:p0055
of Nursing, University of Calgary, Calgary, AB, Canada
Compassion
ID:p0055-p445
Research Lab, Faculty of Nursing, University of Calgary,
Calgary, AB, Canada
Department
ID:p0060
of Oncology, Cumming School of Medicine, University of Calgary,
Calgary, AB, Canada
Priya
ID:p0065
Jaggi, MSc
Faculty
ID:p0070
of Nursing, University of Calgary, Calgary, AB, Canada
Compassion
ID:p0070-p450
Research Lab, Faculty of Nursing, University of Calgary,
Calgary, AB, Canada
Thomas
ID:p0075
F. Hack, PhD
College
ID:p0080
of Nursing, Rady Faculty of Health Sciences, University of Manitoba,
Winnipeg, MB, Canada
Research
ID:p0085
Institute in Oncology and Hematology, CancerCare Manitoba,
Winnipeg, MB, Canada
Psychosocial
ID:p0090
Oncology and Cancer Nursing Research, IH Asper Clinical
Research Institute, Winnipeg, MB, Canada
Susan
ID:p0095
E. McClement, PhD
College
ID:p0100
of Nursing, Rady Faculty of Health Sciences, University of Manitoba,
Winnipeg, MB, Canada
Research
ID:p0105
Institute in Oncology and Hematology, CancerCare Manitoba,
Winnipeg, MB, Canada
Lena
ID:p01
10
Cuthbertson, MEd
Office
ID:p01
15
of Patient-Centred Measurement, British Columbia Ministry of Health,
Vancouver, BC, Canada
Background
ID:p0120
and Purpose: Although various measure development guidelines exist, prac-
tical guidance on how to systematically generate items is nascent. This article provides
practical guidance on item generation in measure development and the use of a Table of
Specifications (TOS) in this process. Methods: In addition to a review of the literature, the
Pdf_Folio:138
138 © 2020 Springer Publishing Company
http://dx.doi.org/10.1891/JNM-D-19-00020
Item Generation in Measure Development: A Practical Guide 139
item generation process within an ongoing study to develop a valid and reliable
patient-reported measure of compassion is provided. Results: Consensus on an ini-
tial pool of 109 items and their response scale was achieved with the aid of a TOS.
Conclusions: Dynamic, experiential, and relational care constructs such as compas-
sion lie at the heart of nursing. Practical guidance on item generation is needed to
allow nurses to identify, measure, and improve compassion in research and practice.
Keywords: measure
ID:p0125
development; item generation; table of specifications;
psychometrics; instrument construction; content validity
Item
ID:p0130
generation is an imperative step in the developmental stage of instrument construc-
tion (DeVellis, 2003; Grant & Davis, 1997; Lynn, 1986; Morgado, Meireles, Neves,
Amaral, & Ferreira, 2017; Rattray & Jones, 2007). When performed well, item gener-
ation ensures that an instrument’s items (i.e., questions; Deshpande, Rajan, Sudeepthi, &
Nazir, 2011; US Food and Drug Administration, Center for Drug Evaluation and Research,
Center for Biologics Evaluation and Research, Center for Devices and Radiological Health,
2009) accurately and comprehensively cover the construct (i.e., topic of interest; Avila
et al., 2015; Deshpande et al., 2011; FDA Center for Drug Evaluation and Research
et al., 2009; Appendix A). Despite its reputed importance, there is little by way of practi-
cal guidance and recommended best practices to enhance rigor in this critical step (Nichols
& Sugrue, 1999) of developing patient-reported measures in healthcare. As a result, the
validity of patient-reported measures utilized by nurse researchers is compromised, as is
their clinical utility and reliability in nursing practice.
Developing
ID:p0135
a robust measure involves a combination of both inductive and deduc-
tive processes (Cheng & Clark, 2017; Morgado et al., 2017). The iterative nature of item
generation begins with domain identification, which is an inductive process that involves
establishing a theoretical foundation of the construct of interest, based on the literature,
subject matter expert (SME) opinion, or qualitative research with the population of inter-
est (Brod & Tesler, 2009; Cheng & Clark, 2017; DeVellis, 2003; Netemeyer, Bearden,
& Sharma, 2003). In patient-reported measure development, a domain is considered to be
a sub-concept within the larger, overarching construct that is being measured (Deshpande
et al. 2011; FDA Center for Drug Evaluation and Research et al., 2009; Appendix A).
Once a theoretical foundation has been established inductively, it becomes the basis for
item generation through a deductive process, which involves utilizing the derived the-
oretical foundation as a guiding source to generate an initial pool of items (Hinkin,
1998). Although it is recognized and suggested that a combination of inductive and
deductive processes is required for item generation (Brod & Tesler, 2009; Morgado
et al., 2017), studies under-report which processes were utilized and whether a theo-
retical foundation was established to inform the development of a measure (Hinkin,
1995). As a result of this methodological gap, the International Society for Qual-
ity of Life Research (ISOQOL) group proposed that the “documentation of sources
from which items were derived, modified, and prioritized during measure develop-
ment” be considered a best practice guideline (Reeve et al., 2013, p. 9). Thus, although
guidelines have been established on numerous accounts to facilitate measure devel-
opment in general (DeVellis, 2003; FDA Center for Drug Evaluation and Research
et al., 2009; Hinkin, 1998; Mokkink et al., 2010; Rattray & Jones, 2007; Reeve
et al., 2013; Saris, 2014; Terwee et al., 2007; Valderas et al., 2008) and while it has
been noted that item generation is a fundamental step of measure development (Hinkin,
Pdf_Folio:139
140 Sinclair et al.
1995; Morgado et al., 2017), there is limited practical guidance on how to generate
items. Without guidance to facilitate the generation and revision of the initial pool of items,
and the linkage of items to their defined construct(s) and associated domain(s), the robust-
ness of the final measure may be compromised or criticized (Bowling, 1997; Brod & Tesler,
2009; Buchbinder et al., 2011; Hinkin, 1995; Nichols & Sugrue, 1999; Rattray & Jones,
2007).
A
ID:p0140
Table of Specifications (TOS) is a tool long utilized within the field of nursing edu-
cation to ensure accurate and comprehensive item coverage in test construction (Billings
& Halstead, 2016, p. 425), and can provide a rigorous and pragmatic framework to aid
test/measure developers in the item generation process. A TOS is defined as a table that
“aligns objectives, instruction, and assessment” (Fives & DiDonato-Barnes, 2013, p. 1;
Appendix A) to facilitate the construction of tests through rigorous means. Since it is not
feasible for educators to measure every facet of a course or topic, or every aspect of the
subject of interest, a TOS allows educators to identify key content and item coverage, while
ensuring that the content is comprehensible to the target population—enhancing the valid-
ity of the test as a result (Fives & DiDonato-Barnes, 2013). Nursing educators rely heavily
on a TOS as their guiding framework to inform the essential content (i.e., categories or
domains of content to be measured) during test construction (Billings & Halstead, 2016,
p. 423). Adopting this framework to develop patient-reported measures of health experi-
ence or outcome may also provide researchers with a systematic means of ensuring accurate
coverage and rigor in linking the overarching construct of interest to its associated domains,
and domains to individual items, thereby ensuring the fidelity of the final measure.
The
ID:p0145
purpose of this article is to provide practical guidance to the process of item genera-
tion and the utilization of a TOS in the healthcare context, as illustrated in the development
of a patient-reported compassion measure for patients living with an incurable, life-limiting
illness.
METHODS
Measure
ID:ti0025
Development: Process Overview
Item
ID:p0150
generation is a highly iterative process of defining, re-defining, re-visiting, refining,
and modifying a measure throughout the course of its development. In the absence of a
guiding framework, there appears to exist a quantum leap from defining the construct of
interest to the completed pool of items, with little information describing the process of
generating items from the empirical literature or a theoretical model. To address this gap,
we provide an example of how we utilized a TOS not only to ensure adequate content
coverage, but also to ensure the fidelity of proposed items to the underlying construct of
interest—compassion within healthcare. Having defined the construct of compassion and
its associated theoretical domains within the Patient Compassion Model (PCM) derived
from qualitative interviews with patients (Sinclair et al., 2016), the goal herein is to describe
our experience of and insights of this process. While we highlight the individual phases
(Figure 1) we followed to achieve our pool of items, this process can be adapted to any
measure of interest. Figure 1 illustrates the process of item generation undertaken by our
team, spanning the initial process of domain identification, generation of initial items, and
refinement up until the construction of the draft measure and subsequent assessment by
SMEs and exploratory and confirmatory factor analyses.Pdf_Folio:140
Item Generation in Measure Development: A Practical Guide 141
Figure 1. Measure
ID:p0155
development overview: Five phases of item generation.
RESULTS
Phase
ID:ti0035
1. Establishing the Scope and Purpose of the Measure Using a
Conceptual Model: The “What” and “Why” of Measure Development
An
ID:p0160
important preliminary task in establishing and confirming the scope and purpose of a
measure is to determine precisely what (i.e., content) is being measured and why. The pur-
pose of the measure needs to be solidified at the outset, as this inevitably will affect the con-
tent and types of items that are developed, their respective response scales (e.g., satisfaction
or frequency scale), and the target population (Hinkin, 1998). To determine the scope and
purpose of our proposed compassion measure, we conducted a comprehensive and critical
review of the compassion literature in healthcare (Sinclair, Russell, Hack, Kondejewski,
& Sawatzki, 2016) and conducted a large qualitative study with patients with advanced
cancer (Sinclair et al., 2016). This qualitative study informed the development of a theo-
retical PCM delineating the construct of interest and its associated domains, and their rela-
tionship with one another. The transferability of the PCM was then further verified through
qualitative interviews with noncancer patients living with an incurable and life-limiting ill-
ness, to ensure that each facet of the model was adequately represented and generalizable toPdf_Folio:141
142 Sinclair et al.
patients with varying life-limiting illnesses. In developing the PCM and assessing its trans-
ferability in other patient populations (Sinclair et al., 2018), we recognized a gap between
what patients consider an integral component of quality care and healthcare providers’
(HCPs) ability to assess and deliver it (Sinclair et al., 2016), including the absence of
a valid and reliable measure of compassion to assist researchers in studying this topic
(Sinclair et al., 2016). Thus, our primary aim was to develop a valid and reliable measure to
aid researchers in studying compassion in healthcare, and secondly to develop a clinically
informed and relevant measure to assess patients’ experiences of compassion from their
HCPs. After a number of team meetings and lengthy discussions surrounding the domains
of the PCM and the associated coding schema (containing over 600 individual codes, rep-
resenting individualized patient views on compassion derived from the qualitative study
described earlier (Sinclair et al., 2016), a two-fold purpose for the measure was agreed
upon: (a) to measure the patient’s experience of compassion based on the emanated behav-
iors (what), skills (how), and qualities (who) of the HCPs that are exemplified in care; and
(b) to determine the extent to which patients feel that the care they received was compas-
sionate.
While
ID:p0165
determining the scope and purpose of the measure may seem like an intuitive or
obvious step, investing the time and effort in this phase should not be underestimated or
determined retrospectively. This critical “a priori” phase is also essential in the develop-
ment of a TOS, which can: (a) function as a framework for deductively generating items;
and (b) control against investigator biases, by providing an important reference point for
revisiting the content that is to be measured throughout the subsequent item evaluation
phases. Additional considerations that we found important and helpful to discuss at this
phase of developing a measure included: Item tense (past vs. present); recall period of inter-
est (i.e., present day, past week vs. month vs. year); and singular vs. plural (HCP(s) to be
evaluated).
Phase
ID:ti0040
2. Developing a TOS: “How” to Best Measure the Construct of Interest
After
ID:p0170
agreeing on the scope and purpose of the measure, we developed a TOS by identify-
ing the key measurement domains of the construct of compassion, to ensure that the items
within the measure adequately covered each domain, much like ensuring that exam ques-
tions adequately cover course learning objectives. Chase (1999) identifies three steps in the
creation of a TOS: (a) choosing the [measurement] domains to be covered, (b) distilling
the [selected] domains into key content or independent parts such as concepts, terms, pro-
cedures, and applications, and (c) constructing the table itself (Chase, 1999). Having iden-
tified the measurement domains in the PCM, the second step of distilling the domains into
key content or independent parts consisted of revisiting the identified domains of the PCM
(Figure 2; Sinclair et al., 2016) and confirming their alignment to the scope and purpose of
the measure. An important consideration at this phase is determining “how” to best mea-
sure the construct of interest by deciding on the granularity of measurement. In relation to
the PCM (Figure 2), the seven domains comprising the model are the highest level of mea-
surement, with the 27 associated themes being a secondary level (Figure 3; Appendix A),
and the individualized codes within these themes being the most granular (Sinclair
et al., 2016).
As
ID:p0185-p440
a team, we also needed to decide whether compassion should be measured at the
domain, thematic, or individual code level. Additional factors to consider in determining
Pdf_Folio:142
Item Generation in Measure Development: A Practical Guide 143
Figure 2. Patient
ID:p0175
compassion model.
Source. From Sinclair, S., McClement, S., Raffin-Bouchal, S., Hack, T., Hagen, A., McConnell, S., & Chochinov,
H. (2016). Compassion in health care: An empirical model. Journal of Pain and Symptom Management, 51,
193–203. Reprinted with permission.
the level of measurement included: (a) ensuring that each level of measurement has appro-
priate content coverage, as some domains or themes may have more or less codes within
them to inform the prospective items; (b) domain or theme overlap (i.e., are there similar
themes across domains or codes across the themes and/or domains); and (c) estimating the
number of items required for exploratory factor analysis. In revisiting and reviewing the
purpose(s) of the compassion measure and our data from previous studies that informed
this process, our research team decided that the level of measurement was thematic, as we
felt confident that the themes within the PCM (Figure 3; Sinclair et al., 2016) accurately
and comprehensively depicted the key components associated with patients’ experiences of
compassion. Individual codes within these respective themes however, served a secondary
purpose by providing the team with more granularity surrounding the content of the themes
and serving as a potential source for specific measurement indicators (Appendix A) of the
themes, and to inform item wording. As a result, we reanalyzed, clustered, and collapsed
each of the qualitative codes within their respective themes, paying particular attention to
areas of overlap and/or redundancy, thereby refining our understanding of the construct
and further delineating each of the domains and their relationship to one another (Table 1).
Having
ID:p0200
solidified the scope and purpose of the measure (Phase 1), we then consid-
ered additional aspects of how to best measure the construct of interest. This involved
revisiting the purpose(s) and determining whether the measure was intended to assess the
frequency of particular facets of compassion (e.g., How often did a patient experience aPdf_Folio:143
144 Sinclair et al.
TABLE 1. Table
ID:p0190
of Specifications: Determining the Core Content of a
Patient-Reported Compassion Measure
Item Content Specific
Examples From
Data Sources
Item Difficulty
(Low, Medium,
or High)
Type of Item
(Perception vs.
Behavior)
Sample Patient
Quotes
ESSENTIAL
ID:t0005
DOMAIN #1: VIRTUOUS RESPONSE
Theme
ID:t0010
#1: Knowing the Person
TBD
ID:t0015
in
subsequent
phases of TOS
development
Understand
what patient is
going through
Sensitive to
patient’s
situation
Medium
ID:t0025
Perception “To get to
know them
and
understand a
little bit about
what they’re
going
through”
TBD
ID:t0040
Genuine
concern (1)
Concerned
Low
ID:t0050
Perception “Well, people
that are
concerned
about you, and
they look after
you, or help
you when they
can”
And
ID:t0065
so on… …
ESSENTIAL
ID:t0090
DOMAIN #2: ATTENDING TO NEEDS
Theme
ID:t0095
#1: Action
TBD
ID:t0100
Helping/doing
things for others
Low
ID:t01
10
Perception “Goodwill,
willing to help
others and a
willingness to
complete the
needs of
others”
TBD
ID:t0125
Little acts of
caring
Medium
ID:t0135
Behavior “. . . it wasn’t
an easy thing
to do, but she
arranged it
and she knew
that that
would really
brighten my
day and it did”
And
ID:t0150
so on… …
Pdf_Folio:144
Item Generation in Measure Development: A Practical Guide 145
Figure 3. Elements of the patient compassion model.
Permission to reproduce this figure was obtained from the Journal of Pain and Symptom Management,
Elsevier Inc.
compassionate behavior(s) from their HCPs?) and/or the adequacy of the HCPs behavior(s)
in keeping with the patient’s subjective experience of compassion (e.g., Were they satis-
fied with the compassionate care provided or did they agree that the care they received wasPdf_Folio:145
146 Sinclair et al.
compassionate?). Second, we considered the (a) type(s) of items (i.e., subjective percep-
tion or observed HCP behavior); (b) item difficulty (i.e., how easy/difficult it is for HCPs
to score well on the items); (c) the cognitive level of the items for the target population;
and (d) potential question types (e.g., open vs. closed ended; Dillman, Smyth, & Christian,
2014). For example, with respect to item difficulty, it is important to generate items with a
range of difficulty to ensure greater response variance, thereby mitigating the risk of floor
or ceiling effects. In the context of measuring a construct like compassion, an “easy” item is
one which the majority of HCPs are anticipated to receive high ratings, while only the most
compassionate HCPs would receive high ratings on a “difficult” item. While this phase of
creating a TOS is highly iterative and complex, it allows measure development teams to err
on the side of content comprehensiveness whil setting clear boundaries to delineate what
is considered to be “compassion” from the patient experience and what is not. Comprehen-
siveness is important, as construct underrepresentation is a challenge in measure develop-
ment (Buchbinder et al., 2011) that can be mitigated by using a TOS, thereby preventing
arbitrary or premature elimination of items.
At
ID:p0205
this phase, a subset of the measure development team developed the TOS (Table 1),
detailing the specific examples from data sources (i.e. codes), level of item difficulty, item
type, and quotes from our PCM study and examples from the literature. The TOS was then
circulated to the larger team in advance of an item generation meeting.
Phase
ID:ti0045
3. Developing Indicators for Each Measurement Domain
and Content Category
A
ID:p0210
3-day item generation meeting commenced with the team organizing specific examples
from data sources (Table 1) into indicators, paralleling the validity driven approach high-
lighted by Buchbinder et al. (2011) that involves “organizing ideas into groups that would
form the basis for the hypothesized scales to be included in the measurement tool” (Buch-
binder et al., p.2). While we recognize that different measures and measure development
teams may require different approaches, given the scope and complexity of our construct
of interest (i.e., compassion in healthcare), we found that meeting face-to-face, consecu-
tively over the course of a few days allowed us to us to generate our initial item pool in a
focussed manner, maximizing retention of information and decision recall in the process,
while also providing dedicated time for important discussions surrounding the preliminary
item generation process.
Once
ID:p0215
important decisions such as the scope and purpose of the measure, and intended
target population were agreed upon, the team developed measurement indicators, which are
observable elements used to assess the overarching construct of interest (Avila et al., 2015).
There are two types of indicators: (a) reflective or psychometric indicators, which “are a
manifestation of the same underlying construct” and are in that sense interchangeable and
(b) formative or clinimetric indicators, “which collectively form a construct,” but are there-
fore not interchangeable (Bagozzi, 2011; Fayers & Hand, 2002; Fayers & Machin, 2016;
Mokkink et al., 2010; Appendix A). A key characteristic of reflective indicators is that the
items co-vary in a consistent fashion in relation to a common overarching construct. In our
example of developing a measure of compassion, based on our theoretical PCM (Figure 2)
we determined that reflective indicators were most appropriate, as we view compassion as
a construct that is reflected in people’s experiences, but not necessarily defined or caused
by those experiences (i.e., there are other causal factor that are external to the measurement
of compassion). As a result, we generated indicators that reflected compassion, with the
Pdf_Folio:146
Item Generation in Measure Development: A Practical Guide 147
idea that each item would be indicative of the extent to which patients experience compas-
sion (Fayers & Machin, 2016). Thus, by way of a practical example, the more that patients
experience HCPs as “treating their patients as fellow human beings” the more this would
be reflective of higher compassion.
Since
ID:p0220
we aimed to measure compassion at the thematic level, we developed reflective
indicators for these themes of the PCM (Figure 3) by grouping the specific examples from
data sources (Table 1) together based on their shared attributes. While the goal of this pro-
cess was to condense the codes within the exhaustive TOS into reflective indicators, we felt
it was important to err on the side of inclusion, knowing that subsequent measure develop-
ment stages (judgment stage of content validity with SMEs and cognitive interviews with
patients, and further assessments of construct validity) will eliminate additional, poorly
performing items.
After
ID:p0225
achieving consensus on the indicators, the measure development team
consolidated them into content categories. Depending on the nature of the measure, these
content categories could represent either a set of behaviors, knowledge, skills, abilities,
attitudes, or other characteristics to be measured by a test (American Psychological Asso-
ciation [APA], 2014), which can then be utilized to generate an initial pool of items. As
an example, in developing a compassion measure, our previously identified measurement
indicators such as “helping/doing things for others; communicating information; helping
navigate the system; anticipating needs; and responding to needs” were consolidated into
a content category of “Help Me,” whereas the indicators of “providing comfort; pain and
symptom management; and gentle physical care” were consolidated into a content cate-
gory of “Relief/Comfort” (Figure 4). Further, this process of developing content categories
on the whiteboard led our team to collapse the domain of Virtues into the domain of Virtu-
ous Response (Figure 2), as we realized the high degree of overlap between the indicators
within each domain. This decision was further informed by reconfirming that the aim of
the measure was to focus on patients’ experiences of compassion, as opposed to the assess-
ment of the innate qualities of their HCPs. After determining the content categories, our
measure development team then verified that the categories were congruent with the TOS
and the preidentified measurement domains (Phase 2) to ensure adequate coverage. These
content categories became the definitive source of item generation and associated response
scaling in Phase 5.
Figure 4. Conceptual model of compassionate care: measurement domains and their content
categories.
*Level 1 = Measurement Domains (identified in Phase 2).
**Level 2 = Content Categories (derived from reflective indicators in Phase 3).
Pdf_Folio:147
148 Sinclair et al.
Phase
ID:ti0050
4. Instrument Construction: Item Generation and Response Scaling
Determination
The
ID:p0240
content categories derived in Phase 3 served as the template for the generation of a
comprehensive and detailed item pool in the form of statements or questions alongside
their variants (i.e., alternatively worded items), and their potential response scales. Prior
to generating the specific items, our team reviewed basic item development guidelines,
such as those provided by Bradburn, Wansink, and Sudman (2004); Dillman et al. (2014);
Hinkin (1998); and Streiner, Norman and Cairney (2015; Appendix B). Specifically, these
guidelines helped to ensure that the items were, concise and comprehensible to our target
population, and not double-barrelled (i.e., measuring two aspects of compassion in a single
item; Appendix B). An example of an item generated in the form of a statement from the
content category “Going the Extra Mile” (Phase 4, Figure 3) was “My Healthcare Providers
went the extra mile.” From this item we then identified the following variant items: “My
HCPs went above and beyond their job”; “My HCPs went above and beyond their call of
duty”; and “My HCPs did more than I expected.”
After
ID:p0245
carefully generating the items in the form of statements along with their possible
variants, the team turned their attention to determining response scale options. Different
scaling options can be utilized depending on the type of measure that is being constructed
(Dillman et al., 2014). Examples of types of scales include Agreement/Disagreement, Sat-
isfaction, and Unipolar or Bipolar scales, each with varying purposes. For the compassion
measure, we identified the following as possible response scales:
Agreement
ID:p0250
/Disagreement Scale
Satisfaction
ID:p0255
Scale
Unipolar
ID:p0260
Scale (anchored at “0” with each point scoring higher than the previous)
Unique
ID:p0265
Bipolar Scale (e.g., “When I needed help, I got it”: Very slowly—Slowly—Neutral—
Quickly—Very quickly or “I felt that my HCPs were”: Very cold—Cold—Neutral—Warm—
Very Warm)
Upon
ID:p0270
determining potential response scale options, we found that some items had more
than one scaling contender, such as “My HCPs were gentle with me” which could work
with an Agree/Disagree Scale or a Unique Bipolar Scale, (“My HCP(s) were . . . Very
rough with me—Rough with me—Neutral—Gentle with me—Very gentle with me). We
then proceeded to determine, item-by-item, potential scales that would be most appropriate
to the respective item. As a result of this process, we discovered that the Agree/Disagree
scale and frequency scale (i.e., Never—Sometimes—Usually—Always) was an optimal
option for every item in our pool. These generated items and response scale options that
formed the draft measure, were then reviewed by the team to finalize response scales for
each individual item.
Phase
ID:ti0055
5. Consensus and Modification of the Draft Instrument Among
the Measure Development Team
After
ID:p0275
each member of the team completed their independent review of the draft measure
and made their recommendation(s) regarding response scale options for each item, feed-
back was collated to identify and flag any items which appeared to be problematic (i.e.,
lack of clarity, suggested rewording, and points of disagreement). We decided that if any
Pdf_Folio:148
Item Generation in Measure Development: A Practical Guide 149
item was flagged, it would require further discussion, debate, and consensus among the full
team via face-to-face or a videoconference meeting before proceeding to content review by
SMEs. We felt that individual opinions from our diverse, interdisciplinary team were valu-
able and needed to be fully considered before consensus was reached, rather than adopting
a “majority rules” approach. In order to reach consensus in an efficient manner, we only
discussed items where differences of opinion existed (Table 2). After achieving consensus
on the item pool and scaling, the item pool was ready for a detailed review by SMEs in the
judgment stage of content validity along with cognitive interviews with patients in order
to further identify issues related to “clarity,” “assumptions,” and “response categories”
(Figure 1).
DISCUSSION
The
ID:p0285
complex process of item generation within measure development is nascent within
both the healthcare literature, leaving researchers with a lack of guidance on this important
and laborious process. While guidelines have been established to facilitate measure devel-
opment in general, there are no best practice guidelines on item generation specifically.
The purpose of this article was to begin to address this gap through a discussion of the use
of a TOS, a pragmatic framework to aid measure development teams in this process, and
detailing the item generation phases related to the development of a patient-reported com-
passion measure. While we caution against an overly prescriptive approach, we feel that
a guiding framework is imperative in addressing this methodological gap, enhancing the
rigor and robustness of measure development in the process.
We
ID:p0290
illustrated the process of item generation based on our ongoing study to develop
and validate a patient-reported compassion measure. In doing so, we provide readers with
a “real life snapshot” of the iterative nature of this process, which is particularly important
for experiential and dynamic constructs like compassion that often lie at the heart of both
the patient experience and high-quality nursing practice (Sinclair et al., 2016). While mea-
sure developers often indicate that items for their specific measures were generated from a
theoretical foundation, we were unable to locate any studies detailing this process within
the healthcare literature. We also found little detail and guidance related to the iterative
process of refining, redefining, and rerefining the construct of interest, domains, items, and
response scales. Further, although measure development experts argue for the necessity of
ensuring the fidelity between the content of the measure and its underlying construct (Buch-
binder et al., 2011; Morgado et al., 2017; Nichols & Sugrue, 1999), historically, measure
developers have been left to their best efforts in substantiating the link between the char-
acteristics of the items, the overarching construct and associated domains. Thus, while the
item generation process we describe herein is potentially simplistic and not overly novel, it
has evaded the purview of scholarship impeding measure development, particularly among
researchers and clinicians, who have limited practical experience in measure development.
In our experience in developing a patient-reported experience measure for compassion, we
found a TOS to be highly valuable as a systematic framework for identifying what is to be
measured and how it is to be measured, while accounting for the relationship between the
key measurement “domains” that compose the overarching construct of interest.
Based
ID:p0295
on our experience, creating a TOS prior to item generation facilitates imperative
discussions, debate, and decisions among key stakeholders, which when absent can result
in unnecessary disagreements on the purpose and scope of the measure, and ultimately anPdf_Folio:149
150 Sinclair et al.
TABLE 2. Sample
ID:p0280
Draft of Initial Measure and Instructions for Measure Development Team Review
Column
ID:t0175
A: Column B: Column C: Column D: Column E: Column F: Column G:
Items
ID:t0175
Default
Response
Scale
Other
Response Scale
Contenders
Indicate Your
Agreement
With Inclusion
of Item:
Indicate Your
Agreement With
Item Wording
Indicate
ID:t0200
Your
Choice of
Response Scale
Collated
ID:t0205
Feedback:
Inclusion
Consensus
as per
“Column D”
(Y= “Yes,”
N= “No”)
(If “agree,” leave
column empty. If
“disagree,”
suggest alternative
wording here)
(If your choice
of scale is not on
the list, please
suggest here)
(i.e., 5/6
reviewers said
“Yes” to item
inclusion)
Pdf_Folio:150
Item Generation in Measure Development: A Practical Guide 151
inaccurate, inadequate, and perhaps biased measure. We found that using a TOS forced our
measure development team to revisit the conceptualization of the construct on numerous
occasions, honing the purpose and scope as a result, while also eliciting additional perspec-
tives that would not otherwise be recognized. This also emphasizes the benefit of having a
heterogenous (substantive expertise, methodological approaches, disciplines, gender, eth-
nicity, etc.) group of experts within the measure development team to enhance the breadth
and depth of the construct of interest and the associated items, augmenting the perspectives
of patients from our initial qualitative research (Sinclair et al., 2016).
While
ID:p0300
we found that there were numerous benefits in creating and implementing a TOS
in the item generation stage of our compassion measure, we recognize that this is a single
case illustration focused on the development of a measure of a specific construct. Further
research is therefore required to determine the applicability of this item generation pro-
cess to other constructs, populations, and measure development teams. Further, while we
suggest that best practice guidelines need to be developed, it is our hope that the stepwise
process highlighted herein (Figure 1), including the use of a TOS to facilitate item gen-
eration, will at the very least provide a starting point for these important discussions and
possibly a potential best practice.
Establishing
ID:p0305
an a priori scope and purpose of measurement (Phase 1) as a necessary,
nonnegotiable preliminary phase to item generation helps measure development teams
remain grounded to the task at hand and can mitigate tangential discussions, which while
interesting, are unnecessary and circuitous to the focus of measure development. At the
same time, the ability to revisit the scope and purpose of the measure and previous decision
points is an ongoing and beneficial exercise throughout the subsequent phases of measure
development, which should not be left to the good intentions or residual memories of the
research team. In our process of developing a patient-reported compassion measure, other
important considerations included in the item generation phase included: Item tense, recall
period, evaluating singular vs. plural HCP(s), response scale, and mode of administra-
tion. While other research teams may find themselves periodically revisiting these factors,
having these discussions at the outset is valuable not only in enhancing the efficiency of
the team, but also in anticipating potential risks before they become unmodifiable issues
during the validation stages of measure development.
As
ID:p0310
a point of consideration, identifying these aforementioned issues in advance does not
guarantee or mitigate issues arising in subsequent stages. In fact, we discovered that many
of our identified, potential issues could not be addressed prospectively and instead had to
be “parked” and revisited as the measure development process unfolded. For example, as
determining the mode of administration was not a priority or necessity at the early stages
of measure development, discussion on this issue was deferred later in the item generation
process.
RELEVANCE TO NURSING PRACTICE, EDUCATION, OR RESEARCH
This
ID:p0315
article is intended to provide pragmatic guidance on the process and complexity of the
item generation step of measure development by utilizing a TOS, illustrated herein in the
development of a patient-reported compassion measure. Although compassion has been
identified as a pillar of quality nursing care, valid and reliable measures to identify, evalu-
ate, and improve compassion in both nursing practice and education are lacking. Likewise,
nursing researchers are currently impeded in conducting research in this area due to thePdf_Folio:151
152 Sinclair et al.
aforementioned shortcoming, leaving much of nursing scholarship to anecdotal and the-
oretical discourses on this vital topic. While we are not suggesting that the creation and
implementation of a TOS will rectify this matter or that it will become a standard of mea-
sure development, we feel that by drawing on the rich history of TOS usage in educational
testing, a TOS can be considered a valuable tool in the item generation process for patient-
reported measures, thereby providing an evidence-based approach to an otherwise neb-
ulous step of measure development—improving the quality, robustness, and rigor of the
final measure in the process.
REFERENCES
American Educational Research Association, American Psychological Association, National Council
on Measurement in Education, Joint Committee on Standards for Educational and Psycholog-
ical Testing (U.S.). (2014). Standards for educational and psychological testing. Washington,
DC: AERA.
Avila, M., Stinson, J., Kiss, A., Brandao, R., Uleryk, E., & Feldman, M. (2015). A critical
review of scoring options for clinical measurement tools. BMC Research Notes,28(8), 612.
https://doi.org/10.1186/s13104-015-1561-6
Bagozzi, R. (2011). Measurement and meaning in information systems and organizational
research: Methodological and philosophical foundations. MIS Quarterly,35(2), 261–292.
https://doi.org/10.2307/23044044
Billings, D., & Halstead, J. (2016). Teaching in nursing: A guide for faculty. Retrieved from
https://books.google.ca/books?isbn=032329054X
Bowling, A. (1997). Research methods in health. Buckingham, UK: Open University Press.
Bradburn, N., Wansink, B., & Sudman, S. (2004). Asking questions: The definitive guide to question-
naire design—For market research, political polls, and social and health questionnaires (Rev
ed.). San Francisco, CA: Jossey-Bass.
Brod, M., & Tesler, L. (2009). Qualitative research and content validity: Developing best
practices based on science and experience..Quality of Life Research,18, 1263–1278.
https://doi.org/10.1007/s11136-009-9540-9
Buchbinder, R., Batterham, R., Elsworth, G., Clermont, D., Irvin, E., & Osborne, R. (2011).
A validity-driven approach to the understanding of the personal and societal burden of low back
pain: Development of a conceptual and measurement model. Arthritis Research & Therapy,
13(5), R152. https://doi.org/R152-10.1186/ar3468
Chase, C. I. (1999). Contemporary assessment for educators. New York, NY: Longman.
Cheng, K., & Clark, A. (2017). Qualitative methods and patient-reported outcomes: Mea-
sures development and adaptation. International Journal of Qualitative Methods,16, 1–3.
https://doi.org/10.1177/1609406917702983
Deshpande, P., Rajan, S., Sudeepthi, L., & Nazir, A. (2011). Patient-reported outcomes:
A new era in clinical research. Perspectives in Clinical Research,2(4), 137–144. https://
doi.org/10.4103/2229-3485.86879
DeVellis, R. (2003). Scale development: Theory and applications (2nd ed.). Newbury Park, CA: Sage.
Dillman, D., Smyth, J., & Christian, L. (2014). Internet, phone, mail, and mixed-mode surveys: The
tailored design method. Somerset, UK: John Wiley & Sons.
Fayers, P., & Hand, D. J. (2002). Causal variables, indicator variables and measurement scales:
An example from quality of life. Journal of the Royal Statistical Society. Series A (Statistics in
Society),165(2), 233–253. doi:https://doi.org/10.1111/1467-985X.02020Pdf_Folio:152
Item Generation in Measure Development: A Practical Guide 153
Fayers, P., & Machin, D. (2016). Quality of life: The assessment, analysis, and reporting of
patient-reported outcomes (3rd ed.). Sussex, UK; Hoboken, NJ: Chichester, West.John Wiley &
Sons Inc.
Fives, H., & DiDonato-Barnes, N. (2013). Classroom test construction: The power of a table of
specifications. Practical Assessment, Research and Evaluation,18(4), 1-7.
Grant, J., & Davis, L. (1997). Selection and use of content experts for instrument development.
Research in Nursing & Health,20, 269–274.
Hinkin, T. (1995). A review of scale development practices in the study of organizations. Journal of
Management,21(5), 967–988. https://doi.org/10.1177/014920639502100509
Hinkin, T. (1998). A brief tutorial on the development of measures for use in survey
questionnaires. Organizational Research Methods,1(1), 104–121. doi:https://doi.org/
10.1177/109442819800100106
LaVela, S., & Gallan, A. (2014). Evaluation and measurement of patient experience. Patient Experi-
ence Journal,1(1). Article 5
Lynn, M. (1986). Determination and quantification of content validity. Nursing Research,35(6),
382–385. https://doi.org/10.1097/00006199-198611000-00017
Mokkink, L., Terwee, C., Patrick, D., Alonso, J., Stratford, P., Knol, D., ... de Vet, H. (2010). The
COSMIN checklist for assessing the methodological quality of studies on measurement prop-
erties of health status measurement instruments: An international delphi study. Quality of Life
Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabil-
itation,19(4), 539–549. https://doi.org/10.1007/s11136-010-9606-8
Morgado, F., Meireles, J., Neves, C., Amaral, A., & Ferreira, M. (2017). Scale development:
Ten main limitations and recommendations to improve future research practices. Psicologia:
Reflexão e Crítica,30(3). https://doi.org/10.1186/s41155-016-0057-1
Netemeyer, R., Bearden, W., & Sharma, S. (2003). Scaling procedures: Issues and applications.
London, UK: Sage.
Nichols, P., & Sugrue, B. (1999). The lack of fidelity between cognitively complex constructs and
conventional test development practice. Educational Measurement: Issues and Practice,18,
18–29. https://doi.org/10.1111/j.1745-3992.1999.tb00011.x
Rattray, J., & Jones, M. (2007). Essential elements of questionnaire design and development. Journal
of Clinical Nursing,16, 234–243. https://doi.org/10.1111/j.1365-2702.2006.01573.x
Reeve, B., Wyrwich, K., Wu, A., Velikova, G., Terwee, C., Snyder, C., ... Butt, Z. (2013). ISO-
QOL recommends minimum standards for patient-reported outcome measures used in patient-
centred outcomes and comparative effectiveness research. Quality of Life Research,22, 1889.
doi:https://doi.org/10.1007/s11136-012-0344-y
Saris, W. (2014). Design, evaluation, and analysis of questionnaires for survey research. Hoboken,
NJ: Wiley.
Sinclair, S., McClement, S., Raffin-Bouchal, S., Hack, T., Hagen, A., McConnell, S., & Chochinov,
H. (2016). Compassion in health care: An empirical model. Journal of Pain and Symptom Man-
agement,51, 193–203. https://doi.org/10.1016/j.jpainsymman.2015.10.009
Sinclair, S., Russell, L. B., Hack, T. F., Kondejewski, J., & Sawatzki, R. (2016). Measuring
compassion in healthcare: A comprehensive and critical review. Patient,10(4), 389–405.
https://doi.org/10.1007/s40271-016-0209-5
Sinclair, S., Jaggi, P., Hack, T. F., McClement, S., Raffin-Bouchal, S., & Singh, P. (2018). Assess-
ing the credibility and transferability of the patient compassion model in non-cancer palliative
populations. BMC Palliative Care,17, 108. https://doi.org/10.1186/s12904-018-0358-5
Streiner, D., Norman, G., & Cairney, J. (2015). Health measurement scales. A practical guide to their
development and use (5th ed.) Oxford, UK: Oxford University Press.Pdf_Folio:153
154 Sinclair et al.
Terwee, C., Bot, S., de Boer, M., van der Windt, D., Knol, D., Dekker, J. ... De de Vet, H. (2007).
Quality criteria were proposed for measurement properties of health status questionnaires. Jour-
nal of Clinical Epidemiology,60(1), 34–42. doi:https://doi.org/10.1016/j.jclinepi.2006.03.012
US Food and Drug Administration, Center for Drug Evaluation and Research, Center for Biologics
Evaluation and Research, Center for Devices and Radiological Health. (2009). Guidance for
industry patient-reported outcome measures: Use in medical product development to support
labeling claims. Retrieved from http://purl.access.gpo.gov/GPO/LPS113413.
Valderas, J., Ferrer, M., Mendivil, J., Garin, O., Rajmil, L., Herdman, M., & Alonso, J. (2008). Devel-
opment of EMPRO: A tool for the standardized assessment of patient-reported outcome mea-
sures. Value Health,11(4), 700–708. https://doi.org/10.1111/j.1524-4733.2007.00309.x
Disclosure.The
ID:p0320
authors have no relevant financial interest or affiliations with any commercial inter-
ests related to the subjects discussed within this article.
Acknowledgments
ID:p0345
.This study was approved by the University of Calgary Conjoint Health Research
Ethics Board (REB #16-1460). The datasets used and/or analyzed during the current study are avail-
able from the corresponding author on reasonable request.
Funding. This study was funded through a Canadian Institutes of Health Research Project Scheme
Grant (#364041). This research was undertaken in part, thanks to funding from the Canada Research
Chairs Program.
Correspondence regarding this article should be directed to Shane Sinclair, PhD, University of
Calgary, Calgary, Alberta, Canada. E-mail: sinclair@ucalgary.ca
Pdf_Folio:154
Item Generation in Measure Development: A Practical Guide 155
Appendix
ID:ti0080
A: Item Development Guidelines
1. Item
ID:p0360
statements should be simple and as short as possible (prevent boredom and fatigue from
the respondent perspective)
2. Language
ID:p0365
should be familiar to target respondents
3. Items
ID:p0370
should be written at a 12-year-old reading level
4. Avoid
ID:p0375
item ambiguity (e.g., words like “recently” can have different meanings for different
individuals, and hence should be described)
5. Avoid
ID:p0380
jargon (e.g., words that are not used on an every-day basis by patients, such as “benev-
olence” or “beneficence” or “proactive”)
6. Items
ID:p0385
should be measuring separate aspects of the topic or construct of interest (e.g., behavior
vs. affect) and should not be intermixed
7. Avoid
ID:p0390
double-barrelled items to avoid representing more than one construct
8. Avoid
ID:p0395
any leading questions to prevent response bias
9. Avoid
ID:p0400
items that would result in the same response from all individuals (i.e., floor and ceiling
effects and therefore no variance)
10. If
ID:p0405
using negatively worded items, ensure that they are very CAREFULLY worded for appro-
priate responses (can be helpful to have negatively worded questions to pick up on patients
who might answer questions a certain way due to fatigue or that they don’t understand the
question, resulting in response acquiescence).
a. Note
ID:p0410
: negatively worded items have lower validity coefficients than positively
worded ones
b. Scales
ID:p0415
with both positive and negative wording are less reliable than those with wording
in the same direction
c. To
ID:p0420
create a balanced scale, all of the items should be positively worded, but one
half should tap one direction of the trait and the other half should tap the opposite
direction of it
11. Avoid
ID:p0425
item redundancy
12. Include
ID:p0430
items with heavily endorsed codes (e.g., representing individualized patient views on
compassion derived from qualitative interviews)
Source
ID:p0435
:Bradburn, N., Wansink, B., & Sudman, S. (2004). Asking questions: The
definitive guide to questionnaire design—For market research, political polls, and
social and health questionnaires (Rev ed.). San Francisco, CA: Jossey-Bass.; Dillman,
D., Smyth, J., & Christian, L. (2014). Internet, phone, mail, and mixed-mode sur-
veys: The tailored design method (p. ). Somerset, UK: John Wiley & Sons. ISBN:
9781118921296; Hinkin, T. (1998). A brief tutorial on the development of measures
for use in survey questionnaires. Organizational Research Methods,1(1), 104–121.
doi:https://doi.org/10.1177/109442819800100106; Streiner, D., Norman, G., & Cairney, J.
(2015). Health measurement scales. A practical guide to their development and use (5th
ed..) Oxford, UK: Oxford University Press.
Pdf_Folio:155
156 Sinclair et al.
Appendix
ID:ti0085
B. Glossary of Terms
Term Definition
Item
ID:t0210
An individual question, statement, or task (and its standardized response
options) that is evaluated by the patient to address a particular concept
(Deshpande et al., 2011; FDA Center for Drug Evaluation and Research et al.,
2009).
Construct
ID:t0220
An abstract phenomenon of interest (Avila et al., 2015).
Domain
ID:t0230
A sub-concept that measures a larger concept comprised of multiple domains
(Deshpande et al., 2011; FDA Center for Drug Evaluation and Research et al.,
2009).
Theme
ID:t0240
A sub-concept that measures a larger domain (Sinclair et al., 2016).
Table
ID:t0250
of
specifications
A table that “aligns objectives, instruction, and assessment” to facilitate the
construction of tests through rigorous means (Fives & DiDonato-Barnes, 2013).
Measurement
ID:t0260
indicators
The observable elements (i.e., variables) used to assess the construct of interest
(Avila et al., 2015; Bagozzi, 2011).
Reflective
ID:t0270
indicators
Variables that measures a key domain of the construct that are a manifestation
of the construct and are interchangeable (Bagozzi, 2011; Fayers & Hand, 2002;
Fayers & Machin, 2016; Mokkink et al., 2010).
Formative
ID:t0280
indicators
Variables that collectively form a construct but are not interchangeable
(Bagozzi, 2011; Fayers & Hand, 2002; Fayers & Machin, 2016; Mokkink
et al., 2010).
Content
ID:t0290
category
A set of behaviors, knowledge, skills, abilities, attitudes, or other
characteristics to be measured by a test, which can be utilized to generate an
initial pool of items (APA, 2014).
Pdf_Folio:156
... Our previous review of compassion measures in healthcare between 1985 and 2016 concluded that no single measure available measured compassion in healthcare in a comprehensive or sufficiently methodologically rigorous fashion [42]. Since then, additional testing has been conducted on several measures and new compassion measures have been proposed [46][47][48][49][50][51][52][53][54][55][56][57][58]. The objective of the present study was to provide a critical and comparative review of the design and psychometric properties of recently updated or newly published compassion measures to identify a "gold standard" for measuring compassion in healthcare research, clinical practice, and healthcare policy development. ...
... The characteristics of the included studies are shown in Tables 1 and 2. Measures that underwent additional testing since our original review included the Compassion Competence Scale (CCS) [46,71], the Compassionate Care Assessment Tool (CCAT) © [47,72], and the Schwartz Center Compassionate Care Scale (SCCCS)™ [48,49,73]. New compassion measures included the Sussex-Oxford Compassion for Others Scale (SOCS-O), a self-report measure of compassion for others [50]; the Bolton Compassion Strengths Indicators (BSCI), a self-report measure of the characteristics (strengths) associated with a compassionate nurse [51]; a five-item Tool to Measure Patient Assessment of Clinician Compassion (TMPACC) [52][53][54]; and the SCQ, a 15-item patient-reported compassion measure developed for use in research and clinical practice [55][56][57][58]. ...
... The Sinclair Compassion Questionnaire (SCQ) The SCQ was developed as a patient-reported measure of compassion. Patients are asked to rate their experience of compassion from their healthcare providers using a 5-point Likert scale of agreement (1 = strongly disagree, 2 = disagree, 3 = neutral, 4 = agree, 5 = strongly agree) [55][56][57][58]. ...
Article
Full-text available
Our previous review of compassion measures in healthcare between 1985 and 2016 concluded that no available measure assessed compassion in healthcare in a comprehensive or methodologically rigorous fashion. The present study provided a comparative review of the design and psychometric properties of recently updated or newly published compassion measures. The search strategy of our previous review was replicated. PubMed, MEDLINE, CINAHL, and PsycINFO databases and grey literature were searched to identify studies that reported information on instruments that measure compassion or compassionate care in clinicians, physicians, nurses, healthcare students, and patients. Textual qualitative descriptions of included studies were prepared. Instruments were evaluated using the Evaluating Measures of Patient-Reported Outcomes (EMPRO) tool. Measures that underwent additional testing since our last review included the Compassion Competence Scale (CCS), the Compassionate Care Assessment Tool (CCAT)©, and the Schwartz Center Compassionate Care Scale (SCCCS)™. New compassion measures included the Sussex-Oxford Compassion for Others Scale (SOCS-O), a self-report measure of compassion for others; the Bolton Compassion Strengths Indicators (BSCI), a self-report measure of the characteristics (strengths) associated with a compassionate nurse; a five-item Tool to Measure Patient Assessment of Clinician Compassion (TMPACC); and the Sinclair Compassion Questionnaire (SCQ). The SCQ was the only measure that adhered to measure development guidelines, established initial construct validity by first defining the concept of interest, and included the patient perspective across all stages of development. The SCQ had the highest EMPRO overall score at 58.1, almost 9 points higher than any other compassion measure, and achieved perfect EMPRO subscale scores for internal consistency, reliability, validity, and respondent burden, which were up to 43 points higher than any other compassion measure. These findings establish the SCQ as the 'gold standard' compassion measure, providing an empirical basis for evaluations of compassion in routine care.
... Boateng et al. (2018) distinguish three phasesitem development, scale development, and scale evaluation. These steps are listed and explained in (Sinclair et al., 2020). To transparently document and solidify our understanding of the construct and its domains, and following the examples of other authors who dedicated their papers to rigorously discuss these initial steps of measure construction (Amendola et al., 2021;Barreca et al., 2004;Ismail et al., 2021;Ruksakulpiwat, 2021;Sinclair et al., 2020), within this paper we focus exclusively on the first of the three phases: item development. ...
... These steps are listed and explained in (Sinclair et al., 2020). To transparently document and solidify our understanding of the construct and its domains, and following the examples of other authors who dedicated their papers to rigorously discuss these initial steps of measure construction (Amendola et al., 2021;Barreca et al., 2004;Ismail et al., 2021;Ruksakulpiwat, 2021;Sinclair et al., 2020), within this paper we focus exclusively on the first of the three phases: item development. ...
Preprint
Full-text available
In this paper we draw on value theory in social psychology to conceptualize the range of motives that may influence research-related attitudes, decisions, and actions of researchers. To conceptualize academic research values, we integrate theoretical insights from the personal, work, and scientific work values literature, as well as the responses of 6 interviewees and 255 survey participants about values relevant to academic research. Finally, we propose a total of 246 academic research value items spread over 11 dimensions and 36 sub-themes. We relate our conceptualization and item proposals to existing work and provide recommendations for future measurement development. Gaining a better understanding of the different values researchers have, is useful to improve scientific careers, make science attractive to a more diverse group of individuals, and elucidate some of the mechanisms leading to exemplary and questionable science.
... The current study began with qualitative interviews with patients to establish the transferability of the model across our study populations and focus groups with HCP, educators and administrators (n=24) to determine the feasibility, challenges, facilitators and clinical utility of the proposed measure (see online supplemental table S1). The results of this first study stage, 24 along with the findings of our afore-mentioned literature review and model development directly informed the item generation stage of the study, 25 in accordance with development guidelines. [20][21][22] Finally, the content validity of the draft measure was established through a Delphi process with international subject matter experts and patient advisors, along with cognitive interviews with patients. ...
... The SCQ contains items that cover patients' experiences of compassion within each of the theoretical domains of the Patient Compassion Model 1 with our results showing that these domains are subsumed under a single latent construct of compassion. These results are a defining feature of reflective measures, 25 whereby individual items each reflect the underlying construct, underscoring the necessity of conducting foundational research 16 25 26 1 and initial validation studies to establish construct validity 24-26 -an essential, but overlooked stage in the development of compassion measures 11 and measure in general. [19][20][21] As a result, the SCQ has excellent internal consistency (Cronbach's alpha of 0.96) and testretest reliability (ranging from 0.74 to 0.89). ...
Article
Full-text available
Objectives Compassion is a key indicator of quality care that is reportedly eroding from patients’ care experience. While the need to assess compassion is recognised, valid and reliable measures are lacking. This study developed and validated a clinically informed, psychometrically rigorous, patient-reported compassion measure. Design Data were collected from participants living with life-limiting illnesses over two study phases across four care settings (acute care, hospice, long term care (LTC) and homecare). In phase 1, data were analysed through exploratory factor analysis (EFA), with the final items analysed via confirmatory factor analysis (CFA) in phase 2. The Schwartz Center Compassionate Care Scale (SCCCS), the revised Edmonton Symptom Assessment Scale (ESAS-r) and Picker Patient Experience Questionnaire (PPEQ) were also administered in phase 2 to assess convergent and divergent validity. Setting and participants 633 participants were recruited over two study phases. In the EFA phase, a 54-item version of the measure was administered to 303 participants, with 330 participants being administered the final 15-item measure in the CFA phase. Results Both EFA and CFA confirmed compassion as a single factor construct with factor loadings for the 15-item measure ranging from 0.76 to 0.86, with excellent test–retest reliability (intraclass correlation coefficient range: 0.74–0.89) and excellent internal reliability (Cronbach’s alpha of 0.96). The measure was positively correlated with the SCCCS (r=0.75, p<0.001) and PPEQ (r=0.60, p<0.001). Participants reporting higher experiences of compassion had significantly greater well-being and lower depression on the ESAS-r. Patients in acute care and hospice reported significantly greater experiences of compassion than LTC residents. Conclusions There is strong initial psychometric evidence for the Sinclair Compassion Questionnaire (SCQ) as a valid and reliable patient-reported compassion measure. The SCQ provides healthcare providers, settings and administrators the means to routinely measure patients experiences of compassion, while providing researchers a robust measure to conduct high-quality research.
... Item generation is a crucial step in the development of an instrument. When done correctly, it ensures that items of an instrument accurately and comprehensively cover the construct measured [44]. In most studies, clients or patients participated in item generation to determine what quality of care means to clients of health care services, which is necessary for the elusive and evolving concept of patient-centered care [3]. ...
Article
Full-text available
Background Perspectives of patients as clients on healthcare offer unique insights into the process and outcomes of care and can facilitate improvements in the quality of services. Differences in the tools used to measure these perspectives often reflect differences in the conceptualization of quality of care and personal experiences. This systematic review assesses the validity and reliability of instruments measuring client experiences and satisfaction with healthcare in low- and middle-income countries (LMICs). Methods We performed a systematic search of studies published in PubMed, SCOPUS, and CINAHL. This review was reported according to the Preferred Reporting Items for Systematic Review and Meta-analysis (PRISMA) guidelines. Studies describing the development and psychometric properties of client experience and satisfaction with general health care were included in the review. Critical appraisal of study design was undertaken using the Appraisal tool for Cross-Sectional Studies (AXIS). The Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) checklist and Terwee’s criteria were used to appraise the psychometric properties of the included studies. A narrative synthesis approach was used in the interpretation of the findings. Results Of the 7470 records identified, 12 studies with 14 corresponding instruments met the inclusion criteria and were included in the final review. No study assessed all the psychometric properties highlighted by the COSMIN criteria. In most instruments, we found evidence that initial development work incorporated client participation. The most evaluated measurement properties were content validity, internal consistency, and structural validity. Measurement error and responsiveness were not reported in any study. Conclusion Reliability and validity should be considered important elements when choosing or developing an instrument for professionals seeking an effective instrument for use within the population. Our review identified limitations in the psychometric properties of patient experience and satisfaction instruments, and none met all methodological quality standards. Future studies should focus on further developing and testing available measures for their effectiveness in clinical practice. Furthermore, the development of new instruments should incorporate clients' views and be rigorously tested or validated in studies with high methodological quality. Trial registration CRD42020150438.
... Age-related differences in self-reported opinions, attitudes or behaviors about health can also be influenced by age-induced changes in cognitive and communicative functioning [18]. There is a need to advance understandable and accurate communication between the patient and healthcare personnel and the patient's appropriate and sufficient involvement in decision making that addresses his/her needs [19,20]. ...
Article
Full-text available
Background Developing machine learning models to support health analytics requires increased understanding about statistical properties of self-rated expression statements used in health-related communication and decision making. To address this, our current research analyzes self-rated expression statements concerning the coronavirus COVID-19 epidemic and with a new methodology identifies how statistically significant differences between groups of respondents can be linked to machine learning results. Methods A quantitative cross-sectional study gathering the “need for help” ratings for twenty health-related expression statements concerning the coronavirus epidemic on an 11-point Likert scale, and nine answers about the person’s health and wellbeing, sex and age. The study involved online respondents between 30 May and 3 August 2020 recruited from Finnish patient and disabled people’s organizations, other health-related organizations and professionals, and educational institutions (n = 673). We propose and experimentally motivate a new methodology of influence analysis concerning machine learning to be applied for evaluating how machine learning results depend on and are influenced by various properties of the data which are identified with traditional statistical methods. Results We found statistically significant Kendall rank-correlations and high cosine similarity values between various health-related expression statement pairs concerning the “need for help” ratings and a background question pair. With tests of Wilcoxon rank-sum, Kruskal-Wallis and one-way analysis of variance (ANOVA) between groups we identified statistically significant rating differences for several health-related expression statements in respect to groupings based on the answer values of background questions, such as the ratings of suspecting to have the coronavirus infection and having it depending on the estimated health condition, quality of life and sex. Our new methodology enabled us to identify how statistically significant rating differences were linked to machine learning results thus helping to develop better human-understandable machine learning models. Conclusions The self-rated “need for help” concerning health-related expression statements differs statistically significantly depending on the person’s background information, such as his/her estimated health condition, quality of life and sex. With our new methodology statistically significant rating differences can be linked to machine learning results thus enabling to develop better machine learning to identify, interpret and address the patient’s needs for well-personalized care.
Article
Full-text available
Background: Information needs are one of the most common unmet supportive care needs of those living with cancer. Little is known about how existing tools for assessing information needs in the cancer context have been created or the role those with lived cancer experience played in their development. Objectives: This review aimed to characterize the development and intended use of existing cancer specific information needs assessment tools. Methods: A systematic scoping review was conducted using a peer-reviewed protocol informed by recommendations from the Joanna Briggs Institute and the Prefered Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) checklist. Results: Twenty-one information needs assessment tools were included. Most tools were either breast cancer (n = 8) or primary tumor nonspecific (n = 8). Patients and informal carers participated in initial identification of questionnaire items in the minority of cases (n = 6) and were more commonly involved in reviewing the final questionnaire before use or formal psychometric testing (n = 9). Most questionnaires were not assessed for validity or reliability using rigorous quantitative psychometric testing. Significance of results: Existing tools are generally not designed to provide a rigorous assessment of informational needs related to a specific cancer challenge and are limited in how they have been informed by those with lived cancer experience. Tools are needed that both rigirously address information needs for specific cancer challenges and that have been developed in partnership with those who have experienced cancer. Future directions should include understanding barriers and facilitators to developing such tools.
Preprint
Developing machine learning models to support health analytics requires increased understanding about statistical properties of self-rated expression statements. We analyzed self-rated expression statements concerning the coronavirus COVID-19 epidemic to identify statistically significant differences between groups of respondents and to detect the patient's need for help with machine learning. Our quantitative study gathered the "need for help" ratings for twenty health-related expression statements concerning the coronavirus epidemic on a 11-point Likert scale, and nine answers about the person's health and wellbeing, sex and age. Online respondents between 30 May and 3 August 2020 were recruited from Finnish patient and disabled people's organizations, other health-related organizations and professionals, and educational institutions (n=673). We analyzed rating differences and dependencies with Kendall rank-correlation and cosine similarity measures and tests of Wilcoxon rank-sum, Kruskal-Wallis and one-way analysis of variance (ANOVA) between groups, and carried out machine learning experiments with a basic implementation of a convolutional neural network algorithm. We found statistically significant correlations and high cosine similarity values between various health-related expression statement pairs concerning the "need for help" ratings and a background question pair. We also identified statistically significant rating differences for several health-related expression statements in respect to groupings based on the answer values of background questions, such as the ratings of suspecting to have the coronavirus infection and having it depending on the estimated health condition, quality of life and sex. Our experiments with a convolutional neural network algorithm showed the applicability of machine learning to support detecting the need for help in the patient's expressions.
Article
Full-text available
Background Although compassionate care is considered a cornerstone of quality palliative care, there is a paucity of valid and reliable measures to study, assess, and evaluate how patients experience compassion/compassionate care in their care.Objective The aim was to develop a patient-reported compassion measure for use in research and clinical practice with established content-related validity evidence for the items, question stems, and response scale.Methods Content validation for an initial 109 items was conducted through a two-round modified Delphi technique, followed by cognitive interviews with patients. A panel of international Subject Matter Experts (SMEs) and a Patient Advisory Group (PAG) assessed the items for their relevancy to their associated domain of compassion, yielding an Item-level Content Validity Index (I-CVI), which was used to determine content modifications. The SMEs and the PAG also provided narrative feedback on the clarity, flow, and wording of the instructions, questions, and response scale, with items being modified accordingly. Cognitive interviews were conducted with 16 patients to further assess the clarity, comprehensibility, and readability of each item within the revised item pool.ResultsThe first round of the Delphi review produced an overall CVI of 72% among SMEs and 80% among the PAG for the 109 items. Delphi panelists then reviewed a revised measure containing 84 items, generating an overall CVI of 84% for SMEs and 86% for the PAG. Sixty-eight items underwent further testing via cognitive interviews with patients, resulting in an additional 14 items being removed.Conclusions Having established this initial validity evidence, further testing to assess internal consistency, test–retest reliability, factor structure, and relationships to other variables is required to produce the first valid, reliable, and clinically informed patient-reported measure of compassion.
Article
Full-text available
Background: A lack of evidence and psychometrically sound measures of compassion necessitated the development of the first known, empirically derived, theoretical Patient Compassion Model (PCM) generated from qualitative interviews with advanced cancer inpatients. We aimed to assess the credibility and transferability of the PCM across diverse palliative populations and settings. Methods: Semi-structured, audio-recorded qualitative interviews were conducted with 20 patients with life-limiting diagnoses, recruited from 4 settings (acute care, homecare, residential care, and hospice). Participants were first asked to share their understandings and experiences of compassion. They were then presented with an overview of the PCM and asked to determine whether: 1) the model resonated with their understanding and experiences of compassion; 2) the model required any modification(s); 3) they had further insights on the model's domains and/or themes. Members of the research team analyzed the qualitative data using constant comparative analysis. Results: Both patients' personal perspectives of compassion prior to viewing the model and their specific feedback after being provided an overview of the model confirmed the credibility and transferability of the PCM. While new codes were incorporated into the original coding schema, no new domains or themes emerged from this study sample. These additional codes provided a more comprehensive understanding of the nuances within the domains and themes of the PCM that will aid in the generation of items for an ongoing study to develop a patient reported measure of compassion. Conclusions: A diverse palliative patient population confirmed the credibility and transferability of the PCM within palliative care, extending the rigour and applicability of the PCM that was originally developed within an advanced cancer population. The views of a diverse palliative patient population on compassion helped to validate previous codes and supplement the existing coding schema, informing the development of a guiding framework for the generation of a patient-reported measure of compassion.
Article
Full-text available
Despite the increasing presence of a variety of measures of patient health care experiences in research and policy, there remains a lack of consensus regarding measurement. The objectives of this paper were to: (1) explore and describe what is known about measures and measurement of patient experience and (2) describe evaluation approaches/methods used to assess patient experience. Patient-experience does not simply reflect clinical outcomes or adherence–driven outcomes; rather it seeks to represent a unique encompassing dimension that is challenging to measure. Several challenges exist when measuring patient experience, in part, because it is a complex, ambiguous concept that lacks a common or ubiquitous definition and also because there are multiple cross-cutting terms (e.g., satisfaction, engagement, perceptions, and preferences) in health care that make conceptual distinction (and therefore measurement) difficult. However, there are many measurement and evaluation approaches that can be used to obtain meaningful insights that can generate actionable strategies and plans. Measuring patient experience can be accomplished using mixed methods, quantitative, or qualitative approaches. The strength of the mixed methods design lies not only in obtaining the “full picture,” but in triangulating (i.e., cross-validating) qualitative and quantitative data to see if and where findings converge, and what can be learned about patient experience from each method. Similar to deciding which measures to use, and which approaches to utilize in measurement, the timing of measurement must also fit the need at hand, and make both practical and purposeful sense and be interpreted in light of the timeframe context. Eliciting feedback from patients and engaging them in their care and health care delivery affords an opportunity to highlight and address aspects of the care experience that need improvement, and to monitor performance with regard to meeting patient experience goals in the delivery of care. The use of core patient-reported measures of patient experience as part of systematic measurement and performance monitoring in health care settings would markedly improve measurement of the ‘total’ patient experience and would heighten our understanding of the patient experience within and across settings. Experience Framework This article is associated with the Policy & Measurement lens of The Beryl Institute Experience Framework. (http://bit.ly/ExperienceFramework) Access other PXJ articles related to this lens. Access other resources related to this lens.
Article
Full-text available
The scale development process is critical to building knowledge in human and social sciences. The present paper aimed (a) to provide a systematic review of the published literature regarding current practices of the scale development process, (b) to assess the main limitations reported by the authors in these processes, and (c) to provide a set of recommendations for best practices in future scale development research. Papers were selected in September 2015, with the search terms “scale development” and “limitations” from three databases: Scopus, PsycINFO, and Web of Science, with no time restriction. We evaluated 105 studies published between 1976 and 2015. The analysis considered the three basic steps in scale development: item generation, theoretical analysis, and psychometric analysis. The study identified ten main types of limitation in these practices reported in the literature: sample characteristic limitations, methodological limitations, psychometric limitations, qualitative research limitations, missing data, social desirability bias, item limitations, brevity of the scale, difficulty controlling all variables, and lack of manual instructions. Considering these results, various studies analyzed in this review clearly identified methodological weaknesses in the scale development process (e.g., smaller sample sizes in psychometric analysis), but only a few researchers recognized and recorded these limitations. We hope that a systematic knowledge of the difficulties usually reported in scale development will help future researchers to recognize their own limitations and especially to make the most appropriate choices among different conceptions and methodological strategies.
Article
Full-text available
The aim of this paper is twofold: (1) to describe the fundamental differences between formative and reflective measurement models, and (2) to review the options proposed in the literature to obtain overall instrument summary scores, with a particular focus on formative models. An extensive literature search was conducted using the following databases: MEDLINE, EMBASE, PsycINFO, CINAHL and ABI/INFORM, using “formative” and “reflective” as text words; relevant articles’ reference lists were hand searched. Reflective models are most frequently scored by means of simple summation, which is consistent with the theory underlying these models. However, our review suggests that formative models might be better summarized using weighted combinations of indicators, since each indicator captures unique features of the underlying construct. For this purpose, indicator weights have been obtained using choice-based, statistical, researcher-based, and combined approaches. Whereas simple summation is a theoretically justified scoring system for reflective measurement models, formative measures likely benefit from the use of weighted scores that preserve the contribution of each of the aspects of the construct.
Article
Full-text available
Context: Compassion is frequently referenced as a hallmark of quality care by patients, health care providers, health care administrators and policy makers. Despite its putative centrality, including its institution in recent health care reform, an empirical understanding based on the perspectives of patients, the recipients of compassion, is lacking--making compassion one of the most referenced aspects of quality care that we know little about. Objectives: The objective of this study was to investigate palliative cancer patients' understanding and experiences of compassion in order to provide a critical perspective on the nature and importance of compassion. Methods: This grounded theory study used semi-structured interviews to investigate how patients understand and experience compassion in clinical care. Utilizing convenience and theoretical sampling, 53 advanced cancer inpatients were recruited over a seven-month period from a specialized palliative care unit and hospital-wide palliative care service within a Canadian urban setting. Data were analyzed by four members of the research team through the three stages of Straussian grounded theory. Results: Qualitative analysis yielded seven categories, each containing distinct themes and subthemes. Together, they constitute components of the compassion model-the first empirically based clinical model of compassion. The model defines compassion as a virtuous response that seeks to address the suffering and needs of a person through relational understanding and action. Conclusion: The components of the compassion model provide insight into how patients understand and experience compassion, providing the necessary empirical foundation to develop future research, measures, training and clinical care based on this vital feature of quality care.
Article
Content experts frequently are used in the judgment-quantification stage of content validation of instruments. However, errors in instrumentation may arise when important steps in selecting and using these experts are not carefully planned. The systematic process of choosing, orienting, and using content experts in the judgment-quantification stage of instrument development is addressed, with particular attention to the often neglected, important step of familiarizing these experts with the conceptual underpinnings and measurement model of the instrument. An example using experts to validate content for a measure of caregiver burden is used to illustrate this stage of instrument review. © 1997 John Wiley & Sons, Inc. Res Nurs Health 20: 269–274, 1997
Book
Clinicians and those in health sciences are frequently called upon to measure subjective states such as attitudes, feelings, quality of life, educational achievement and aptitude, and learning style in their patients. This fifth edition of Health Measurement Scales enables these groups to both develop scales to measure non-tangible health outcomes, and better evaluate and differentiate between existing tools. Health Measurement Scales is the ultimate guide to developing and validating measurement scales that are to be used in the health sciences. The book covers how the individual items are developed; various biases that can affect responses (e.g. social desirability, yea-saying, framing); various response options; how to select the best items in the set; how to combine them into a scale; and finally how to determine the reliability and validity of the scale. It concludes with a discussion of ethical issues that may be encountered, and guidelines for reporting the results of the scale development process. Appendices include a comprehensive guide to finding existing scales, and a brief introduction to exploratory and confirmatory factor analysis, making this book a must-read for any practitioner dealing with this kind of data.
Article
‘This is an excellent book which introduces the underlying concepts and practical issues related to psychosocial measurement and scale development’ - Statistics in Medicine. Effective measurement is a cornerstone of scientific research. Yet many social science researchers lack the tools to develop appropriate assessment instruments for the measurement of latent social-psychological constructs. Scaling Procedures: Issues and Applications examines the issues involved in developing and validating multi-item self-report scales of latent constructs. Distinguished researchers and award-winning educators Richard G Netemeyer, William O Bearden, and Subhash Sharma present a four-step approach for multi-indicator scale development. With these steps, the authors include relevant empirical examples and a review of the concepts of dimensionality, reliability, and validity. Scaling Procedures: Issues and Applications supplies cutting-edge strategies for developing and refining measures. Providing concise chapter introductions and summaries, as well as numerous tables, figures, and exhibits, the authors present recommended steps and overlapping activities in a logical, sequential progression. Designed for graduate students in measurement//psychometrics, structural equation modeling, and survey research seminars across the social science disciplines, this book also addresses the needs of researchers and academics in all business, psychology, and sociology-related disciplines.