Figure - uploaded by Elspeth Nolen
Content may be subject to copyright.
Source publication
The use of reliable, valid measures in implementation practice will remain limited without pragmatic measures. Previous research identified the need for pragmatic measures, though the characteristic identification used only expert opinion and literature review. Our team completed four studies to develop a stakeholder-driven pragmatic rating criteri...
Contexts in source publication
Context 1
... all stakeholders believed that "easy to interpret" was a very important property of pragmatic measures. Table 2 shows the mean scores and standard deviations for all terms and phrases within categories, and those for all terms and phrases. members were identified through our professional networks as leading experts in implementation science who have published on measurement issues. ...Context 2
... all stakeholders believed that "easy to interpret" was a very important property of pragmatic measures. Table 2 shows the mean scores and standard deviations for all terms and phrases within categories, and those for all terms and phrases. members were identified through our professional networks as leading experts in implementation science who have published on measurement issues. ...Similar publications
Background
Implementation science (IS) has the potential to improve the implementation and impact of policies, programs and interventions. Most of the training, guidance and experience has focused on implementation research, which is only one part of the broader field of IS. In 2018, the Society for Implementation Science in Nutrition borrowed conc...
Despite the utility of formative evaluation in implementation research, few published projects in low- and middle-income countries have used this approach and incorporating qualitative data into implementation projects can be challenging. Implementation science and qualitative formative evaluation can help inform the delivery of evidence-based prac...
Background:
Maternal health outcomes in the USA are far worse than in peer nations. Increasing implementation research in maternity care is critical to addressing quality gaps and unwarranted variations in care. Implementation research priorities have not yet been defined or well represented in the plans for maternal health research investments in...
Background:
A community-based dengue fever intervention was implemented in Burkina Faso in 2017. The results achieved vary from one area to another. The objective of this article is to analyze the implementation of this intervention, to better understand the process, and to explain the contextual elements of performance variations in implementatio...
Background
The provision of contraceptive care for incarcerated individuals has been largely inconsistent and has contributed to, at best, inadequate care, and at worst reproductive abuses, violence, and coercion. While previous research has identified strategies to remedy known issues, to date, very few recommendations have been implemented across...
Citations
... There have been recent efforts to quantify the IOF, though available measures are limited, either due to assessing only a few implementation outcomes or due to a lack of transferability from one type of setting/intervention to any others [23,28,29,30,31]. There is still a need for consistent approaches to measuring implementation outcomes that are both pragmatic, (i.e., acceptable, compatible, easy, and useful [32,33]) and psychometrically sound [34] thereby permitting longitudinal assessment [35,36]. The current study, an evaluation of multiple interventions in multiple settings conducted multiple years, provided an opportunity to test an adapted measure to assess implementation outcomes that can be used in a variety of organizational contexts and at various stages of implementation. ...
In 2017, the Health Resources and Services Administration’s HIV/AIDS Bureau funded an Evaluation Center (EC) to assess the rapid implementation of 11 evidence-informed interventions at 25 HIV care and treatment providers across the U.S. The EC conducted an implementation science-based evaluation, including longitudinal assessment of implementation outcomes as defined by Implementation Outcome Framework (IOF) of the Conceptual Model for Implementation Research. The EC adapted a measure originally designed for implementation readiness to capture seven implementation outcomes and administered the measure to site leadership every six months, from intervention launch through the end of the initiative. The adapted measure demonstrated adequate internal consistency within and across time periods. Individual outcomes changed over the course of implementation, with the greatest period of growth during the first six months. Longitudinal relationships between outcomes posited to be most relevant at early, mid- or late-implementation were not evident in these analyses; rather, relationships between the outcomes were significant within time periods. Finally, there were differences in the trajectory of outcomes based on characteristics of the site’s larger context. The use of this adapted measure across multiple implementation settings, assessing multiple interventions, is an important step forward in the comparability of implementation outcomes more broadly.
... This helps to enhance content validity by capturing a broader range and scope of ideas and can facilitate the creation of culturally sensitive measures (Rosas, 2023). In fact, GCM has been found to identify constructs not previously derived through top-down methods, like literature reviews or expert opinion (Adams et al., 2021;Stanick et al., 2021). Furthermore, by creating conceptual maps, GCM moves beyond identifying dimensions of a concept to effectively and succinctly illustrating relationships among ideas. ...
... Our preliminary research identified five core goals (functions) and distinct group activities that promote a cocreation engagement process. 16 The novel measure will address identified functions. Second, our proposed cocreation measure is grounded in systems thinking 11 17 18 and equity implementation research. ...
... We will validate the cocreation measure by using the Psychometric and Pragmatic Evidence Rating Scale (PAPERS) to ensure usability for patient-centred research and other community engagement research initiatives. 16 PAPERS is the first stakeholder-driven and validated rating criteria assessing whether a measure is pragmatic by providing a list of pragmatic categories. 16 See figure 2 for an overview of the measure development and validation project. ...
... 16 PAPERS is the first stakeholder-driven and validated rating criteria assessing whether a measure is pragmatic by providing a list of pragmatic categories. 16 See figure 2 for an overview of the measure development and validation project. ...
Introduction
Cocreation, a collaborative process of key interested partners working alongside researchers, is fundamental to community-engaged research. However, the field of community-engaged research is currently grappling with a significant gap: the lack of a pragmatic and validated measure to assess the quality of this process. This protocol addresses this significant gap by developing and testing a pragmatic cocreation measure with diverse community and research partners involved in participatory health-related research. A valid measure for evaluating the quality of the cocreation process can significantly promote inclusive research practices and outcomes.
Methods and analysis
The measure consists of two components: (1) an iterative group assessment to prioritise cocreation principles and identify specific activities for achieving those principles and (2) a survey assessing individual partner experience. An expert panel of 16–20 patients, community, healthcare providers and research partners, will participate in a modified Delphi process to assist in construct delineation and assess content validity using group discussions and rating exercises. We will compute survey items using an Item-Level Content Validity Index and a modified kappa statistic to adjust for chance agreement with panel members’ ratings. We will then conduct cognitive interviews with a new group of 40 participants to assess survey item comprehension and interpretation, applying an iterative coding process to analyse the data. Finally, we will assess the measure’s psychometric and pragmatic characteristics with a convenience sample of 300 participants and use the Psychometric and Pragmatic Evidence Rating Scale. Construct validity will be assessed by examining survey data using confirmatory and exploratory factor analysis.
Ethics and dissemination
This funded study (years 2024–2025) has been approved by the Institutional Review Board at the University of Colorado, Denver. The team will share the study findings online, with key partners, and by publishing results in a peer-reviewed journal.
... The barrier buster tool was developed by interviews with clinicians to create a simplified and easy-to-use version of CFIR that uses the 14 most important constructs, identified by the clinicians, from 4 out of the 5 domains [30]. The authors of this tool used an objective criterion to assess the pragmatism of a measurement instrument, i.e., barrier buster tool, and found the tool to be relatively pragmatic [31]. ...
Background: Physical inactivity in the U.S. poses a significant risk of developing chronic health factors associated with cardiovascular disease. Children from rural communities are especially vulnerable to inactivity. The Hoosier Sport program aims to address this by working to increase physical activity in 6th and 7th grade students in a rural Indiana middle school. Hoosier Sport uses sport participation coupled with health education delivered by college-service learning students to establish healthy behaviors that children can sustain throughout their life. The purpose of this prospective longitudinal study was to evaluate the implementation of Hoosier Sport in a rural middle school, using a multi-component evaluation approach. Methods: This prospective program evaluation study utilized The Consolidated Framework for Implementation Research (CFIR) to assess feasibility outcomes such as recruitment, retention, fidelity, attendance, acceptability, and cost. CFIR was incorporated through surveys completed by Hoosier Sport team members to identify facilitators and barriers. Fidelity was measured using SOSPAN and SOFIT tools. SOSPAN (System for Observation of Staff Promotion of Activity and Nutrition) monitored staff interactions with children during physical education classes. SOFIT (System of Observing Fitness Instruction Time) evaluated the duration and type of activities in each lesson context. For our descriptive analysis, we calculated means and standard deviation for continuous variables and percentages for categorical variables. Results: All feasibility measures met or exceeded the a priori threshold, indicating high success. Fidelity was high among college student implementers and child participants. SOSPAN showed that staff did not use physical activity as punishment, engaged in physical activity 62.5% of the time, provided verbal encouragement 87.5% of the time, and used elimination games only 2.5% of the time. SOFIT revealed significant promotion of moderate-to-vigorous physical activity, with 94% during the 4-week strength training intervention and 95% during the 4-week basketball intervention. The barrier buster tool identified general agreement with most statements, indicating promising system-level acceptability. Conclusion: The study results demonstrate successful feasibility, high fidelity, and promising system-level acceptability. These findings underscore the importance of continued refinement and repeated evaluation of the program in alignment with the ORBIT model. The use of college student implementers presents a sustainable model that benefits all participants involved.
... According to Glasgow and Riley [8], important criteria for pragmatic measures include, among others: important to stakeholders, low respondent burden, actionable, sensitive to change, broadly applicable, and can serve as a benchmark. Efforts to establish criteria to evaluate pragmatic properties of IS measures have yielded substantial conceptual clarity and are pushing the field of IS measurement development forward to achieve greater scientific rigor and practical impact [7,[9][10][11]. Nevertheless, still largely missing in the literature is a detailed account of the process of developing and validating a pragmatic IS measure, including regarding how stakeholders including program implementers are engaged to enhance the measure's utility, a key property defined as whether a measure and its items account for the meaningful aspects of the implementation contexts (e.g., cultural relevance, environmental resources, and program processes). ...
Background
Few implementation science (IS) measures have been evaluated for validity, reliability and utility – the latter referring to whether a measure captures meaningful aspects of implementation contexts. We present a real-world case study of rigorous measure development in IS that assesses Barriers and Facilitators in Implementation of Task-Sharing in Mental Health services (BeFITS-MH), with the objective of offering lessons-learned and a framework to enhance measurement utility.
Methods
We summarize conceptual and empirical work that informed the development of the BeFITS-MH measure, including a description of the Delphi process, detailed translation and local adaptation procedures, and concurrent pilot testing. As validity and reliability are key aspects of measure development, we also report on our process of assessing the measure’s construct validity and utility for the implementation outcomes of acceptability, appropriateness, and feasibility.
Results
Continuous stakeholder involvement and concurrent pilot testing resulted in several adaptations of the BeFITS-MH measure’s structure, scaling, and format to enhance contextual relevance and utility. Adaptations of broad terms such as “program,” “provider type,” and “type of service” were necessary due to the heterogeneous nature of interventions, type of task-sharing providers employed, and clients served across the three global sites. Item selection benefited from the iterative process, enabling identification of relevance of key aspects of identified barriers and facilitators, and what aspects were common across sites. Program implementers’ conceptions of utility regarding the measure’s acceptability, appropriateness, and feasibility clustered across several common categories.
Conclusions
This case study provides a rigorous, multi-step process for developing a pragmatic IS measure. The process and lessons learned will aid in the teaching, practice and research of IS measurement development. The importance of including experiences and knowledge from different types of stakeholders in different global settings was reinforced and resulted in a more globally useful measure while allowing for locally-relevant adaptation. To increase the relevance of the measure it is important to target actionable domains that predict markers of utility (e.g., successful uptake) per program implementers’ preferences. With this case study, we provide a detailed roadmap for others seeking to develop and validate IS measures that maximize local utility and impact.
... Pragmatic criteria scores from Psychometric and Pragmatic Evidence Rating Scale (PAPERS) were used across identified measures Stanick et al., 2021). The criteria evaluated by PAPERS include factors such as the cost of the measure, its length, language readability, and the burden on assessors in terms of training and interpretation. ...
Purpose
A variety of assessment tools (e.g., questionnaires) measure the type and degree of bilingualism in children in both research and clinical settings. Although these tools are often assumed to evaluate the same constructs and be interchangeable, this may not be the case, as indicated by other recent reviews. This review critically evaluated existing measures of child bilingualism, focusing on item-content overlap, measure development, and pragmatic quality.
Method
A database and manual search identified studies on child bilingualism measure development, which were then appraised using the Psychometric and Pragmatic Evidence Rating Scale and the Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN).
Result
Analysis across the six identified measures showed weak between-measure content overlap, with less than one quarter of items shared on average, suggesting they assess different constructs. Ratings indicated varied pragmatic quality, especially in assessor burden (training, interpretation). COSMIN evaluations also highlighted shortcomings in measure design and development.
Conclusion
The findings underscore the need for improved content validity and better pragmatic criteria for the clinical use of these tools. We offer recommendations for measure selection dependent on use case (e.g., setting-specific needs) and suggestions for future bilingualism measure development, prioritizing a pragmatic approach.
... The short CSAT measure has 7 domains, for 21 items. We were able to reduce the number of items of both measures while maintaining strong psychometric properties, and, in doing so, reduce the burden of assessment and facilitate measurement of sustainability capacity in research, evaluation, and practice setting [5,17,18,22]. Especially for research and evaluation in clinical settings were clinician time is extremely limited, shorter versions of the tool will help support more effective participation among clinicians. This work will aid in the development of interventions and strategies that respond to identified gaps based on these measures, which will serve to further advance sustainability of evidence informed programs and practices [1]. ...
... Within implementation science, the PAPERS scale has been adopted as a way to assess the pragmatic nature of measures and tools. While the PAPERS scale has a cutoff of 10 items for a short tool [22], reducing the PSAT and CSAT still reduces the burden of assessment. Our previous work has commented on the length of the assessment and suggested that shortening the tool would result in higher usability [11]. ...
... This work responds to calls within implementation science to for pragmatic measures [17,18,22], which include a criterion for shorter tools to reduce burden of assessment. This manuscript contributes shorter measures for sustainability capacity that are both theoretically driven and empirically tested. ...
Background
Although significant advances have been made in the conceptualization of sustainability, having pragmatic, psychometrically valid tools remains a need within the field. Our previous work has developed frameworks and tools to assess both program sustainability and clinical sustainability capacity. This work presents new, psychometrically tested short versions of the Program Sustainability Assessment Tool (PSAT) and the Clinical Sustainability Assessment Tool (CSAT).
Methods
These methods were conducted in identical, parallel processes for the CSAT and PSAT. Previously collected data for these instruments was obtained across a variety of settings, contexts, and participants. We first conducted testing to determine cronbach’s alpha of shortened domains (3 items each) and then conducted Confirmatory Factor Analysis to ensure that the domains were still appropriate for the tool. After, the team met to review the results and determine the final versions of the short PSAT and short CSAT.
Results
The short PSAT retained cronbach’s alpha’s of 0.82 – 0.91 for each domain of the tool, with which maintains excellent reliability for the tool. Confirmatory factor analysis highlights that the short PSAT retains conceptual distinction across the 8 domains, with CFI scores greater than 0.90, RMSEA scores below 0.6, and SRMR scores less than 0.08. The short CSAT had cronbach’s alpha of 0.84 – 0.92 for each of the domains of the tool, also suggesting excellent reliability of the domains within the measure after dropping two items/domain. Confirmatory factor analysis of the short CSAT meets the same specifications as above, again highlighting conceptual distinction across the domains.
Conclusion
Each tool was able to be shortened to three items per domain while maintaining strong psychometric properties. This results in a tool that takes less time to complete, meeting one of the key calls for pragmatic measures within implementation science. This advances our abilities to measure and test sustainability within implementation science.
... This step was followed by thematic analysis [44] of interview transcripts to identify overarching themes from feedback with a focus on pragmatic issues with the measurement tools to help inform changes to the survey, following a similar approach to previously published research [39]. Once surveys were finalized, the team used the Psychometric and Pragmatic Evidence Rating Scale (PAPERS) standardized scale [45] to rate the measures on five pragmatic properties: brevity, cost, training, interpretation, and readability. The PAPERS has been used in prior work such as systematic reviews of policy implementation measurement tools [16,17]; its use in the current study facilitates comparison of the resulting measures against existing tools within the field of policy implementation science. ...
Background: Policy implementation measurement lacks an equity focus, which limits understanding of how policies addressing health inequities, such as Universal School Meals (USM) can elicit intended outcomes. We report findings from an equity-focused measurement development study, which had two aims: (1) identify key constructs related to the equitable implementation of school health policies and (2) establish face and content validity of measures assessing key implementation determinants, processes, and outcomes. Methods: To address Aim 1, study participants (i.e., school health policy experts) completed a survey to rate the importance of constructs identified from implementation science and health equity by the research team. To accomplish Aim 2, the research team developed survey instruments to assess the key constructs identified from Aim 1 and conducted cognitive testing of these survey instruments among multiple user groups. The research team iteratively analyzed the data; feedback was categorized into “easy” or “moderate/difficult” to facilitate decision-making. Results: The Aim 1 survey had 122 responses from school health policy experts, including school staff (n = 76), researchers (n = 22), trainees (n = 3), leaders of non-profit organizations (n = 6), and others (n = 15). For Aim 2, cognitive testing feedback from 23 participants was predominantly classified as “easy” revisions (69%) versus “moderate/difficult” revisions (31%). Primary feedback themes comprised (1) comprehension and wording, (2) perceived lack of control over implementation, and (3) unclear descriptions of equity in questions. Conclusions: Through adaptation and careful dissemination, these tools can be shared with implementation researchers and practitioners so they may equitably assess policy implementation in their respective settings.
... 4 However, previous critical appraisals of the literature have reported that a substantial proportion of validated patient empowerment tools has significant limitations related to their acceptability and generalizability to the general population despite demonstrating relatively robust psychometric measurement properties. [5][6][7][8] The primary aim of this rapid evidence review was to examine the range of patient empowerment tools available within the published literature and to measure their respective psychometric and pragmatic properties against the Psychometric and Pragmatic Evidence Rating Scale. The secondary aim was to establish which tools performed best against these psychometric and pragmatic criteria. ...
... 5 PAPERS has previously been shown to assess the quality of existing tools and inform development of implementation measures. 5 As PAPERS is stakeholder-driven, the items in the pragmatic criteria can be modified to reflect the current situation within a complex system. Relevant data from the thematic analysis from the scoping exercise was used to determine these new items (Table 1). ...
... Previous critical appraisals of the literature have reported that a substantial proportion of validated patient empowerment tool has significant deficiencies limiting their acceptability and generalizability to the general population despite demonstrating relatively robust psychometric measurement properties. [5][6][7][8] The ability to detect diverse population with the PAPERS criteria has arguably been achieved within this study as those tools included under the Long-term Conditions domain score the worse for generalizability given the specificity of the longterm conditions tools to target a particular population of typical healthcare service-users. ...
Self-management of long-term conditions requires health professionals to understand and develop capabilities that empower the population they serve. A rapid evidence review was undertaken to assess the current evidence based on the psychometric properties of patient empowerment tools. MEDLINE was searched, and data were extracted for each publication and scored using a modified Psychometric and Pragmatic Evidence Rating Scale (PAPERS) evidence rating scale. The results were grouped into the following domains: (a) health literacy; (b) patient activation; (c) long-term conditions; (d) self-management needs and behaviors. A full-text review of 65 publications led to the inclusion of 29 primary studies. The highest scoring tools were selected with respect to performance for each domain: (a) Newest Vital Sign and the Brief Health Literacy Screen; (b) Consumer Health Activation Index and PAM-13; (c) LTCQ and LTCQ8; and (d) SEMCD and Patient Enablement Instrument. PAPERS was a useful tool in determining the generalizability, validity, and reliability of these patient empowerment tools. However, further research is required to establish whether an individual's health literacy status influences patient empowerment tool outcomes.
... We designed the TA Engagement Scale with several goals in mind: i) to provide an expert-informed measure that captures multiple dimensions (domains) of TA relationships, ii) to bridge the science and practice of TA through a scale development process involving an in-depth cross-walk of research literature and TA expert input, and iii) to create a user-friendly measure of TA engagement that serves as a practical implementation tool. Given the scale's relative briefness, TA providers can administer the scale regularly with little time burden on the recipients (< 12 min to complete), making it a practical tool for regularly assessing engagement over time and aligning with calls for more pragmatic approaches to implementation monitoring and tailoring [39,40]. ...
Background
Technical assistance (TA) is a tailored approach to capacity building that is commonly used to support implementation of evidence-based interventions. Despite its widespread applications, measurement tools for assessing critical components of TA are scant. In particular, the field lacks an expert-informed measure for examining relationship quality between TA providers and recipients. TA relationships are central to TA and significantly associated with program implementation outcomes. The current study seeks to address the gap in TA measurement tools by providing a scale for assessing TA relationships.
Methods
We utilized a modified Delphi approach involving two rounds of Delphi surveys and a panel discussion with TA experts to garner feedback and consensus on the domains and items that compose the TA Engagement Scale.
Results
TA experts represented various U.S. organizations and TA roles (e.g., provider, recipient, researcher) with 25 respondents in the first survey and 26 respondents in the second survey. The modified Delphi process resulted in a scale composed of six domains and 22 items relevant and important to TA relationships between providers and recipients.
Conclusion
The TA Engagement Scale is a formative evaluation tool intended to offer TA providers the ability to identify strengths and areas for growth in the provider-recipient relationship and to communicate about ongoing needs. As a standard measurement tool, it lends a step toward more systematic collection of TA data, the ability to generate a more coherent body of TA evidence, and enables comparisons of TA relationships across settings.