Access to this full-text is provided by Springer Nature.
Content available from Implementation Science Communications
This content is subject to copyright. Terms and conditions apply.
Choy‑Brownetal.
Implementation Science Communications (2023) 4:39
https://doi.org/10.1186/s43058‑023‑00419‑1
RESEARCH Open Access
© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco
mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Implementation Science
Communications
Psychometric evaluation ofapragmatic
measure ofclinical supervision
asanimplementation strategy
Mimi Choy‑Brown1* , Nathaniel J. Williams2 , Nallely Ramirez2 and Susan Esp2
Abstract
Background Valid and reliable measurement of implementation strategies is essential to advancing implementation
science; however, this area lags behind the measurement of implementation outcomes and determinants. Clinical
supervision is a promising and highly feasible implementation strategy in behavioral healthcare for which pragmatic
measures are lacking. This research aimed to develop and psychometrically evaluate a pragmatic measure of clini‑
cal supervision conceptualized in terms of two broadly applicable, discrete clinical supervision techniques shown to
improve providers’ implementation of evidence‑based psychosocial interventions—(1) audit and feedback and (2)
active learning.
Methods Items were generated based on a systematic review of the literature and administered to a sample of 154
outpatient mental health clinicians serving youth and 181 community‑based mental health providers serving adults.
Scores were evaluated for evidence of reliability, structural validity, construct‑related validity, and measurement invari‑
ance across the two samples.
Results In sample 1, confirmatory factor analysis (CFA) supported the hypothesized two‑factor structure of scores on
the Evidence‑Based Clinical Supervision Strategies (EBCSS) scale (χ2=5.89, df=4, p=0.208; RMSEA=0.055, CFI=0.988,
SRMR=0.033). In sample 2, CFA replicated the EBCSS factor structure and provided discriminant validity evidence
relative to an established supervisory alliance measure (χ2=36.12, df=30, p=0.204; RMSEA=0.034; CFI=0.990;
SRMR=0.031). Construct‑related validity evidence was provided by theoretically concordant associations between
EBCSS subscale scores and agency climate for evidence‑based practice implementation in sample 1 (d= .47 and .55)
as well as measures of the supervision process in sample 2. Multiple group CFA supported the configural, metric, and
partial scalar invariance of scores on the EBCSS across the two samples.
Conclusions Scores on the EBCSS provide a valid basis for inferences regarding the extent to which behavioral
health providers experience audit and feedback and active learning as part of their clinical supervision in both clinic‑
and community‑based behavioral health settings.
Trial registration ClinicalTrials.gov NCT04 096274. Registered on 19 September 2019.
Keywords Evidence‑based practice, Implementation, Clinical supervision, Measure development
*Correspondence:
Mimi Choy‑Brown
mchoybro@umn.edu
Full list of author information is available at the end of the article
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 2 of 14
Choy‑Brownetal. Implementation Science Communications (2023) 4:39
Contributions totheliterature
• Measurement of implementation strategies lags
behind other implementation constructs. Limited
accurate and practical measurement stalls efforts to
make evidence-informed decisions about effective
methods to promote the implementation of programs
and practices.
• is study advances the conceptualization of clinical
supervision as an implementation strategy and pro-
vides evidence for the validity of a pragmatic measure
of evidence-based clinical supervision strategies.
• is study helps fill a measurement gap in implemen-
tation science by providing a tool for implementa-
tion researchers and practitioners to evaluate and
optimize embedded clinical supervision techniques
as a lever to promote routine integration of evidence-
based practices.
Background
Sound measurement is foundational to implementation
science, and while many authors have noted the need for
improved measurement of implementation outcomes
and determinants [1], far less attention has been paid to
the measurement of implementation strategies, which
arguably represent the heart of the field [2]. Implemen-
tation strategies are the methods used to change health-
care practice; they represent the means through which
patient or provider behavior is modified to improve the
use of evidence-based treatments [3]. Much attention has
been devoted to operationalizing [2, 4, 5] and categorizing
[6–8] implementation strategies, often with the explicit
goal of facilitating their precise measurement [2]. How-
ever, despite these advances, the development of measures
of implementation strategies has lagged far behind other
areas [1, 9]. is measurement deficit has stalled efforts to
assess the use of implementation strategies in community
settings—for the purpose of identifying areas of strength
and targets for improvement [2]—and has hindered
the consolidation of research findings on the effects of
implementation strategies across studies [10]. is paper
describes the development and psychometric evaluation
of a measure of one implementation strategy—clinical
supervision—which is highly feasible for acting on numer-
ous implementation outcomes across stages of implemen-
tation in settings where behavioral healthcare is delivered.
Operationalizing clinical supervision asanimplementation
strategy
Clinical supervision is included within taxonomies
of implementation strategies, which define it broadly
as “provid[ing] clinicians with ongoing supervision
focusing on the innovation” and “provid[ing] training
for clinical supervisors who will supervise clinicians
who provide the innovation” [6]. While these defini-
tions are useful for distinguishing the overarching pro-
cess of clinical supervision from other implementation
strategies, such as expert consultation, we propose
that precise measurement of clinical supervision as an
implementation strategy benefits from a more granu-
lar conceptualization of the specific techniques used
within supervision time to facilitate practice change
[11]. Delineation and measurement of techniques used
by supervisors to facilitate specific implementation out-
comes will enable greater clarity regarding exactly what
facilitates implementation outcomes and will enhance
harmonization of scientific findings across studies.
us, we propose that the measurement of clinical
supervision as an implementation strategy should focus
on discrete supervision techniques that (a) occur within
broader supervision interactions and (b) have the high-
est potential for impact on implementation outcomes
within community behavioral healthcare.
Research on clinical supervision has identified two
discrete techniques which are associated with improved
implementation outcomes and are applicable across psy-
chosocial behavioral health interventions: [1] audit and
feedback and [2] active learning [12–15]. Both of these
techniques include behaviors that could occur outside of
supervision; however, both fit naturally within the super-
vision process and have long been considered impor-
tant elements of effective clinical supervision [16–18]. A
recent systematic review [11] confirmed that these two
supervision techniques, long considered “gold standard”
components of supervision by researchers [12], are asso-
ciated with improved implementation of clinical prac-
tices in behavioral health settings. Given the importance
of pragmatism in implementation measurement [19],
and the possibility that these techniques may represent
a “minimum intervention necessary for change” [20],
we propose that the assessment of these two techniques
within the context of clinical supervision represents a
valuable starting point for operationalizing and measur-
ing clinical supervision as an implementation strategy.
Gaps inmeasuring clinical supervision
asanimplementation strategy
Guidelines for the development of implementation meas-
ures stress the importance of optimization with regard
to three criteria—reliability, validity, and pragmatism
[19, 21]. No available measures of clinical supervision
strategies are optimal on all three criteria [22]. Coder-
rated observational measures, such as the Supervision
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 3 of 14
Choy‑Brownetal. Implementation Science Communications (2023) 4:39
Process Observational Coding System [12], can be con-
sidered gold-standard measures with strong evidence
of reliability and validity [12, 23]; however, the require-
ments of coding audio-recorded sessions using trained
raters (a rare practice outside of training clinics or clini-
cal trials) significantly limits their pragmatism [24, 25].
Measures that rely on clinician or supervisor report are
more feasible [26–29]; however, available measures are
either too narrow, focusing in great depth on only a sin-
gle clinical intervention, or too broad, assessing only the
duration, format, and general functions of supervision
(e.g., crisis assessment) rather than the use of specific
supervision techniques that facilitate implementation
across clinical interventions. Furthermore, many meas-
ures lack strong evidence of score reliability or validity. In
sum, the field lacks measures of clinical supervision that
have strong evidence of validity and that meet criteria
for pragmatism including free, brief, easy to administer,
and understandably written [30]. is is a significant bar-
rier to the widespread evaluation of clinical supervision
as an implementation strategy in both routine care and
research trials.
Study aims
e aim of this research was to develop and evaluate a
reliable, valid, and pragmatic measure of clinical super-
vision, conceptualized as an overarching implementation
strategy comprised of two, evidence-based and broadly
applicable techniques: [1] audit and feedback and [2]
active learning. In aim 1, investigators developed items
for the Evidence-Based Clinical Supervision Strategies
(EBCSS) scale and evaluated evidence of score reliabil-
ity, structural validity, and construct-related validity in
a sample of clinicians delivering outpatient psychother-
apy to youth and their families. In aim 2, the items were
administered to a sample of providers delivering commu-
nity-based mental health services to adults and evidence
of score validity was assessed with regard to measures of
theoretically important supervision constructs. In aim
3, investigators tested the extent to which scores on the
EBCSS exhibited measurement invariance across the two
samples from aims 1 and 2.
Methods
Item generation
Items were generated for the EBCSS within two
domains of [1] audit and feedback and [2] active learn-
ing. Audit and feedback was defined as the review and
use of information regarding a supervisee’s clinical
performance to identify ways to optimize the deliv-
ery of new programs or practices [6]. Three types of
clinical performance information could be incorpo-
rated into the audit and feedback process: symptom
monitoring, which involves examining data from client
outcome measures; review of practice, which involves
the supervisor’s observation of therapeutic interac-
tions between the practitioner and the client (either in
person, via audio or video recordings, or through doc-
umentation); and fidelity assessment, which involves
examining data about the practitioner’s use of an evi-
dence-based treatment as intended by the developers
[31]. A recent systematic review and meta-analysis
concluded that the effects of audit and feedback were
strongest when feedback was delivered by supervisors
as compared to other sources [32]. Providing feedback
informed by clinical performance information has
been key to improving the competent delivery of care
[25, 33] and is successfully used as an implementation
strategy in nearly every supervision outcomes study to
support high-fidelity delivery of evidence-based prac-
tices (EBP) [11, 34]. On their own, neither observa-
tion (audit) nor feedback is sufficient to promote the
implementation and sustainment of new clinical inter-
vention; consequently, they were conceptualized and
measured as an integrated unit.
Active learning was defined as using behavioral strat-
egies to solidify the application of concepts into prac-
tice [16, 35]. According to experiential learning theory,
skills and knowledge are acquired through a process of
practical experience, reflection, conceptualization, and
planning [36]. Clinical supervision provides a hold-
ing environment for this learning process, grounded
in practice experience and contextual adaptation, and
facilitated by the supervisor-supervisee relationship
[37]. Using active learning strategies, such as behavioral
rehearsal (also referred to as role play), in supervision
sessions has been associated with improved adoption
and fidelity to EBP in subsequent treatment sessions
with clients [13, 38]. In addition, behavioral rehearsal
within supervision is a pragmatic and valid method for
evaluating clinicians’ fidelity [35, 39].
After generating definitions of each domain based
on the literature, the research team reviewed existing
supervision measures, including observational meas-
ures (e.g., SPOCS)[12], for potentially relevant item
stems and content [40–43]. Items were then drafted to
elicit supervisee reports of their supervision experi-
ence during the prior 30-day period. e research team
and two consulting clinical supervisors reviewed and
revised items iteratively until a consensus was reached
on item content and wording. For the audit and feed-
back domain, items included three primary sources
of clinical performance feedback: symptom ratings,
observation of practice, and documentation. Items for
the active learning domain included both behavioral
rehearsal and supervisor modeling of skills.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 4 of 14
Choy‑Brownetal. Implementation Science Communications (2023) 4:39
Participants andprocedures
e aim 1 sample included clinicians who participated in
a baseline survey of a larger study aimed at understand-
ing how to support the implementation of EBPs in mental
health settings serving youth. Outpatient mental health
clinics were eligible to participate if they provided psy-
chotherapy to youth and their families and were located
in one of three western States in the USA targeted for
enrollment. Clinicians working in these agencies were
eligible to participate if they delivered psychotherapy to
youth on a 50% or greater full-time equivalent basis.
Participating clinicians in this sample received an email
invitation from the research team to complete a confi-
dential web-based survey in October and November of
2019. Participants provided electronic informed consent
prior to responding and received a $30 gift card. In total,
N=21 agencies, employing N =193 eligible clinicians
participated in the study; N=177 clinicians responded to
the survey representing a response rate of 92%. e final
analytic sample included N=154 clinicians who indicated
they participated in clinical supervision. To evaluate the
statistical power associated with this sample size, we
used guidelines and Monte Carlo simulation code pro-
vided by Wolf etal. [44]. Assuming a two-factor confirm-
atory factor analysis (CFA) model with the hypothesized
factor structure, small to moderate factor loadings of
0.65, and a moderate factor correlation of 0.50 (based on
the anticipated correlation of the two supervision tech-
niques), N=140 participants were adequate to generate
0.9 statistical power for all parameters of interest [45]. All
procedures were approved by the affiliated Institutional
Review Board.
e aim 2 sample included providers working on Asser-
tive Community Treatment (ACT) teams in two States in
the USA. ACT is an interdisciplinary team-based model
providing community-based health services for adults
diagnosed with a severe mental illness [46]. All assertive
community treatment team leaders (N=52) working in
these two states received an electronic invitation to enroll
their teams in the survey and 77% (N=40) of the teams
were enrolled. Providers (N=181) working on an enrolled
team responded to an email invitation to participate in
the web-based survey from May to July 2021, represent-
ing an average provider response rate of 50%. Partici-
pants were asked to provide electronic informed consent
prior to participation and received a $20 electronic gift
card. All procedures were approved by the affiliated Insti-
tutional Review Board. Based on simulation procedures
described by Wolf etal. [44], we determined that a sam-
ple size of N=180 was adequate to achieve power >0.8 for
all parameters of interest in aim 2, assuming the hypoth-
esized CFA factor structure, medium factor loadings
of 0.65 [45], and small to moderate factor correlations
ranging from 0.40 to 0.55, based on the anticipated rela-
tionship between the EBCSS subscales and the measure
of supervisory alliance.
e aim 3 sample was comprised of the samples from
aims 1 and 2. Simulation research by Sass and colleagues
[47] indicates our total sample of N=335 participants
provides adequate statistical power (>0.8) to test our
measurement invariance hypotheses given our data (i.e.,
ordinal categorical indicators), model specification, and
choice of estimator.
e STROBE checklist of items to include in reports of
observational studies was used for this study (see Addi-
tional File 2).
Measures
e extent to which supervisees experienced audit and
feedback and active learning in their clinical supervision
during the last 30 days was assessed using the five EBCSS
items developed for this project as described above. Each
item included a statement describing a specific supervi-
sion experience and clinicians indicated how often it
occurred during the last 30 days, using a 5-point Likert-
type scale from 1 (“Never”) to 5 (“Almost Always”). Coef-
ficient alpha for both subscales were acceptable in both
samples (i.e., α > 0.7).
In addition, participants reported on general supervi-
sion characteristics including total hours of supervision
time in a typical week; percentage of supervision time
typically focused on clinical content (e.g., case conceptu-
alization, treatment interventions), administrative con-
tent (e.g., billing), or “other” content (e.g., professional
development); and perceptions of their supervisor’s
availability when they have a question, ranging from 1
(“almost never”) to 5 (“almost always”).
In addition to the measures described above, clinicians
in sample 1 rated their agency’s EBP implementation cli-
mate using the 18-item Implementation Climate Scale
(ICS) [48]. e ICS assesses the extent to which clinicians
share perceptions that they are expected, supported, and
rewarded to use EBP in their clinical work with clients.
Scores on the ICS have demonstrated excellent reliability
and evidence of construct-related validity [49–52], includ-
ing positive associations with EBP-related content in clini-
cal supervision [28]. Items were rated on a Likert-type
scale from 0 (“not at all”) to 4 (“a very great extent”). Coef-
ficient alpha was 0.93 in this sample. In accordance with
theory and prior research, clinician responses to the ICS
were aggregated to the agency level for analysis following
an assessment of interrater agreement among clinicians
within each agency using the rwg(j) index with a null distri-
bution [53]. In this sample, all values of rwg(j) were above
the recommended cutoff of 0.7 (M = 0.92, SD = 0.07), sup-
porting the use of the agency-level aggregate scores [54].
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 5 of 14
Choy‑Brownetal. Implementation Science Communications (2023) 4:39
Providers in sample 2 completed four measures of
their supervision experience in addition to the EBCSS
items. e quality of supervisees’ working alliance with
their supervisors was assessed using the five-item Brief
Supervisory Working Alliance Inventory - Trainee Form
(BSWAI-T) [55]. Providers indicated the frequency
with which each item characterized their work with
their supervisor along two dimensions: rapport and cli-
ent focus. Items are scored on a Likert-type scale from 1
(“Almost never”) to 7 (“Almost always”). Prior research
offers strong evidence supporting the reliability and
validity of scores on the BSWAI-T [55]. Cronbach’s alpha
in this sample was α = 0.81.
e quality of the supervisory exchange between super-
visees and their supervisors was assessed using the
7-item Leader-Member Exchange [56]. e scale was
generated to capture the quality of supervisor-supervi-
see interactions [57]. An example item is: “How would
you characterize your working relationship with your
leader?” Scores on the scale range from 7 (very low-qual-
ity exchanges) to 35 (high-quality exchanges). Decades of
prior research has established the psychometric validity
and utility of this measure for characterizing supervisory
process and relationships [58] and it has been used in
mental health treatment settings [59]. Coefficient alpha
was excellent in this sample (α = 0.92).
e extent to which supervisors engaged in leadership
behaviors that supported ACT implementation (ACT
leadership) was assessed using 11 items generated from a
study in which ACT experts rated the importance of spe-
cific supervisor behaviors for supporting high adherence
to the ACT model [45]. Behaviors included in this scale
were rated as extremely important by experts (> 6 on a 1
to 7 scale) and addressed four domains, including facili-
tating team meetings, enhancing provider skills, moni-
toring outcomes, and quality improvement. Coefficient
alpha was excellent in this sample (α = 0.95).
e extent to which supervisees experienced inad-
equate supervision behaviors in their supervision was
assessed using ten items from the harmful and inad-
equate supervision scale [60]. is scale is grounded in
theory and expert ratings of supervisory behaviors that
may insufficiently support supervisees and has been
tested in the USA and Ireland [60, 61]. Seven items were
selected for this study from the “inadequate” supervision
behaviors subscale, representing global experiences of
supervision (e.g., supervision is a waste of time, supervi-
see provided consent or a contract for supervision) that
were consistent with supervision models in mental health
[62] and not redundant with other items in the study. In
addition, three items were generated for this study that
focused specifically on attention to racism and power in
supervision (e.g., supervisor interest in staff experiences
of racism in their work). Coefficient alpha was good in
this sample (α = 0.85).
Data analysis
For aim 1, internal consistency reliability of the EBCSS
subscale scores was evaluated using Cronbach’s alpha
(SPSS version 27). Confirmatory factor analysis (CFA)
was used to assess structural validity evidence. Given the
hypothesized two-factor structure, a correlated 2-factor
model was specified, with items assessing active learning
forced to load on one factor and items assessing audit and
feedback forced to load on another factor. Models were
estimated in Mplus 8.0 using robust maximum likelihood
estimation (MLR) which is appropriate for nonnormally
distributed variables and small samples [63–66]. Model
fit was evaluated using the model chi-square test, root
mean square error of approximation (RMSEA), com-
parative fit index (CFI), and standardized root mean
square residual (SRMR) [67]. A non-significant model
chi-square test supports the hypothesized model by fail-
ing to reject it [68]. Commonly accepted thresholds of
RMSEA are <0.05 for close fit, <0.08 for reasonable fit,
and >0.10 indicating poor fit [67, 68]. Values of CFI ≥
0.95 and values of SRMR ≤0.05 indicate good model fit
(Schreiber etal., 2006). To further test the hypothesized
factor structure, an alternative 1-factor model was esti-
mated to evaluate if responses to items were caused by a
single latent construct.
Construct-related validity evidence for sample 1 was
generated by using two-level linear mixed effects regres-
sion models to test the hypothesis that scores on the
EBCSS would be higher in agencies with higher levels of
EBP implementation climate. ese models incorporated
random agency intercepts [69, 70] and were implemented
in Mplus [66] using the TYPE=TWOLEVEL command
and default MLR estimator. Clinician years of experi-
ence and level of education (doctoral vs. non-doctoral)
were included as covariates to isolate the association of
climate with the EBCSS subscales. Because agency cli-
mate should only influence supervisors who work within
an agency, the sample for this analysis was restricted to
clinicians who reported receiving agency-based supervi-
sion (N=147). Missing data (fewer than 2% of cases) were
addressed using Bayesian multiple imputation (N=10
datasets). Effect sizes were calculated using an analogue
to Cohen’s d [69]. Values represent the standardized mar-
ginal mean difference, comparing clinicians in agencies ±
1 standard deviation from the mean of EBP implementa-
tion climate. Cohen [71] suggested d could be interpreted
as small (0.2), medium (0.5), or large (0.8).
In aim 2, two CFA models were estimated to assess
structural and discriminant validity evidence for scores
on the EBCSS. e first model tested the hypothesized
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 6 of 14
Choy‑Brownetal. Implementation Science Communications (2023) 4:39
factor structure of the EBCSS alongside the hypothesized
factor structure of the BSWAI-T (supervisory work-
ing alliance) (see Fig.1A). Based on prior research [55],
BSWAI-T items were forced to load onto two first-order
latent factors, representing the subscales of rapport and
client focus, and these first-order factors were forced to
load onto a single second-order factor representing the
overall supervisory working alliance (see Fig. 1A). e
EBCSS items were forced to load onto their respective
factors and these were correlated with each other and
with the BSWAI-T second-order factor. Good fit of this
model provided evidence supporting [1] the structural
validity of scores on the EBCSS and [2] the discriminant
validity of scores on the EBCSS relative to the supervi-
sory working alliance.
e second CFA tested a competing hypothesis: scores
on the EBCSS and BSWAI-T measure a single, overarch-
ing construct (e.g., general likability of the supervisor). In
this model, the two EBCSS factors and the two BSWAI-
T first-order factors were forced to load onto a single
Fig. 1 Hypothesized 3‑factor model (A) and competing 1‑factor model (B) of EBCSS and BSWAI items. Note: N = 181 clinicians.
Models estimated using robust maximum likelihood estimation; standardized estimates shown. EBCSS, evidence‑based clinical supervision
strategies scale; BSWAI‑T, brief supervisory working alliance inventory—trainee form; active, active learning subscale of the EBCSS; audit, audit
and feedback subscale of the EBCSS; alliance, second‑order supervision working alliance factor of the BSWAI‑T; focus, client focus subscale of the
BSWAI‑T; rapport, rapport subscale of the BSWAI‑T. Model A: χ2 = 36.12, df = 30, p = 0.204; RMSEA = 0.034; CFI = 0.990; SRMR = 0.031. Model B:
χ2 = 55.13, df = 31, p = 0.005; RMSEA = 0.066; CFI = 0.962; SRMR = 0.067. Results of a Satorra‑Bentler scaled chi‑square difference test indicated
Model A fit significantly better than Model B (S‑B Scaled χ2 Δ = 39.40, df = 1, p = 0.000)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 7 of 14
Choy‑Brownetal. Implementation Science Communications (2023) 4:39
second-order factor (see Fig.1B). Good fit of this model
would undermine the discriminant validity of scores
on the EBCSS by suggesting all the scores (BSWAI-T
+ EBCSS) reflect a single latent construct. A Satorra-
Bentler chi-square difference test [72] was used to deter-
mine whether the hypothesized 3-factor model fit better
than the competing 1-factor model. All models were esti-
mated in Mplus 8 using MLR estimation as described
above.
Construct validity evidence for aim 2 was assessed by
calculating Pearson correlations between EBCSS scores
and other measures of supervision using SPSS 27.
For aim 3, multiple group CFA was used to test the
extent to which scores on the EBCSS exhibited measure-
ment invariance across the aim 1 and 2 samples. Meas-
urement invariance is desirable because it suggests item
scores assess the same latent construct(s) in the same
way across populations, thus supporting generalizability
and comparability across populations. is is important
because the supervisory actions assessed by the EBCSS
are believed to apply across psychosocial EBPs and
behavioral health settings.
Following well-established guidelines [73, 74], meas-
urement invariance of scores on the EBCSS was tested by
fitting a series of increasingly restrictive multiple group
CFA models to data from the samples in aims 1 and 2 and
examining the extent to which model fit deteriorated at
each step. Specific models provide evidence for differ-
ent aspects of measurement invariance. e first (least
restrictive) model tested configural invariance by impos-
ing the same factor structure in both groups but allow-
ing all parameters to freely vary (i.e., factor loadings,
item intercepts, error variances). Support for configural
invariance indicates the number of latent constructs, and
the alignment of item scores with those constructs is the
same across groups [75]. e second (more restrictive)
model tested metric invariance. Support for metric invar-
iance indicates the magnitudes of the factor loadings are
equal and implies the item scores measure the latent con-
structs to the same degree in both groups [75]. e third
(most restrictive) model tested scalar invariance. Sup-
port for scalar invariance indicates “mean differences in
the latent constructs capture all mean differences in the
shared variance of the items”[74].
e fit of the configural model was evaluated using the
model chi-square test and the RMSEA, CFI, and SRMR
goodness of fit indices as described above. e extent
to which model fit deteriorated when moving from the
configural model to subsequent (more restrictive) mod-
els was evaluated using the Satorra-Bentler chi-square
difference test [72] and by examining change (Δ) in CFI,
RMSEA, and SRMR. Measurement invariance was not
supported if the model chi-square difference test was
statistically significant or if there was a change in CFI ≤
−.005, a change in RMSEA ≥ .010, or a change in SRMR
≥ .025 [76]. Given the possibility that full metric or sca-
lar invariance may not be supported, we planned a priori
to test for partial metric or scalar invariance as needed
following procedures described by Byrne and colleagues
[77].
Results
Table 1 presents the characteristics of the samples for
aims 1 and 2. Table2 presents descriptive statistics and
reliability coefficients for the EBCSS items and subscales.
Both subscales exhibited adequate score variation; how-
ever, as expected, a sizeable proportion of clinicians indi-
cated they had not received any audit and feedback (25%,
N = 38) or active learning (19%, N = 29) during supervi-
sion in the last 30 days.
Reliability
Coefficient alpha for both subscales was acceptable (i.e.,
α > 0.7). Examination of the corrected item-total corre-
lations indicated Item 1 (supervision includes feedback
about practice based on supervisor’s in vivo observa-
tions or review of audio or video recordings) was not
as strongly related to its latent construct as the other
items; however, it was retained due to its theoretical
importance.
Structural validity evidence
Results of the CFA analyses for aim 1 supported the
hypothesized 2-factor structure of scores on the EBCSS.
e model was not rejected by the model chi-square
test (χ2 = 5.89, df = 4, p = 0.208) and all other fit indi-
ces were in the good to excellent range (RMSEA = 0.055,
CFI = 0.988, SRMR = 0.033). All unstandardized factor
loadings were statistically significant at p < 0.001 and the
standardized factor loadings ranged from 0.55 to 0.82
(see Table2). e two factors were moderately correlated
(r = 0.58, p < 0.001), providing evidence that the items
assessed related but unique supervision experiences
(see Fig.2). e competing 1-factor model, in which all
items were forced to load onto a single factor, did not fit
the data well and was rejected based on all criteria (χ2 =
51.40, df = 5, p = 0.000; RMSEA=0.245, CFI = 0.712,
SRMR = 0.077).
Construct‑related validity evidence
Results of the linear mixed-effects regression models for
aim 1, which assessed the relationships between agency
EBP implementation climate and scores on the EBCSS
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 8 of 14
Choy‑Brownetal. Implementation Science Communications (2023) 4:39
subscales, are shown in Fig.3. As expected, higher agency
EBP implementation climate predicted greater exposure
to audit and feedback in supervision (B = 0.28, p = 0.010)
after controlling for all other variables in the model. is
represents a medium effect of d = 0.55 (95% CI = 0.13 to
0.96) when comparing the amount of audit and feedback
experienced by clinicians in agencies with high (+1 SD)
versus low (−1 SD) levels of EBP implementation climate
(see Fig.3A). Clinicians working in agencies with higher
levels of EBP implementation climate also reported more
exposure to active learning strategies in supervision (B =
0.28, p = 0.036) representing a medium effect (d = 0.47;
95% CI = 0.03 to 0.92) (see Fig.3B).
Structural anddiscriminant validity evidence
Results of the CFA for aim 2, which tested the hypothe-
sized 3-factor model, are presented in Fig.1A. is model
demonstrated excellent fit based on all indices (χ2 =
36.12, df = 30, p = 0.204; RMSEA = 0.034; CFI = 0.990;
SRMR = 0.031). All unstandardized item factor loadings
were statistically significant at p < .001 and standard-
ized factor loadings were high (range = 0.65–0.93). As
expected, scores on the two EBCSS subscales were cor-
related (r = 0.75, p < 0.001) and had moderate but lower
magnitude correlations with scores on the supervisory
working alliance (r = 0.59 and r = 0.53, all ps < 0.001).
e CFA testing the competing 1-factor model (see
Fig. 1B) for aim 2 did not fit the data well and was
rejected by the model chi-square test (χ2 = 55.13, df =
31, p = 0.005). Furthermore, the Satorra-Bentler chi-
square difference test comparing the 1- versus 3-factor
models indicated that the 1-factor model fit significantly
worse (Δ = 39.40, df = 1, p < 0.001); consequently, it was
rejected. ese results offer structural and discriminant
validity evidence for scores on the EBCSS.
Construct validity evidence
Table3 shows correlations between the EBCSS subscales
and the other supervision measures completed as part of
aim 2. As expected, small-to-moderate correlations were
observed between scores on the EBCSS subscales and the
quality of the supervisory exchange and supervisor avail-
ability (r = .23 to .29). Also consistent with expectations,
correlations between the EBCSS subscales and ACT
leadership were larger and in the medium range (r = .49
and .51, respectively). Finally, inadequate supervision had
the anticipated inverse relationships with both EBCSS
subscales (see Table3). ese results provide construct
validity evidence by showing that scores on the EBCSS
are related to, but distinct from, other aspects of supervi-
sion in theoretically concordant ways.
Table 4 presents model fit statistics and change in
model fit statistics for the CFA models testing measure-
ment invariance of scores on the EBCSS across the two
samples (aim 3). e configural invariance model fit the
data well based on all criteria (see Table4). ere was
no evidence of significant deterioration in model fit
when moving from the configural to the metric invari-
ance model based on the Satorra-Bentler chi-square dif-
ference test (Δ = 3.35, df = 3, p = 0.341) or on changes
in CFI, RMSEA, or SRMR. In contrast, results of the
Satorra-Bentler chi-square difference test indicated the
scalar invariance model fit the data significantly worse
Table 1 Characteristics of study participants and supervision
NA, not available. No missing responses are included and the percentages do
not add up to 100. Aims 1 and 2 did not ask the same question about gender;
gender categories were expanded in the table
Characteristic Aim 1
N=154 Aim 2
N=181
Participants
Years of clinical experience (mean ± SD) 6.5 ± 6.2 7.1 ± 32.6
Years tenure in agency (mean ± SD) 3.3 ± 3.8 5.6 ± 5.5
Age (in years) (mean ± SD) 38.9 ± 9.9 42.2 ± 11.9
N (%) N (%)
Employment model (%)
Salaried 66 (42.9) 86 (47.5)
Fee‑for‑service/contractor 87 (56.5) 95 (52.5)
Race (%)
Asian 4 (2.6) 5 (2.8)
Black or African American 2 (1.3) 6 (3.3)
American Indian or Alaska Native 0 (0) 2 (1.1)
Native Hawaiian or Other Pacific Islander 2 (1.3) 0 (0)
More than one race 2 (1.3) NA
White 125 (81.2) 161 (89.0)
Prefer to self‑identify 7 (4.5) 4 (2.2)
Prefer not to respond 12 (7.8) 6 (3.3)
Ethnicity
Identify as Hispanic/Latino 18 (11.7) 8 (4.4)
Do not identify as Hispanic/Latino 134 (87.0) 172 (95.0)
Gender
Man 26 (16.9) 29 (16.0)
Woman 122 (79.2) 147 (81.2)
Transgender NA 1 (.6)
Non‑binary/non‑conforming NA 3 (1.7)
Prefer to self‑identify 5 (3.2) 1 (.6)
Prefer to not respond NA 2 (1.1)
Education
Doctoral Degree 6 (3.9) 7 (3.9)
Non‑Doctoral Degree 148 (96.1) 174 (96.1)
Supervision Mean ± SD Mean ± SD
Total hours per week 2.4 ± 1.7 5.1 ± 4.3
Percent of time on clinical content 59.5 ± 25.0 54.3 ± 28.5
Percent of time on administrative content 29.5 ± 23.4 24.1 ± 24.4
Supervisor availability (1–5 scale) 5.6 ± 1.5 4.5 ± .8
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 9 of 14
Choy‑Brownetal. Implementation Science Communications (2023) 4:39
Table 2 Summary statistics and confirmatory factor analysis (CFA) factor loadings for Evidence‑based Clinical Supervision Strategies
scale items in samples 1 and 2
CFA estimated using robust maximum likelihood estimation. The reported r is the corrected item‑total correlation
Item M SD Min‑Max rStandardized factor
loading α
Aim 1 (N = 154)
Clinical performance feedback 2.17 1.01 1–4.67 0.73
Feedback based on observations. 2.05 1.33 1–5 0.46 0.55
Feedback based on outcome data. 1.95 1.15 1–5 0.61 0.75
Feedback based on chart review. 2.52 1.31 1–5 0.59 0.80
Active learning strategies 2.71 1.20 1–5.00 0.76
Role play or rehearsal of a clinical intervention. 2.28 1.30 1–5 0.61 0.75
Supervisor demonstration of a clinical intervention. 3.14 1.38 1–5 0.61 0.82
Aim 2 (N = 181)
Clinical performance feedback 2.76 1.12 1–5.00 0.79
Feedback based on observations. 2.36 1.37 1–5 0.56 0.65
Feedback based on outcome data. 2.87 1.32 1–5 0.72 0.85
Feedback based on chart review. 3.03 1.31 1–5 0.63 0.77
Active learning strategies 2.47 1.19 1–5.00 0.80
Role play or rehearsal of a clinical intervention. 2.15 1.29 1–5 0.66 0.82
Supervisor demonstration of a clinical intervention. 2.78 1.31 1–5 0.66 0.81
Fig. 2 Aim 2 confirmatory factor analysis model
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 10 of 14
Choy‑Brownetal. Implementation Science Communications (2023) 4:39
than the metric invariance model (Δ = 16.59, df = 3, p =
0.001) and therefore should be rejected. is conclusion
was also supported by deterioration in the values of CFI,
RMSEA, and SRMR (see Table4). Given these results, a
partial scalar invariance model was estimated by allowing
the intercept for Item 2 to vary freely across groups (“my
Fig. 3 Adjusted mean differences in clinicians’ experience of EBCSS clinical supervision techniques by level of agency climate for EBP
implementation. Note: K = 21 mental health clinics, N = 147 clinicians. Adjusted means are estimated using linear 2‑level mixed effects regression
models with random intercepts; all models control for clinician years of experience and education. EBCSS, Evidence‑based Clinical Supervision
Strategies scale. ICC[1] for Audit and Feedback = 0.095; ICC[1] for Active Learning = 0.241
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 11 of 14
Choy‑Brownetal. Implementation Science Communications (2023) 4:39
supervision included feedback about my practice based
on data about the people I serve”). As is shown in Table4,
this model exhibited excellent fit based on all criteria (χ2
= 18.77, df = 13, p = 0.130; RMSEA = 0.051; CFI = 0.98;
SRMR = 0.042) and there was no evidence of significant
deterioration in model fit on any criteria when compar-
ing the partial scalar invariance model to the metric
invariance model. Consequently, this model was accepted
as final. ese results support the configural, metric, and
partial scalar invariance of scores on the EBCSS across
these two provider samples.
Discussion
e goal of this research was to develop a pragmatic,
reliable, and valid measure of clinical supervision as an
implementation strategy. Drawing on the literature, clini-
cal supervision was conceptualized as an overarching
implementation strategy consisting of two widely appli-
cable, evidence-based techniques: [1] audit and feedback
and [2] active learning. e evidence presented here sug-
gests scores on the EBCSS provide a reliable and valid
basis for making inferences about the extent to which
behavioral health providers experience these techniques
as part of their clinical supervision. Across both samples,
scores on the EBCSS subscales demonstrated acceptable
internal consistency and evidence of structural validity.
Construct validity evidence was generated in aim 1 by
showing that scores on the EBCSS subscales were higher
in agencies with higher levels of EBP implementation cli-
mate, an outcome supported by theory and prior research
[28]. Aim 2 provided construct validity evidence. Scores
on the EBCSS covaried with scores on other measures
of the clinical supervision process in anticipated ways,
including moderate positive associations with the super-
visory alliance and ACT leadership behaviors and nega-
tive associations with inadequate supervision behaviors.
Aim 3 provided evidence of measurement invariance,
suggesting scores on the EBCSS generalize across two
settings and populations of behavioral health providers,
albeit with some variation in the mean level of data-based
feedback provided to the two groups (i.e., partial scalar
invariance). Measurement invariance is an important
property of scores on implementation measures given the
need to evaluate implementation across a range of EBPs
and settings.
In addition to its promising psychometric characteris-
tics, the EBCSS aligns well with criteria for pragmatism
as described by the PAPERS (Psychometric And Prag-
matic Evidence Rating Scale) framework for implemen-
tation measures [30]. Specifically, the EBCSS is free (see
Additional File 1), brief (5 items), low burden to admin-
ister (requires no training), easy to analyze, and under-
standably written. Because perceptions of pragmatism
can vary across stakeholder groups, an important direc-
tion for future research is to evaluate the extent to which
potential users view the EBCSS as pragmatic across these
and other criteria [19, 21].
e EBCSS fills a gap in pragmatic and valid measure-
ment with important applications in research and prac-
tice. It can facilitate the identification and optimization
of supervision strategies within embedded supervi-
sion time in order to promote and sustain provider
behavior change. How clinical supervisors use routine
supervision time to mediate policy and practice, sell
Table 3 Aim 2 (N=181) construct‑based validity evidence
correlations for EBCSS subscales
ACT Assertive community treatment, EBCSS Evidence‑based clinical supervision
strategies scale
Clinical
performance
feedback
Active
learning
strategies
r (p) r (p)
Quality of supervisory exchange .23 (.002) .29 (<.001)
ACT leadership .46 (<.001) .48 (<.001)
Availability of supervisor .28 (<.001) .24 (.001)
Inadequate supervision −.28(<.001) −.30 (<.001)
Table 4 EBCSS measurement invariance model fit statistics and comparisons
N = 335 clinicians (n = 154 working in outpatient mental health, n = 181 working in assertive community treatment). Models estimated using robust maximum
likelihood estimation; S-B Scaled χ2 Δ, Satorra‑Bentler Scaled Chi‑Square Dierence test
CFI Comparative t index, EBCSS Evidence‑based Clinical Supervision Strategies scale, RMSEA Root mean square error of approximation, SRMR Standardized root mean
square residual
a The intercept for Item 2 (“supervision included feedback about practice based on data about the people I serve”) was allowed to vary freely across groups; all other
intercepts and factor loadings constrained equal
Model CFI RMSEA SRMR Model χ2df Model χ2
p‑value S‑B Scaled χ2 Δ df S‑B Scaled χ2 Δ
p‑value ΔCFI ΔRMSEA ΔSRMR
Configural invariance 0.991 0.054 0.028 11.855 8 0.158
Metric invariance 0.990 0.048 0.038 15.293 11 0.170 3.350 3 0.341 −0.001 −0.006 0.010
Scalar invariance 0.962 0.082 0.050 29.875 14 0.008 16.589 3 0.001 −0.028 0.034 0.012
Partial scalar invariancea0.986 0.051 0.042 18.770 13 0.130 3.600 2 0.165 −0.004 0.003 0.004
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 12 of 14
Choy‑Brownetal. Implementation Science Communications (2023) 4:39
the implementation effort to providers, and diffuse and
synthesize information remains less understood [78,
79]. is is particularly important to evaluate across
clinical and community-based settings and stages of
implementation (i.e., exploration, preparation, imple-
mentation, and sustainment). Such research can also
unpack the links between a host of organizational
context factors (e.g., climate for EBP implementation)
and provider implementation behavior [28, 80]. Addi-
tionally, including this 5-item measure in clinical and
implementation trials will identify effective supervision
targets for improved implementation outcomes. Prac-
tice applications include evaluating workforce super-
vision experiences as part of ongoing assessments or
quality improvement efforts in order to understand
the strengths and gaps in available supports. While
rates of these supervision techniques were low, which
is consistent with previous literature [12], such gaps
highlight the need for growth and improvement to sup-
port implementation. Supervision-focused workforce
development initiatives could target these techniques
to support competent delivery of EBPs. Pursuit of these
research and practice applications will help optimize
the infrastructure to support widespread and equitable
EBP access in routine care.
Further evaluation of the EBCSS is needed. Essential
next aims include generation of concurrent criterion-
related validity evidence by testing whether scores on this
clinician-reported measure correspond with behaviors
as rated by trained observers (e.g., via the SPOCS). Stud-
ies that generate predictive validity evidence, assess the
responsiveness of scores on the EBCSS to changes over
time, and further evaluate potential moderating effects
of other supervision characteristics and potential expan-
sion to include additional supervision techniques are also
needed. Analysis of EBCSS scores using item response
theory will further enhance the evaluation of the scores
based on the measure.
Conclusions
is paper advances the conceptualization and meas-
urement of clinical supervision as an implementation
strategy. e study presented offers validity evidence
indicating scores on the EBCSS form a valid basis for
inferences about the extent to which clinicians experi-
ence two theoretically grounded, evidence-based clinical
supervision techniques that promote the implementation
of EBP: audit and feedback and active learning. Findings
highlight promising directions for future discovery and
provide a tool for stakeholders to optimize the embedded
infrastructure of clinical supervision in support of prac-
tice improvement.
Abbreviations
BSWAI‑T Br ief Supervisory Working Alliance Inventory—Trainee
CFA Confirmator y factor analysis
CFI Comparative fit index
EBCSS Evidence‑Based Clinical Supervision Scale
EBP Evidence‑based practice
MLR Maximum likelihood estimation
RMSEA Root mean square error of approximation
SPOCS Supervision Process Objective Coding System
SRMR Standardized root mean square residual
Supplementary Information
The online version contains supplementary material available at https:// doi.
org/ 10. 1186/ s43058‑ 023‑ 00419‑1.
Additional le1: Evidence‑Based Clinical Supervision Strategies Scale
(EBCSS).
Additional le2: STROBE checklist.
Acknowledgements
The authors wish to thank the participating providers for sharing their work
experiences with us.
Authors’ contributions
MCB and NJW conceptualized and designed the study, contributed to the
data acquisition, conducted the data analyses, interpreted the findings, and
drafted the manuscript. NR contributed to the data acquisition and analysis
and provided substantive revisions of the manuscript. SE contributed to the
study design, data acquisition, and interpretation. All authors approved the
submitted version.
Funding
This work was supported by the National Institute of Mental Health under
award number R01MH119127 (PI: Williams). The content is solely the
responsibility of the authors and does not necessarily represent the official
views of the National Institute of Health. This work was also supported
by the National Institute of Food and Agriculture under award number
1026688 and the University of Minnesota Grant‑In‑Aid program (PI:
Choy‑Brown).
Availability of data and materials
NJW and MCB had full access to all the data in the study and take responsibil‑
ity for the integrity of the data and the accuracy of the data analysis. Requests
for access to deidentified data can be sent to Nate Williams at natewilliams@
boisestate.edu.
Declarations
Ethics approval and consent to participate
All procedures were approved by the Boise State University and University of
Minnesota Institutional Review Boards. Written formal consent was obtained
for all study participants.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Author details
1 University of Minnesota, Twin Cities, 1404 Gortner Avenue, St. Paul, MN 55108,
USA. 2 Boise State University, 1910 University Drive, Education Suite 717, Boise,
ID 83725‑1940, USA.
Received: 25 February 2022 Accepted: 16 March 2023
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 13 of 14
Choy‑Brownetal. Implementation Science Communications (2023) 4:39
References
1. Rabin BA, Lewis CC, Norton WE, Neta G, Chambers D, Tobin JN, et al.
Measurement resources for dissemination and implementation research
in health. Implement Sci. 2015;11:1–9.
2. Proctor EK, Powell BJ, McMillen JC. Implementation strategies: recom‑
mendations for specifying and reporting. Implement Sci. 2013;8(1):1–11.
3. Eccles MP, Mittman BS. Welcome to implementation science. Implement
Sci. BioMed Central. 2006;1:1–3.
4. Perry CK, Damschroder LJ, Hemler JR, Woodson TT, Ono SS, Cohen DJ.
Specifying and comparing implementation strategies across seven large
implementation interventions : a practical application of theory. Imple‑
ment Sci. 2019;14(1)1–13.
5. Rudd BN, Davis M, Beidas RS. Integrating implementation science in
clinical research to maximize public health impact: a call for the reporting
and alignment of implementation strategy use with implementation
outcomes in clinical research. Implement Sci. 2020;15(1):1–11.
6. Powell BJ, Waltz TJ, Chinman MJ, Damschroder LJ, Smith JL, Matthieu MM,
et al. A refined compilation of implementation strategies: Results from
the Expert Recommendations for Implementing Change (ERIC) project.
Implement Sci. 2015;10(1):21.
7. Leeman J, Birken SA, Powell BJ, Rohweder C, Shea CM. Beyond “imple‑
mentation strategies”: Classifying the full range of strategies used in
implementation science and practice. Implement Sci. 2017;12(1):125.
8. Cook CR, Lyon AR, Locke J, Waltz T, Powell BJ. Adapting a Compilation of
Implementation Strategies to Advance School‑Based Implementation
Research and Practice. Prev Sci. 2019;20(6):914–35.
9. Lewis CC, Stanick CF, Martinez RG, Weiner BJ, Kim M, Barwick M, et al. The
society for implementation research collaboration instrument review
project: A methodology to promote rigorous evaluation. Implement Sci.
2015;10(1):1–18.
10. Lewis CC, Weiner BJ, Stanick C, Fischer SM. Advancing implementation
science through measure development and evaluation: a study protocol.
Implement Sci. 2015;10(1):1–10.
11. Choy‑Brown M, Baslock D, Cable C, Marsalis S, Williams N. In search of
the common elements of clinical supervision: A systematic review. Adm
Policy Ment Health Ment Health Ser. 2022;49(4):623‑43. https:// doi. org/
10. 1007/ s10488‑ 022‑ 01188‑0.
12. Dorsey S, Kerns SEU, Lucid L, Pullmann MD, Harrison JP, Berliner L, et al.
Objective coding of content and techniques in workplace‑based supervi‑
sion of an EBT in public mental health. Implementation Science. 2018
Jan;13(1).
13. Dorsey S, Pullmann MD, Deblinger E, Berliner L, Kerns SE, Thompson K,
et al. Improving practice in community‑based settings: a randomized trial
of supervision ‑ study protocol. Implement Sci. 2013;8(1):1–11.
14. Bearman SK, Schneiderman RL, Zoloth E. Building an Evidence Base for
Effective Supervision Practices: An Analogue Experiment of Supervision
to Increase EBT Fidelity. Adm Policy Ment Health. 2017;44(2):293–307.
15. Bearman SK, Weisz JR, Chorpita BF, Hoagwood K, Ward A, Ugueto AM,
et al. More practice, less preach? The role of supervision processes and
therapist characteristics in EBP implementation. Adm Policy Ment Health.
2013;40(6):518–29.
16. Milne D. Evidence‑based clinical supervision: principles and practice.
Evidence‑based clinical supervision: Principles and Practice. Hoboken:
Wiley; 2009.
17. Borders LDA, Glosoff HL, Welfare LE, Hays DG, DeKruyf L, Fernando DM,
et al. Best practices in clinical supervision: evolution of a counseling
specialty. Clinical Supervisor. 2014;33(1):26–44.
18. Sewell KM. Social work supervision of staff: a primer and scoping review
(2013–2017). Clin Soc Work J. 2018;46(4):252–65.
19. Powell BJ, Stanick CF, Halko HM, Dorsey CN, Weiner BJ, Barwick MA, et al.
Toward criteria for pragmatic measurement in implementation research
and practice: a stakeholder‑driven approach using concept mapping.
Implement Sci. 2017;12:1–7.
20. Glasgow RE, Fisher L, Strycker LA, Hessler D, Toobert DJ, King DK, et al.
Minimal intervention needed for change: definition, use, and value for
improving health and health research. Transl Behav Med. 2014;4(1):26–33.
21. Stanick CF, Halko HM, Dorsey CN, Weiner BJ, Powell BJ, Palinkas LA, et al.
Operationalizing the “pragmatic” measures construct using a stake‑
holder feedback and a multi‑method approach. BMC Health Serv Res.
2018;18(1):88.
22. Lewis CC, Dorsey C. Advancing implementation science measurement.
In: Albers B, Shlonsky A, Mildon R, editors. Implementation Science 30.
Switzerland: Springer Nature; 2020. p. 227–51.
23. Bailin A, Bearman SK, Sale R. Clinical Supervision of Mental Health Profes‑
sionals Serving Youth: Format and Microskills. Adm Policy Ment Health.
2018;45(5):800–12. https:// doi. org/ 10. 1007/ s10488‑ 018‑ 0865‑y.
24. Stirman SW, Pontoski K, Creed T, Xhezo R, Evans AC, Beck AT, et al. A Non‑
randomized Comparison of Strategies for Consultation in a Community‑
Academic Training Program to Implement an Evidence‑Based Psycho‑
therapy. Adm Policy Ment Health. 2017;44(1):55–66.
25. Creed TA, Kuo PB, Oziel R, Reich D, Thomas M, Connor SO, et al. Knowl‑
edge and attitudes toward an artificial intelligence‑based fidelity meas‑
urement in community cognitive behavioral therapy supervision. Adm
Policy Ment Health. 2022; 49(3):343–56.
26. Schoenwald SK, Sheidow AJ, Chapman JE. Clinical supervision in treat‑
ment transport: effects on adherence and outcomes. J Consult Clin
Psychol. 2009;77(3):410–21.
27. Lucid L, Meza R, Pullmann MD, Jungbluth N, Deblinger E, Dorsey S.
Supervision in Community Mental Health: Understanding Intensity of EBT
Focus. Behav Ther. 2018;49(4):481–93. Available from http:// ovidsp. ovid.
com/ ovidw eb. cgi?T= JS& PAGE= refer ence&D= medc& NEWS= N& AN=
29937 252.
28. Pullmann MD, Lucid L, Harrison JP, Martin P, Deblinger E, Benjamin KS,
et al. Implementation climate and time predict intensity of supervi‑
sion content related to evidence based treatment. Front Public Health.
2018;6:280.
29. Accurso EC, Taylor RM, Garland AF. Evidence‑based practices addressed
in community‑based children’s mental health clinical supervision. Train
Educ Prof Psychol. 2011;5(2):88–96.
30. Lewis CC, Mettert KD, Stanick CF, Halko HM, Nolen EA, Powell BJ, et al. The
psychometric and pragmatic evidence rating scale ( PAPERS ) for measure
development and evaluation. Implementation Research & Practice.
2021;2:1–6.
31. Tracey TJG, Wampold BE, Lichtenberg JW, Goodyear RK. Expertise in
psychotherapy: An elusive goal? Am Psychol. 2014;69(3):218–29.
32. Ivers N, Jamtvedt G, Flottorp S, Young JM, Odgaard‑Jensen J,
French SD, et al. Audit and feedback: Effects on professional
practice and healthcare outcomes. Cochrane Database Syst Rev.
2012;2012(6):CD000259.
33. Creed TA, Frankel SA, German RE, Green KL, Jager‑hyman S, Taylor KP,
et al. Implementation of transdiagnostic cognitive therapy in community
behavioral health: The Beck Community Initiative. J Consult Clin Psychol.
2016;84(12):1116–26.
34. Roth AD, Pilling S, Turner J. Therapist training and supervision in clini‑
cal trials: Implications for clinical practice. Behav Cogn Psychother.
2010;38(3):291–302.
35. Beidas RS, Cross W, Dorsey S. Show Me, Don’t Tell Me: Behavioral
Rehearsal as a Training and Analogue Fidelity Tool. Cogn Behav Pract.
2014;21(1):1–11.
36. Kolb DA. Experiential Learning: Experience As The Source Of Learning
And Development How You Learn Is How You Live View project Learning
Sustainability View project. 1984.
37. Milne D, Aylott H, Fitzpatrick H, Ellis MV. The Clinical Supervisor How Does
Clinical Supervision Work? Using a “Best Evidence Synthesis” Approach to
Construct a Basic Model of Supervision. 2008.
38. Herschell AD, Kolko DJ, Baumann BL, Davis AC. The Role of Therapist Training
in the Implementation of Psychosocial Treatments: A Review and Critique
with Recommendations. Clin Psychol Rev. 2011;30(4):448–66.
39. Beidas RS, Maclean JC, Fishman J, Dorsey S, Schoenwald SK, Mandell DS,
et al. A randomized trial to identify accurate and cost‑effective fidelity meas‑
urement methods for cognitive‑behavioral therapy: Project FACTS study
protocol. BMC Psychiatry. 2016;16(1):323.
40. Ellis MV, Krengel M, Ladany N, Schult D. Clinical supervision research
from 1981 to 1993: a methodological critique. J Couns Psychol.
1996;43(1):35–50.
41. Schriger SH, Becker‑Haimes EM, Skriner L, Beidas RS. Clinical Super vision in
Community Mental Health: Characterizing Supervision as Usual and Explor‑
ing Predictors of Supervision Content and Process. Community Ment Health
J. 2020;57:552–66. https:// doi. org/ 10. 1007/ s10597‑ 020‑ 00681‑w.
42. Schriger SH, Becker‑Haimes EM, Skriner L, Beidas RS. Clinical super vision
in community mental health: characterizing supervision as usual and
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 14 of 14
Choy‑Brownetal. Implementation Science Communications (2023) 4:39
•
fast, convenient online submission
•
thorough peer review by experienced researchers in your field
•
rapid publication on acceptance
•
support for research data, including large and complex data types
•
gold Open Access which fosters wider collaboration and increased citations
maximum visibility for your research: over 100M website views per year
•
At BMC, research is always in progress.
Learn more biomedcentral.com/submissions
Ready to submit your research
Ready to submit your research
? Choose BMC and benefit from:
? Choose BMC and benefit from:
exploring predictors of supervision content and process. Community Ment
Health J [Internet]. 2020;57:552–66. Available from: https:// doi. org/ 10. 1007/
s10597‑ 020‑ 00681‑w
43. Wolf EJ, Harrington KM, Clark SL, Miller MW. Sample size requirements for
structural equation models: An evaluation of power, bias, and solution
propriety. Educ Psychol Meas. 2013;73(6):913–34.
44. Wolf EJ, Harrington KM, Clark SL, Miller MW. Sample size requirements for
structural equation models: an evaluation of power, bias, and solution
propriety. Educ Psychol Meas. 2013;73(6):913–34.
45. Carlson L, Rapp CA, Eichler MS. The experts rate: supervisory behaviors that
impact the implementation of evidence‑based practices. Community Ment
Health J. 2012;48(2):179–86.
46. Monroe‑DeVita M, Teague GB, Moser LL. The TMACT: A new tool for measur‑
ing fidelity to assertive community treatment. J Am Psychiatr Nurses Assoc.
2011;17(1):17–29.
47. Sass DA, Schmitt TA, Marsh HW. Evaluating model fit with ordered categori‑
cal data within a measurement invariance framework: a comparison of
estimators. Struct Equ Modeling. 2014;21(2):167–80.
48. Ehrhart MG, Aarons GA, Farahnak LR. Assessing the organizational context
for EBP implementation: the development and validity testing of the Imple‑
mentation Climate Scale (ICS). Implement Sci. 2014;9(1):157.
49. Williams, N.J., Ramirez, N., Esp, S., Watts, A., & Marcus SM. Organization‑level
variation in therapists’ attitudes toward and use of measurement‑based
care. Adm Policy Ment Health. 2022;49(6):927‑42.
50. Williams NJ, Ehrhart MG, Aarons GA, Marcus SC, Beidas RS. Linking molar
organizational climate and strategic implementation climate to clinicians’
use of evidence‑based psychotherapy techniques: cross‑sectional and
lagged analyses from a 2‑year observational study. Implementation Science
[Internet]. 2018 Dec 25 [cited 2020 Aug 3];13(85):1–13. Available from:
https://implementationscience.biomedcentral.com/articles/https:// doi. org/
10. 1186/ s13012‑ 018‑ 0781‑2
51. Williams NJ, Hugh ML, Cooney DJ, Worley JA, Locke J, Williams NJ, et al. Testing
a theory of implementation leadership and climate across autism evidence‑
based interventions of varying complexity. Behav Ther [Internet]. 2022 Mar
[cited 2022 May 1]; Available from: https:// doi. org/ 10. 1016/j. beth. 2022. 03. 001
52. Williams NJ, Benjamin Wolk C, Becker‑Haimes EM, Beidas RS. Testing a theory
of strategic implementation leadership, implementation climate, and clini‑
cians’ use of evidence‑based practice: a 5‑year panel analysis. Implementa‑
tion Science [Internet]. 2020 [cited 2020 Dec 29];15(10):1–15. Available from:
https:// doi. org/ 10. 1186/ s13012‑ 020‑ 0970‑7
53. James LR, Demaree RG, Wolf G. rwg: an assessment of within‑group inter‑
rater agreement. J Appl Psychol. 1993;78(2):306.
54. LeBreton JM, Senter JL. Answers to 20 Questions About Interrater Reliability
and Interrater Agreement. Organ Res Methods. 2008;11(4):815–52. https://
doi. org/ 10. 1177/ 10944 28106 296642.
55. Sabella SA, Schultz JC, Landon TJ. Validation of a Brief Form of the Supervi‑
sory Working Alliance Inventory. Rehabil Couns Bull. 2020;63(2):115–24.
56. Graen G, Uhl‑Bien M. Relationship‑based approach to leadership: develop‑
ment of leader‑member exchange (LMX) theory of leadership over 25 years:
Applying a multi‑level multi‑domain. Leadersh Quarterly1. 1995;6(2):219–47.
57. Dulebohn JH, Bommer WH, Liden RC, Brouer RL, Ferris GR. A meta‑analysis
of antecedents and consequences of leader‑member exchange: integrating
the past with an eye toward the future. J Manage. 2012;38(6):1715–59.
58. Liden RC, Wu J, Cao X, Wayne SJ. Leader–member exchange measurement.
Bauer TN, Erdogan B, editors. Oxford University Press; 2015.
59. Fenwick KM, Brimhall KC, Hurlburt M, Aarons G. Who wants feedback?
Effects of transformational leadership and leader‑member exchange on
mental health practitioners’ attitudes toward feedback. Psychiatr Serv.
2019;70(1):11–8.
60. Ellis MV, Berger L, Hanus AE, Ayala EE, Swords BA, Siembor M. Inadequate
and harmful clinical supervision: testing a revised framework and assessing
occurrence. Couns Psychol. 2014;42(4):434–72.
61. Ellis MV, Creaner M, Hutman H, Timulak L. A Comparative Study of Clinical
Supervision in the Republic of Ireland and the United States. J Couns
Psychol. 2015;62(4):621–31.
62. Hoge MA, Migdole S, Cannata E, Powell DJ. Strengthening supervision in
systems of care: exemplary practices in empirically supported treatments.
Clin Soc Work J. 2014;42(2):171–81.
63. Yang‑Wallentin F, Jöreskog KG, Luo H. Confirmatory factor analysis of
ordinal variables with misspecified models. Structural Equation Modeling.
2010;17:392–423.
64. Li CH. Confirmatory factor analysis with ordinal data: comparing robust
maximum likelihood and diagonally weighted least squares. Behav Res
Methods. 2016;48(3):936–49.
65. Lei PW. Evaluating estimation methods for ordinal data in structural equa‑
tion modeling. Qual Quant. 2009;43(3):495–507.
66. Muthén LK, Muthén BO. Statistical analysis with latent variables user’s
guide. 1998.
67. Schreiber JB, Nora A, Stage FK, Barlow EA, King J. Reporting structural equa‑
tion modeling and confirmatory factor analysis results: a review. J Educ Res.
2006;99(6):323–38.
68. Kline RB. Principles and practice of structural equation modeling. New York:
Guilford Press; 2015.
69. Raudenbush S, Bryk A. Hierarchical linear models: applications and data
analysis methods. Thousand Oaks: Sage Publications; 2002.
70. Hox JJ, Moerbeek M, van de Schoot R. Multilevel analysis. Third edition. New
York: Routledge; 2017.
71. Cohen J. Statistical power analysis for the behavioral sciences. Second
edition. New York: Routledge; 1988.
72. Satorra A, Bentler PM. Ensuring positiveness of the scaled difference chi‑
square test statistic. Psychometrika. 2010;75(2):243–8.
73. Meredith W, Teresi JA. An essay on measurement and factorial invariance.
Med Care. 2006;44(11 Suppl 3):S69‑77.
74. Putnick DL, Bornstein MH. Measurement invariance conventions and report‑
ing: the state of the art and future directions for psychological research. Dev
Rev. 2016;41:71–90.
75. Rhudy JL, Arnau RC, Huber FA, Lannon EW, Kuhn BL, Palit S, Payne MF,
Sturycz C, Hellman N, Guereca YM, Toledo TA, Shadlow JO. Examining
configural, metric, and scalar invariance of the pain catastrophizing scale in
native American and non‑Hispanic White adults in the Oklahoma Study of
Native American Pain. J Pain Res. 2020;13:961–9.
76. Chen FF. Sensitivity of goodness of fit indexes to lack of measurement
invariance. Struct Equ Modeling. 2007;14(3):464–504.
77. Byrne BM, Shavelson RJ, Muthén B. Testing for the equivalence of factor
covariance and mean structures: The issue of partial measurement invari‑
ance. Psychol Bull. 1989;105(3):456–66.
78. Birken SA, Lee SYD, Weiner BJ, Chin MH, Schaefer CT. Improving the effec‑
tiveness of health care innovation implementation: middle managers as
change agents. Med Care Res Rev. 2013;70(1):29–45.
79. Birken SA, Lee SYD, Weiner BJ. Uncovering middle managers’ role in health‑
care innovation implementation. Implement Sci. 2012;7(1):28.
80. Bunger AC, Birken SA, Hoffman JA, MacDowell H, Choy‑Brown M, Magier E.
Elucidating the influence of supervisors’ roles on implementation climate.
Implement Sci. 2019;14(1):93.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in pub‑
lished maps and institutional affiliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com