Content uploaded by Jennifer Holloway
Author content
All content in this area was uploaded by Jennifer Holloway on Jan 17, 2016
Content may be subject to copyright.
REVIEW PAPER
A Systematic Review and Evaluation of Video Modeling,
Role-Play and Computer-Based Instruction as Social Skills
Interventions for Children and Adolescents
with High-Functioning Autism
Anna McCoy
1
& Jennifer Holloway
1
& Olive Healy
2
& Mandy Rispoli
3
& Leslie Neely
4
Received: 21 July 2015 /Accepted: 18 November 2015
#
Springer Science+Business Media New York 2016
Abstract An impaired development in social interaction is a
defining characterist ic of high-functioning autism (HFA).
Video modeling (VM), role play, and computer-based instruc-
tion (CBI) have received empirical evaluation in the literature
and are increasingly used in clinical practice as treatment ap-
proaches for increasing social skills in this population.
Systematic reviews of the efficacy and evidence base of these
interventions for children and adolescents with HFA are lim-
ited to date and are primarily narrative in methodology. It is
true that much of what we know about VM, role play, and CBI
is derived from reviews of the broader Autism Spectrum
Disorder (ASD) population, which highlights the need to eval-
uate the effects of these interventions on heterogeneous
groups of ASD (i.e., HFA). The current study provides a fo-
cused review of the efficacy and evidence base of VM, role
play, and CBI for teaching social skills to children and ado-
lescents with HFA. In addition, a set of stringent criteria were
used to evaluate the status of these interventions as evidence-
based practice (EBP; Reichow 2011). According to Reichow’s
(2011) criteria, only one of the three interventions evaluated
(i.e., CBI) had the accumulated evidence necessary to be clas-
sified as an established EBP, while both VM and role play did
not. Areas for future research and recommendations for prac-
tice are discussed.
Keywords Autism spectrum disorder
.
High functioning
autism
.
Social skills
.
Video modeling
.
Role play
.
Computer-based instruction
.
Evidence-based practice
Children and adolescents with a diagnosis of Autism
Spectrum Disorder (ASD) present with impairments in
social functioning, which is one of the defining diagnos-
tic features of the condition. Such deficits vary within
the spectrum resulting in social skills profiles that can
present in multiple ways across different individuals
(Flynn and Healy 2012). Other factors including verbal
and intellectual difficulties can further impact on the rep-
ertoire of social skills of an individual with ASD.
Children and adolescents with ASD who are considered
to be “high functioning” (high-functioning autism; HFA)
do not present with IQ or language impairments and as a
result the nature of their social skills deficits are consid-
ered to be different to those i ndividuals diagnosed with
autism who present with more complex needs
(Diagnostic and Statistical Manual of Mental Disorders
5th ed.; American Psychiatric Association 2013).
Individuals with H FA are often motivated to interact so-
cially with others but may not have the required social
skills to successfully fulfill such interaction (Schopler
and Mesibov 1983;Tseetal.2007).
The effects of social interaction deficits can be ad-
verse, transcending from childhood into adulthood
(Schmidt and Stic hter 2012). For children and adoles-
cents with HFA, these deficits can negatively impact
their management of everyday interactions with others,
their development and maintenance of meaningful peer
relationships, and their i nclusion op portunities in both
school and community settings (Koegel et al. 2013).
Long-term outcome studies have documented the
* Olive Healy
olive.healy@tcd.ie
1
National University of Ireland, Galway, Ireland
2
Trinity College Dublin, Dublin, Ireland
3
Purdue University, West Lafayette, IN, USA
4
The University of Texas at San Antonio, San Antonio, TX, USA
Rev J Autism Dev Disord
DOI 10.1007/s40489-015-0065-6
continued struggle with inclusion, where adults with
HFA experience difficulties securing meaningful relation-
ships (Goode et al. 1999), maintaining employment
(Szatmarietal.1989) and achieving independence, plac-
ing them at risk for increased stress, loss of self-esteem,
anxiety, and depression (Howlin 2004).
Within the field of behavior analysis, there is a breadth of
research investigating the effects of social skills interventions
for remediating social challenges and teaching social skills to
children and adolescents with HFA. Social skills interventions
typically break down complex social behaviors that are deter-
mined by chronological and developmental age levels into
more feasible steps (Cappadocia and Weiss 2011). They aim
to produce generalized social outcomes that can be applied to
everyday social situations. Examples of social skills interven-
tions include, but are not limited to, video modeling (VM; Axe
and Evans 2012; Bellini et al. 2007), peer-mediated interven-
tions (PMI; Chan et al. 2009), role play (Gutman et al. 2010),
social stories (Sansosti and Powell-Smith 2008; Scattone et al.
2006), and computer-based instruction (CBI; Beaumont and
Sofronoff 2008; Mitchell et al. 2007).
Of the available social skills interventions for children and
adolescents with HFA, VM, role play and, more recently, CBI
have been increasingly applied and empirically examined for
teaching social skills. For example, VM has been applied in
the literature to teach perspective taking and problem solving
skills, greetings and social initiation, and conversational skills
(Kagohara et al. 2013;Radleyetal.2014; Sansosti and
Powell-Smith 2008). Similar to this, interventions that include
role play have been shown to have positive outcomes, effec-
tively increasing social skills such as sportsmanship (i.e., giv-
ing compliments and making positive post-game comments)
and conversational turn taking and maintenance (Ferguson
et al. 2013;Leafetal.2012a, b). Animated or virtual reality
(VR) CBI interventions have been successful in teaching con-
versational turn taking and maintenance, recognition of body
language, facial expressions, and emotions in the self and
others (Ke and Im 2013;LaCavaetal.2007; Stichter et al.
2014).
For over a decade, ASD intervention research has experi-
enced a growing demand to identify interventions that have
the accumulated evidence necessary to be considered
evidence-based practices (EBPs). The identification of EBPs
originated in the medical field but spread quickly to other
disciplines including the s ocial sciences (Reichow et al.
2008). EBPs exist where quality research provides empirical
evidence to inform practice and to identify the best interven-
tions available in a field (Camargo et al 2014; Reichow et al
2008). While research synthesis organizations and profession-
al associations (e.g., Cochrane Collaboration, Campbell
Collaboration, and What Works Clearinghouse) propose stan-
dards for determining evidence-based practices, there is de-
bate ove r the type s (i.e., research design), strength, and
amount/magnitude of evidence that is required (Odom et al
2005). Most standards for determining evidence-based prac-
tice equate evidence of effectiveness and research quality to
randomized controlled trials (i.e., RCTs), in so far as they offer
the greatest capacity to control for threats to internal validity
(Odom et al 2005). This highlights a particular issue in the
ASD and special education literature, where the heterogeneity
of participant characteristics pose significant challenge to re-
search designs that require establishing equivalent groups
(i.e., RCTs) and, therefore, where single-subject research de-
signs (SSRDs) are frequently used to evaluate intervention
outcomes (Odom et al 2005
). While SSRDs are being ac-
kn
owledged for their contribution to the identification of
EBPs, standards for determining strength of evidence or re-
search quality are required. In addition to this, methods for
synthesizing the strength of evidence of RCTs and SSRDs are
especially required where different research methodologies
are used to evaluate intervention outcomes. Reichow et al
(2008) addressed this issue by developing the Evaluat ive
Method for Determining Evidence-Based Practice in
Autism. The evaluative method accounts for the lack of an
operational method for evaluating research quality by includ-
ing a rubric for determining research quality or report rigor
and guidelines for accumulating results from the rubrics to
inform the strength of the research report. Following from this,
the evaluative method sets forth criteria for determining EBPs.
Outcomes from efforts to identify EBPs are further en-
hanced when quantitative methods are used to assess the mag-
nitude of effects. In group comparison research, the calcula-
tion of effect size is commonly employed. In single-subject
research, non-overlap methods have long been employed as
an indicator of performance difference between phases
(Sidman 1960) and, therefore, as a calculation of effect size.
According to Parker and Vannest (20 09), non-overlap
methods measure the extent to which baseline (A) versus in-
tervention (B) phases do not overlap. There are a number of
non-overlap methods available for synthesizing single-subject
research, however, there is no agreement on which method
best measures and interprets the strength of treatment effects
(Peterson-Brown et al. 2012;Carretal.2015). Efforts to ad-
dress this have focused on evaluating and comparing each
method according to the presence of an underlying distribu-
tion, their relationship to other established effect sizes, their
ability to discriminate among published studies, their statisti-
cal power for small N studies, and lastly, their ease of calcu-
lation (Parker et al. 2011a, 2011b). Of the nine available over-
lap methods, non-overlap of all pairs (i.e., NAP; Parker and
Vannest 2009) and tau for non-overlap with baseline trend
control (i.e., Tau-U; Parker et al. 2011a, 2011b) have been
(1) credited as “complete” non-overlap indices as both NAP
and Tau-U compare all phase A and phase B data points, (2)
shown to have the greatest statistical power in that they dem-
onstrate the abili ty to reliably identify smaller treatment
Rev J Autism Dev Disord
effects, (3) shown to discriminate better among results from a
large group of published studies (Parker et al. 2011a, b), and
lastly, (4) are compatible with results from visual analysis
(Rakap 2015). Tau-U extends upon NAP in control for unde-
sirable positive baseline trends (i.e., trends in a therapeutic
direction; Parker et al. 2011a, b). In a recent review, the suit-
ability of applying NAP to studies in both the self-
management and e xercise single-sub ject research for the
ASD population was evaluated (Carr et al 2015). As a mea-
sure of effect size, NAP was unrestricted by the volume of
data points collated during interventio n and therefore was
found to be an appropriate index to apply to studies that em-
ploy SSRDs (Carr et al 2015). At the time of this current study,
a similar comparison of the Tau-U has yet to be conducted.
Given the similarities between NAP and Tau-U, both are con-
sidered appropriate indices of effect size in single-subject
research.
Systematic reviews that have evaluated VM interventions
for teaching social skills have applied both quantitative
methods for assessing treatment efficacy (i.e., percentage of
non-overla pping data; PND) and methods for determining
evidence-based practice (Bellini and Akullian 2007; Wang
and Spillane 2009). These reviews, however, document the
efficacy and evidence base of VM for children and adoles-
cents across the broader ASD spectrum. Reviews and evalua-
tion of the evidence-base of the VM intervention for children
and adolescents with a diagnosis of HFA are limited to one
study (Reichow and Volkmar 2010). A similar situation exists
for the evaluation of CBI interventions for increasing social
skills. Systematic reviews conducted to date have applied both
methods for assessing treatme nt efficacy and determ ini ng
evidence-based practice but reviews are limited in number
(i.e., n=2) and include children and adolescents with ASD
alone (Ramdoss et al. 2012;Reedetal.2011). For role play
interventions, a synthesis and evaluation of the literature
through systematic review, for children and adolescents with
a diagnosis of ASD and its subgroup, HFA, have yet to be
conducted.
While there are reviews of the literature that evaluate a
myriad of social skills interventions specifically for children
and adolescents with a diagnosis of HFA, they are narrative in
methodology and do not evaluate the efficacy or evidence
base of the interventions (Cappadocia and Weiss 2011;Rao
et al. 2008). In order to determine the best practices for ho-
mogenous groups on the autism spectrum (i.e., HFA), evalu-
ation through systematic review, with methodologies for de-
termining EBP and treatment efficacy, is necessary. This cri-
tique of the literature would provide valuable information for
both progression of research in the area and for clinical
practice.
The purpose of the current review was to evaluate the ev-
idence base of the extant literature on VM, role play, and CBI
as interventions for increasing the social skills of children and
adolescents with HFA. This review extends upon past reviews
in the area by evaluating VM, role play, and CBI as interven-
tions for the HFA subgroup of ASD. In addition to this, both a
measure of intervention efficacy for SSRDs (i.e., NAP) and an
evaluative method that includes rigorous research quality stan-
dards (i.e., the Evaluative Method for Determining Evidence-
Based Practice in Autism) were used to determine the efficacy
and evidence base of the studies identified for each respective
intervention. Lastly, this current review provides a review of
focus in that three interventions that are increasing in clinical
use are compared according to their efficacy and evidence
base.
Method
Search Procedures
Systematic searches were carried out using the following six
electronic databases: Educational Resources Information
Centre (ERIC), PsycINFO, Psychology and Behavioral
Sciences, and Web of Science, Scopus and MedLine. In all
databases, searches were conducted by inputting the term “so-
cial skills” in combination with the following keywords: “role
play” or “ video*” or “ computer*” plus “ autis*” or
“asperger*” (i.e., social skills AND video* AND autis*).
The abstracts of the resulting studies were reviewed to identify
studies for inclusion. The reference lists for studies meeting
the inclusion criteria, together with the reference lists from
relevant review articles, were then manually reviewed to iden-
tify additional articles for inclusion. These searches were con-
ducted up to October 2015 and were limited to peer-reviewed
studies written in the English language.
Inclusion Criteria
In order to be included in this review, a study was required to
meet the following criteria: (a) all participants in the study
were required to have a diagnosis of HFA, with an IQ>85 or
Asperger Syndrome (AS); (b) the study was required to report
an evaluation of either a role play, VM, or CBI intervention,
used alone or as a primary component of a treatment package,
to improve one or more social skill(s) of the participant with
AS or HFA; (c) the evaluation of the intervention must have
been conducted using SSRDs (e.g., alternating treatments,
multiple baseline, and reversal/withdrawal designs) or group
research designs; (d) the study must have included children,
up to and including 12 years, or adolescents from age 13 to
17 years; (e) the study must have been published or accepted
for publication with online availability in English within a
peer-reviewed journal.
Rev J Autism Dev Disord
Data Extraction
Studies selected for inclusion in this review were then sum-
marized in terms of: (a) study design; (b) participant charac-
teristics, including total number of participants, age range,
gender, diagnosis, co-occurring diagnosis, and cognitive func-
tioning; (c) dependent variables; (d) standardized and behav-
ioral measures used; (e) intervention characteristics, including
intervention type, intervention density (duration of a session
and/or the number of sessions per week, and/or the total length
of the intervention), setting, intervention delivery agent; (f)
generalization and/or maintenance procedures (including as-
sessments); (g) treatment integrity; (h) social validity mea-
sures, and (i) intervention outcomes. Generalization proce-
dures, treatment fidelity, and social validity were further coded
according to the type of procedures used.
Generalization Each study was coded according to the pres-
ence of Stokes and Baer’s(1977) nine technologies of gener-
alization. These include train and hope, sequential modifica-
tion, applying naturally maintaining contingencies, training
sufficient exemplars, training loosely, using indiscriminable
contingencies, programming common stimuli, mediating gen-
eralization, and training “to generalize”. Each study was fur-
ther coded to determine if technologies of generalization were
applied alone or in combination.
Treatment Integrity Each study was coded according to the
method used to monitor treatment integrity. The methods in-
cluded (a) pre-intervention training taught to a pre-determined
criterion (i.e., 90 % accuracy of treatment implementation);
(b) direct observation, with/out feedback, and/or inter-
observer agreement on the occurrence and non-occurrence
of the target social behavior; (c) intervention checklists, com-
pleted during the intervention and by the interventionists and/
or independent observers; and (d) use of a manual that provid-
ed a step by step guide of intervention implementation.
Social Validity Each study was coded according to the type of
method used to monitor intervention satisfaction/social valid-
ity. The types of methods included (1) standardized scales
developed to evaluate the acceptability of an intervention,
(2) child, parent or teacher completed surveys or question-
naires, and (3) satisfaction interviews.
Determining Treatment Efficacy
Nonoverlap of All Pairs (NAP; Parker and Vannest 2009)
The NAP statistic was applied to determine the treatment ef-
ficacy of each study by calculating the overlap between each
phase A datapoint and each phase B datapoint, in turn. NAP
equals the number of comparison pairs showing no overlap,
divided by the total number of comparisons. Where studies
included several experiments, NAP scores were combined by
calculating the weighted average. A treatment was categorized
as showing a “large” effect where the NAP effect size was
calculated between 0.93 and 1.0. If the NAP effect size was
calculated between 0.66 and 0.92, a treatment was categorized
as showing a “medium” effect, and if the NAP effect size was
calculated between 0 and 0.65, a treatment was categorized as
showing a “small” effect (Parker and Vannest 2009). To cal-
culate the overall NAP effect size for each intervention inves-
tigated in this study, the median was calculated.
Evidence-Based Practice (EBP; Reichow 2011) The pur-
pose of this review was to classify the empirical support for
role play, VM, and CBI as interventions for teaching social
skills to children and adolescents with HFA. This was
achieved by applying Reichow’s(2011) evaluative method
for determining if an intervention constituted EBP. This meth-
od was selected because it was designed to evaluate research
involving specific interventions, and it is equally suited to the
evaluation of research that employed either single-subject or
group comparison designs (Reichow 2011). Other researchers
have demonstrated the utility of this method in evaluating both
single-subject and groups designs (Lydon et al. 2013).
The evaluative method involved a comprehensive protocol
implemented across three stages. Firstly, research report rigor
was rated according to two rubrics, one for research conducted
using group comparison research designs and the other for
research conducte d using SSRDs. E ach rubric provided a
grading scheme that evaluated the quality of the methodolog-
ical elements pertaining to the design. In addition to this, and
wi
thin each rubric, methodological elements were categorized
according to primary and secondary quality indicators.
According to Reichow (2011), primary quality indicators are
methodological elements that are considered critical for dem-
onstrating the validity of a study. Primary quality indicators
for group comparison designs include (1) information on par-
ticipant characteristics; (2) independent and dependent vari-
ables that are described with replicable precision; (3) the use
of a comparison condition; (4) the demonstration of the link
between the research question and the data analysis, and lastly,
(5) the use of accurate statistical analysis. Primary quality
indicators for SSRDs include (1) information on participant
characteristics; (2) independent and dependent variables de-
scribed with replicable precision; (3) the demonstration of a
stable baseline condition, and (4) the demonstration of exper-
imental control. Across both rubrics, each primary indicator
was rated as either “high quality” (H), “acceptable quality”
(A), or “unacceptable quality” (U). According to Reichow
(2011), secondary quality indicators are elements of the re-
search design that, although important, are not considered
necessary for determining the validity of a study. Secondary
quality indicators for group comparison designs include (1)
random assignment; (2) the use of interobserver agreement
Rev J Autism Dev Disord
and blind raters; (3) the measurement of treatment fidelity; (4)
details of participant attrition; (5) the measurement of gener-
alization and maintenance outcomes; (6) the reporting of treat-
ment effect sizes, and (7) the demonstration of social validity.
Secondary quality indicators for SSRDs include (1) the use of
interobserver agreement and blind raters; (2) the calculation of
the Kappa statistic; (3) the measurement of treatment fidelity;
(4) the measurement of generalization and maintenance out-
comes, and (5) the demonstration of social validity. Across
both rubrics, secondary quality indicators were rated on a
dichotomous scale, where indicators were either present or
absent.
Following this, ratings of primary and secondary quality
indicators were synthesized using a scoring criterion whereby
each study received a strength rating (i.e., “strong”, “ade-
quate”,or“weak”). Similar to the primary and secondary
quality indicators, the scoring criterion for the strength ratings
differed according to the research design. For example, to
score a “strong” strength rating, a study using a group com-
parison design needed to receive high quality grades on all
primary quality indicators and to show evidence of four or
more secondary quality indicators, whereas a study using a
single-subject research design needed to receive high quality
grades on all primary quality indicators and to show evidence
of three or more secondary quality indicators.
Lastly, studies for each intervention were collated based on
the number of participants in studies that used a single-subject
research design, and in the number of studies in group com-
parison designs, that received “strong” and “adequate”
strength ratings. The following formula was then applied,
per intervention, to determine EBP (“Group
s
” equals the total
number of group design studies with an overall “strong” rat-
ing, “Group
A
” equals the total number of group design studies
with an overall “adequate” rating, “SSED
S
” equals the total
number of single-subject studies with an “strong” rating, and
“SSED
A
” equals the total number of single-subject studies
with an “adequate” rating):
Group
s
*30ðÞþGroup
A
*15ðÞþSSED
s
*4ðÞþSSED
A
*2ðÞ¼Z
Lastly, a Z score, indicating the total number of points per
intervention, was employed with ≥60 points indicating
“established EBP” and >30 indicating “probable EBP”.
Inter-Rater Agreement
Reliability measures were collected for each step of the data
extract and evaluation for EBP. For the data summaries, the
first author extracted information from each study to develop
an initial summary. In order to ensure accuracy of the infor-
mation extracted, the fifth author evaluated the accuracy of
each summary using a checklist developed by Lang et al.
(2012). The checklist included nine questions, specifically:
(a) Is this an accurate description of the participants?; (b) Is
this an accurate description of the diagnosis and cognitive
functioning of the participants?; (c) Is this an accurate descrip-
tion of the assessment procedures?; (d) Is this an accurate
description of the dependent variables?; (e) Is this an accurate
description of the intervention procedures?; (f) Is this an ac-
curate description of the design of the study?; (g) Is this an
accurate description of outcomes; (h) Is this an accurate de-
scription of the treatment efficacy for single-subject research
design; (i) Is this an accurate description of the certainty of
evidence or evidence-based practice. From the checklist, there
were 216 items (i.e., 24 studies with nine checklist items per
study) on which there could be agreement and disagreement.
Initial agreement was obtained on 201 items (93 %). When
information extracted was considered inaccurate, the co-
authors discussed the study and reached agreement. This pro-
cess was repeated until 100 % accuracy on the summaries was
achieved. The resulting summaries were used to create
Tables 1, 2,and3.
Inter-rater agreement for NAP was calculated on 89.47 %
of the included SSRDs. An agreement was defined as both
raters recording the same percentage of non-overlapping data
per behavior. Overall agreement was determined by the fol-
lowing formula:
#of agreements= # agreements þ disagreements½*100 ¼ %
Initial inter-rater agreement for the calculation of NAP was
89 %. When NAP calculations were considered inaccurate,
the co-authors reached agreement through discussion. This
process was repeated until 100 % agreement was achieved.
Inter-rater agreement for EBP was calculated on 48.27 %
(N =14) of the studies identified in the literature search.
Agreement was defined as obtaining id entical quality and
strength ratings for EBP. The first and fifth authors calculated
all inter-rater agreement for this study. Inter-rater agreement
was calculated using the same formula outlined above. Initial
agreement for the calculation of primary and secondary qual-
ity indicators was 87.90 %, and inter-rater agreement for re-
search strength ratings was 79.16 %. When primary and qual-
ity indicators and strength ratings were considered inaccurate,
the co-authors reached agreement through discussion of the
indicators evaluated. This process was repeated until 100 %
agreement was achieved.
Results
The systematic search procedures and the application of the
pre-determined inclusion criteria resulted in the inclusion of
29 studies in this review. Of these 29 studies, eight studies
were categorized as evaluating the use of VM, either alone
or within a multi-component treatment package, of which all
Rev J Autism Dev Disord
Table 1 Summary of participant characteristics, intervention characteristics, methodological characteristics, and methodological outcomes for included video modeling studies
Reference N
value
No. of P.
incl. with
Diagnosis
Gender (M:
F) and age
Intervention
and setting
Density Design and measures G/M Fidelity Social validity TSB and outcomes
Allen et al.
(2010)
41×AS
1×HFA
2:0
16 and
17 years
VM
Retail
warehouse
Up to approx. 45
min×1 day
SSRD-MB, across
participants
Direct observation (i.e.,
partial interval
recording) of the
TSBs at baseline,
intervention and
maintenance phases
G—No
M—Yes
No Yes—
participant
completed
survey
Increases in the TSB’s
(i.e., waving, shaking
hands, and giving high
fives) in an average of
79 % of intervals
during the intervention
phase and of 70 % of
intervals during the
maintenance phase
Appleetal.
(2005)
42×AS
2×HFA
3:1
5–5.9 years
VM
Preschool
classroom
1 session×3 times
perweekfora
total duration of
1to4weeks
SSRD-MB, across
participants
Direct observation (i.e.,
frequency count per
observation period-
15 min) of the TSBs
at baseline,
intervention and
generalization
phases. Parent and
teacher reported
social skills
questionnaires were
administered pre- and
post-intervention
G—Yes; Training
sufficient exemplars for
stimuli & responses,
Intro. of natural
maintaining
contingencies, Prog.
common stimuli
M—No
Yes No Increases in one TSB
(i.e., compliments
responded to) only;
For 2 participants,
when VM was
combined with
tangible reinforcement
there were increases in
the remaining TSB
(i.e., compliments
initiated); for 2
participants, there was
no change in scores on
the social skills
questionnaire from
pre- to post-
intervention, however,
the remaining 2
participants received a
score that was one
point higher post-
intervention
Gena et al.
(2005)
31×HFA1:0
3.11
VM
Preschooler’s
homes
2–4 times per
week, with each
session lasting
15–20 min
SSRD-MB, across
participants with
embedded ABAC
design Direct
observation of the
TSB at baseline,
intervention,
generalization, and
maintenance phases
G—Yes: Training
sufficient exemplars for
stimuli & responses,
Intro. of natural
maintaining
contingencies, Prog.
common stimuli.
M—Yes
Yes No Increases in all three TSB
(i.e., appropriate
affective responses for
appreciation,
sympathy, and
disapproval type
situations)
Kagohara
et al.
(2013)
22×AS1:1P1&P2:
10 years
VM with social
stories
School
NP SSRD-MB, across
participants Direct
observation (i.e.,
frequen
cy count per
session) of the TSBs
at baseline,
intervention and
maintenance phases
G—No
M—Yes
Yes No Increases in the TSB (i.e.,
partial greetings)
during both the social
stories and VM
phases; increases in
the TSB (i.e., full
greetings) during the
VM phase only;
results were replicated
across both
participants
Nikopoulos
and
71×AS0:1
9 years
VM
School
35 s × for an
average of 37
sessions
SSRD-Multiple
treatment with
reversal design Direct
G—Yes: Training
sufficient exemplars for
stimuli, training loosely
No Yes—
observation
of the TSBs
Increase in appropriate
play, with a reduced
Rev J Autism Dev Disord
Table 1 (continued)
Reference N
value
No. of P.
incl. with
Diagnosis
Gender (M:
F) and age
Intervention
and setting
Density Design and measures G/M Fidelity Social validity TSB and outcomes
Keenan
(2003)
observations of the
TSBs at baseline,
intervention,
generalization and
maintenance phases
for responses, Prog.
common stimuli
M—Yes
by mothers of
school-aged
children
latency to socially
initiate
Radley et al.
(2014)
3 1×A 1:010.7 years VM with role
play (i.e.,
Superheroes
Social Skills
Manual)
University-
based clinic
10×1.5–2h
sessions, across
5 weeks
SSRD-MB, across
behaviors
Direct observations of
the TSBs (i.e., probes
of the target social
behavior) at baseline,
intervention and
generalization
phases; Standardized
measures included
ASSP (Bellini and
Hopf 2007) & PSI/SF
(Abidin 1995)
G—Yes: Training
sufficient exemplars for
stimuli & responses,
sequential
modification, Intro. of
natural maintaining
contingencies, Prog.
common stimuli M—
Yes
Yes Yes : B IR S
(Elliot and
Treuting
1991)
Increases in all four TSB
(i.e., participation,
conversation,
perspective taking &
problem solving);
generalization across
stimuli, persons and
settings; maintenance
at levels greater than
baseline for three
TSBs (i.e.,
participation,
conversation &
perspective taking);
observed
improvements, though
not significant, on the
measure of social
functioning (i.e.,
ASSP); significant and
clinically meaningful
reduction was found in
total stress scores (i.e.,
PSI/SF)
Sansosti and
Powell-
Smith
(2008)
31×AS1:0
6.6 years
Treatment
package SS
&VM
School
1 session×10 days SSRD-MB, across
participants
Direct observation (i.e.,
partial interval
recording) of the
TSBs at baseline,
intervention and
generalization phases
G—Yes: Training
sufficient exemplars for
responses, Intro. of
natural maintaining
contingencies, Prog.
common stimuli
M—Yes
Yes—Self
report
checklist
Yes—IRP-15
(Martens et al
1985)
Increases in the TSB (i.e.,
initiating and
maintaining
conversations) in
100 % of intervals
duringintervention
plus prompts phase
Scattone
(2008)
1 1×AS 1:0 9 years Tre atment
package SS
&VM
Medical center
24 session, across
15 weeks
SSRD-MB, across
behaviors
Direct observation (i.e.,
partial interval
recording) of the
TSBs at baseline,
intervention and
generalization phases
G—Yes: Train and hope
M—No
Ye
s—Parent
completed
checklist
Yes—IRP-15
(Martens et al
1985)
Increases in two TSBs
(i.e., eye contact and
initiations) during
intervention; large
generalized gains were
demonstrated for one
TSB (i.e., eye contact)
and smaller gains were
demonstrated for the
two remaining TSBs
(i.e., smiling and
initiations)
Note. P participants, Incl. included, AS asperger syndrome, HFA high-functioning autism, VM video modeling, SS social stories, NP not provided, SSRD single-subject research design, MB multiple
baseline, MP multiple probe, ASSP Autism Social Skills Profile, PSI/SF parenting stress index-short form, G generalization, Intro . introduction, Prog. programing, M maintenance, BIRS Behavior
Intervention Rating Scale, IRP Intervention Rating Profile, TSB target social behavior.
Rev J Autism Dev Disord
Table 2 Summary of participant characteristics, intervention characteristics, methodological characteristics and methodological outcomes for included role play studies
Reference N value No. of P.
incl. with
Diagnosis
Gender (M:
F) and age
Intervention and
setting
Density Design and measures G/M Fidelity Social validity TSB and outcomes
Dotson et al.
(2010)
61×AS1:0
17 years
Role play delivered in a
treatment package—
Teaching Interaction
(TI) procedure
Classroom of a
University
1.5 h sessions×2
sessions /week
SSRD-MP, across behaviors
Direct observation of the TSBs
(i.e., frequency count of the
steps of each skill during
probes) measured during
teaching and generalization
probes, conducted at
baseline, intervention and
maintenance
G—Yes: Train loosely
for stimuli
M—Yes
Yes No Increases for the TSBs (i.e.,
conversational basics,
providing feedback &
asking & answering on-
topic questions) in 84 % of
teaching probes and 60 % of
generalization probes across
intervention and
maintenance phases
Ferguson
et al.
(2013)
93×AS3:0
7–10 years
Role play treatment
package—Teaching
Interaction (TI)
procedure
Clinic
90 min×1 session/
week×10 weeks
SSRD-MB, across participants
Direct observation (i.e.,
frequency count) of the TSB
during the baseline and
intervention phases.
Generalization probes were
conducted following the
intervention phase.
G—Yes: Train loosely
for stimuli &
responses
M—No
No No Increases in the TSBs (i.e.,
sportsmanship skills: giving
compliments, taking turns,
making positive postgame
comments) across three
participants
Kassardjian
et al
(2014)
33×HFA2:1
P1, P2 & P3:
5years
Role play delivered in a
treatment package—
Teaching Interaction
(TI) procedure
Setting—not provided
45 min×1 session,
ranging from 4 to 8
sessions for the TI
procedure
Alternating treatment design
Direction observation of the
TSB (i.e., frequency count
of the steps of each skill
represented as percent
correct) during performance
probes, conducted at
baseline, intervention and
maintenance
G—No
M—Yes
Yes No Increases in the TSBs (i.e.,
changing he game when
bored) that received the TI
procedure across all three
participants
Leaf et al.
(2012a,
b))
32×HFA2:0
4and8years
Role play treatment
package—Coo
l
versus Not Cool
Procedure
Clinic
Approx. 60 min×1–5
sessions/week,
ranging from 3 to 25
sessions
SSRD-MB across behaviors
Direction observation of the
TSB (e.g., frequency count
of the steps of each skill and
intervals correct) during
probes, conducted at
baseline, intervention and
maintenance phases
G—No
M—Yes
Yes No Increases in the TSBs (i.e., joint
attention, changing the
conversation and eye
contact) when the
intervention (i.e., cool
versus not cool procedure)
was combined with
participant role play
Leaf et al.
(2012a,
b)
61×HFA
1×AS
2:0
5and6years
Role play delivered in a
treatment package—
Teaching Interaction
(TI) procedure
P1: Both the research
room in a university
and home
P2: Research room in a
university
45-min sessions×3–6
times /week
SSRD-MB across behaviors,
with a parallel treatment
design
Direction observation of the
TSB (i.e., frequency count
of the steps of each skill)
during performance and
generalization probes,
conducted at baseline,
intervention and
maintenance
G—Yes: Train loosely
for stimuli &
responses, Intro. of
natural maintaining
contingencies
M—Yes
Yes No Increases in the TSB (i.e.,
joining a game, clarifying
instructions, losing
graciously, remaining on
topic during a conversation,
andcheeringupafriend)
during intervention and
maintenance phases;
responding on the TSB
“losing graciously” was
variable for the performance
(i.e., P2) and generalization
probes (i.e., P1) during the
maintenance phase
Leaf et al.
(2010)
51×AS
2×HFA
3:0
4–5 years
Role play delivered in a
treatment package—
Teaching Interaction
(TI) procedure
Pre-school classroom at a
university
2×1.5-h sessions, across
20–28 weeks
SSRD-MP, across behaviors
Direct observation of the TSBs
(i.e., frequency count of the
steps of each skill) during
teaching and generalization
probes, conducted at
baseline, intervention and
maintenance
G—
Yes: Train loosely
fo
r stimuli &
responses, &
sequential
modification
M—Yes
Yes Yes: 17 it e m
parental survey
Increases for the TSB (i.e.,
making an empathetic
statement & changing the
game) in 91 % of teaching
probes and 56 % of
generalization probes across
intervention and
maintenance phases
Rev J Autism Dev Disord
Table 2 (continued)
Reference N value No. of P.
incl. with
Diagnosis
Gender (M:
F) and age
Intervention and
setting
Density Design and measures G/M Fidelity Social validity TSB and outcomes
Leaf et al.
(2009)
32×HFA2:0
5and6years
Role play delivered in a
treatment package—
Teaching Interaction
(TI) procedure
Summer school
program—setting not
provided
30-min sessions×3 days/
week×8 weeks
SSRD-MB, across behaviors
Generalization probes of the
TSB following each
teaching session during the
intervention phase and
across the maintenance
phase
G—Yes: Train loosely
for stimuli &
responses, Intro. to
natural maintaining
contingencies
M—Yes
No No Increases in TSBs (i.e., absence
of inappropriate
conversations, playing what
his friend wants to play,
giving a compliment,
choosing the same friend for
activities, sharing,
remaining on-topic during a
conversation, choosing the
same friend for activities) in
96 % of generalization
probes during the
intervention phase; gains
maintained at intervention
levels for one behavior for
P1 and for three behaviors
for P2
Palmen
et al.
(2008)
92×HFA2:0
17 years
Role play delivered in a
treatment package
(i.e., self-management
and feedback)
Therapy room
1 h/week×6 weeks SSRD
non-concurrent MB, across
participant groups
Direct observation of the TSB
(i.e., partial interval
recording) during a weekly
one hour meeting with the
participants coach
G—No
M—Yes
No Yes: coach and
participant
completed
survey
Increases in the TSB (i.e.,
question asking during a
conversation & response
efficiency) in 100 % of
sessions; Gains were
maintained at follow up
Tse et al.
(2007)
46 46×AS/
HFA
NP 13–18 years Role play-adapted from
the book:
Skillsstreaming the
Adolescent
Conference room of a
child and adolescent
psychiatry clinic
1–1/2 h×12 weeks Pre-test, Posttest, with no group
comparison Standardized
measures: SRS (Constantino
et al. 2000), ABC (Aman,
Singh, Stewart, and Field
1985), and the N-CBRF
(Aman, Tasse, Rojahn, and
Hammer 1996)
No No Yes: participant
and parental
surveys
Significant pre- to
posttreatment gains were
found on measures of both
social competence and
problem behaviors
asso
ciated with AS/HFA
Note. P participants, NP not provided, Incl. included, AS asperger syndrome, HFA high-functioning autism, VM video modeling, CBI computer-based instruction, SSRD single-subject research design, MB
multiple baseline, MP multiple probe, SRS social responsiveness scale, ABC aberrant behavior checklist, N-CBRF Nisonger Child Behavior Rating Form, G generalization, Intro. Introduction, M
maintenance, TSB target social behavior.
Rev J Autism Dev Disord
Table 3 Summary of participant characteristics, intervention characteristics, methodological characteristics and methodological outcomes for included Computer-Based Instruction studies
Study N No. of P.
incl. with
diagnosis
Gender (M:F)
and age
Intervention and
setting
Density Design and measures G/M Fidelity Social validity TSB and outcomes
Bauminger-
Zviely et al
(2013)
22 22×HF A 18:4
M=9.83 years
CBI—“Join in” and “No
Problem”
School
12×45 min
lessons
GCD
Direct social cognitive
measures (not
standardized), Problem
solving measure, concept
clarification: Cooperation
and social conversation;
non-direct social cognitive
measure (not
standardized), Theory of
Mind (ToM): Strange
Story measure; Direct
observation of overt social
engagement (i.e.,
collaboration and social
conversation)-
Companionship Measure-
The Drawing Task
(Bauminger 2007)
No No No Significant increases in the
understanding of the concepts
of collaboration and social
conversation and in the
participants ability to generate
active problem solving
solutions; significant increases
in social engagement
following intervention for the
entire sample, with greater
improvements in collaborative
social engagement skills for
participants who received the
“Join-In” intervention first
Beaumont and
Sofronoff
(2008)
49 49×AS 44:5
7.5–11 year
CBI-
Junior Detective Traini ng
Program University-
clinic
2×7 h sessions RCT
Standardized measures- SSQ
(Spence 1995a), ERSSQ,
Assessment of perception
of emotion from facial
expression (Spence
1995b), Assessment of
perception of emotion
from posture cues (Spence
1995b),
Knowledge of emotion
management strategie
James and the Maths Test
(Attwood 2004a)&Dylan
is being teased (Attwood
2004b)
G—No
M—Yes
Yes—direct
observation
and checklist
No Participants in the experimental
group showed greater
improvements in social skills
(i.e., social functioning) over
th
e course of the intervention
Bernard-Optiz
et al. (2001)
16 8×HF A 6:2
5.8–8.5 years
CBI 10 sessions GCD
Frequency of novel ideas
during probe and training
sessions
No No No Steady increase in the TSBs (i.e.,
social problem solving) across
sessions; however, participants
with HF A generated fewer
alternative solutions to their
neurologically typical peers
Cheng et al.
(2010)
33×HFA 3:0
8, 9, and 10 year
Collaborative virtual
learning environment
(CVLE
3D empathy system
School
1×-
40 mi-
n×22 days
SSRD-MB, across
participants
Standardized measures
ERS (Lin 2008)
G—No
M—Yes
No No Increases in the TSB (i.e.,
empathy) in 100 % of
intervention sessions;
maintenance of gains at follow
up
Cheng and Ye
(2010)
33×HFA 2:1
1×7 years;
2×8 years
Collaborative virtual
learning environment
(CVLE)3D empathy
system
School
1×40 min
session for
5days
SSRD-MP, across
participants
Direct observation (i.e.,
frequency occurrence) of
the TSB during baseline,
intervention and
G—No
M—Yes
No Yes—open-ended
questionnaire
Increases in all TSBs (i.e.,
appropriate answers,
understanding the expressive
feelings of others, recognizing
non-verbal behaviors-facial
expression and body language,
eye contact, appropriate
Rev J Autism Dev Disord
Table 3 (continued)
Study N No. of P.
incl. with
diagnosis
Gender (M:F)
and age
Intervention and
setting
Density Design and measures G/M Fidelity Social validity TSB and outcomes
maintenance phases;
standardized measures
SSP and the BC (Jeanie et al.
2007)
manner&listeningtoothers
during intervention) across all
participants; Maintenance of
gains at follow up
Gordon et al
(2014)
34 17×HFA/
ASD
HFA group:
M=10.89;
TD control group;
M=10.76
CBI- Facemaze 2 games (i.e.,
Happy and
Angry
maze) of
approx.
4minin
duration
GCD
Direct observations of TSBs
pre- and post-intervention
A rating scale was used to
measure the presence and
absence of the
demonstration of the TSB
No No No Increases in the target social
behaviors (i.e., facial
expressions) from pretest to
posttest of the participants in
the HF A/ ASD group, with the
control “surprise” expression
showingnochangesinquality
ratings; While the quality of
the HF A/ASD group’s
“happy” expression was not
comparable to the TD control
group at pretest, posttest
results revealed comparable
expressions at post-test
Hopkins et al.
(201 1 )
49 24×HF A 21:3
M=10 years
CBI-Face say
School
10–25 min×12
sessions,
across
6 weeks
RCT
Direct observations of social
skills at baseline and post-
intervention
Standardized measures
SSRS (Gresham and Elliott
1990), & the Benton
Facial Recognition test
(Benton 1980); Other
measures- Emotion
Recognition test, with
photos selected from
EkmanandFriesen(1975)
No No No Significant improvement in the
TSBs-emotional and facial
recognition; while
improvements in social
interactions skills (i.e.,
measured by the SSRS) were
not significant there were
significant improvements in
the observational measure of
social skills
Ke and Im
(2013)
44×AS 2:2
3×10 year;
1×9 years
Virtual reality (VR) social
skills program
Home/school/parent’
s
of
fice
60 min×2–3
sessions/
week for 6–
9 sessions
SSRD-MB, across
participants
Direct observation (i.e.,
frequency count) of the
TSB during baseline and
intervention phases;
Standardized
measuresSSQ (Spence
1995a, 1995b)
No No Yes—satisfaction
interview
P1 & P2: increases in 66.66 % of
TSB (i.e., recognizing body
gestures and facial
expressions, responding to and
maintaining interactions &
leading or initiating
interaction) by Task 3; P3 &
P4: increases in 50 % of TSB
LaCava et al.
(2007))
88×AS 6:2
8–11 years
CBI: Mindreading
programe
Home (five participants)
& school (three
participants)
10 weeks Pretest, Posttest, with no
group comparison
Standardized measuresCAM-
C (Golan and Baron-
Cohen 2006a, 2006b), C-
FAT (Golan et al. 2008),
RMF-C (Golan, Baron-
Cohen, and Golan 2008)
No Yes—software
use was
monitored on
data from the
computer
software
Yes—checklist Significant improvement from
pre- and posttest on all three
measures of the TSBs (i.e.,
emotional recognition of self
and others)
Mitchell et al.
(2007)
73×HFA 0:3
14.4–15.9 years
(M=14.9 years)
CBI, with virtual
environmental training
40-min
sessions
across
6 weeks
GCD, with crossover
Frequency of correct
responding to social
scenarios at three time
points
No No No Small to modest improvements in
judgments and explanations
about where to sit, both in a cafe
and a bus, across all three
participants; Modest
Rev J Autism Dev Disord
Table 3 (continued)
Study N No. of P.
incl. with
diagnosis
Gender (M:F)
and age
Intervention and
setting
Density Design and measures G/M Fidelity Social validity TSB and outcomes
improvements in social
reasoning for two participants,
with a decrement in social
reasoning for the remaining
participant
Stichter et al.
(2014)
11 11× HFA 11:0
11–14 years
(12.6 years)
iSocial
School
31–45 min×2–
3days/
week×10 -
weeks
Pretest, Posttest, with no
group comparison
Standardized measuresSRS
(Constantino and Gruber
2005), BRIEF (Gioia et al.
2000), Faux Pas Stories
(Baron-Cohen et al. 1999)
& Strange Stories (White
et al. 2009), D-KEFS
(Delis et al. 2001), & CPT-
II (Conners and Staff
2000); Other performance
measures include:
Reading the Mind in Eyes
test (Baron-Cohen et al.
2001), DANVA-2-CF
(Nowicki and Carton
1993)
No Yes—checklist Yes—IRP-15
(Martens et al
1985)anda
satisfaction
survey
(Wheeler et al
2002)
Significant pre- to posttreatment
gains were found on the
measure social skills as rated
by parents only; participants
recognition of others’
perspectives in social
situations (e.g., Faux Pas
Stories, Strange Stories Mental
States) was variable and
results did not indicate
improvement pre to post
intervention; No statistically
significant improvements on
measures of executive
functioning
Tartaro et al
(2014)
76×HFA 5:2
8.11–12.0 years
CBI—authorable virtual
peer intervention
2 h×1 day/
week×1 1 -
weeks
Within subjects, counter-
balanced design
Standardized measuresSRS
(Constantino and Gruber
2005); Direct observation
of TSB during role-play
No No No Significant increases in the
appropriate use of reciprocity
components directly following
use
of the authorable virtual
peer intervention and over the
course of the entire
intervention; significant
difference for the social
communication subscale of the
SRS
Note. P participants, M mean, TD typically developing, Incl. included, AS asperger syndrome, HFA high-functioning autism, VM video modeling, CBI computer-based instruction, VR virtual reality, SSRD
single-subject research design, MB multiple baseline, MP multiple probe, RCT randomized control trial, GCD group comparison design, SRS social responsiveness scale, D-KEFS delis–kaplan executive
functioning system, CPT-II conners’ continuous performance test-II, DANVA-2-CF diagnostic analysis of non-verbal accuracy-2, child facial expressions, SSQ social skills questionnaire, SSP social
situation pictures, BC behavior checklist, ERS empathy rating scale, SSRS social skills rating scale, ERSSQ emotional regulation and social skills questionnaire, CAM-C cambridge mindreading face–voice
battery for children, C-FAT child feature-based auditory task, RMF-C reading the mind in films test–children’s version, G generalization, M maintenance, TSB target social behavior.
Rev J Autism Dev Disord
were single-subject designs. Nine studies were categorized as
using role play, either alone or within a multi-component treat-
ment package. Of these studies, eight utilized SSRDs and one
utilized a group comparison design. The remaining 12 studies
used CBI as the primary intervention. Of these 12 studies,
three were SSRDs and nine were group comparison designs.
Table 1 provides a summary of the participant, intervention,
and methodological characteristics, as well as the intervention
outcomes for studies that employed a VM intervention.
Table 2 summarizes the participant, intervention, and method-
ological characteristics, as well as the intervention outcomes
for studies that employed a role play intervention. Table 3
summarizes participant, intervention, and methodological
characteristics, as well as the intervention outcomes for stud-
ies that employed a CBI intervention.
Participants
There were a total of 330 participants across the 29 studies. Of
these, results for 235 participants were included in this review,
having met the inclusion criteria (i.e., diagnosis of HFA with
an IQ>85 or AS). The mean age of the participants was 9 years
(range 3.11–17 years). Studies (n=8) that included the age
range of the participants were not included in the calculation
Tabl e 4 Summary of treatment efficacy and strength ratings for included studies
Reference NAP and confidence intervals (Range) Strength rating
Video modeling
Allen et al. (2010) Intervention—0.84, 90 % CI=0.59–1; maintenance—0.80, 90 % CI=0.43–1Weak
Apple et al. (2005) Intervention—0.75, 90 % CI=0.52–0.99; generalization—0.93, 90 % CI=0.45–1. Weak
Gena et al. (2005) Intervention—0.69, 90 % CI=0.37–1; maintenance—0.88, 90 % CI=0.38–1. Weak
Kagohara et al. (2013) *Unable to calculate NAP as the video modeling phase was not adjacent to the
baseline phase
Weak
Nikopoulos and Keenan (2003) Intervention—0.76, 90 % CI=0–1; maintenance—1, 90 % CI=0.31–1Weak
Radley et al. (2014) Intervention—0.99, 90 % CI=0.75–1; generalization—0.96, 90 % CI=0.73–1;
maintenance—0.96, 90 % CI=0.73–1
Weak
Sansosti and Powell-Smith (2008) Intervention—0.91, 90 % CI=0.36–1; generalization—0.60, 90 % CI=0.47–0.87;
maintenance—1, 90 % CI=0.39–1
Weak
Scattone (2008) Intervention=0.91, 90 % CI=0.62–1; maintenance: 1, 90 % CI=0.35–1 Adequate
Role Play
Dotson et al. (2010) Intervention—0.93, 90 % CI=0.54–1; generalization—0.51, 90 % CI=0.21–1;
ma
intenance—0.90, 90 % CI 0.52–1
Weak
Ferguson et al. (2013) *Unable to calculate NAP as the mean for each phase (baseline, intervention &
generalization) was provided.
Weak
Kassardjian et al. (2014) Intervention—0.78, 90 % CI=0.36–1; maintenance=0.93, 90 % CI=0.43–1 Adequate
Leaf et al. (2012a, b)) Intervention—0.81, 90 % CI=0.59–1; maintenance—0.95, 90 % CI=0.67–1 Adequate
Leaf et al. (2012a, b)) Intervention—0.86, 90 % CI=0.65—1; generalization—0.91, 90 % CI=0.61–1;
maintenance—0.98, 90 % CI=0.75–1
Weak
Leaf et al. (2010) Intervention—0.92, 90 % CI=0.71–1; generalization—0.73, 90 % CI=0.5–0.96;
maintenance—0.99, 90 % CI=0.77–1
Adequate
Leaf et al. (2009) Intervention—0.97, 90 % CI=0.71–1; maintenance—0.94, 90 % CI=0.65–1 Adequate
Palmen et al. (2008) Intervention—1, 90 % CI=0.40–1; maintenance—1, 90 % CI=0.13–1Weak
Tse et al. (2007)N/A Weak
Computer-Based Instruction
Bauminger-Zviely et al. (2013)N/A Adequate
Be
aumont and Sofronoff (2008)N/A Strong
Bernard-Optiz et al. (2001)N/A Weak
Cheng et al. (2010) Intervention—1, 90 % CI=0.63–1; generalization—1, 90 % CI=0.59–1Weak
Cheng and Ye (2010) Intervention—1, 90 % CI=0.60–1; generalization—1, 90 % CI=0.48–1Weak
Gordon et al. (2014)N/A Weak
Hopkins et al. (2011)N/A Adequate
Ke and Im (2013) *Unable to calculate NAP as data points were not clear on the graphs provided. Weak
LeCava et al. (2007)N/A Weak
Mitchell et al. (2007)N/A Weak
Stichter et al. (2014)N/A Weak
Tartaro et al. (2014)N/A Weak
Note. NAP nonoverlap of all pairs, CI confidence intervals, N/A not applicable.
Rev J Autism Dev Disord
of the mean age across studies. Twenty-three studies
(79.31 %) included children <13 years as the participant sam-
ple, and five studies (17.24 %) included adolescents between
13 and 17 years. One study (3.44 %) included a mixed partic-
ipant sample of 11–14 years old children and adole scents
(Stichter et al. 2014). Some 39.7 % of participants were diag-
nosed with AS and 59.8 % were diagnosed with HFA. One
study (Tse et al. 2007) did not differentiate the diagnoses of
participants. Other co-occurring diagnoses included ADD
(Dotson et al. 2010), ADHD (Kagohara et al. 2013;Keand
Im 2013; Nikopoulos and Keenan 2003), dyslexia (Dotson
et al. 2010), and PDD-NOS.
Experimental Design
Tables 1, 2,and3 provide a summary of the type of experi-
mental design employed by the studies reviewed. Nineteen
studies utilized SSRDs. Of these, 10 studies (52.63 %)
employed a multiple-baseline design across participants and
7 (36.80 %) employed a multiple-baseline design across be-
haviors. The remaining studies utilized a multi-element treat-
ment design (n=1; 5.26 %) and an alternating treatments de-
sign (n=1; 5.26 %).
Of the ten studies that employed group comparison de-
signs, two studies (20 % ) used RCTs and four studies
(40 %) used group comparison designs without random allo-
cation to the experimental and control groups. The remaining
four studies (40 %) used a single group design and examined
treatment effectiveness at pretest and posttest (LaCava et al.
2007; Stichter et al. 2014; Tartaro et al. 2014; Tse et al. 2007).
Dependent Variables
The 29 included studies examined a variety of dependent var-
iables. For the purpose of this review, dependent variables
were organized into the following categories: verbal and
non-verbal social communication, recognizing and expressing
emotions and showing empathy, social problem solving, and
social skills for play. Examples of verbal and non-verbal social
communication behaviors include greetings, initiati ng and
maintaining conversations, conversational turn taking, initiat-
ing and responding to compliments, identifying facial expres-
sions, eye contact, and smiling. Examples of behaviors includ-
ed in the category recognizing and expressing emotions and
showing empathy are identifying and expressing the feelings
of the self and others, perspective taking and providing empa-
thetic statements. Examples of social problem solving behav-
iors include negotiation and identifying and responding to
teasing and bullying. Lastly, time spent in appropriate play
and allowing a friend to choose a game are examples of be-
haviors included in the category social skills for play.
The majority of the studies (n=20; 68.96 %) targeted ver-
bal and non-verbal social communication either alone (n=10;
50 %) or in conjunction with recognizing and expressions
emotions and showing empathy (n=3; 15 %) or social skills
for play (n=4; 20 %) or social problem solving (n=2, 10 %).
One study (n=1; 5 %) targeted verbal and non-verbal social
communication together with both recognizing and express-
ing emotions and showing empathy and social problem solv-
ing. Of the remaining studies (n=9; 31.03 %), three studies
targeted recognizing and expressing emotions and showing
empathy alone (n=3; 33.33 %), while one study targeted rec-
ognizing and expressing emotions and showing empathy to-
gether with social skills for play (n=1; 11.11 %). In addition to
this, social problem solving was targeted alone in two studies
(n=
2; 22.22 %). Overall social functioning (n=2; 22.22 %)
and community social responses (i.e., responses required in a
café or on a bus) (n=1; 11.11 %) were so cial behaviors
targeted alone and that did not fall into the four categories of
dependent variables described.
Measures
The measures used to assess intervention outcomes differed
according to study design (see Tables 1, 2,and3). Of the 19
SSRDs, 14 studies (73.68 %) used direct observation with
event recording alone to measure the dependent variables dur-
ing baseline, intervention, and/or maintenance and generaliza-
tion phases, and one study (5.26 %) used standardized assess-
ments to measure the dependent variables and/or overall social
functioning. The remaining four studies used a combination of
direct observation and standardized assessment (n =3;
15.78 %) or direct observation with non-standardized mea-
sures or questionnaires (n=1;5.26%)tomeasurethedepen-
dent variables and overall social functioning. Of the studies
that employed group comparison designs, standardized mea-
sures of social skills were used alone in four (40 %) of the ten
studies. In addition to this, three of the ten studies (30 %) used
direct observation (e.g., frequency counts) alone pre- and
post-intervention and a further two studies (20 %) used both
standardized assessments and direct observation. The remain-
ing study (n=1; 10 %) combined non-standardized measures
with direct observation to measure the dependent variables.
Generalization and Maintenance
Of the 17 studies that employed SSRDs, 10 (58.80 %) includ-
ed measures of generalized outcomes, with five of these 10
studies (50 %) employing a VM intervention, and the remain-
ing five (50 %) studies employing a role-play intervention. Of
the seven group comparison design studies, there was not one
study that programed for and/or included a measure of gener-
alized social outcomes. In addition to this, studies employing a
CBI intervention neither programed for nor measured gener-
alized outcomes. Of the five VM studies that programed and
measured for generalized outcomes, all five (100 %)
Rev J Autism Dev Disord
employed a combination of technologies for programing for
generalization (Stokes and Baer 1977). Of these studies, all
five (100 %) included “Training Sufficient Exemplar” and
“Programming for Common Stimuli”. In addition to this, four
(80 %) of the five studies included the “Introduction of
Natural Maintaining Contingencies”, one (20 %) of the five
studies included “Training Loosely” and one (20 %) other
study included “Sequential Modification” to program for gen-
eralization. Of the five studies that employed a role play in-
tervention, all five (100 %) studies included “Training
Loosely” to program for generalization. In addition to this,
one (20 %) of the five studies combined “Training Loosely”
with the “Introduction of Natural Maintaining Contingencies”,
and one (20 %) other study combined “Training Loosely” with
“Sequential Modification”. Tables 1, 2, and 3 describe the
combination of technologies employed in each study.
Similar overall results were found for the maintenance,
where 15 of 19 SSRD studies (78.94 %) and one of the 10
group comparison study (10 %) measured the maintenance of
gains across time. Across these studies, maintenance assess-
ments ranged from 1 week to 5-month post-intervention.
Treatment Fidelity
Of the 29 studies reviewed, 14 studies (48.27 %) included a
measure of treatment fidelity. Studies evaluating VM (n=6;
42.85 %) included the majority of measures of treatment in-
tegrity compared to role-play (n=5; 35.71 %) and CBI (n=3;
21.42 %).
Social Validity
With regard to the evaluation of social validity, 10 of the
19 SSRD studies (52.63 %) included a method to measure
intervention satisfaction. Three of the 10 SSRDs (30 %)
employed a standardized measure of social validity (i.e.,
Intervention Rating Profile-15) with the remaining studies
employing either parent or participant completed surveys
(n=4; 40 %), satisfaction interviews (n=1; 10 %), or
opened-ended questionnair es (n=1; 10 %). One study
(n=1; 10 %) measured social validity by having blind
observers score the presence or absence of the target be-
havior. Of the 10 studies that employed a group compar-
ison design, three studies (30 %) included a measure of
social validity. One study (33.33 %) used an adolescent
and parental completed survey to measure social validity
(Tse et al. 2007). Another study (n=1; 33.33 %) used a
checklist adapted from a standardized measure of social
validity (LaCava et al. 20
07) and the remaining study
(n=1; 33.33 %) used a standardized measure of social
validity (i.e., Intervention Rating Profile-15; Stichter
et al. 2014).
Treatment Efficacy Calculations
It was possible to calculate treatment efficacy for 16 studies
employing SSRDs using the NAP statistic. Seven of these
studies used a VM intervention to increase social skills, pro-
ducing a median NAP effect size of 0.84, with a range of
0.69–0.99. According to Parker and Vannest (2009), a NAP
effect size of between 0.66 and 0.92 reflects a treatment of
medium effect. Seven of the 16 SSRD studies used a role play
intervention to increase social skills, producing a median NAP
effect size of 0.92, with a range of 0.78–1.0, therefore,
reflecting a treatment of medium effect (Parker and Vannest
2009). Two of the SSRD studies used a CBI intervention to
increase social skills, producing a median NAP effect size of
1.0. A NAP effect size between 0.93 and 1 reflects a treatment
of large effect. Table 4 provides a summary of the NAP effect
size for each individual study.
Research Strength and Evidence-Based Practice
Evaluation
The research strength of the included studies across interven-
tion type was calculated in accordance with Reichow’s(2011)
criteria. For studies that used a VM intervention, one (12.5 %)
received an “adequate” rating and seven (87.5 %) received a
“weak” rating. None of the included studies that used a VM
intervention were rated as strong. An evidence-based status Z
score of 2 was calculated [(0*30)+(0*15)+(0*4)+(1*2)=2],
indicating that VM interventions could not be categorized as
evidence-based practice (Reichow 2011).
For the studies that used a role play intervention, four
(44.44 %) were rated as “adequate” and five (55.55 %) were
rated as “weak”. An evidence-based status Z score of 20 was
calculated [(0*30)+(0*15)+(0*4)+(10*2)= 10], indicating
that role play interventions could not be categorized as
evidence-based practice (Reichow 2011).
For studies that used a CBI inte rvention, one (8.33 %)
study was rated as “strong” and two (16.66 %) studies were
rated as “adequate”. The remaining nine studies (75 %) were
rated as “weak”. An evidence-based status Z score of 45 was
calculated [(1*30)+(2*15)+(0*4)+(0*2)=60], indicating
that CBI interventions could be categorized as an established
evidence-based practice (Reichow 2011). Table 4 provides a
summary of the strength ratings for each study included in this
review.
Discussion
The present review aimed to assess the extant literature on the
utility of VM, role play, and CBI as interventions for increas-
ing social skills in children and adolescents with a diagnosis of
HFA. For all studies included in this review (i.e., both SSRDs
Rev J Autism Dev Disord
and group comparison designs), an evaluation of the empirical
support was undertaken by applying Reichow’s(2011)meth-
odology fo r determining EBP. For studies that included a
SSRD only, treatment efficacy was assessed using the NAP
statistic (Parker and Vannest 2009).
Reichow’s(2011) criteria describe two levels of EBP
which include an “established” EBP and a “promising” EBP.
The findings of the current review reveal that only CBI had the
accumulated evidence necessary to be classified as an
established EBP. Furthermore, according to Reichow’s
(2011) criteria and the inclusion criteria set forth in this current
review, VM and role play could not be classified as EBP. For
the VM intervention, this finding is in contrast to other re-
views evaluating the evidence base of the intervention for
children and adolescents w ith ASD, where VM has been
shown to have positive outcomes for social behavioral in-
crease (Shulta-Mehta et al. 2010) and to be classified as a
promising EBP (Reichow and Volkmar 2010). This review
marks the first evaluation of the evidence base for role play
intervention for social skills.
While the findings of the review support the positive out-
comes for CBI interventions in the area of social skills training
and HFA (i.e., Beaumont and Sofronoff 2008; Bernard-Optiz
et al. 2001; Hopkins et al. 2011), it reflects the current status of
the methodological rigor of the three interventions evaluated.
For example, only one CBI study received a strong strength
rating (Beaumont and Sofronoff 2008), with the variables re-
sponsible for this including a large sample size and the use of a
randomized control trial (RCT) to evaluate intervention
outcomes.
Of the remaining CBI studies, two studies received an ad-
equate strength rating (Bauminger-Zviely et al. 2013;Hopkins
et al. 2011), and nine studies received a weak strength rating
(Bernard-Optiz et al. 2001; Cheng et al. 2010; Cheng and Ye
2010; Gordon et al. 2014; Ke and Im 2013; LaCava et al.
2007;Mitchelletal.2007; Stichter et al. 2014; Tartaro et al.
2014). Six of the nine studies that received a weak strength
rating were group comparison in design. From an analysis of
the Reichow’s(2011) primary and secondary quality indica-
tors, failure to provide a comparison control group (Tartaro
et al 2014) and operationalized information regarding the par-
ticipant’s characteristics impacted on the outcome (LaCava
et al. 2007; Mitchell et al. 2007
; Stichter et al. 20
14). Other
variables that contributed were an insufficient sample size
(Mitchell et al. 2007) and an operationalized definition of
the dependent variable (Bernard-Optiz et al. 2001;Mitchell
et al. 2007). Consistent across all three SSRDs that received a
weak strength rating was the failure to describe the dependent
variable with operational precision (Cheng et al. 2010; Cheng
and Ye 2010;KeandIm2013). Further to this, the strength
ratings for these studies were negatively impacted by the de-
scription of the independent variable and baseline conditions
(Cheng et al. 2010; Cheng and Ye 2010).
For studies that employed a VM intervention, only one
study received an adequate rating (Scattone 2008) and seven
studies received weak ratings (Allen et al. 2010; Apple et al.
2005; N ikopo ulos and Keen an 2003;Radleyetal.2014;
Sansosti and Powell-Smith 2008 ). Insufficient participant
and methodological characteristics were responsible for this
finding. For example, four of the seven studies failed to pro-
vide an operationalized definition of the diagnosis (i.e., in-
cluding the specific diagnosis and the diagnostic instrument)
for all participants or to provide adequate information on the
characteristics of the interventionists (i.e., Allen et al. 2010;
Apple et al. 2005; Nikopoulos and Keenan 2003). Further to
this, insufficient demonstration of experimental control was
consistent among these studies, where experimental effect var-
ied with the manipulation of the independent variable (i.e.,
Scattone 2008) or there were an insufficient number of dem-
onstrations (i.e., Apple et al. 2005; Nikopoulos and Keenan
2003).
The studies included in this review that employed
role play interventions received slightly more positive
strength ratings. For example, five of the nine s tudies
received a weak strength rating, with the remaining four
studies receiving adequate strength ratings. The vari-
ables r esponsible for an adequate strength rating include
operationally defined participant characteristics (i.e., age,
gender, diagnosis, etc.) and independent and dependent
variables, as well as adherence to t he characteristics of
the baseline condition and the criteria for demonstrating
experimental control. Expectedly, insufficient informa-
tion on the c haracteristics of the interventioni st and the
description of baseline conditions (i.e., Ferguson et al
2013;Palmenetal.2008) were responsible for the re-
ceipt of weak strength ratings.
In support of the established evidence base for CBI inter-
ventions is the finding that CBI interventions produce large
effects