Content uploaded by Laurentiu Maricutoiu
Author content
All content in this area was uploaded by Laurentiu Maricutoiu on Jun 11, 2020
Content may be subject to copyright.
Contents lists available at ScienceDirect
Educational Research Review
journal homepage: www.elsevier.com/locate/edurev
Reviewing the research on instructional development programs for
academics. Trying to tell a different story: A meta-analysis
Marian D. Ilie
a,b,∗
, Laurențiu P. Maricuțoiu
a,c
, Daniel E. Iancu
a
,
Izabella G. Smarandache
a
, Velibor Mladenovici
a
, Dalia C.M. Stoia
a
, Silvia A. Toth
a
a
Center of Academic Development at the West University of Timișoara, Romania
b
Teacher Training Department at the West University of Timișoara, Romania
c
Department of Psychology at the West University of Timișoara, Romania
ARTICLE INFO
Keywords:
Instructional development programs
Higher education
Meta-analysis
Effect size
ABSTRACT
This paper presents a meta-analysis that investigates the effectiveness of instructional develop-
ment programs (IDPs) dedicated to academics. IDPs are instructional activities specifically
planned to improve quality of teaching in higher education by enhancing academics' instruc-
tional approach to support student learning. We analyzed 1555 unique results from online
searches and from the references lists of the previous nine reviews on this topic. The final sample
contains 20 controlled studies, carried out between 1973 and 2019, in which 23 independent
effect sizes were reported. The results indicated that IDPs have a small effect size (d= 0.385,
SE = 0.055, Z = 6.985, p < .001, k = 23). We did not find evidence of publication bias.
Additionally, we analyzed the influences of different trainee's characteristics, features and out-
come of IDPs on the final effectiveness of such programs. Trainees' mandatory enrollment on IDPs
reported stronger effect (d= 0.515), while IDPs that lasted less than 15 h reported the highest
effect size (d= 0.571). The effect size of IDPs reported on students' outcomes is small
(d = 0.396), but that on teachers' outcomes is even smaller (d = 0.315). For the first time, our
findings summarize quantitative evidence regarding the overall IDPs effectiveness.
1. Introduction
In the past 50 years, practitioners and researchers have invested substantial effort in the design of instructional development
programs (IDPs) aimed at increasing the quality of academics' instructional approach in higher education. After reviewing 38 meta-
analyses (published between 1980 and 2014), Schneider and Preckel (2017) concluded that academics with high-achieving students
invest effort and time in designing their instructional approach. Therefore, academics that are well prepared for their teaching role
have strong premises for obtaining high levels of student achievement. Unfortunately, there is scarce information regarding how to
improve the instructional approach of academics through IDPs (Stes, Min-Lelived, Gijbels, & Van Petegem, 2010a). For example, too
little is known about the features that make IDPs effective, or about the influence of trainee's characteristics on the final impact of
such programs (Steinert et al., 2016;Stes et al., 2010a). Also, little is known about the overall effect of the IDPs because the previous
literature reviews had a qualitative approach. Thus, there is evidence regarding the impact of academics' instructional approach on
students' achievement, but we have little information about how academics' instructional approach could be improved through IDPs.
https://doi.org/10.1016/j.edurev.2020.100331
Received 25 May 2019; Received in revised form 25 March 2020; Accepted 30 March 2020
∗
Corresponding author. West University of Timișoara, Teacher Training Department, Center of Academic Development, No. 4 Vasile Pârvan Blvd.,
300223 Timișoara, Romania.
E-mail address: marian.ilie@e-uvt.ro (M.D. Ilie).
Educational Research Review 30 (2020) 100331
Available online 10 April 2020
1747-938X/ © 2020 Elsevier Ltd. All rights reserved.
T
We have some general conclusions that IDPs work (e.g., Steinert et al., 2016) but we know very little regarding the ways we can
improve their effectiveness.
In this article, we aim to summarize the knowledge regarding the effect of IDPs, and to provide insights regarding how to improve
their effectiveness. We began by presenting the prior reviews on the effects of IDPs in higher education. Then, we advanced four main
research questions, followed by the state-of-the-art description on: (1) reporting and interpreting the effect size of IDPs (2) the
trainee's characteristics that could influence the effects of IDPs, (3) the IDPs features relevant for their effectiveness and (4) the
outcome levels of the IDPs. For each topic, we formulated subsequent research questions. Further, we presented the design, the
procedures and the results of our review. The effects of IDPs are presented and interpreted in terms of effect sizes for each of the
topics of interest. Also, we presented possible implications of this work for instructional development practice, and for future research
in the field. Additionally, we advanced a comprehensive framework for reporting the data on future research studies in the field.
2. Previous reviews of instructional development research
In the past 40 years, nine reviews summarized the research on the effect of IDPs in higher education (Amundsen & Wilson, 2012;
De Rijdt, Stes, Van Der Vleuten, & Dochy, 2013;Levinson-Rose & Menges, 1981;McAlpine, 2003;Prebble et al., 2004;Steinert et al.,
2016,2006;Stes et al., 2010a;Weimer & Lenze, 1998).
Levinson-Rose and Menges (1981) conducted the first systematic literature review and selected 71 studies from the mid-sixties up
to 1980. The authors intended to use the meta-analysis technique, but they conducted a narrative review approach, because of the
poor quality of quantitative data in the selected studies. The selected studies were clustered in five categories. Positive results were
reported for all types of IDPs (i.e., grants; workshops and seminars; feedback from student ratings; practice with feedback; and
concept-based training), but Levinson-Rose and Menges (1981) admitted that there were numerous poor-quality studies in the field.
Thus, they advanced different suggestions for future research as follows: to investigate individual participants’ characteristics; to
compare the impact of different types of IDPs; to use a mixed research approach or to conduct rigorous experimental investigations
(Levinson-Rose & Menges, 1981, p. 418–419).
In their review, Weimer and Lenze (1998) included 66 studies published in the eighties. The authors revised the five categories
which Levinson-Rose and Menges (1981) used to cluster IDPs. They kept three out of the five previous categories of IDPs (workshops,
seminars and programs;consultation;grants) and added two more (resource materials; and colleagues helping colleagues). The same five
levels of outcomes proposed by Levinson-Rose and Menges (1981) were used to report the impact of IDPs (i.e., self-reported change in
teacher attitude; the tested or the observed change in teacher knowledge; the observed change in teacher skill; the self-reported
change in student attitude; and the change observed in student learning). The reviewers described each type of IDP under the
following headings: history, prevalence, description, assessment and discussion. In addition, IDPs targeting a new faculty and those
targeting teaching assistants were given special attention. The authors acknowledged that their results were inconclusive and ad-
vanced a hope that future reviewers could ″tell a different story″ (Weimer & Lenze, 1998, p. 237). For future research studies, Weimer
and Lenze (1998) reiterated some suggestions of their predecessors such as: conducting studies with a mixed research approach and
experimental design, or considering the individual participants’ characteristics to investigate the impact of IDPs on specific target
groups (e.g., new faculties, teaching assistants or specific faculties). Furthermore, they advanced the idea that IDPs developers should
take into account evidence from other relevant disciplines (e.g., adult learning, studies on human motivation) in order to have a good
theoretical foundation for their studies (Weimer & Lenze, 1998).
McAlpine (2003) included ten studies published between 1983 and 2002 in her review. These studies were analyzed on the basis
of their characteristics (i.e., aim, modality of participants' enrollment – voluntary or not, number of participants, duration, and type
of activities). Also, McAlpine (2003) took into account the methodological characteristics of the studies (i.e., the design, the focus of
evaluation, and the instruments used) and used six levels of outcomes to investigate the impact of IDPs. Five levels of outcomes were
proposed by Levinson-Rose and Menges (1981), and the sixth concerned the IDPs effects on the culture of the organization. Based on
this analysis, McAlpine (2003, p. 65) suggested that successful workshops have: (1) direct and explicit focus on students and learning,
(2) activities aimed at the application of what is addressed in the participants' practice and feedback on this, (3) voluntary parti-
cipation, (4) at least 12 h of meetings spread over a minimum of one semester, and (5) opportunities for participants to try to
incorporate their learning in their practice. Like their predecessors, McAlpine (2003) recommended that researchers investigate the
impact of IDPs beyond the level of the individual participants at the level of students’ learning and/or the institution.
Prebble and his collaborators reviewed 33 studies published between 1990 and 2004 (Prebble et al., 2004). In order to cluster the
IDPs, Prebble et al. (2004) adapted the categories proposed in the first two reviews (i.e., Levinson-Rose & Menges, 1981;Weimer &
Lenze, 1998) as follows: short training courses;in situ training;consulting, peer assessment and mentoring;student assessment of teaching;
and intensive staff development. The effects of the IDPs reported in the studies selected were presented through the six categories of
outcomes introduced by McAlpine (2003).Prebble et al. (2004) concluded that short training courses have only a limited impact on
teaching practices in higher education. Other types of IDPs were viewed as being more reliable in producing significant improve-
ments in the quality of academics' teaching and, consequently, in students’ learning. However, Prebble et al. (2004) did not formulate
suggestions for future developments in the field.
Steinert and her collaborators (2006) conducted a discipline-specific review on medical studies. Fifty-three studies (between 1980
and 2002) were included. In order to cluster the IDPs presented in these studies, they used the following taxonomy: workshops;short
courses;seminar series;longitudinal programs and fellowships; and others (e.g., grants, student feedback, or consultation). The authors
used an adapted version of Kirkpatrick’s (1994) model of educational outcomes to cluster the IDPs outcomes including the following
levels: Reaction (satisfaction); Learning (a. attitudes and b. knowledge and skills); Behaviour (a. self-reported changes and b. observed
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
2
changes) and Results (a. organizational practices and b. change in students, residents or colleagues). In addition to the models of
educational outcomes used by the previous reviewers, the model proposed by Steinert et al. (2006) added the following levels
reaction,behavior,residents and colleagues. The main conclusion of this work was that medical IDPs are able to give high levels of
satisfaction and to produce positive changes in medical teachers' attitudes, knowledge, skills and behavior. However, Steinert et al.
(2006) admitted that there were several weaknesses in the studies in the field, and advanced different recommendations. It was noted
that it is important to use valid and reliable research instruments and to report data which relates to the characteristics of tools used.
More studies are needed on measuring the impact of IDPs at the level of students' learning and/or the institution level. Especially,
studies should investigate the impact of IDPs in the long term, not just right after the end of the IDP. Studies aimed at analyzing the
interaction between different factors and/or comparing different IDPs should be given more attention (Steinert et al., 2006).
In 2010, Stes and her collaborators (2010a) published a new systematic review of the literature (not limited in time or in the
source of publications). The authors clustered the 36 selected studies by their research design (quantitative, qualitative and mixed-
method), and categorized the IDPs on four dimensions: intervention duration (one-time event or extended over time), nature of in-
tervention (collective and course-like, alternative or hybrid), target group intervention (teaching assistants, new faculty, another/no
specific target group, and discipline-specific group) and outcome measure. In order to evaluate outcomes of IDPs, Stes et al. (2010a)
adapted the model used by Steinert et al. (2006), as follows: change within teachers (learning, subdivided into - change in attitudes,
change in conceptions, change in knowledge and change in skills; behavior), institutional impact and change within students (subdivided
into - change in perceptions, change in study approaches and change in learning outcomes). Stes et al. (2010a) advanced several
interpretations for each type of IDPs according to different categories of program outcomes and research design. However, their
dataset was very diverse and not suitable for making meaningful comparisons about the effectiveness of different types of IPDs. For
example, the authors identified the most used type of IDPs, but they did conclude regarding the effectiveness of different IDPs types.
In addition to the proposals advanced for future research by the previous reviewers, Stes et al. (2010a) added some new areas such as:
attention to effect sizes to detect effects interesting from a practical point of view, describing in more detail the IDPs evaluated or
involving more participants.
Amundsen and Wilson (2012) published a conceptual review in which they included 137 studies (between 1995 and 2008). They
argued that the focus on ways of evaluating the impact of IDPs did not result in a significant increase in knowledge in the field.
Because the previous reviews focused on the IDPs impact, they excluded many papers that presented IDPs focusing on the instruc-
tional process rather than on specific outcomes (Amundsen & Wilson, 2012). In fact, the two authors argued that the previous reviews
did not ask the right questions. Consequently, they took into consideration the following questions (Amundsen & Wilson, 2012, p.
91): how are IDPs designed? and what is the rational underpinning the design of IDPs? The authors clustered the IDPs by their instructional
focus and used the following categories: skill focus, method focus, reflection focus, disciplinary focus, institutional focus and action
research focus. For each type, Amundsen and Wilson (2012) described the core characteristics of an IDP (goal,activities planned, and
evidence collected to present the impact) and identified the network of researchers and the most cited person. The authors suggested that
their framework for presenting details of an IDP is a useful approach because it provides enough information for other experts to
replicate the described IDP. In future research, this framework for presenting details of IDPs should be taken into account (Amundsen
& Wilson, 2012). This suggestion was also made by Stes et al. (2010a) two years before.
De Rijdt and her collaborators (2013) conducted another review concerning the effects of IDPs on the transfer of teachers' learning
to the academic workplace and selected 47 studies. The authors developed a conceptual framework including possible moderators of
the process of transferring the learning contents to the higher education teaching activities (De Rijdt et al., 2013). In the development
of this framework, the authors combined information from the fields of management, human resources development and organi-
zational psychology. In a second step, the authors adapted the framework to the specific topic of IDPs for academics and introduced
four new criteria (amount of experience, nature of the intervention, amount of time spent, and learning climate). The review
highlighted that 80% of the studies showed positive results, which suggested that learning from IDPs actually transfers to the
academics’ job performance. Because they observed that this is in sharp contrast with what has been found in other fields (e.g., only
10% of the results are positive in management or organizational psychology), De Rijdt et al. (2013) invited researchers in the field of
IDPs to report studies with negative results. Based on their analysis, De Rijdt et al. (2013) highlighted variables (e.g., motivation to
learn, motivation to transfer or active learning) and moderators (e.g., lag versus no time lag or self-measure versus other measures of
transfer) that should require further analyses in future studies.
Ten years after the first discipline-specific review on medical studies, Steinert and her collaborators (2016) published an updated
review (i.e., studies between 2002 and 2012), and analyzed 113 studies. In this new work, the research team used the same approach
to cluster the IDPs and to present the educational outcomes as that used in the previous synthesis (Steinert et al., 2006). Steinert et al.
(2016) highlighted some changes in practice that had taken place during the years between the two reviews. They found that more
recent IDPs took into consideration the relevance of the instructional content to the participants' daily work or addressed the par-
ticipants' professional needs. In addition, the program duration was a key feature that was positively associated with IDPs effec-
tiveness. Steinert et al. (2016) advanced several suggestions for future research: to use a theoretical or conceptual framework for
design and implementation of research, to evaluate impact of IDPs on teachers' behavior, students’ learning and/or institution level,
and to analyze key features of IDPs.
Generally, the authors mentioned above conducted a systematic investigation of the literature aiming to carry out discipline-
general reviews (limited in time, but not in the source of publications). However, there are several exceptions from this typical
research approach: some were not based on systematic searches of the literature (e.g., Weimer & Lenze, 1998), some were focused
only on medical studies (i.e., Steinert et al., 2006,2016), and other were not limited in time (e.g., Stes et al., 2010a). Most previous
reviews aimed to evaluate the impact of IDPs (excepting one conceptual review published by Amundsen & Wilson, 2012). Thus,
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
3
Table 1
The evolution of the recommendations for future studies highlighted in the previous reviews on studies which investigate the impact of IDPs.
Topics of proposed recommendations for future studies Levinson-Rose
and Menges
(1981)
Weimer and
Lenze (1998)
McAlpine
(2003)
Prebble et al.
(2004)
Steinert et al.
(2006)
Stes et al.
(2010)
Amundsen and
Wilson (2012)
De Rijdt
et al. (2013)
Steinert et al.
(2016)
1981 1998 2003 2004 2006 2010 2012 2013 2016
t1. the IDPs characteristics (1) describe programs in more
details, (2) present the core/key characteristics of the IDPs,
(3) compare different types of programs;
⊗ ⊗ ⊗ ⊗ ⊗ ⊗
t2. the possible variables (4) consider the individual participants'
characteristic, or (5) that of their students; consider the (6)
institutional characteristic, consider (7) specific target groups;
⊗ ⊗ ⊗ ⊗ ⊗ ⊗
t3. the researchers (8) consider cross-campus collaborations;
take in account (9) collaborations with researchers from
other fields or with (10) participates in IDPs;
⊗ ⊗ ⊗
t4. the process of evaluation (11) define very clear the
evaluation outcomes, (12) organize the process of
evaluating outcomes in accord with the goal of the IDPs
evaluated; measure the impact of IDPs on (13) students and
on (14) institution, not only on participants; (15) use more
than one level of outcomes to evaluate one IDPs; (16) evaluate
the impact of IDPs on long term;
⊗ ⊗ ⊗ ⊗ ⊗
t5. design and methodology of the research studies (17)
randomized controlled, (18) comparison groups, (19) mixed
designs, and/or (20) multiple data sources to allow for
triangulation of data, (21) develop and use valid and reliable
research instruments;
⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
t6. the theoretical foundation (22) use theoretical framework to
develop the IDPs and the (23) design of the studies which
investigate the impact of IDPs; (24) learn from practices and
theories used in other fields;
⊗ ⊗ ⊗
t7. the process of reporting research data (25) report enough
data to be utile in other studies or reviews, (26) report effect
size, (27) report nonsignificant results and negative effects.
⊗ ⊗ ⊗
Note: ⊗= the authors of that reviews did at least one recommendation from the topic described in the first column.
Italic = recommendations highlighted by the more recent reviews (De Rijdt et al., 2013;Steinert et al., 2016).
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
4
authors have designed and used two types of conceptual frameworks: one for presenting the selected programs (e.g., one-time event vs.
extended over time or collective and course-like,alternative or hybrid,Stes et al., 2010a) and another one for classifying the outcomes of
these programs (e.g., change in teacher attitude,change in teacher skill,change in student attitude etc., Levinson-Rose & Menges, 1981).
These two types of conceptual frameworks have adapted and improved in time. For example, the framework proposed by Stes et al.
(2010a) for the evaluation of the IDPs outcomes integrates most elements of the frameworks developed by previous reviews. Ad-
ditionally, some authors investigated the research design of the selected studies (Stes et al., 2010a). Also, the work proposed by De
Rijdt and her collaborators (2013) is the only one in which the authors adopted an interdisciplinary approach to combine useful
information from different fields (i.e., management, human resources development, organizational psychology and instructional
development).
In the 35 years that passed from the first review (Levinson-Rose & Menges, 1981) to the latest one (Steinert et al., 2016), there has
been substantial growth in the field. For example, Amundsen and Wilson concluded that the field ″is becoming more grounded in an
established theory base″ (Amundsen & Wilson, 2012, p. 112). Thus, we have specific frameworks for investigations on characteristics
and/or outcomes of IDPs (Amundsen & Wilson, 2012;Stes et al., 2010a). Also, we know which are the most frequent training
practices and/or research design (Stes et al., 2010a). Moreover, the analysis of previous reviews allowed us to summarize an ex-
tensive list of recommendations for future development and research in the field (see Table 1). These recommendations should help
the IDPs research field to provide more compelling evidence which, in turn, will provide conclusive answers to many of the fun-
damental questions in the field. Unfortunately, too little is known about the answer of the central question that has been addressed
until now in the field: what is the impact of the IDPs on higher education? Even if the previous reviewers have searched for the impact
of the IDPs, they used a qualitative framework to present their results. This approach was due to the few quantitative data reported in
the literature and, moreover, to the incomplete and/or inconsistent data reported in many studies that have been published in the
field. In this review, we are trying to tell a different story by adopting a quantitative framework for presenting and interpreting the
results.
3. The need for a new synthesis of the instructional development literature
If we aim to improve instructional development practices, we need to understand which types of IDPs are most effective and
which variables and moderators may increase or reduce their effectiveness. This kind of questions can be addressed using a quan-
titative approach (i.e., meta-analysis) to summarize the results of previous studies. Such an approach may allow for more reliable
comparisons between different features, outcomes of IDPs or influences of different trainees’ characteristics on the final impact of
IDPs. Almost 40 years ago, Levinson-Rose and Menges (1981) intended to conduct a quantitative review of the evidence regarding
IDPs effectiveness. To the best of our knowledge, this objective was not yet reached. Thus, the two main reasons for conducting a new
review study are: a) to have a quantitative estimation of IDPs effectiveness, and b) to address, in a quantitative framework, some of
the research questions regarding the moderator variables of IDPs effectiveness. Based on these considerations, we advanced the
following main research questions:
Q1 What is the overall effect size of the IDPs?
Q2 Is the overall effect size of IDPs related to the trainees' characteristics?
Q3 What are features of IDPs that make them effective?
Q4 What are the effect sizes of IDPs on different levels of evaluation outcomes?
In the following paragraphs, we presented the state-of-the-art description for each of these central questions. Consequently, we
advanced subsequent research questions.
3.1. The effect size of IDPs
The Standards on reporting on empirical social science research in AERA publications (American Educational Research Association
[AERA], 2006) emphasized the need to report and to interpret the effect size for each statistical result. The effect size is defined as a
statistic index that standardizes the mean differences (e.g., Cohen's d and Hedges's g) or the strength of associations between research
variables (e.g., r, R-squared and eta-squared), thus allowing for scale-free comparisons of the effect, from one study to another.
Reporting the effect size is useful in at least three practical situations (Sun, Pan, & Wang, 2010). First, one can estimate an adequate
sample size for detecting statistically significant results in future research studies (i.e., in an a-priori power analysis). Second, one can
assess the practical significance of research studies. Third, one can synthesize findings across studies in a more accurate way (e.g.,
carrying out a meta-analysis).
However, the reporting of the effect size is not enough. Researchers use Cohen's three levels (Cohen, 1988 - “small”, “medium”
and “large”) to evaluate the practical significance of the effect size. However, the practical relevance of an effect size depends on
various factors, such as the effect sizes reported in prior studies, the resources used to obtain the effect size, the context of the studies
or the outcomes measured (Henson, 2006;Prentice & Miller, 1992). Thus, even a small effect size could be useful from a practical
point of view if it is obtained with few resources (i.e., if an IDP reported a small effect size, but few resources were used for its
implementation, then the program could be considered a useful investment).
In the instructional development literature, the practice of reporting and interpreting the effect size is scarce, therefore previous
reviews did not analyze the effect sizes. Some previous reviewers (De Rijdt et al., 2013;Stes et al., 2010a) have mentioned that none
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
5
of the studies included in their syntheses reported an effect size. In this review, for the first time in the field, we report the impact of
IDPs using an effect size index. Therefore, we formulated our first main question (i.e., Q1. What is the overall effect size of IDPs?).
In addition, we collected the publication year of each study. Because previous reviews cover a large time span (1981–2016) and
because all reviews formulated suggestions for enhancing the IDPs’ effects, we expected stronger effects in more recent studies, as
compared with early studies. Therefore, we formulated an additional research question: Does the overall effect size of IDPs vary over
time as a function of the publication year?
Recent reviews (Steinert et al., 2016;Stes et al., 2010a) highlighted the need to evaluate the long-term effects of IDPs. It seems
that very few studies investigate the impact of IDPs beyond the end of the program, therefore we formulated another additional
question: Does the overall effect size of IDPs vary as a function of the time lag between the end of the program and the evaluation moment?
3.2. The relationship between trainee's characteristics and the final impact of IDPs
The “what works for whom” in IDPs has not been sufficiently explored, as pointed out by the first review in the field (Levinson-
Rose & Menges, 1981) and was highlighted again by more recent reviews (De Rijdt et al., 2013;Stes et al., 2010a). Learning from
studies published in other fields (management, human resource development and organizational psychology), De Rijdt et al. (2013)
proposed eight trainee's characteristics (cognitive ability, self-efficacy,motivation,personality,perceived utility,career variables,locus of
control and amount of experience) that could influence the outcomes of the IDPs. Our intention was to use these eight characteristics
and others such as gender,age,staff grades,specialization and type of enrollment (that are usually utilized by researchers in the field to
describe their research sample). However, the studies selected for our review presented very few information regarding these trainee's
characteristics. We have succeeded to collect data only for the following: gender,participants' specialization and type of enrollment.
Thus, additionally to our second main question (i.e., Q2. Is the overall effect size of IDPs related to the trainees' characteristics?), we
advanced only questions related to these three trainee's characteristics, as follows.
3.2.1. Is the overall effect size of IDPs related to the participants' gender?
Empirical studies have presented participants' gender (e.g., Bailey, 1999;Stes, Gijbels, & Van Petegem, 2008) as a variable with
little or no impact on the outcomes of IDPs. None of the previous reviews presented information about this research topic. We
collected the gender distribution percentage in each study, and we investigated if this variable could impact the outcomes of IDPs.
3.2.2. Does the overall effect size of IDPs vary as a function of the participants' specialization?
Previous research studies that investigated the relationship between the participants' specific discipline and the outcomes of IDPs
provided mixed findings. Some studies revealed no significant differences (Renta-Davids, Jiménez-González, Fandos-Garrido, &
González-Soto, 2016;Stes et al., 2008) while others showed significant results (Rienties, Brouwer, & Lygo-Baker, 2013;Stes,
Coertjens, & Van Petegem, 2010b). In their review, Stes and her collaborators (2010a) concluded that discipline-specific IDPs have an
impact similar to the impact of general IDPs. In this review, we classified our studies into two categories: IDPs targeting one specific
discipline group and IDPs targeting heterogeneous groups of academics regardless of their specialization, and we compared the effect
sizes of these two categories.
3.2.3. Does the overall effect size of IDPs vary as a function of the type of trainees’ enrollment?
Some previous empirical studies highlighted that teachers forced to attend IDPs are less likely to obtain good results, as compared
to those who choose to participate themselves (Cilliers & Herman, 2010). McAlpine (2003) presented voluntary participation of
academics in IDPs as a characteristic of successful training initiatives. We categorized the selected studies as: a) voluntary IDPs –
teachers decided to enroll themselves and b) mandatory IDPs – the enrollment was decided by a hierarchical superior and was
mandatory for teachers.
3.3. Features of IDPs that make them effective
Previous reviews used three criteria to cluster IDPs by their features: a) nature of instructional activities (e.g., Levinson-Rose &
Menges, 1981;Prebble et al., 2004;Stes et al., 2010a;Weimer & Lenze, 1998), b) amount of training time spent (e.g., De Rijdt et al.,
2013;Stes et al., 2010a) and c) instructional focus (Amundsen & Wilson, 2012). We used all these three criteria and added a sup-
plementary one, named type of responsibility approach (Smith, 1992a;1992b). For this topic, we formulated our third main question
(i.e., Q3. What are features of IDPs that make them effective?) and four additional questions, as follows.
3.3.1. Does the overall effect size of IDPs vary as a function of the nature of the instructional activities?
Levinson-Rose and Menges (1981) first clustered IDPs using the nature of the instructional activities as a criterion for comparing
effectiveness. The authors proposed five types of IDPs (e.g., grants for faculty projects; workshops and seminars). All other reviewers
used the classification proposed by Levinson-Rose and Menges (1981) considering the evolution of academic training practice
(Prebble et al., 2004;Steinert et al., 2016,2006;Stes et al., 2010a;Weimer & Lenze, 1998). We used the classification proposed by
Stes et al. (2010a) because it encompassed all types of IDPs proposed by the other reviewers using three categories: collective and
course-like (e.g., workshop, short course, seminar or seminar series, longitudinal program), alternative (e.g., instructional grant,
practice with feedback, feedback from student ratings, individual consultation, provision of resource materials, peer coaching, in situ
training of a group), and a hybrid form of the first two categories (e.g., a workshop followed by one-on-one-support) (Stes et al.,
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
6
2010a, p. 30). None of the previous reviews presented data regarding the impact of the various types of IDPs. Stes et al. (2010a)
observed that collective and course-like was the most frequent implemented type of IDPs, and they recommended future research on
this topic. In reaction to the previous reviews, De Rijdt et al. (2013) proposed another typology based on the transfer of learning to
the workplace. Thus, the authors proposed three types of IDPs (De Rijdt et al., 2013, p. 67): learning on the job,learning off the job and a
mixture between on the job and off the job. Learning on the job is situated in trainees' actual educational contexts involving their
current students, curriculum and instructional problems (e.g., action research, the community of practice, peer coaching and men-
toring etc.). The Learning off the job type includes IDPs that take place away from the trainees’ current work situation or instructional
practices (e.g., workshops, training sessions or seminars). Because the meaning of this IDPs typology (transfer of learning to the
workplace) is different from all previous proposals, we decided to use it in addition to the typology proposed by Stes et al. (2010a)
described above. As a preliminary conclusion, De Rijdt et al. (2013) presented IDPs that use on the job learning approach as having a
positive impact. However, the authors highlighted the need for future research on this topic.
3.3.2. Does the overall effect size of IDPs vary as a function of the amount of training time spent?
The first analysis of the effect of the amount of training time spent on the final impact of IDPs was conducted by Weimer and Lenze
(1998). Because of the poor quality of data collected, Weimer and Lenze did not advance any conclusions about this topic. The
following reviewers (De Rijdt et al., 2013;McAlpine, 2003;Steinert et al., 2016;Stes et al., 2010a) suggested that IDPs extended over
time tended to produce more positive effects, as compared with one-time IDPs. However, all previous reviewers concluded that future
studies were needed to formulate a conclusive answer to this research topic. It should be noted that there were divergences between
different reviewers when they defined one-time interventions. For example, Stes et al. (2010a) presented interventions ranging in
duration from 1 h to 4 day as one-time IDPs while De Rijdt et al. (2013) defined one-time events as interventions varying from 1 h or
one day to two consecutive days. Also, Steinert et al. (2016) included programs ranging in duration from 1 h to six days in this one-
time events category.
In this review, we used the option advanced by De Rijdt et al. (2013). Thus, one-time intervention means intervention which varies
from 1 h or one day to two consecutive days. Also, if the intervention was more than one day with a time gap between the sessions, or
more than two consecutive days it was considered as an extended over-time event. Moreover, we calculated the exact number of hours
of training time spent in each IDP identified in the selected studies. We used these two options (the dichotomy – one-time vs. extended
over-time evens and the exact number of hours) to calculate the effect of the amount of training time on the final impact of IDPs.
3.3.3. Does the overall effect size of IDPs vary as a function of the instructional focus of the program?
Amundsen and Wilson (2012) used instructional focus and core characteristics of IDPs as criteria to cluster instructional inter-
ventions rather than nature and/or amount of time spent. Therefore, we clustered our studies according to the six types of IDPs
proposed by Amundsen and Wilson: (1) skill focus cluster – aiming at the acquisition or enhancement of teaching skills and techniques;
(2) method focus cluster - aiming at mastery of a particular teaching method; (3) reflection focus cluster - aiming at a change in teacher
conceptions of instruction in higher education; (4) institutional focus cluster - focus on coordinated institutional plans to support
teaching improvement; (5) disciplinary focus cluster - aiming at disciplinary understanding in order to develop pedagogical knowledge;
(6) - action research cluster - aiming to answer concrete instructional questions of interest for individuals or groups of faculty members
(Amundsen & Wilson, 2012, pp. 98–99). Unfortunately, the authors did not provide information regarding the impact of these types
of IDPs.
3.3.4. Does the overall effect size of IDPs vary as a function of the type of responsibility approach?
Smith’s (1992a,1992b) classification of IDPs presents information about the planning phase of IDPs and highlights the respon-
sibility for developing and implementing the intervention. Smith (1992a,1992b) divided IDPs into three types: management (initiated
by top management), shop-floor (initiated by participants) and partnership (initiated by staff developers). De Rijdt, Dochy, Bamelis,
and Van Der Vleuten (2016) show that management IDPs are the most frequently implemented, partnership IDPs are the most sa-
tisfying model for participants, and shop-floor IDPs are the programs that report the highest impact on changing teaching conceptions.
However, the authors concluded that, generally, these types of IDPs resulted in limited differences in terms of effects on participants.
To conclude, in order to understand what are the features that make IDPs effective, we proposed the usage of five different and
complementary taxonomies of IDPs (Amundsen & Wilson, 2012 -instructional focus;De Rijdt et al., 2013 -duration of the program and
transfer of learning to the workplace;Smith, 1992a,1992b -responsibility approach;Stes et al., 2010a -nature of instruction) to cluster the
selected studies. Additionally, we calculated the exact number of hours of training time spent in each IDP described in the selected
studies. We used this data to calculate the effect size of different types of IDPs arising from various taxonomies. Also, we analyzed the
impact of the number of training hours on the final impact of IDPs.
3.4. Levels and types of evaluation outcome of IDPs
In this review, we also clustered the IDPs effect sizes based on the levels of the outcomes. Previous reviews proposed different
models which could be adopted for such a process. We presented these models in the section dedicated to the description of the
previous reviews. Thus, in this section, our decision to use one in particular of these models is explained.
Stes et al. (2010a) adapted the model for evaluating outcomes of IDPs developed by Steinert et al. (2006) and excluded the
elements that were typical for medical education. The model proposed by Stes et al. (2010a) encompassed all the levels of IDPs
outcomes proposed in previous reviews. Thus, the model proposed by Stes and her collaborators (2010a) is one of the most
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
7
comprehensive classification of IDPs outcomes.
Stes et al. (2010a) discussed the evaluation of IDPs at each level (i.e., teachers, institutional and students) and type of outcomes
(e.g., teachers' behavior or students' perceptions) but did not advanced meaningful conclusions about the effectiveness of instruc-
tional development initiatives on each outcome. Generally, Stes et al. (2010a) concluded that almost all the training initiatives
reported positive results or at least mixed results, regardless of the level of outcomes measured. Such a conclusion is also sustained by
other previous studies (e.g., De Rijdt et al., 2013;Levinson-Rose & Menges, 1981). For example, De Rijdt et al. (2013) concluded that
80% of the IDPs analyzed reported positives results. Also, similar to other reviews (e.g., McAlpine, 2003;Steinert et al., 2016), the
Stes et al. (2010a) found that the impact of IDPs is most frequent evaluated on the teachers' outcomes and few studies use institutional
or students' outcomes. Moreover, Stes et al. (2010a) presented that different types of initiatives are more frequent analyzed on some
outcomes and less on others (e.g., collective course-like initiatives are more frequent evaluated on students' outcomes and less on
teachers' behavior). From another point of view, Stes et al. (2010a) highlighted that none of the studies examining institutional
impact utilized a control group. Based on these considerations, we formulated our last main question (i.e., Q4. What are the effect
sizes of IDPs on different levels of evaluation outcomes?) and others two additional questions: What are the effect sizes of IDPs on teachers'
level outcome and on different types of this level of outcomes? and What are the effect sizes of IDPs on students’ level outcome and on different
types of this level of outcomes?
4. Method
4.1. Literature search
The flow of the study selection process is presented in Fig. 1. We had two approaches in the literature search. First, we analyzed
the reference lists of the 9 previous reviews discussed above, and we obtained an initial list with 567 references. We identified 72
duplicates (51 were cited by two reviews, 9 were cited by three reviews, and 1 was cited by four reviews). Of the remaining 495, we
found the full-text version for 450 studies, but 45 studies were not traceable despite several attempts to contact the authors.
The second approach in the systematic search of the literature involved a search on the following online databases: Academic
Search Complete, Academic Search Premier, Central and Eastern European Academic Source, EconLit, Education Research Complete,
MEDLINE, Middle Eastern and Central Asian Studies, Psychology and Behavioral Sciences Collection, PsycINFO, and Teacher
Reference Center. We used the following keywords: staff development,instructional development/training,academic development,faculty
development/training,professional development,educational development/training,pedagogical training,curriculum development,university
teacher,teaching and higher education. The different combinations of these words were sought for in titles and abstracts. No limit on
sources of publications was imposed. Considering Stes's et al. work (Stes et al., 2010a)
1
and the first step presented above, the search
was limited to studies from 1 January 2008 to 27 February 2019 (the search day). An initial list of 1079 studies resulted. Out of these,
we removed 19 duplicates (11 with the references listed during the first step).
4.2. Inclusion criteria
For a study to be included in the meta-analysis, the following criteria were considered: a) it had to involve instructional activities
for university teachers as defined by Taylor and Rege Colet (2009): initiatives specifically planned to improve quality of teaching in
higher education by enhancing academics’ instructional approach to support student learning; b) the evaluation of IDPs had to be the
aim of the study; c) the study had to have quantitative or mixed design (studies using only a qualitative approach were excluded); d)
only quasi-experimental or experimental studies with pretest and posttest and control groups were selected; and e) the control groups
must not have received any type of instructional intervention (e.g., no intervention group or waiting list group) and had to be
statistically equivalent to the experimental groups at the pretest time (i.e., the study reported insignificant differences at pretest, or
conducted a random assignment of the participants). The 450 studies resulted from the previous reviews were analyzed using the
inclusion criteria presented above: 154 studies had only qualitative and/or quantitative descriptive data (e.g., percent, frequencies);
166 studies did not have control groups; 43 studies did not evaluate the impact of IDPs; 30 studies did not have a pretest evaluation;
11 studies were eliminated because they did not discuss instructional activities for academics, but rather for other categories of
teachers (e.g., Collins, 1978;Kallenbach & Gall, 1969). The remaining 46 articles present experimental studies, 6 studies had active
control groups that received a specific instructional intervention; 3 studies had nonequivalent control groups; and 1 conference paper
reported a study which had already been included (as a journal paper). Consequently, these were removed. The remaining 36 met all
the criteria for inclusion.
Out of the 1060 abstracts resulted from our online search, 842 were eliminated after the abstracts were read because it was
obvious that they did not respect at least one of the inclusion criteria (e.g., the evaluation of IDPs had not been the aim of the study or
only a qualitative research methodology was used). Out of the 218 full-text papers we analyzed, we excluded 125 references because
they presented only qualitative and/or quantitative descriptive data (e.g., percent, frequencies); 81 studies did not have control
groups and 12 did not include pretest evaluation. The remaining 6 articles met the criteria for inclusion. Thus, we included 42 studies
1
We chose the review of Stes et al. (2010) as a landmark because the newer reviews analyzed only specific studies in the field (e.g., transfer on the
work place - De Rijdt et al., 2013; medical university - Steinert et al., 2016). Consequently, these can be less relevant for a general overview of the
field.
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
8
in the next stage of our review (36 from first search approach and 6 from the second search approach).
4.3. Study coding
Study coding involved the extraction of quantitative data from the results section of each research study, and the assessment of
study characteristics using the coding scheme presented in the Supplemental Material. From each study, we extracted the following
statistical information: the descriptive statistics for the experimental group and for the control group (i.e., mean, standard deviation
and sample size) of each outcome, at each measurement moment. After a first analysis of the 42 articles that met the criteria, only 19
(e.g., Alemoni, 1978; Cole et al., 2004; Goldman, 1978; Hewson & Copeland, 1999; Skeff, Stratos, Campbell, Cooke, & Jones, 1986;
Stes, Coertjens, & Van Petegem, 2013a; Stes, Maeyer, Gijbels, & Van Petegem, 2012a, 2012b, 2013b) were included in the meta-
analysis as 23 studies presented insufficient data (e.g., did not present data for the control group at the posttest moment, or presented
only means without standard deviations). Because these studies were rather old (from 1973 to 2008), we only managed to contact 20
of their authors. Unfortunately, none of these authors could provide the necessary information for the inclusion of their articles in the
meta-analysis. An unpublished manuscript (Mladenovici, Ilie, Maricuțoiu, Iancu, & Smarandache, 2019) that met all the criteria of
selection was also included in the final sample of our review.
Regarding the assessment of study characteristics, data for each study was collected for all of the variables from the coding list
(see Supplemental Material). Each selected study was coded by two different members of the research team, working separately from
each other. The level of agreement between the independent coding members was interpreted based on kappa statistic and Gwet's
benchmark (Gwet, 2012): 0.00–0.20 = slight agreement, 0.21–0.40 = fair, 0.41–0.60 = moderate, 0.61–0.80 = substantial, and
0.81–1.00 = almost perfect agreement. The degree of agreement ranged between fair agreement (kappa = 0.30; for features of IDPs)
to almost perfect agreement (kappa = 0.82; for levels and types of evaluation outcome of IDPs and kappa = 0.91; for trainee's
characteristics). Divergences between the two different coding members were analyzed and clarified by the first and second author
working with the two coding members (the first author has a PhD. in educational sciences and more the ten years of experience in
instructional development activities and the other author has a PhD. in psychology and five years of experience in conducting meta-
analyses).
4.4. Data analysis
We introduced all quantitative data (i.e., means, standard deviations and sample size) into the Comprehensive Meta-Analysis
version 2.0 software (CMA - Borenstein, Hedges, Higgins, & Rothstein, 2005). We used the CMA software to compute all effect sizes,
and to conduct all calculations reported in this paper. In all computations, we weighted the studies based on the number of academics
involved in the control and experimental groups.
Fig. 1. Literature review and selection of articles for review.
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
9
Following criteria for evaluating intervention effectiveness (American Psychological Association [APA], 2002), we computed the
standardized mean difference between the experimental and control group, after the intervention (i.e., in post-test and follow-up).
For each outcome variable in each measurement moment, we used the Cohen (1988) formula for the standardized mean difference
(i.e., the d-Cohen effect size). Following the recommendations formulated by Lipsey and Wilson (2001), a positive effect size indicates
that the intervention had improvement effects on the participants (e.g., improved post-intervention student ratings in the experi-
mental group, as compared with the control group), while a negative effect size indicates that the intervention had negative effects.
We used the pretest results to check whether the experimental groups had similar evaluations before the intervention. We assessed
the between-studies heterogeneity using the classical Q-test (i.e., insignificant results suggest homogeneous results from one study to
another, Lipsey & Wilson, 2001) and the I
2
index (i.e., an estimation of the percentage of between-studies variance that might be
attributed to unknown moderators – Borenstein, Hedges, Higgins, & Rothstein, 2009).
In the present review we analyzed the results of 20 studies that reported post-test results of a control and an experimental group.
Three studies (i.e., Erickson & Sheehan, 1976;Payne & Hobbs, 1979;Skeff, 1983) had one control group and two independent
experimental groups, therefore we computed two effect sizes for each of these studies. Because the studies reported very different
IDPs, we assumed that the ‘true’ effect size varies randomly from one study to another. Therefore, we conducted a random-effects
meta-analysis. Most studies reported more than one intervention outcome, and we computed the d-Cohen for each outcome in each
post-intervention moment (i.e., post-intervention, follow-up). However, because the effect sizes resulted from the same participants
cannot be considered independent (Borenstein et al., 2009), we used the algorithms implemented in CMA (Borenstein et al., 2005) to
average all effects reported by a study into a single effect size.
5. Results
The characteristics of the included studies are presented in Table S1 (see Supplemental Material), as follows: reference, country
and university, effect size, participants and IDPs characteristics (e.g., number, gender, specialization, length of IDP, instructional
focus, outcomes etc.).
5.1. The overall effect
Regarding the averaged effect, the results presented in Table 2 indicated that the IDPs effect is statistically significant (d = 0.385,
SE = 0.055, Z = 6.958, p < .001, k = 23). According to Cohen's guidelines regarding the magnitude of the effect size (Cohen, 1988),
the IDPs effect size can be considered a small effect size. The heterogeneity indices (Q(19) = 19.732, p = .060, I
2
< 0.001) suggested
that the 23 effect sizes are similar from one study to another). However, we adopted a cautionary strategy regarding this result. As
Borenstein et al. (2009) explain, “while a significant p-value provides evidence that the true effects vary, the converse is not true”
(p.113). On the one hand, a non-significant Q test indicates that the observed between-studies variance is not significantly different
from the random variance that can be attributed to sampling error. On the other hand, finding a non-significant Q does not imply that
the observed between-study variance is exclusively random (or sampling error) variance, and that systematic variance (or variance
attributed to moderator variables) is not present. Therefore, we proceeded with our moderator analyses.
5.2. The year of publication
Because the first controlled study in our sample was published 46 years ago (i.e., Lewis & Orvis, 1973), we investigated whether
the effect size varies as a function of time. Therefore, we grouped our studies in four categories based on their publication year, and
we compared the averaged effect size of these groups. Although the results for each category (Table 3) seem to indicate a decrease in
the effect size of recent studies, the omnibus statistical test did not yield significant results: Q(3) = 2.916, p = .405. Based on this
result, we can conclude that the year in which the study was published did not have an influence on the IDPs effect size.
5.3. The moment of the evaluation of the impact
In our sample of studies, the moment of the post-intervention evaluation was very different from one study to another, and most
trials (i.e., 18 out of 20) reported the time lag between the end of the IDP and the evaluation moment. Eleven studies evaluated
Table 2
The overall effect size of the effectiveness of IDPs.
k Effect size 95% confidence interval Test of null (2-Tail) Heterogeneity
d SE (d) Var (d) Lower limit Upper limit Z p Q df(Q) p I
2
Overall effect 23 .385 .055 .003 .291 .494 6.958 <.001 19.732 22 .060 <.001
Notes: k = the number of studies included in the analysis; d= the averaged effect size; SE = standard error of the averaged effect size; Var
(d) = variance of the averaged effect size; Z= the statistical test used for computing the significance of the average effect size; Q= the statistical
test used for the estimation of heterogeneity; I
2
=the degree of inconsistency in the studies' results.
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
10
intervention efficacy up to a week after the end of the intervention. Six studies evaluated intervention efficacy up to three months
after the end of the intervention, while other six studies evaluated the efficacy one year after the end of the intervention. The results
presented in Table 4 suggested that the effect size is larger when intervention efficacy was measured one year after the intervention
(d = 0.497, SE = 0.120, Z = 4.138, p < .001). However, the differences between these categories should be interpreted with
caution because we did not conduct a direct comparison of the effect sizes. Finally, the studies that assessed IDPs efficacy in at least
two moments after the end of the intervention (i.e., post-intervention and follow-up) provided mixed findings. Ho, Watkins, and Kelly
(2001) reported a significant effect only in the post-intervention moment and not in the follow-up measures, while Baroffio et al.
(2006) reported slightly larger effects in the follow-up evaluation, as compared with the post-IDP evaluation.
5.4. The influence of trainee's characteristics on the final impact of IDPs
We analyzed the IDPs using the general framework presented in the Introduction section, and we compared IDPs effectiveness for
groups with at least 3 independent studies. Because the studies were assigned only to one category, we compared the between-groups
differences using the Q test (Borenstein et al., 2005).
5.4.1. Gender distribution
Out of the entire sample of studies, 12 studies reported this information and, except for Mladenovici et al. (2019), the percentage
of male participants ranged between 66.66% and 91.60%. We conducted a meta-regression analysis in which we used the percentage
of males to predict the effect size of the study. The results of the meta-regression indicated a close to null relation (B = 0.177,
SE = 0.447, Z = 0.395, p = .693) between the percentage of males and the effect of the IDP.
5.4.2. Participants’ specialization
Most studies used a mixed sample of participants (Table 5). The studies that included participants from a single specialization
were mostly conducted on medicine academics (k = 8), and only Lewis and Orvis (1973) had participants from a different spe-
cialization (i.e., Economics). Although the comparison test was not statistically significant (Q(1) = 3.692, p = .055), the IDPs that
involved academics from a single discipline reported larger effect sizes (d = 0.442, SE = 0.095, Z = 4.635, p < .001, k = 9) as
compared with IDPs that involved academics from different disciplines (d = 0.253, E = 0.094, Z = 2.474, p = .013, k = 12).
5.4.3. Trainees’ enrolment (mandatory vs. voluntary)
We found significant differences (Q(1) = 5.635, p = .018) between the effect of IDPs with mandatory participation (d = 0.515,
SE = 0.186, Z = 2.769, p = .006, k = 3) and the effect of IDPs with voluntary participation (d = 0.284, SE = 070, Z = 4.044,
p < .001, k = 18).
Table 3
The effectiveness of IDPs over time as a function of the year of publications.
Years k Effect size 95% confidence interval Test of null (2-Tail) Heterogeneity
d SE (d) Var (d) Lower limit Upper limit Z p Q df(Q) p I
2
<1980 7 .359 .127 .016 .110 .609 2.280 .005 3.342 6 .765 <.001
1980–1999 4 .562 .168 .028 .234 .890 3.354 .001 2.144 3 .543 <.001
2000–2009 5 .371 .147 .022 .083 .658 2.523 .012 9.775 4 .044 59.077
2010-present 7 .225 .109 .012 .011 .440 2.061 .039 0.660 6 .995 <.001
Notes: k = the number of studies included in the analysis; d= the averaged effect size; SE = standard error of the averaged effect size; Var
(d) = variance of the averaged effect size; Z= the statistical test used for computing the significance of the average effect size; Q= the statistical
test used for the estimation of heterogeneity; I
2
=the degree of inconsistency in the studies' results.
Table 4
The effectiveness of IDPs as a function of the lag between the end of the program and the moment of evaluation.
k Effect size 95% confidence interval Test of null (2-Tail) Heterogeneity
d SE (d) Var (d) Lower limit Upper limit Z p Q df (Q) p I
2
1–7 days 11 .309 .079 .006 .154 .463 3.925 <.001 2.694 10 .988 <.001
30–90 days 6 .137 .129 .018 -.116 .390 1.059 .290 0.114 5 .999 <.001
1 year 6 .497 .120 .014 .261 .732 4.138 <.001 6.423 5 .267 22.158
Notes: k = the number of studies included in the analysis; d= the averaged effect size; SE = standard error of the averaged effect size; Var
(d) = variance of the averaged effect size; Z= the statistical test used for computing the significance of the average effect size; Q= the statistical
test used for the estimation of heterogeneity; I
2
=the degree of inconsistency in the studies' results.
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
11
5.5. Features of IDPs that make them effective
5.5.1. Types of instructional activities
We classified the studies based on the types of activities used in the IDPs (i.e., alternative vs. collective vs. hybrid activities), but the
different groups of IDPs reported similar effect sizes (Q(2) = 0.090, p = .956). Next, we compared the average effect size of IDPs that
used learning on the job approach with the effect size of IDPs that used learning off the job approach, or a mixed approach. These
three approaches also yielded similar results (Q(2) = 1.053, p = .591), therefore it seems that the transfer of learning is not related
to IDPs effectiveness.
5.5.2. Length of the IDP
In order to investigate the influence of the training length on IDPs effectiveness, we used two approaches to group the studies
(Table 6). We found marginally significant differences (Q(1) = 2.069, p = .150) when we compared one-time events with IDPs that
had more than one meeting. To refine this analysis, we grouped the studies based on their overall duration (i.e., IDPs total number of
hours), and we found significant differences (Q(2) = 7.950, p = .019) in favor the short IDPs. Taken together, these results suggested
that short IDPs obtain superior effects, as compared to long-term IDPs.
Table 5
The effectiveness of IDPs for trainee's characteristics.
Trainee's characteristics k Effect size 95% confidence
interval
Test of null (2-Tail) Heterogeneity
d SE (d) Var (d) Lower
limit
Upper
limit
Z p Q df (Q) p I
2
Heterogeneity of IDTs groups (Q
between
(1) = 3.692, p = .055)
Heterogeneous discipline groups 12 .233 .094 .009 .049 .418 2.474 .013 3.621 11 .980 <.001
Specific discipline groups (medicine – 8
studies, economics – 1 study)
9 .442 .095 .009 .255 .628 4.635 <.001 12.065 8 .148 33.691
Motivation to participate in the IDPs (Q
between
(1) = 5.635, p = .018)
Mandatory participations 3 .515 .186 .035 .151 .880 2.769 .006 4.707 2 .095 57.150
Voluntary participations 18 .284 .070 .005 .147 .422 4.044 <.001 6.428 16 .919 <.001
Notes: k = the number of studies included in the analysis; d= the averaged effect size; SE = standard error of the averaged effect size; Var
(d) = variance of the averaged effect size; Z= the statistical test used for computing the significance of the average effect size; Q= the statistical
test used for the estimation of heterogeneity; I
2
=the degree of inconsistency in the studies' results.
Table 6
The effectiveness of IDPs for different features of IDPs.
Features of IDPs k Effect size 95% confidence interval Test of null (2-Tail) Heterogeneity
d SE (d) Var (d) Lower limit Upper limit Z p Q df (Q) p I
2
Nature of instructional activities (Q
between
(2) = 0.090, p = .956)
Alternative 8 .378 .112 .013 .158 .598 3.371 .001 4.164 7 .761 <.001
Collective and course-like 10 .353 .095 .009 .167 .538 3.725 <.001 11.514 9 .242 21.835
Hybrid 4 .401 .147 .022 .112 .689 2.720 .007 3.293 3 .349 8.895
Transfer of learning to the workplace (Q
between
(2) = 1.053, p = .591)
Off the job IDPs 7 .404 .107 .012 .193 .614 3.760 <.001 9.602 6 .142 37.514
On the job IDPs 8 .378 .112 .013 .158 .598 3.371 .001 4.164 7 .761 <.001
Mixed IDPs (on & off the job) 7 .284 .133 .018 .024 .544 2.140 .032 4.242 6 .644 <.001
IDP length (Q
between
(1) = 2.069, p = .150)
1-time events 10 .453 .093 .009 .270 .635 4.858 <.001 11.098 9 .269 18.905
Extended over time events 13 .277 .079 .006 .123 .431 3.521 <.001 4.893 12 .961 <.001
IDP duration ((Q
between
(2) = 7.950, p = .019)
Less than 15 h 9 .571 .086 .007 .402 .739 6.627 <.001 7.650 8 .468 <.001
Between 48 and 64 h 3 .285 .136 .019 .018 .552 2.018 .044 .549 2 .760 <.001
More than 126 h 7 .210 .104 .011 .006 .415 2.018 .044 .508 6 .998 <.001
Instructional focus of IDP (Q
between
(1) = 3.576, p = .059)
Focused on reflection 8 .221 .103 .011 .020 .423 2.157 .031 .967 7 .995 <.001
Focused on skills 11 .489 .085 .007 .321 .656 5.723 <.001 11.864 10 .294 15.708
Type of responsibility approach (Q
between
(1) = 0.229, p = .633)
Management 14 .380 .075 .006 .234 .527 5.085 <.001 14.923 13 .312 12.887
Partnership 7 .345 .113 .013 .123 .566 3.047 .002 3.733 6 .713 <.001
Notes: k = the number of studies included in the analysis; d= the averaged effect size; SE = standard error of the averaged effect size; Var
(d) = variance of the averaged effect size; Z= the statistical test used for computing the significance of the average effect size; Q= the statistical
test used for the estimation of heterogeneity; I
2
=the degree of inconsistency in the studies' results.
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
12
5.5.3. Instructional focus of the IDP
Regarding the focus of the training, we compared the effect sizes of IDPs focused on reflection and the effect sizes of IDPs focused
on skills. The comparison test was marginally insignificant (Q(1) = 3.576, p = .059) in favor of the IDPs focused on skills
(d = 0.489, SE = 0.085, Z = 5.723, p < .001, k = 11).
5.5.4. Responsibility approach
The last IDP feature that we analyzed was the Smith's (1992a,1992b) classification of IDPs that highlights the responsibility for
developing and implementing the intervention (i.e., management responsibility, partnership, or shop-floor). However, we did not find
significant differences (Q(1) = 0.229, p = .633) between IDPs initiated by the management and the IDPs that were initiated by the
staff developers (i.e., partnership IDPs).
5.6. Levels and types of evaluation outcome of IDPs
We conducted separate analyses (Table 7) on two outcome levels: outcomes assessed at academics' level and outcomes assessed at
students’ level. Overall, we have found that student-level effects (d = 0.396, SE = 0.079, Z = 6.072, p < .001) are slightly stronger,
as compared with the academics-level effects (d = 0.315, SE = 0.080, Z = 3.938, p < .001). However, given the evidence available,
this difference should not be treated as significant.
5.7. Publication bias
Publication bias is an issue of any literature review because research studies with insignificant results have smaller chances of
being published, as compared with research studies that reported significant results (Borenstein et al., 2009). If publication bias is
present, then it is possible that the averaged effects presented in the meta-analysis are inflated by the fact that the studies with small
effect sizes were unavailable to the authors of the review.
To assess publication bias, we used the funnel plot. The funnel plot is a graphical representation in which each study is plotted in a
bi-dimensional graph that has the standard error of the study on the vertical axis, and the effect size of the study on the horizontal
axis (Borenstein et al., 2009). If publication bias is present, then the funnel plot should be unsymmetrical on the left or on the right of
the averaged effect size. Furthermore, CMA (Borenstein et al., 2005) computes two quantitative indices (the Egger's test, and the
Duval and Tweedie trim and fill procedure) for assessing funnel plot symmetry.
The funnel plot for our sample of studies is presented in Fig. 2. The quantitative indices suggested that there was no publication
bias in our sample of studies. The Egger test was not statistically significant (intercept = −0.497; CI: 1.634, 0,640; p = .185 1-tailed)
and the trim and fill procedure signaled that no additional studies are needed in order to have a symmetrical funnel plot. Therefore, we
can conclude that publication bias is not present in the sample of studies included in our review.
6. Discussion and conclusion
In the past decades, the researchers and practitioners have invested substantial effort in developing and evaluating IDPs aimed at
improving the instructional approach used by academics (Hodgson & Wilkerson, 2014;Saroyan & Trigwell, 2015). Although this vast
literature was already reviewed several times, none of the nine previous reviews discussed the impact of IDPs in terms of effect sizes.
Therefore, we have limited information regarding the general effect size of IDPs in academia, or about the different factors that could
increase or decrease the IDPs effectiveness. Therefore, the main objective of the present work was to summarize the existing
quantitative evidence regarding the impact of IDPs in academia. To achieve this goal, we carried out a systematic literature search
from which resulted 20 controlled studies (that reported 23 effect sizes).
Table 7
The effectiveness of IDPs for levels and types of evaluation outcome of the program.
k Effect size 95% confidence interval Test of null (2-Tail) Heterogeneity
d SE (d) Var (d) Lower limit Upper limit Z p Q df (Q) p I
2
Overall academics-level effects 11 .315 .080 .006 .158 .471 3.938 <.001 5.421 10 .861 <.001
teachers' behavior 3 .357 .167 .028 .030 .685 2.137 .033 1.622 2 .444 <.001
teachers' conception 6 .281 .148 .022 -.009 .571 1.902 .057 1.360 5 .929 <.001
teachers' skill 3 .444 .302 .091 -.147 1.035 1.471 .141 4.414 2 .110 54.690
Overall student-level effects 16 .396 .079 .006 .242 .550 6.072 <.001 16.722 15 .336 10.299
Students' approaches of learning 4 .114 .176 .031 -.231 .458 .646 .518 0.151 2 .985 <.001
students' perceptions 14 .446 .081 .007 .288 .605 5.536 <.001 14.044 13 .371 7.435
Notes: k = the number of studies included in the analysis; d= the averaged effect size; SE = standard error of the averaged effect size; Var
(d) = variance of the averaged effect size; Z= the statistical test used for computing the significance of the average effect size; Q= the statistical
test used for the estimation of heterogeneity; I
2
=the degree of inconsistency in the studies' results.
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
13
6.1. Implications for instructional development practice and future research
Our overall results indicated that the effectiveness of IDPs has a small effect size and that this effect is similar from one study to
another (i.e., is an homogeneous effect size). This finding suggests that the development and the implementation of IDPs enhanced
the self-reported and student-reported teaching behaviors in higher education. Similar opinions are advanced by Schneider and
Preckel (2017). The two authors revised 38 meta-analyses published between 1980 and 2014. The studies included in these 38 meta-
analyses collected more than two million students. Based on this input, Schneider & Preckel presented 105 factors associated with
achievement in higher education. Thus, in their conclusion section, the authors sustained that their findings highlighted the im-
portance of the pedagogical training of teachers in higher education (Schneider & Preckel, 2017).
6.1.1. Trainees’ characteristics
Because the majority of the previous reviews emphasized the necessity to investigate the relevance of trainees' characteristics for
the final effect of an IDP (e.g., De Rijdt et al., 2013;Levinson-Rose & Menges, 1981;Stes et al., 2010a), we analyzed the impact of
three participants’ characteristics: enrollment,specialization and gender. Our results indicated that mandatory participation of aca-
demics in IDPs generated stronger effects, as compared with voluntary participation. This result is not convergent with previous
studies that presented the voluntary participation as a characteristic of a successful IDP (e.g., Cilliers & Herman, 2010;McAlpine,
2003). However, we must emphasize that this result should be treated with caution because only three included studies had man-
datory participation (Ebrahimi & Kojuri, 2012;Lewis & Orvis; 1973;Notzer & Abramovitz, 2008). Moreover, we did not carry out a
direct comparison between mandatory IDPs and voluntary IDPs. So, it is correct to advance the idea that mandatory IDPs could be
effective training practice rather than sustain that it is a better practice than voluntary IDPs. Therefore, this result should encourage
researchers to provide more evidence regarding the mandatory enrollment of academics in IDPs.
Stes and her collaborators (2010a), concluded that discipline-specific IDPs have an impact that is similar to the impact of dis-
cipline-general IDPs. Our results suggested that this might not be the case, because IDPs in which participants have the same spe-
cialization tend to have a larger effect sizes in comparison to IDPs in which participants have different specializations. One possible
explanation for this divergent finding could be that discipline-specific IDPs on our sample are from medical sciences universities.
Practices of instructional development for medical academics have a longer history than similar practices in other fields (Hodgson &
Wilkerson, 2014). Therefore, our results suggest that the medical instructional development practice is an important reference for
academic developers interested in discipline-general IDPs, or in other discipline-specific IDPs (e.g., economics or math).
Finally, because none of the previous reviews advanced conclusions on the relationship between participants' gender balance and
the IDPs' effectiveness (e.g., Steinert et al., 2016;Stes et al., 2010a), we also investigated this potential moderator. Our results
indicated a close to null relationship between participants’ gender and final impact of IDPs. However, these results should be treated
with caution because the majority of participants were male in most studies (i.e., 11 out of 12).
Our results suggested that participants' characteristics can be important for the success of an IDP. However, because of poor
reporting practices, we analyzed only three such characteristics. In order to improve our knowledge on relevance of trainees'
characteristics, future research studies should provide an adequate report of participants’ characteristics (as proposed in the third
section of the comprehensive framework, see Supplemental Material). Also, future research should consider the learner character-
istics proposed by De Rijdt et at. (2013) as variables that could influence the final impact of an IDP (e.g., amount of experience or
motivation).
6.1.2. Features of the IDPs
Previous literature reviews (Amundsen & Wilson, 2012;De Rijdt et al., 2016,2013;Stes et al., 2010a) presented various
Fig. 2. The funnel plot for publication bias.
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
14
conclusions regarding the importance of IDPs features but did not address the issue of IDPs effectiveness using effect size. We
analyzed the impact of different features of the IDPs (i.e., duration,instructional focus,nature of instruction,transfer of learning and
responsibility approach). With the exception of duration and instructional focus, our results generally suggested that the features of IDPs
do not moderate the IDPs effect.
According to our results, one-time events reported stronger effect sizes, as compared with events that were extended over time.
The higher impact reported for short IDPs was also confirmed when we calculated the impact of IDPs based on their overall duration.
IDPs lasting less than 15 h reported the highest effect size. These results are in a sharp contrast with the conclusions drawn by the
previous reviewers (e.g., De Rijdt et al., 2013;McAlpine, 2003;Steinert et al., 2016;Stes et al., 2010a) who suggested that extended
over time events tended to produce more positive effects, as compared with one-time events. A more thorough analysis of these short
IDPs highlighted that nine out of the ten one-time events aimed at improving teachers' skills. So, it seems that the relationships
between the duration of the program and the instructional focus is what makes those IDPs effective. A possible explanation for this
unexpected result comes from the cognitive load theory (Sweller, 1976;Schnotz & Kürschner, 2007). From the cognitive load per-
spective, it is possible that shorter and skill-focused IDPs may have higher effects because they are less demanding (i.e., have smaller
cognitive load), as compared with long-term, more complex IDPs that have a high cognitive load. However, this hypothesis should be
tested by future research studies. From a practical point of view, one-time IDPs are highly effective when their aim is to increase
teachers’ skills. Therefore, a general conclusion that one-time events reported higher effects size in comparison with extended over
time events should be treated with caution. This means that we cannot formulate any conclusions regarding the effect of short-term
IDPs that do not focus on teaching skills. Furthermore, it is possible that the key of IDPs effectiveness is related to the internal
connection between various elements of IDPs, not just one specific training feature. This conclusion was also anticipated by Stes et al.
(2010a) and Amundsen and Wilson (2012) in their reviews. Thus, future research could advance the knowledge in the field by
describing the IDP in detail (as proposed in the fourth and fifth sections of our comprehensive framework, see Supplemental Material)
and by adding more information regarding the core characteristics of the program.
6.1.3. Outcomes of the IDPs
Our results confirmed the positive effects of IDPs on almost all levels of outcomes, and the effect size was similar across all
outcomes. Unfortunately, we could not estimate whether the effects of IDPs on teachers-level are also present on student-level,
because only three studies reported results on both levels (Gibbs & Coffey, 2004;Erickson & Sheehan, 1976 – for both experimental
conditions; Mladenovici et al, 2019). Like Stes et al. (2010a) nine years ago, we did not find any controlled studies which reported
results at the institutional-level, therefore we cannot formulate any conclusions regarding the effectiveness of IDPs at this level.
Based on our results, we can conclude that the literature needs more studies that investigate the effects of IDPs at an institutional
or student-level (going beyond the students' perception of teachers behaviour, considering students' approaches to learning, or
students' achievement). Investigations into changes in teachers' behaviour as perceived by experts and/or peers (not only on self-
reports of participants) could also be useful for completing this picture. In addition, there is an even stronger need to understand the
network interrelationship between different IDPs outcomes. Studies which investigate the impact of IDPs on three or four different
outcomes (e.g., teachers' behavior, students' perceptions and students’ achievement) would advance our understanding about IDPs
effects. Lessons from the organizational psychology field (e.g., theoretical frameworks, research design etc.) could be very useful for
carrying out such studies (Bell, Tannenbaum, Ford, Noe, & Kraiger, 2017).
6.1.4. Duration of the IDPs effect
Our analysis regarding the time lag between the end of IDP and the evaluation moment provided inconclusive results. Although
the highest effect size was reported when IDPs effectiveness was measured one year after the intervention, the studies that measured
the effectiveness of IDPs using time lags between 1 and 3 months reported the lowest effect size. Only three studies (i.e., Baroffio
et al., 2006;Ho et al., 2001;Mladenovici et al, 2019) measured the impact of IDPs using at least two different moments (e.g., at the
end of the IDPs and after one year). More evidence is needed to improve our understanding regarding the duration of the IDPs’ effects,
using longitudinal designs that investigate the impact of IDPs at different moments in time.
6.2. Limitations
The findings of the present review have some limitations as follows. Firstly, more than a half of the IDPs analyzed in this review
were conducted in the USA (12 out of 23 IDPs), eight in Europa and the remaining three in Asia. Therefore, more research is needed
to ensure higher levels of generalization. Secondly, during our literature review process, we identified 42 studies that met the
eligibility criteria having an adequate research design to be included in a meta-analysis. Out of these, 23 papers did not include
complete statistical data needed to compute the effect size and, consequently, were excluded. Despite this limitation, based on our
publication bias analysis we believe that it is unlikely that these studies could have made a considerable impact on the overall effect
size of IDPs. However, authors and editors should pay more attention to the process of reporting research data. In order to facilitate
such a process, we advanced a comprehensive framework for reporting future research studies in the field (see Supplemental
Material).
Another limitation of the present review is that we could not estimate the impact of study quality on our overall results. As
Cheung and Slavin (2016) concluded, the effect sizes reported in Education Sciences are stronger in published articles and quasi-
experimental, small-scale trials, as compared with unpublished and large-scale studies. Because we included only controlled trials, all
our eligible sources were published articles that reported quasi-experimental small-scale trials. Therefore, we could not divide our
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
15
sample between highly qualitative and poorly designed studies, in order to investigate this possible moderator variable. This lim-
itation also has a potential impact regarding the validity of the measured outcomes, and regarding the types of IDPs included in these
small-scale trials.
Finally, in the present review we estimated the IDPs’ effectiveness by contrasting the outcomes of an intervention group with the
outcomes of a passive control group. Although this is a common practice in literature reviews aimed at assessing intervention
effectiveness, it also generates a limitation regarding our moderator analyses. In our moderator analyses, we could only perform
indirect comparisons (e.g., mandatory IDPs were compared with their control groups, and voluntary IDPs were compared with their
control groups), not direct comparisons (i.e., we did not compute de difference between the outcomes of a mandatory group and a
voluntary group). Therefore, we encourage future research studies to include both passive and active control groups (i.e., groups that
receive a different IDP).
6.3. Conclusion
In the present review we found a small effect size of IDPs for academics. For the first time, we used a quantitative framework to
investigate the evidence regarding the impact of different moderators on the final effect of an IDP. Although it seems that not many
factors increase or decrease the final effect of an IDP, one of the most relevant factors would seem to be the internal connection
between features of instructional development (e.g., duration and goals).
Generally, all research design suggestions and topics that have been advanced by the previous reviewers are still valid for future
developments in the field. However, additional insights could be gained from longitudinal controlled studies that assess the effec-
tiveness of IDPs on different levels of outcomes and at different moments over time. In order to facilitate the understanding of IDPs
effects, more attention should be paid to describing the instructional programs and their institutional context in detail. Finally,
comprehensive reporting of the statistical data would be very useful for increasing knowledge in the field of instructional devel-
opment literature.
CRediT authorship contribution statement
Marian D. Ilie: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project
administration, Resources, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing. Laurențiu P.
Maricuțoiu: Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision,
Validation, Visualization, Writing - original draft, Writing - review & editing. Daniel E. Iancu: Data curation, Formal analysis,
Investigation, Methodology, Project administration, Resources, Software, Validation. Izabella G. Smarandache: Data curation,
Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation. Velibor Mladenovici: Data
curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation. Dalia C.M. Stoia:
Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation. Silvia A. Toth:
Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation.
Acknowledgments
The first two authors contributed equally to this paper. The first author is also the corresponding author. This work was partially
supported by two grants of the Romanian Ministry of National Education, CNFIS-UEFISCDI, projects number CNFIS-FDI-2017-0518
and CNFIS-FDI-2018-0063. This organization had no role in the design and implementation of the study.
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.edurev.2020.100331.
References
References marked with an asterisk indicate studies included in the present review.
∗Alemoni, L. M. (1978). The usefulness of student evaluation in improving college teaching. Instructional Science, 7, 95–105. https://doi.org/10.1007/BF00121277.
American Educational Research Association (2006). Standards on reporting on empirical social science research in AERA publications. Educational Researcher, 35,
33–40. https://doi.org/10.3102/0013189x035006033.
American Psychological Association (2002). Criteria for evaluating treatment guidelines. American Psychologist, 57, 1052–1059. https://doi.org/10.1037//0003-066X.
57.12.1052.
Amundsen, C., & Wilson, M. (2012). Are we asking the right questions? A conceptual review of the educational development literature in higher education. Review of
Educational Research, 82, 90–126. https://doi.org/10.3102/0034654312438409.
Bailey, J. G. (1999). Academics' motivation and self-efficacy for teaching and research. Higher Education Research and Development, 18, 343–359. https://doi.org/10.
1080/0729436990180305.
∗Baroffio, A., Nendaz, M. R., Perrier, A., Layat, C., Vermeulen, B., & Vu, N. V. (2006). Effect of teaching context and tutor workshop on tutorial skills. Medical Teacher,
28, 112–119. https://doi.org/10.1080/01421590600726961.
Bell, B. S., Tannenbaum, S. I., Ford, J. K., Noe, R. A., & Kraiger, K. (2017). 100 years of training and development research: What we know and where we should go.
Journal of Applied Psychology, 102, 305–323. https://doi.org/10.1037/apl0000142.
Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2005). Comprehensive meta-analysis (version 2) [computer software]. Englewood: Biostat.
Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2009). Introduction to meta-analysis. Chichester: Wiley.
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
16
Cheung, A. C., & Slavin, R. E. (2016). How methodological features affect effect sizes in education. Educational Researcher, 45, 283–292. https://doi.org/10.3102/
0013189X16656615.
Cilliers, F. J., & Herman, N. (2010). Impact of an educational development program on teaching practice of academics at a research-intensive university. International
Journal for Academic Development, 15, 253–267. https://doi.org/10.1080/1360144X.2010.497698.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.
∗Cole, A. K., Barker, L. R., Kolodner, K., Williamson, P., Wright, S. M., & Kern, D. E. (2004). Faculty development in teaching skills: An intensive longitudinal model.
Academic Medicine, 79, 469–480. Retrieved from: https://journals.lww.com/academicmedicine/Fulltext/2004/05000/Faculty_Development_in_Teaching_Skills__
An.19.aspx.
Collins, M. L. (1978). Effects of enthusiasm training on preservice elementary teachers. Journal of Teacher Education, 29, 53–57. https://doi.org/10.1177/
002248717802900120.
De Rijdt, C., Dochy, F., Bamelis, S., & Van Der Vleuten, C. (2016). Classification of staff development programmes and effects perceived by teachers. Innovations in
Education & Teaching International, 53, 179–190. https://doi.org/10.1080/14703297.2014.916543.
De Rijdt, C., Stes, A., Van Der Vleuten, C., & Dochy, F. (2013). Influencing variables and moderators of transfer of learning to the workplace within the area of staff
development in higher education: Research review. Educational Research Review, 8, 48–74. https://doi.org/10.1016/j.edurev.2012.05.007.
∗Ebrahimi, S., & Kojuri, J. (2012). Assessing the impact of faculty development fellowship in Shiraz University of medical sciences. Archives of Iranian Medicine, 15,
79–81 doi:012152/AIM.005.
∗Erickson, G. R., & Sheehan, D. S. (1976). An evaluation of a teaching improvement process for university faculty. Paper presented at the Annual meeting of the American
educational research association. San Francisco, April 19-23, 1976. Retrieved from https://eric.ed.gov/?id=ED131111.
∗Gibbs, G., & Coffey, M. (2004). The impact of training of university teachers on their teaching skills, their approach to teaching and the approach to learning of their
students. Active Learning in Higher Education, 5, 87–100. https://doi.org/10.1177/1469787404040463.
∗Goldman, J. A. (1978). Effect of a faculty development workshop upon self-actualization. Education, 98, 254–258. Retrieved from https://eric.ed.gov/?id=
EJ178233.
Gwet, K. L. (2012). Handbook of interrater reliability (3rd ed.). Maryland: Advanced Analytics, LLC.
Henson, R. K. (2006). Effect size measures and meta-analytic thinking in counseling psychology research. The Counseling Psychologist, 34, 601–629. https://doi.org/10.
1177/0011000005283558.
∗Hewson, M. G., & Copeland, H. L. (1999). Exploring the outcomes of faculty development programs. Academic Medicine, 74, s68–s71 doi not available.
Hodgson, C. S., & Wilkerson, L. (2014). Faculty development for teaching improvement. In Y. Steinert (Vol. Ed.), Faculty development in the health professions: A focus on
research and practice. Innovation and changes in professional education: Vol. 11, (pp. 29–52). Dordrecht, Netherlands: Springer Science + Business Media.
∗Ho, A., Watkins, D., & Kelly, M. (2001). The conceptual change approach to improving teaching and learning: An evaluation of a Hong Kong staff development
programme. Higher Education, 42, 143–169. https://doi.org/10.1023/A:1017546216800.
Kallenbach, W. W., & Gall, M. D. (1969). Microteaching versus conventional methods in training elementary intern teachers. Journal of Educational Research, 63,
136–141. https://doi.org/10.1080/00220671.1969.10883958.
Kirkpatrick, D. L. (1994). Evaluating training programs: The four levels. San Francisco, CA: Berrett-Koehler Publishers.
Levinson-Rose, J., & Menges, R. J. (1981). Improving college teaching: A critical review of research. Review of Educational Research, 51, 403–434. https://doi.org/10.
3102/00346543051003403.
∗Lewis, D. R., & Orvis, C. C. (1973). A training system for graduate student instructors of introductory economics at the University of Minnesota. The Journal of
Economic Education, 5, 38–46. https://doi.org/10.1080/00220485.1973.10845380.
Lipsey, M. W., & Wilson, D. B. (2001). Practical metaanalysis. Thousand Oaks, CA: Sage.
McAlpine, L. (2003). Het belang van onderwijskundige vorming voor student gecentreerd onderwijs [The importance of instructional development for student
centered teaching]. In N. Druine, M. Clement, & K. Waeytens (Eds.). Dynamiek in het hoger onderwijs [Dynamics in higher education] (pp. 57–71). Leuven, Belgium:
Universitaire Pers.
∗Mladenovici, V., Ilie, M. D., Maricuțoiu, L. P., Iancu, D. E., & Smarandache, I. G. (2019). Effectiveness of an instructional development program for academics: Teachers’
approaches to teaching, students’ perceptions of teaching and approaches to learning. unpublished manuscript.
∗Notzer, N., & Abramovitz, R. (2008). Can brief workshops improve clinical instruction? Medical Education, 42, 152–156. https://doi.org/10.1111/j.1365-2923.2007.
02947.x.
∗Payne, D. A., & Hobbs, A. M. (1979). The effect of college course evaluation feedback on instructor and student perceptions of instructional climate and effectiveness.
Higher Education, 8, 525–533. https://doi.org/10.1007/BF00139792.
Prebble, T., Hargraves, H., Leach, L., Naidoo, K., Suddaby, G., & Zepke, N. (2004). Impact of student support services and academic development programmes on student
outcomes in undergraduate tertiary study: A synthesis of the research. Report to the Ministry of education. Massey University College of Education.
Prentice, D. A., & Miller, D. T. (1992). When small effects are impressive. Psychological Bulletin, 112, 160–164. https://doi.org/10.1037/0033-2909.112.1.160.
Renta-Davids, A. I., Jiménez-González, J. M., Fandos-Garrido, M., & González-Soto, A. P. (2016). Organizational and training factors affecting academic teacher
training outcomes. Teaching in Higher Education, 21, 219–231. https://doi.org/10.1080/13562517.2015.1136276.
Rienties, B., Brouwer, N., & Lygo-Baker, S. (2013). The effects of online professional development on higher education teachers' beliefs and intentions towards learning
facilitation and technology. Teaching and Teacher Education, 29, 122–131. https://doi.org/10.1016/j.tate.2012.09.002.
Saroyan, A., & Trigwell, K. (2015). Higher education teachers' professional learning: Process and outcome. Studies in Educational Evaluation, 46, 92–101. https://doi.
org/10.1016/j.stueduc.2015.03.008.
Schneider, M., & Preckel, F. (2017). Variables associated with achievement in higher education: A systematic review of meta-analyses. Psychological Bulletin, 143,
565–600. https://doi.org/10.1037/bul0000098.
Schnotz, W., & Kürschner, C. (2007). A reconsideration of cognitive load theory. Educational Psychology Review, 19, 469–508. https://doi.org/10.1007/s10648-007-
9053-4.
∗Skeff, M. K. (1983). Evaluation of a method for improving the teaching performance of attending physicians. The American Journal of Medicine, 75, 465–470. https://
doi.org/10.1016/0002-9343(83)90351-0.
∗Skeff, M. K., Stratos, G., Campbell, M., Cooke, M., & Jones, H. W. (1986). Evaluation of the seminar method to improve clinical teaching. Journal of General Internal
Medicine, 1, 315–322. https://doi.org/10.1007/BF02596211.
Smith, G. (1992a). A categorization of models of staff development in higher education. British Journal of Educational Technology, 23, 39–47. https://doi.org/10.1111/j.
1467-8535.1992.tb00308.x.
Smith, G. (1992b). Responsibility for staff development. Studies in Higher Education, 17, 27–41. https://doi.org/10.1080/03075079212331382746.
Steinert, Y., Mann, K., Anderson, B., Barnett, B. M., Centeno, A., Naismith, L., et al. (2016). A systematic review of faculty development initiatives designed to enhance
teaching effectiveness: A 10-year update: BEME no. 40. Medical Teacher, 38, 769–786. https://doi.org/10.1080/0142159X.2016.1181851.
Steinert, Y., Mann, K., Centeno, A., Dolmans, D., Spencer, J., Gelula, M., et al. (2006). A systematic review of faculty development initiatives designed to improve
teaching effectiveness in medical education: BEME no. 8. Medical Teacher, 28, 497–526. https://doi.org/10.1080/01421590600902976.
∗Stes, A., Coertjens, L., & Van Petegem, P. (2010b). Instructional development for teachers in higher education: Impact on teaching approach. Higher Education, 60,
187–204. https://doi.org/10.1007/s10734-009-9294-x.
∗Stes, A., Coertjens, L., & Van Petegem, P. (2013a). Instructional development in higher education: Impact on teachers' teaching behaviour as perceived by students.
Instructional Science, 41, 1103–1126. https://doi.org/10.1007/s11251-013-9267-4.
Stes, A., Gijbels, D., & Van Petegem, P. (2008). Student-focused approaches to teaching in relation to context and teacher characteristics. Higher Education, 55,
255–267. https://doi.org/10.1007/s10734-007-9053-9.
∗Stes, A., Maeyer, S. D., Gijbels, D., & Van Petegem, P. (2012a). Instructional development for teachers in higher education: Effects on students' perceptions of the
teaching–learning environment. British Journal of Educational Psychology, 82, 398–419. https://doi.org/10.1111/j.2044-8279.2011.02032.x.
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
17
∗Stes, A., Maeyer, S. D., Gijbels, D., & Van Petegem, P. (2012b). Instructional development for teachers in higher education: Effects on students' learning outcomes.
Teaching in Higher Education, 17, 295–308. https://doi.org/10.1080/13562517.2011.611872.
∗Stes, A., Maeyer, S. D., Gijbels, D., & Van Petegem, P. (2013b). Effects of teachers' instructional development on students' study approaches in higher education.
Studies in Higher Education, 38, 2–19. https://doi.org/10.1080/03075079.2011.562976.
Stes, A., Min-Lelived, M., Gijbels, D., & Van Petegem, P. (2010a). The impact of instructional development in higher education: The state-of-the-art of the research.
Educational Research Review, 5, 25–49. https://doi.org/10.1016/j.edurev.2009.07.001.
Sun, S., Pan, W., & Wang, L. L. (2010). A comprehensive review of effect size reporting and interpreting practices in Academic journals in education and psychology.
Journal of Educational Psychology, 102, 989–1004. https://doi.org/10.1037/a0019507.
Sweller, J. (1976). The effect of task complexity and sequence on rule learning and problem solving. British Journal of Psychology, 67, 553–558. https://doi.org/10.
1111/j.2044-8295.1976.tb01546.x.
Taylor, L., & Rege Colet, N. (2009). Making the shift from faculty development to educational development: A conceptual framework grounded in practice. In A.
Saroyan, & M. Frenay (Eds.). Building teaching capacities in higher education: A comprehensive international model. Sterling, VA: Stylus Publishing.
Weimer, M., & Lenze, L. F. (1998). Instructional interventions: A review of the literature on effort to improve instruction. In R. Perry, & J. Smart (Eds.). Effective
teaching in higher education (pp. 205–240). New York, NY: Agathon Press.
M.D. Ilie, et al. Educational Research Review 30 (2020) 100331
18