Access to this full-text is provided by SAGE Publications Inc.
Content available from Language Teaching Research
This content is subject to copyright.
LANGUAGE
TEACHING
RESEARCH
https://doi.org/10.1177/1362168820981403
Language Teaching Research
2023, Vol. 27(5) 1268 –1292
© The Author(s) 2020
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/1362168820981403
journals.sagepub.com/home/ltr
The flipped classroom in
second language learning:
A meta-analysis
Joseph P. Vitta
Rikkyo University, Japan
Ali H. Al-Hoorie
Royal Commission for Jubail and Yanbu, Saudi Arabia
Abstract
Flipped learning has become a popular approach in various educational fields, including second
language teaching. In this approach, the conventional educational process is reversed so that
learners do their homework and prepare the material before going to class. Class time is then
devoted to practice, discussion, and higher-order thinking tasks in order to consolidate learning.
In this article, we meta-analysed 56 language learning reports involving 61 unique samples
and 4,220 participants. Our results showed that flipped classrooms outperformed traditional
classrooms, g = 0.99, 95% CI (0.81, 1.17), z = 10.90, p < .001. However, this effect had high
heterogeneity (about 86%), while applying the Trim and Fill method for publication bias made it
shrink to g = 0.58, 95% CI (0.37, 0.78). Moderator analysis also showed that reports published
in non-SSCI-indexed journals tended to find larger effects compared to indexed ones, conference
proceedings, and university theses. The effect of flipped learning did not seem to vary by age,
but it did vary by proficiency level in that the higher proficiency the higher the effects. Flipped
learning also had a clear and substantial effect on most language outcomes. In contrast, whether
the intervention used videos and whether the platform was interactive did not turn out to be
significant moderators. Meta-regression showed that longer interventions resulted in only a slight
reduction in the effectiveness of this approach. We discuss the implications of these findings and
recommend that future research moves beyond asking whether flipped learning is effective to
when and how its effectiveness is maximized.
Keywords
CALL, flipped learning, foreign language learning, research synthesis, second language learning
Joseph P. Vitta, is now affiliated to Kyushu University, Japan
Corresponding author:
Joseph P. Vitta, Kyushu University, Fukuoka Prefecture, Fukuoka, 819-0395, Japan.
Email: vittajp@flc.kyushu-u.ac.jp
981403LTR0010.1177/1362168820981403Language Teaching ResearchVitta and Al-Hoorie
research-article2020
Article
Vitta and Al-Hoorie 1269
I Introduction
Education has traditionally been viewed as the transfer of information from the teacher
to learners within the context of the classroom, though the desire to move away from
this paradigm has existed for some time (e.g., Freire, 1968/1970). Although flipped
learning was not the first paradigm to challenge this traditional model, it has recently
emerged as a popular and topical alternative to teacher-dominated instruction across
various educational domains (van Alten, Phielix, Janssen, & Kester, 2019) and espe-
cially in the second language (L2) field (Mehring & Leis, 2018). Flipped learning (or
flipped classrooms) is colloquially described as a process of ‘flipping’ what has tradi-
tionally been done inside the classroom to independent homework activities preced-
ing the lesson. Thus, the lesson involves problem-solving and higher-order thinking
tasks traditionally assigned to subsequent homework activities (Låg & Sæle, 2019;
Mehring, 2016, 2018). Over the past several decades, flipped learning has become
one of the most discussed trends in education for both practitioners and researchers.
Consider that a non-profit organization, the Flipped Learning NetworkTM (www.
flippedlearning.org; see Hamdan, McKnight, McKnight, & Arfstrom, 2013), has also
been established to help teachers flip their classrooms more effectively, while confer-
ences are regularly taking place around the globe for teachers to share techniques and
tips on this approach.
Another clear indication of the interest flipped learning has generated is the amount
of research conducted on it. Meta-analyses and systematic reviews subsequently appeared
across varied domains such as higher education (Lundin, Bergviken Rensfeldt, Hillman,
Lantz-Andersson, & Peterson, 2018), engineering education (Lo & Hew, 2019), health
professions education (Hew & Lo, 2018), nursing education (Xu et al., 2019), and L2
learning (Turan & Akdag-Cimen, 2019). In all of these review studies, the number of
flipped reports has increased dramatically over time. The same trend is observed in com-
prehensive reviews comparing the effectiveness of flipped learning across educational
domains (e.g. Cheng, Ritzhaupt, & Antonenko, 2019; Låg & Sæle, 2019; Shi, Ma,
MacLeod, & Yang, 2020). In sum, flipped learning has grown to be one of the most
influential phenomena within the broad educational arena.
While flipped classroom research has grown exponentially in recent years and has
been the focus of several meta-analyses and systematic reviews, the effectiveness of the
flipped classroom for L2 learning has admittedly been under-researched. Consider a
recent synthesis by Turan and Akdag-Cimen (2019), who conducted a systematic review
of 43 published L2 reports. Because their work was a systematic review of published
reports, their article does not present summary effect size estimates or moderator analy-
ses, nor does its scope cover unpublished reports, thus raising the risk of publication bias.
In their recent meta-analysis of flipped classroom interventions across educational
domains, Shi et al. (2020) included only six L2 reports that were subsumed under a gen-
eral ‘social sciences’ label. Strelan, Osborn, and Palmer (2020) in a similar vein sub-
sumed second language flipped reports under a broader ‘humanities’ label. L2 reports
were also subsumed under humanities in another comprehensive recent flipped meta-
analysis (Låg & Sæle, 2019) with the authors noting that about 70% (k = 23) of these
humanities reports were L2-focused.
1270 Language Teaching Research 27(5)
Considering the above, there appears to be a lack of focus on meta-analytic work on
L2 flipped learning. This is problematic given the diverse range of skills (e.g. writing and
reading) and underlying competencies (e.g. vocabulary and pragmatics) that underpin L2
proficiency theory and frameworks (Council of Europe, 2001, 2011; Green, 2012;
Halliday & Matthiessen, 2014) which often act as L2 learning outcomes as highlighted
in Green’s (2012) presentation of English Profile Programme, an L2 curriculum plan
operationalizing the Common European Framework of Reference for languages (CEFR;
Council of Europe, 2001, 2011) guidelines. The subsuming of L2 studies under ‘humani-
ties’ without further exploration (e.g. Strelan et al., 2020) does not fully account for the
effectiveness of L2 flipped learning given the range of skills and competencies that
underpin language learning. In other words, an L2-focused meta-analysis of flipped
learning interventions is needed to better understand how effective flipped learning is
within the multidimensional L2 space, and we address this gap in this current article. As
with recent meta-analyses of L2 classroom pedagogy, such as Bryfonski and McKay’s
(2019) task-based learning meta-analysis, the current meta-analysis has implications for
both future research and practice, and this has been considered in its construction and
analysis of findings.
II Flipped learning
1 Definition of flipped learning
Despite its popularity, flipped learning has been somewhat inconsistently defined by edu-
cational researchers and practitioners (Mehring & Leis, 2018; van Alten et al., 2019). In a
general sense, there is agreement on the ‘flipped’ or ‘inverted’ aspect of the approach,
where classroom teaching and independent learning are switched. The disagreement is on
what exactly flipping a classroom means. For some (e.g. Bergmann & Sams, 2012;
Mehring, 2018), the essence of the flipped approach is this pedagogical shift to presenting
new content before class, allowing the teacher and students to apply this new content in
meaningful ways during class time. The manner in which the content is presented to stu-
dents outside of the class (i.e. via technology or not) is assumed to be inconsequential.
From this perspective, flipped learning has been described as having its roots in the 1980s
when active learning emerged within educational circles emphasizing learning by doing
(Ryback & Sanders, 1980). For other flipped theorists (e.g. Adnan, 2017; Evseeva &
Solozhenko, 2015), flipped learning is heavily dependent on (digital) technology to allow
students to engage with new content. This latter definition appears to be especially popular
in recent L2 flipped learning scholarship, where both primary studies (e.g. Chen Hsieh,
Wu, & Marek, 2017; Hung, 2015, 2017) and research syntheses (e.g. Turan & Akdag-
Cimen, 2019) have included or emphasized technology in their definitions of flipped appli-
cations. In the context of the present study, we have followed the example of recent flipped
learning meta-analyses (e.g. Låg & Sæle, 2019) and adopted a general definition where: “A
flipped intervention first involves presentation of new content to learners to be indepen-
dently studied before class, and then class time is devoted to reinforcing and engaging
with the ‘flipped’ content”.
Vitta and Al-Hoorie 1271
2 Anticipated effects of flipped learning
In addition to disagreement on certain definitional specifics, the case has been made both
for and against flipped learning. The case for flipped learning is grounded in its optimiza-
tion of class time use (Mehring & Leis, 2018; Voss & Kostka, 2019). Compared to tradi-
tional lecturing, flipped learning pushes learners toward developing ‘the upper cognitive
levels of the taxonomy where knowledge application and skill building are happening’
(Davis, 2016, p. 2). Concerns about flipped classrooms has been expressed along at least
three lines.
The first concern is that learners have to be able to comprehend the flipped content
independently. This can be challenging for learners with lower proficiency levels, espe-
cially when such material is in the target language. In this vein, Milman (2012) suggested
that flipped learning might not be ideal for such learners as they may not have the chance
to ask for clarifications in real time. Milman also posited that procedural, factual, con-
ceptual, and metacognitive knowledge were best for flipped interventions, while L2
learning outcomes such as initial vocabulary learning (meaning to form mapping; Nation,
2013) may not always clearly fit into these areas.
The second concern is that the demand of flipped learning can be impractical. Mehring
(2016), for instance, argues that flipping classes requires a substantial amount of effort
and planning from the teacher. On the student side, because this approach requires each
student to show a level of proactivity and self-directedness in learning the new content
before class, flipped lessons could fail if the students, in the aggregate, do not effectively
perform the activities assigned to them (Mehring, 2018). Language learners, especially
those with lower proficiency levels, may not be able to benefit fully from independent
study outside of class.
Finally, there is also the argument, as summarized by Webb and Doman (2016), that
modern language teaching approaches such as communicative language teaching and
thereby task-based language teaching (see Ellis, 2009) are essentially flipped learning by
another name. The argument here is students prepare for the communicative activity and/
or task before engaging with it (e.g., pre-task; see Ellis, 2009) and thus the ‘flip’ has already
been baked into these approaches. Mehring (2016, 2018) noted however that flipped learn-
ing is not defined by interaction on even pre-task planning but by the agency that students
are provided while studying content on their own and then by the active engagement with
that content during class time. Overall, the above concerns make it essential to find out how
effective flipped learning is when it comes to the long and arduous process of L2 learning,
and whether its effectiveness varies by learner level and target L2 outcome.
3 Applications of L2 flipped learning
Considering the various skills involved in L2 learning, research has examined the effec-
tiveness of the flipped approach on different learning outcomes. For example, some
interventions targeted writing performance (e.g. Leis, Cooke, & Tohei, 2015), vocabu-
lary development (e.g. Oh, 2017), and standardized tests such as TOEIC (e.g. Ishikawa
et al., 2015). These investigations were implemented on a range of age groups, from
elementary (e.g. Baş & Kuzucu, 2009) to adult learners (e.g. Karimi & Hamzavi, 2017).
1272 Language Teaching Research 27(5)
This body of research has also relied heavily on technology. Some research reflected
the argument that flipped learning went hand in hand with technology by emphasizing
the use of technology, such videos and apps, to deliver the content outside of the class-
room (Alnuhayt, 2018; Chen Hsieh et al., 2017), though little space is usually devoted to
explaining how extra class time was used. Some of these applications adopted a Web 1.0
framework (Lomicka & Lord, 2016). Mori, Omori, and Sato (2016), for instance, used
PowerPoint and other one-way technology to flip their teaching of Japanese writing
characters, kanji. On the other hand, technology was also used to facilitate student inter-
action outside of the class via learning management systems and well-known Web 2.0
applications such as chat boards and blogs (Lin & Hwang, 2018; Lin, Hwang, Fu, &
Chen, 2018).
Given the flexibility of this approach, this wide range manner in which flipped learn-
ing has been applied should not be surprising. Theorists such as Bergmann and Sams
(2012) and Mehring (2016, 2018) emphasized the need for flipped applications to maxi-
mize class time for higher-order thinking activities, while investigators such as Hung
(2015) and AlJaser (2017) detailed how the lesson was used to facilitate cognitively
engaging and student-centered tasks when describing their flipped interventions. On the
other hand, Chen Hsieh et al. (2017) and Alnuhayt (2018) focused more on how the fea-
tures of technology were used to ‘flip’ the content. L2 flipped learning applications have
thus varied in their contexts, learning outcomes, use and engagement with technology,
and focus on class time use.
III Past non-L2 flipped learning meta-analyses
While there have been arguments at the theoretical level for and against flipping class-
rooms in the context of L2 learning, the actual empirical evidence from these diverse
applications points to the approach being effective across many contexts and domains.
The results of a number of non-L2 meta-analyses conducted recently (see Table 1) show
that effect sizes tend to range from around 0.20 to just over 0.50. In one case (Xu et al.,
2019), the average effect size approached 1.80, which is substantial. This ‘extra’ large
effect might be attributed to the fact that that meta-analysis was limited to 22 published
reports related to the Chinese context, and so this magnitude may not be representative
of unpublished research and research published elsewhere.
Table 1. Examples of recent flipped learning meta-analyses in different disciplines.
Effect size (95% CI) kDomain
Strelan etal. (2020) g = 0.50 (0.42, 0.57) 198 Cross-disciplinary
Cheng etal. (2019) g = 0.19 (0.11, 0.27) 55 Cross-disciplinary
Låg and Sæle (2019) g = 0.35 (0.31, 0.40) 272 Cross-disciplinary
Lo and Hew (2019) g = 0.29 (0.17, 0.41) 29 Engineering education
Shi etal. (2020) g = 0.53 (0.36, 0.70) 60 Cross-disciplinary
van Alten etal. (2019) g = 0.36 (0.28, 0.44) 114 Cross-disciplinary
Xu etal. (2019) d = 1.79 (1.32, 2.27) 22 Nursing education in China
Vitta and Al-Hoorie 1273
Furthermore, all these meta-analyses found high heterogeneity in their effects. In an
attempt to explain this heterogeneity, the researchers employed various moderators.
Examples of commonly used moderators are age of learners (or educational level), sub-
ject of study, and duration of the intervention. Låg and Sæle (2019) found that subject of
study did not predict the effectiveness of flipped learning significantly. In contrast,
Cheng et al. (2019) found that it did vary by discipline. The largest effect was for arts and
humanities, g = 0.63, 95% CI (0.16, 1.10), but the effect was non-significant for engi-
neering education, g = −0.08, 95% CI (−0.25, 0.08). Cheng et al. (2019) also investi-
gated the role of study duration. Interventions that were less than one semester tended to
obtain larger effects, though this was not significantly different from interventions last-
ing one semester or longer. Similarly, whether the intervention used videos or not did not
seem to have an effect (Lo & Hew, 2019).
Some meta-analysts examined study quality as part of moderator analysis. van Alten
et al. (2019), for example, investigated three aspects related to study quality, all of
which turned out to be non-significant. They compared allocation type (i.e. non-ran-
dom, pre-existing groups, and individual allocation), group equivalence test (tested–
equal, tested–not equal, not tested–descriptive statement, and not tested–no descriptive
statement), and report source (journal article, conference proceeding, and university
thesis). Thus, heterogeneity has been consistently obtained in flipped learning meta-
analyses, though moderators used to date tend to either fail to explain it or explain it
inconsistently.
IV The present study
As reviewed above, past meta-analyses show that the effectiveness of flipped learning
can vary by discipline, and in some cases its effectiveness is non-significant (e.g. engi-
neering education; Cheng et al., 2019). This underscores the need for an L2-focused
meta-analysis, especially since within language learning there are different skills requir-
ing different learning and teaching strategies. The present study therefore aimed to con-
tribute to the research synthesis work on flipped learning by meta-analysing L2 flipped
learning interventions. As explained above, we adopted a broad definition of flipped
learning that includes, first, learners studying the material before class whether technol-
ogy-supported or not and, second, class time is then spent on learners engaging with that
material (Låg & Sæle, 2019). In cases where technology is employed to present new
content, flipped learning, according to our definition, also becomes a specific type of
blended learning as technology is being fused with face-to-face instruction (Mahmud,
2018; Zarrinabadi & Ebrahimi, 2019). Teng (2017) captured the intersection between
flipped and blended learning when stating that flipped learning is ‘a pedagogical method
to blended learning’ (p. 114).
To be more specific, we attempted to answer the following research questions:
1. To what extent does the flipped learning approach improve L2 learning compared
to traditional classroom teaching?
2. To what extent does the effectiveness of the flipped learning approach vary by L2
learning outcome?
1274 Language Teaching Research 27(5)
3. To what extent do learner characteristics (educational level and L2 proficiency
level), report characteristics (peer review and journal indexing), flipped applica-
tion characteristics (use of videos and interactive technology), and methodologi-
cal characteristics (reliability, pretesting, and duration of the intervention)
account for the observed variation in the effectiveness of flipped learning in L2
settings?
V Method
1 Inclusion criteria
In order to qualify for inclusion in the present meta-analysis, the report had to satisfy the
following inclusion criteria:
1. The report must apply a (quasi-)experimental design, whether between- or
within-group. A between-group design must involve at least one group of learn-
ers learning the material outside of class time and a comparison group learning
the same material via a traditional face-to-face approach during class time. A
within-group design must include a comparison of these two approaches alternat-
ing on the same learners.1
2. The report must establish pre-treatment equivalency between experimental
groups/conditions by either an empirical measurement (i.e. a researcher-adminis-
tered pretest relative to the outcome variable) or an argument referencing stu-
dents’ L2 ability/performance in relation to the outcome variable vis-à-vis a
standard proficiency scale, such as the CEFR.
3. The participants must be learning a language, whether English or another lan-
guage, whether as a second or additional language, and whether in a second or
foreign language context.
4. The report must include a quantitative dependent variable measuring gains in the
target language learning outcomes (e.g. vocabulary and writing proficiency).
5. The report must provide an effect size or sufficient statistics to calculate it.
6. The language of the report must be English.
2 Literature search
Following standard practice in meta-analyses, we conducted a keyword-driven database
search to build our report pool. However, given the particular features of L2 research
(discussed below), we commenced our search at the journal level and then moved to the
database level. In total, our literature search process had four stages.
Stage 1. As our meta-analysis was L2-specific, we expected the bulk of L2 flipped learn-
ing studies to be found in L2 journals. We therefore focused the initial stage of our search
on these journals (the following stages expanded this scope). We first created a list of 73
L2 and educational technology journals adapted from previous bibliometric work and
relevant flipped literature (Al-Hoorie & Vitta, 2019; Mehring, 2016; Vitta & Al-Hoorie,
Vitta and Al-Hoorie 1275
2017; Zhang, 2020; for the complete list; see Appendix A). Considering the inconsist-
ency in author-supplied keywords in L2 journals (see Lei & Liu, 2019), which could
limit our ability to obtain a comprehensive list of flipped learning reports, we then uti-
lized the Scopus search engine to search articles in these journals. The Scopus search
engine permits searching the title, abstract, keyword list, and other meta-data of each
article (Burnham, 2006). We used the keywords flip*, invert*, and blend*. We included
blend* because L2 researchers tend to view flipped learning as a pedagogic approach to
blended learning (Chen Hsieh et al., 2017; Hung, 2015; Teng, 2017). Journals were
searched with an ‘all time’ parameter where each journal was searched comprehensively
without time range limitations.
Although this step helped us avoid relying on author-supplied keywords, we still
wanted to ensure that our Scopus search was indeed comprehensive. We manually
inspected all articles in all issues of eight relevant journals (CALL-EJ, Computer Assisted
Language Learning, ReCALL, Language Learning & Technology, CALICO Journal,
Teaching English with Technology, JALTCALL, International Journal of Computer-
Assisted Language Learning and Teaching). Each journal was inspected after its auto-
mated processing, and this manual search did not uncover additional reports not captured
by the automated Scopus search, thus raising confidence in our search protocols.
Stage 2. We then expanded the search to EBSCO and ProQuest. Within EBSCO data-
base, our search covered OpenDissertations, Academic Search Ultimate, ERIC, and Edu-
cation Research Complete. Within ProQuest, our search covered Educational Database,
Linguistics Database, Psychology Database, and Social Science Database, as well as
ProQuest Thesis and Dissertation Global. In addition to the search keywords above, we
further limited the search at this stage by adding L2-specific keywords (second language
or foreign language or L2 or ESL or EFL) to filter out research conducted on other par-
ticipants. As with Stage 1, there were no time constraints, and the search was performed
at the ‘full text’ level with subsequent relevance ordering to facilitate a quicker screening
of false negatives.
Stage 3. In an attempt to minimize publication bias, we issued a call for papers request-
ing reports meeting our inclusion criteria. This call for papers was announced in various
L2 outlets including Linguist List, BAALmail, Korea TESOL, and IATEFL Research
SIG, as well as social media.
Stage 4. We finally conducted a saturation search to ensure our search was comprehen-
sive. We performed an ancestry search in three recent L2 flipped learning syntheses
(Filiz & Benzet, 2018; Mahmud, 2018; Turan & Akdag-Cimen, 2019) to find out whether
they included reports not captured by our search. We also searched two generic data-
bases: Google Scholar and AskZad. These two databases contain reports from non-
indexed journals as well as theses and dissertations not found in ProQuest.
Our literature search concluded in August 2019, resulting in 56 unique reports satisfying
our inclusion criteria (for the complete list; see Appendix B). Comparing the number of
reports in our pool to the domain-specific meta-analyses in Table 1, we note that it was
larger than that by Lo and Hew (2019, k = 29), Låg and Sæle (2019, k = 23), and Xu et al.
1276 Language Teaching Research 27(5)
(2019, k = 22). It was also larger than the number of quantitative reports found in L2
flipped learning systematic reviews, including Turan and Akdag-Cimen (2019, k = 21) and
Filiz and Benzet (2018, k = 25). Figure 1 presents a flow diagram of our search process.
3 Moderators
To operationalize research questions 2 and 3, we coded the reports for three groups of
moderators related to learners, report source, and design characteristics, the latter sub-
suming flipped application and methodological design features.
Regarding learner characteristics, we coded for educational stage: elementary, inter-
mediate, secondary, and university. We coded adult learners as university learners (k = 2).
Identification:
Total initial
reports
Scopus†
n = 54,630
EBSCO
n =
1,355
ProQuest
n =
20,600
ProQuest
Theses
n =
148,202
Call
for
Papers
n = 4
Screening:
Reports retained
after inspection of
titles andabstracts
n = 1,781n = 719 n =
1,700
n =
1,300 n = 4
Eligibility:
Reports retained
after inspection
of full texts
n = 41 n = 23 n = 7 n = 13 n = 4
Saturation Search
(Ancestry Search,
Google Scholar,
AskZad)
n = 14
Included: Total
reports retained
after removing
duplicates
n = 56
Figure 1. Flow diagram of the search process.
Notes. † Includes all articles in the 73 journals at Stage 1.
Vitta and Al-Hoorie 1277
We eventually compared secondary and university learners only due to the small number
of reports on the other educational stages (k = 3 combined). In previous L2 meta-analyses
(e.g. Bryfonski & McKay, 2019), proficiency was omitted because of the inherent diffi-
culty of standardized proficiency judgments across reports. In light of this, we imple-
mented a three-category proficiency moderator: 1) below intermediate, 2) intermediate,
and 3) above intermediate. Intermediate was anchored to B1 according to the CERF. As
an illustration, Ishikawa et al. (2015) was coded as ‘below intermediate’ as the reported
TOEIC scores were within the A2 range of 250 to 550; Karimi and Hamzavi (2017) was
coded as intermediate since reported Cambridge PET scores established a B1 level. The
remaining studies were coded in the same manner where either empirical evidence or an
argument anchoring the learners’ proficiency (e.g. to the CEFR) was presented as evi-
dence of the learners’ proficiency. Reports spanning multiple levels or omitting such pro-
ficiency evidence were not coded, and those reporting proficiency in a manner that makes
such anchoring not possible were likewise not coded.
As for report source, some reports did not undergo conventional editorial-driven peer
review (e.g. conference proceedings and university theses). Some methodologists rec-
ommend including such reports for comprehensiveness (e.g. Norris & Ortega, 2000), as
they may contain a higher proportion of statistically non-significant results (Dickersin,
2005). Similarly, it has been argued that reports published in journals have a higher like-
lihood of publication bias as significant results with noteworthy effects tend to be favored
(Fanelli, 2010). We therefore coded whether the report was published in a peer-reviewed
journal. Since there is also evidence suggesting that report quality can vary depending on
the indexing of the journal (Al-Hoorie & Vitta, 2019), we also compared these journals
in relation to their indexing in SSCI, Scopus, and other indices. Table 2 presents a break-
down of the report types in our pool.
We also examined the effect of certain design characteristics in two areas: flipped
application features and report methodological features. In relation to flipped applica-
tions, we examined the effect of whether the intervention utilized videos, and whether
the technology employed was interactive (Lo & Hew, 2019). An example of an ‘interac-
tive’ flipped intervention was Lin and Hwang (2018) where the content was presented
via Facebook, and students used the platform to discuss it with their peers and with the
instructor. In relation to methodological features, we examined whether the design
included an empirical pretest before the implementation of the treatment or relied on
Table 2. Types of reports satisfying our inclusion criteria.
Report type k
Journal:
SSCI and Scopus 14
Scopus only 12
Neither Scopus nor SSCI 19
Other:
Conference proceeding 4
Thesis/dissertation 7
1278 Language Teaching Research 27(5)
pre-existing holistic judgements, whether the reliability of dependent variable scores was
reported (Al-Hoorie & Vitta, 2019; Brown, Plonsky, & Teimouri, 2018), and how long
the intervention lasted (Cheng et al., 2019).
Finally, we tested whether the effectiveness of the flipped approach was related to the
L2 outcome targeted in the report. We compared the effectiveness of flipped learning on
the four skills (listening, speaking, reading, and writing) and two competencies (vocabu-
lary and grammar). When scores were combined across two or more L2 outcomes, we
coded the report as ‘multi-outcome’. Four reports had outcomes targeting performance
on standardized tests combining reading and listening scores (e.g. TOEIC Listening and
Reading). We coded these as ‘standardized tests’.
4 Data analysis
a Software. We used Comprehensive Meta Analysis 3.3 (Borenstein, Hedges, Higgins,
& Rothstein, 2014) for all analyses. We applied a random-effects model as we had no
reason to assume one common effect size underlying all reports (see Borenstein, Hedges,
Higgins, & Rothstein, 2009). We also examined heterogeneity using the I2-statistic and
its significance value. Significant heterogeneity suggests that the effect highly varies
from report to report, and this variability could potentially be explained through modera-
tor analysis of certain report characteristics.
b Publication bias. Publication bias can occur because of the tendency of journals to
favor significant results over non-significant ones. As a result, some non-significant
findings may not find their way to the research community, leading to what is commonly
known as the file-drawer problem (Rosenthal, 1979). We tested publication bias using
the Trim and Fill method (Duval & Tweedie, 2000a, 2000b). We also examined the
results of the classic fail-safe N test (Rosenthal, 1979), Orwin’s fail-safe N test (Orwin,
1983), and the p-curve (Simonsohn, Nelson, & Simmons, 2014) to further shed light on
potential bias.
c Coding. Initially, 40 reports were coded independently by two coders against our
inclusion criteria. This procedure resulted in 85% agreement (Cohen’s ᴋ = .70, p <
.001). All discrepancies were subsequently resolved by discussion until 100% agreement
was reached. The two coders then independently coded the effects of 16 reports (approxi-
mately 30%), resulting in 88% inter-coder agreement (ᴋ = .86, p < .001). All discrepan-
cies were also resolved by discussion until 100% agreement was reached. When a study
had multiple data collection points (e.g. several quizzes and a final exam), we used the
last test for the analysis (k = 5). If the report had multiple assessments for one dependent
variable (e.g. essay subdomains and an overall score), we used the most comprehensive
measure (k = 7). In one case, a report had two outcome variables; we selected the one
with the best construct validity corresponding to modern ‘complexity–accuracy–fluency’
theory (Pallotti, 2009).
d Effect size computation. Effect sizes were computed using Comprehensive Meta
Analysis software with Hedge’s g being the effect size metric employed to correct for
Vitta and Al-Hoorie 1279
smaller sample sizes. Each report was weighted by the inverse of its variance including
the estimated between-studies variance. Most effect sizes were directly estimated from
the means, standard deviations, and sample sizes. In cases where these data were una-
vailable, test statistics or other effect size metrics were used in tandem with sample size
to estimate g (for detailed formulae; see Borenstein et al., 2009). Thus, all selected
reports provided enough information to estimate effects. A small number of the reports
(k = 3) used within-participant designs. According to Lakens (2013), such effect sizes
are best estimated with gav when meta-analysing them with between-participant effects.
Nevertheless, gav values are always nearly identical to gs (for between-participant effects;
Lakens, 2013), and this was the case with our data. Therefore, g has been employed sub-
suming gs and gav.
VI Results
The reports included in our pool were interventions conducted in different parts of the
world, though the target language in almost all of these reports was English. Only a
minority of reports tested the effectiveness of the flipped approach on learning other
languages, such as Chinese (k = 2), Japanese (k = 2), and Korean (k = 1). Only a few
studies, also, reported the results for each gender separately (kfemale = 5, kmale = 2)
whereas the remainder reported the results for the two genders combined. Some of these
reports were unpublished university theses/dissertations (k = 7). As mentioned above,
most of these reports adopted a between-participant design, whereas a few were within-
participant (k = 3). These reports involved 61 unique samples and 4,220 learners.
Using a random-effects model, the results showed that groups receiving the flipped
intervention achieved significantly better than those receiving traditional face-to-face
teaching, g = 0.99, 95% CI (0.81, 1.16), z = 10.90, p < .001. This average effect size
exhibited substantial heterogeneity, Q(60) = 432.82, I2 = 86.14, p < .001. These results
indicate that around 86% of the dispersion of the true effect is over and above sampling
error and is potentially explainable by certain moderator variables.
In relation to publication bias, the classic fail-safe N test showed that 694 missing
reports would be required to bring the effect size down to zero, z = 26.02, p < .001. The
Orwin’s fail-safe N also showed that 58 additional reports are needed to reduce the effect
size to below 0.40, the generally recognized threshold for effective educational interven-
tions (Hattie, 2009). These results provide strong evidence of a non-zero effect size.
Similarly, the p-curve did not indicate evidence of questionable research practices such
as p-hacking (Figure 2). The p-curve included 45 statistically significant results (p <
.05), of which 38 were significant at p < .025.
However, the Trim and Fill method did suggest the possibility of publication bias. As
Figure 3 shows, reports with smaller samples tended to find larger effect sizes. This
analysis showed that there could be at least 17 missing reports. Adjusting for these miss-
ing reports made the average effect size shrink, g = 0.58, 95% CI (0.37, 0.78). This sug-
gests that the 0.99 effect size originally obtained might be inflated.2
In relation to research question 2, the moderator analysis revealed some interesting
results in relation to the target L2 outcomes investigated (Table 3). The findings showed
that flipped learning had a non-significant effect on reading and standardized tests, as
1280 Language Teaching Research 27(5)
Figure 2. Results of p-curve analysis.
-4 -3 -2 -1 01234
0.0
0.1
0.2
0.3
0.4
0.5
Standard Error
Hedges's g
Funnel Plot of Standard Error by Hedges's g
Figure 3. Funnel plot showing publication bias based on the fixed-effects model.
Note. Imputed results are filled dots.
the 95% confidence intervals overlapped with zero. The confidence intervals for the
reading outcome were also so wide that they were hardly informative, underscoring the
need for more research on reading. Vocabulary did show a significant effect, though
Vitta and Al-Hoorie 1281
lower confidence interval was barely above zero. In contrast, the effects were substan-
tial for writing, listening, grammar, speaking as well as assessments comprising multi-
ple outcomes.
Table 3. Results of moderator analyses.
Subgroup k g Lower CI Upper CI Q(df) p
L2 outcome: 43.70(7) < .001
Writing 13 1.50 1.00 1.99
Listening 4 1.42 0.62 2.21
Speaking 8 1.14 0.81 1.48
Multi-outcome 14 1.03 0.65 1.41
Grammar 5 1.01 0.38 1.63
Vocabulary 9 0.25 0.03 0.47
Standardized tests 4 0.33 –0.07 0.72
Reading 3 1.25 –0.09 2.59
Educational level:
Secondary 10 1.21 0.65 1.77 1.07(1) .302
University 48 0.90 0.72 1.08
Proficiency level:
Below intermediate 13 0.65 0.34 0.96
Intermediate 16 0.89 0.58 1.20 7.12(2) .028
Above intermediate 11 1.45 0.95 1.96
Peer review status:
Journal article 50 1.07 0.85 1.28 5.27(1) .022
Other 11 0.64 0.35 0.93
Report source:
Not Scopus or SSCI 19 1.18 0.78 1.57 10.76(3) .013
Scopus only 13 1.39 0.94 1.84
Scopus and SSCI 18 0.73 0.47 1.00
Thesis/conference 11 0.64 0.35 0.93
Technology type:
Video 47 0.97 0.76 1.18 < 0.001(1) .987
No Video 13 0.97 0.62 1.31
Interactive technology:
Yes 23 1.01 0.78 1.24 0.05(1) .831
No 38 0.97 0.71 1.23
Reliability of DV:
Reported 31 1.07 0.82 1.32 0.82(1) .364
Not reported 30 0.90 0.64 1.15
Pre-test:
Empirical Pre-test 39 1.07 0.85 1.29 1.49(1) .223
Pre-existing
evaluation
22 0.84 0.53 1.14
Note. k = unique samples.
1282 Language Teaching Research 27(5)
When it comes to research question 3, the results did not provide evidence that learner
age, specifically whether they are in secondary or university level, was related to how
effective the flipped intervention was. In contrast, the effectiveness of flipped learning
varied significantly in relation to proficiency level. As the post hoc results in Table 4
show, learners with the higher proficiency were the ones exhibiting the larger effect sizes.
Regarding the type of the report itself, the analysis showed that peer reviewed journal
articles reported significantly larger effects than other types of reports such as conference
proceedings and unpublished theses. Furthermore, comparison by report source sug-
gested that the largest effect sizes came from journals not indexed in the SSCI (Table 4).
Analysis of whether the intervention used videos or not, and whether the technology
was interactive, did not result in a significant difference. Similarly, whether the research-
ers reported the reliability of their dependent variables did not seem to have an effect on
the results. The same applied to whether the researchers administered their own pretest
or relied on a pre-existing judgement or evaluation reported by learners.
Finally, we examined the relationship between the length of the intervention and its
effectiveness. Meta-regression analysis showed that there was a small negative effect of
duration of the study (see Figure 4 and Table 5), suggesting that the novelty of the
approach might slightly wane with time. One report lasted for 60 weeks, which was the
longest duration in our pool. Excluding that report led only a minor decrease in the coef-
ficient from −0.02 to −0.03 (see Figure 5).
Table 4. Q-values in post hoc analyses showing whether differences in moderator levels are
significant.
Proficiency: 1 2 3
1. Below intermediate –
2. Intermediate 1.20 –
3. Above intermediate 7.11* 3.47†–
Report source: 1 2 3 4
1. Neither SSCI nor
Scopus
–
2. Scopus only 0.50 –
3. SSCI and Scopus 3.24†6.09* –
4. Thesis/conference 4.46* 7.52** 0.21 –
L2 outcome: 1 2 3 4 5 6 7
1. Writing –
2. Listening 0.03 –
3. Multi-outcome 2.14 0.73 –
4. Grammar 1.47 0.64 0.005 –
5. Speaking 0.53 0.05 0.14 0.46 –
6. Standardized tests 13.16*** 5.78* 6.33* 3.25†10.58** –
7. Vocabulary 20.50*** 7.65** 12.12*** 5.01* 18.79*** 0.10 –
8. Reading 0.11 0.04 0.09 0.10 0.0003 1.66 2.06
Notes. †p < .10, *p < .05, **p < .01, ***p < .001.
Vitta and Al-Hoorie 1283
VII Discussion
The purpose of the present meta-analysis was to extend existing research synthesis work
on the effectiveness of flipped learning in the context of L2 learning. We aggregated
effect sizes in reports located through a broad literature search process that included dif-
ferent report types. In this section, we discuss the following three notable findings, in
relation to both research and practice, emerging from this meta-analysis:
There is clear evidence that flipped learning is effective for L2 learning overall
(research question 1).
Flipped learning seems more effective under certain conditions and for certain L2
outcomes (research questions 2 and 3).
Publication bias and methodological issues seem to have impeded accurate esti-
mation of the effect of flipped learning (research question 3).
Regression of Hedges's g on Duration in weeks
Duration in weeks
-20.0 -10.0 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0
Hedges's g
6.00
5.00
4.00
3.00
2.00
1.00
0.00
-1.00
-2.00
-3.00
-4.00
Figure 4. Meta-regression of the relationship between effect size and duration of the
intervention in weeks.
Table 5. Results of meta-regression.
Coefficient SE Lower
95%
Upper
95%
Z p VIF
Intercept 1.25 0.16 0.94 1.55 8.97 < .001 2.95
Duration –0.02 0.01 –0.04 –0.0008 2.04 .041 1.00
1284 Language Teaching Research 27(5)
1 Overall effectiveness of flipped learning
The main finding in this meta-analysis is that flipped learning seems to be an effective
approach for L2 learning. The overall effect size was substantial (g = 0.99), though this
magnitude might be somewhat inflated as discussed in more detail below. Virtually all
moderator analyses on L2 outcome displayed a positive effect size point estimate, with
the lowest being 0.25 in the case of vocabulary learning. These findings echo what many
teachers have probably noticed: Students who prepare for the lesson before class tend to
find it easier to understand the lesson during class. Flipped learning provides a more
systematic approach so that all learners prepare for the lesson, and then consolidate what
they have learned during class. These results, therefore, indicate that the flipped approach
is strongly recommended for language teachers.
As reviewed above, one potential argument is that the flipped approach is simply
communicative language teaching by another name (see Webb & Doman, 2016). While
the present meta-analysis was not designed to specifically engage with this debate,
closer examination of the reports in our pool supports Webb and Doman’s (2016) posi-
tion that the flipped approach and the communicative approach are not interchangeable.
Comparison groups in some studies tended to engage in communicative language
teaching-governed learning activities. As a demonstration, in Hung (2017) communica-
tive features were observed in both the flipped and non-flipped groups. Chen Hsieh
Regression of Hedges's g on Duration in weeks
Duration in weeks
-10.0 -5.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0
Hedges's g
6.00
5.00
4.00
3.00
2.00
1.00
0.00
-1.00
-2.00
-3.00
Figure 5. Meta-regression of the relationship between effect size and duration of the
intervention in weeks after excluding one potentially outlier report.
Vitta and Al-Hoorie 1285
et al. (2017) also had students draft ‘the final dialog collaboratively’ (p. 4) under the
conventional learning condition. Hung (2015) and Ishikawa et al. (2015) likewise had
their non-flipped learning groups engage in classroom discussions about the content
presented in class. While not all researchers intentionally used communicative activities
for their comparison groups, the fact that communicative features were observed in both
the flipped and non-flipped groups makes it unlikely that the results of the present meta-
analysis are attributable simply to communicative activities. Thus, flipped learning
applications in our report pool do not appear to be communicative language teaching by
another name, and are possibly superior in that additional, structured out-of-class activi-
ties are involved. Again, as this issue was not within the scope of the present meta-
analysis, direct comparative analysis between these two approaches seems an interesting
future direction.
2 Effectiveness according to learning outcomes and learner characteristics
As explained above, a clear implication of the results of this and previous meta-analyses
is that flipped learning is effective. The moderator analyses addressing research ques-
tions 2 and 3, however, uncovered variation in its effectiveness in relation to learning
outcomes and learner characteristics. In sum, flipped learning appears to be most effec-
tive for intermediate to above-intermediate proficiency learners where the learning out-
come is skill-based and procedural (e.g., writing).
The results showed that the lowest effects were obtained for reading, vocabulary, and
standardized test performance. One explanation for this finding is the small number of
reports addressing these areas (see Table 3), underscoring the need for future research to
fill in this gap. We might additionally speculate that flipped learning may work best for
procedural and conceptual learning, where the teacher can lead the students through
higher-order thinking activities with the extra class time that flipping facilitates (Milman,
2012). With vocabulary, there may be less utility in this regard as vocabulary begins with
form to function mapping (Nation, 2013). There may be less room for the teacher to add
to this process when new lexical items are being learned. If this is true, then the effect of
flipped learning on vocabulary might be smaller (even if positive) than the effect on other
language outcomes. With writing and speaking, for example, a flipped classroom could
optimize students’ learning experience as the teacher has time to engage them in thinking
about processes involved and reinforce their learning utilizing the extra class time. At the
assessment level, furthermore, competence measurements such as speaking and writing
are often criterion-referenced where the content taught is assessed. Thus, students have a
better chance of demonstrating improvement. Standardized tests, on the other hand, may
not map as closely to content taught as does teacher-made course tests. It is an open
empirical question as to the extent to which flipped learning applications can be adapted
to maximize impact on vocabulary, reading, and performance on standardized tests.
Proficiency was the only learner characteristic to significantly moderate effect sizes
where the post-hoc comparison between below-intermediate (g = 0.65) and above-inter-
mediate (g = 1.45) was significant. Although the effect in both cases was large, the sig-
nificant difference might indicate that low-proficiency learners have less ability to sustain
student-centered engagement with material in the target language, which corresponds to
1286 Language Teaching Research 27(5)
the concern voiced by Milman (2012). Willis and Willis (2019) in a similar vein posited
that beginner students might have trouble engaging with student-centered task-based
language teaching (see also Vitta, Jost, & Pusina, 2019). Thus, there seems to be a theo-
retical basis for the positive association between proficiency and the effectiveness of
flipped learning. Should teachers seek to implement flipped learning with low-profi-
ciency learners, then extra care may need to be taken in preparing accessible and appeal-
ing content so that these learners can remain engaged with it outside of class time.
Alternatively, flipped learning applications to low-proficiency learners could require
greater first language (L1) and extra-linguistic support.
3 Accurate estimation of the effect of flipped learning
Our findings correspond to the large effects observed in other L2 meta-analyses investi-
gating trends in instructed second language learning. Consider that both Zhang and
Zhang (2020) and Bryfonski and McKay (2020) observed large effects in meta-analyses
of the association between vocabulary and reading (r = .57) and the effects of task-based
language teaching interventions (.93 ⩽ d ⩽ .95), respectively. The effect size obtained in
the present meta-analysis was likewise substantial, g = 0.99. Such magnitudes exceed
Cohen’s (1992) classic threshold for a large effect (d = 0.80) and approach Plonsky and
Oswald’s (2014) empirically derived large effect size for between-group differences in
L2 outcomes (d = 1.0). These magnitudes also substantially exceed the large benchmark
for individual difference research (d = 0.60; Gignac & Szodorai, 2016) and the typical
effects from teachers in longitudinal studies (d = 0.15–0.40; Hattie, 2009).
However, the Trim and Fill method suggested the presence of publication bias. Smaller
studies tended to report larger effects, suggesting the possibility of a file-drawer effect
(Rosenthal, 1979) in our field. With the Trim and Fill method, we obtained an effect size
that is almost half of the original one (g = 0.58), which is closer to what Cheng et al.
(2019) found for the arts and humanities (g = 0.63). Moderator analysis further supported
the possibility of publication bias, showing that peer-reviewed articles – particularly in
non-SSCI-indexed journals – have larger effects than conference proceedings and unpub-
lished theses. Still, as can be seen from the p-curve analysis, this pattern does not seem to
have resulted from questionable research practices. Instead, one plausible explanation is
that some researchers publishing in low-impact journals might lack experience and/or
resources to conduct well-controlled interventions (for a discussion; see Paiva et al.,
2017). Conducting an educational intervention is no easy task. Classroom research is
fraught with obstacles and challenges due to the complexity of classroom realities (Hiver
& Al-Hoorie, 2020b; Rounds, 1996). Some unexpected factors that influence how valid
the results are might go unreported, ‘not out of any willful malfeasance, but because we
have been so conditioned to preserve methodological purism, however unrealistic a goal
that might have been’ (Larsen-Freeman, 1996, p. 157).
4 Future directions
As we mentioned above, the status of scholarship on flipped learning indicates that
researchers should move from the question of whether flipped learning is effective to
when and how it is so. To address these questions, we suggest two main future directions
Vitta and Al-Hoorie 1287
for the field. First, research should target different underrepresented L2 learners. Just like
it is the case in various L2 subdisciplines (Dörnyei & Al-Hoorie, 2017), L2 flipped learn-
ing research has been English-biased in that learners of languages other than English
have seldom been investigated. As with Lundin et al. (2018), our report pool was domi-
nated by university-level learners by ratio approaching 5:1. Younger learners were espe-
cially underrepresented, making it unclear to what extent flipped learning is effective
with younger learners considering that this approach presupposes a level of commitment
and self-directedness without the teacher’s direct supervision. It is likely that the type of
content that can attract this type of learners will be very different, and possibly more
demanding to prepare. In addition to young learners, older learners and those not suffi-
ciently skilled in or familiar with technology, including those based outside the devel-
oped world (e.g. only one of 56 reports was situated within an African context; Hassan,
2018) might also require different applications of the flipped approach.
A second future direction we recommend for flipped learning research has to do with
intervention quality. Part of understanding when and how flipped learning is effective is
to understand what features maximize its effectiveness. Little comparative analysis has
been conducted to investigate the various online platforms available to L2 teachers and
how their features influence learning (e.g. for lower proficiency learners). Another aspect
of the quality of flipped learning interventions is the teacher’s skill in preparing and
handling online materials. We suspect that teachers who can create custom materials on
demand to suit emerging needs of their particular classes will most likely be more effec-
tive. Investigation of these aspects requires a more micro-analysis of intervention qual-
ity. A further aspect of study quality is rigor in design and statistical analysis (Al-Hoorie,
2018; Hiver & Al-Hoorie, 2020a, 2020b). While Al-Hoorie and Vitta’s (2019) systematic
review found that the statistical quality varies based on the impact of the journal, the
present meta-analysis additionally showed that the actual results also vary. Further
research is needed to understand why the findings of high- and low-impact journals can
be discrepant (see Paiva et al., 2017). Tips and strategies for effective flipped learning
implementation can be found in Mehring and Leis (2018).
Next, research on flipped learning should therefore move to what Zanna and Fazio
(1982) called second-generation and third-generation questions. According to Zanna and
Fazio’s (1982) classification, first-generation research simply asks ‘is’ questions (e.g. is
flipped learning effective?). Second-generation questions move beyond this yes–no ques-
tion to ‘when’ questions (e.g. under what conditions does flipped learning become more
effective?) (see also Al-Hoorie & Al Shlowiy, 2020). Third-generation research asks ‘how’
questions (e.g. how is flipped learning effective?). This last type of questions inquires after
the mechanism, or mediators, making flipped learning effective. While this type of ques-
tions is described as third-generation, thus implying a temporal lag, in reality second- and
third-generation questions are ‘linked inextricably’ (Zanna & Fazio, 1982, p. 284).
Understanding under what conditions a treatment is effective might shed light on why it is
effective, and vice versa. It is at this point that practitioners-as-researchers can contribute
to the future directions of flipped research in L2 contexts as localized studies will be essen-
tial in addressing the second- and third-generation questions. To provide a specific exam-
ple, our findings highlight the need for frontline teachers to pilot and report flipped
approaches that focus on vocabulary outcomes and with lower proficiency learners.
1288 Language Teaching Research 27(5)
Finally, as part of intervention quality, researchers should investigate innovative adap-
tations of flipped learning. The prototypical design is that students engage with the mate-
rial before class independently and then have agency to further engage with and explore
the content during subsequent class time. Little research has examined whether and to
what extent group work before class can make flipped learning more effective. Little
research has also examined what we might describe as ‘interval flipping’, the process of
alternating between the flipped and the traditional approaches in order to prevent flipped
learning from losing its novelty over time. Indeed, even long-term retention of learning
from the flipped approach has hardly been compared to that from the traditional approach.
VIII Conclusions
The present study meta-analysed the effects of L2 flipped learning interventions.
Extending past flipped meta-analyses on flipped learning, our literature search was able
to locate about double the number of L2 experimental reports analysed in past syntheses.
Future endeavors could add to our approach by considering gray literature, however. Our
results also clearly demonstrate the effectiveness of this approach over the traditional
face-to-face approach. Still, there was also wide heterogeneity in the results that could be
partially explained by certain moderators, including learner proficiency, study type, and
target L2 outcome. Future research should shift focus from whether flipped learning is
effective to when and how its effectiveness can be maximized.
Acknowledgements
We would like to thank Dr. Jeffrey G. Mehring for his comments on our literature search protocols.
We are also grateful to Alex Sutton and Daniël Lakens for comments on the analysis.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this
article.
ORCID iDs
Joseph P. Vitta https://orcid.org/0000-0002-5711-969X
Ali H. Al-Hoorie https://orcid.org/0000-0003-3810-5978
Supplemental material
Supplemental material for this article is available online.
Note
1. The exclusion of true control (no learning intervention) comparisons was in line with the
methodologies of recent L2 (e.g. Bryfonski & McKay, 2019) and flipped learning (e.g.
Strelan et al., 2020) meta-analyses.
2. The Trim and Fill method was calculated using the fixed-effects model. The random-effects
model (not reported here) showed the opposite pattern, indicating that bias might be resulting
from reports with larger samples—which does not seem likely (see also Shi & Lin, 2019). The
funnel plot based on the random-effects model may be obtained from the authors.
Vitta and Al-Hoorie 1289
References
Adnan, M. (2017). Perceptions of senior-year ELT students for flipped classroom: A materials
development course. Computer Assisted Language Learning, 30, 204–222.
Al-Hoorie, A.H. (2018). The L2 motivational self system: A meta-analysis. Studies in Second
Language Learning and Teaching, 8, 721–754.
Al-Hoorie, A.H., & Al Shlowiy, A.S. (2020). Vision theory vs. goal-setting theory: A critical
analysis. Porta Linguarum, 33, 217–229.
Al-Hoorie, A.H., & Vitta, J.P. (2019). The seven sins of L2 research: A review of 30 journals’
statistical quality and their CiteScore, SJR, SNIP, JCR Impact Factors. Language Teaching
Research, 23, 727–744.
AlJaser, A.M. (2017). Effectiveness of using flipped classroom strategy in academic achieve-
ment and self-efficacy among education students of Princess Nourah Bint Abdulrahman
University. English Language Teaching, 10, 67–77.
Alnuhayt, S.S. (2018). Investigating the use of the flipped classroom method in an EFL vocabulary
course. Journal of Language Teaching and Research, 9, 236–242.
Baş, G., & Kuzucu, O. (2009). Effects of CALL method and DynED language programme on stu-
dents’ achievement levels and attitudes towards the lesson in English classes. International
Journal of Instructional Technology and Distance Learning, 6, 31–44.
Bergmann, J., & Sams, A. (2012). Flip your classroom: Reach every student in every class every
day. Eugene, OR: International Society for Technology in Education.
Borenstein, M., Hedges, L.V., Higgins, J.P.T., & Rothstein, H.R. (2009). Introduction to meta-
analysis. Oxford: Wiley.
Borenstein, M., Hedges, L.V., Higgins, J.P., & Rothstein, H.R. (2014). Comprehensive meta anal-
ysis: Version 3.3. Englewood, NJ: Biostat.
Brown, A.V., Plonsky, L., & Teimouri, Y. (2018). The use of course grades as metrics in L2
research: A systematic review. Foreign Language Annals, 51, 763–778.
Bryfonski, L., & Mckay, T.H. (2019). TBLT implementation and evaluation: A meta-analysis.
Language Teaching Research, 23, 603–632.
Burnham, J.F. (2006). Scopus database: A review. Biomedical Digital Libraries, 3(1).
Chen Hsieh, J.S., Wu, W.-C.V., & Marek, M.W. (2017). Using the flipped classroom to enhance
EFL learning. Computer Assisted Language Learning, 30, 1–21.
Cheng, L., Ritzhaupt, A.D., & Antonenko, P. (2019). Effects of the flipped classroom instructional
strategy on students’ learning outcomes: A meta-analysis. Educational Technology Research
and Development, 67, 793–824.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.
Council of Europe. (2001). Common European framework of reference for languages. Strasbourg:
Council of Europe.
Council of Europe. (2011). Common European framework of reference for languages: Learning,
teaching, assessment. Strasbourg: Council of Europe.
Davis, N.L. (2016). Anatomy of a flipped classroom. Journal of Teaching in Travel & Tourism,
16, 228–232.
Dickersin, K. (2005). Publication bias: Recognizing the problem, understanding its origins and
scope, and preventing harm. In Rothstein, H.R., Sutton, A.J., & M. Borenstein (Eds.),
Publication bias in meta-analysis: Prevention, assessment and adjustments (pp. 11–33).
Chichester: Wiley.
Dörnyei, Z., & Al-Hoorie, A.H. (2017). The motivational foundation of learning languages other
than Global English. The Modern Language Journal, 101, 455–468.
1290 Language Teaching Research 27(5)
Duval, S., & Tweedie, R. (2000a). A nonparametric ‘trim and fill’ method of accounting for pub-
lication bias in meta-analysis. Journal of the American Statistical Association, 95, 89–98.
Duval, S., & Tweedie, R. (2000b). Trim and fill: A simple funnel-plot–based method of testing and
adjusting for publication bias in meta-analysis. Biometrics, 56, 455–463.
Ellis, R. (2009). Task-based language teaching: Sorting out the misunderstandings. International
Journal of Applied Linguistics, 19, 221–246.
Evseeva, A., & Solozhenko, A. (2015). Use of flipped classroom technology in language learning.
Procedia – Social and Behavioral Sciences, 206, 205–209.
Fanelli, D. (2010). Do pressures to publish increase scientists’ bias? An empirical support from US
States Data. PLoS One, 5(4), e10271.
Filiz, S., & Benzet, A. (2018). A content analysis of the studies on the use of flipped classrooms in
foreign language education. World Journal of Education, 8, 72–86.
Freire, P. (1968/1970). Pedagogy of the oppressed. New York: Herder and Herder.
Gignac, G.E., & Szodorai, E.T. (2016). Effect size guidelines for individual differences research-
ers. Personality and Individual Differences, 102, 74–78.
Green, A. (2012). Language functions revisited: Theoretical and empirical bases for language
construct definition across the ability range. Cambridge: Cambridge University Press.
Halliday, M.A.K., & Matthiessen, C. (2014). Halliday’s introduction to functional grammar. 4th
edition. New York: Routledge.
Hamdan, M., McKnight, P.E., McKnight, K., & Arfstrom, K.M. (2013). A review of flipped learn-
ing. Flipped Learning Network. Available at: https://www.flippedlearning.org/wp-content/
uploads/2016/07/LitReview_FlippedLearning.pdf (accessed December 2020).
Hassan, S.R.R. (2018). Using the flipped learning model to develop EFL argumentative writing
skills of STEM secondary school students. Majalat Kuliyat Altarbiah (Education College
Journal), 70, 24–74.
Hattie, J.A.C. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achieve-
ment. New York: Routledge.
Hew, K.F., & Lo, C.K. (2018). Flipped classroom improves student learning in health professions
education: A meta-analysis. BMC Medical Education, 18, 38.
Hiver, P., & Al-Hoorie, A.H. (2020a). Reexamining the role of vision in second language motiva-
tion: A preregistered conceptual replication of You, Dörnyei, & Csizér (2016). Language
Learning, 70, 48–102.
Hiver, P., & Al-Hoorie, A.H. (2020b). Research methods for complexity theory in applied linguis-
tics. Bristol: Multilingual Matters.
Hung, H.-T. (2015). Flipping the classroom for English language learners to foster active learning.
Computer Assisted Language Learning, 28, 81–96.
Hung, H.-T. (2017). Design-based research: Redesign of an English language course using a
flipped classroom approach. TESOL Quarterly, 51, 180–192.
Ishikawa, Y., Akahane-Yamada, R., Smith, C., et al. (2015). An EFL flipped learning course
design: Utilizing students’ mobile online devices. In Helm, F., Bradley, L., Guarda, M., & S.
Thouësny (Eds.), Critical CALL – Proceedings of the 2015 EUROCALL Conference, Padova,
Italy (pp. 261–267). Dublin: Research-publishing.net.
Karimi, M., & Hamzavi, R. (2017). The effect of flipped model of instruction on EFL learners’
reading comprehension: Learners’ attitudes in focus. Advances in Language and Literary
Studies, 8, 95–103.
Låg, T., & Sæle, R.G. (2019). Does the flipped classroom improve student learning and satisfac-
tion? A systematic review and meta-analysis. AERA Open, 5, 3.
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practi-
cal primer for t-tests and ANOVAs. Frontiers in Psychology, 4, 863.
Vitta and Al-Hoorie 1291
Larsen-Freeman, D. (1996). The changing nature of second language classroom research. In
Schachter, J., & S. Gass (Eds.), Second language classroom research: Issues and opportuni-
ties (pp. 157–170). Mahwah, NJ: Lawrence Erlbaum.
Lei, L., & Liu, D. (2019). Research trends in applied linguistics from 2005 to 2016: A bibliometric
analysis and its implications. Applied Linguistics, 40, 540–561.
Leis, A., Cooke, S., & Tohei, A. (2015). The effects of flipped classrooms on English composi-
tion writing in an EFL environment. International Journal of Computer-Assisted Language
Learning and Teaching (IJCALLT), 5, 37–51.
Lin, C.-J., & Hwang, G.-J. (2018). A learning analytics approach to investigating factors affecting
EFL students’ oral performance in a flipped classroom. Journal of Educational Technology
& Society, 21, 205–219.
Lin, C.-J., Hwang, G.-J., Fu, Q.-K., & Chen, J.-F. (2018). A flipped contextual game-based learn-
ing approach to enhancing EFL students’ English business writing performance and reflective
behaviors. Journal of Educational Technology & Society, 21, 117–131.
Lo, C.K., & Hew, K.F. (2019). The impact of flipped classrooms on student achievement in
engineering education: A meta-analysis of 10 years of research. Journal of Engineering
Education, 108, 523–546.
Lomicka, L., & Lord, G. (2016). Social networking in language learning. In Farr, F., & L. Murray
(Eds.), The Routledge handbook of language learning and technology (pp. 225–268). New
York: Routledge.
Lundin, M., Bergviken Rensfeldt, A., Hillman, T., Lantz-Andersson, A., & Peterson, L. (2018).
Higher education dominance and siloed knowledge: a systematic review of flipped classroom
research. International Journal of Educational Technology in Higher Education, 15, 20.
Mahmud, M.M. (2018). Technology and language: What works and what does not: A meta-
analysis of blended learning research. Journal of Asia TEFL, 15, 365–382.
Mehring, J. (2016). Present research on the flipped classroom and potential tools for the EFL class-
room. Computers in the Schools, 33, 1–10.
Mehring, J. (2018). The flipped classroom. In Mehring, J., & A. Leis (Eds.), Innovations in flip-
ping the language classroom: Theories and practices (pp. 1–10). New York: Springer Berlin
Heidelberg.
Mehring, J., & Leis, A. (Eds.). (2018). Innovations in flipping the language classroom: Theories
and practices. New York: Springer Berlin Heidelberg.
Milman, N.B. (2012). The flipped classroom strategy: What is it and how can it best be used?
Distance Learning, 9, 85–87.
Mori, Y., Omori, M., & Sato, K. (2016). The impact of flipped online Kanji instruction on written
vocabulary learning for introductory and intermediate Japanese language students. Foreign
Language Annals, 49, 729–749.
Nation, I.S.P. (2013). Learning vocabulary in another language. 2nd edition. Cambridge:
Cambridge University Press.
Norris, J.M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quan-
titative meta-analysis. Language Learning, 50, 417–528.
Oh, E. (2017). The effect of peer teaching via flipped vocabulary learning on class engagement and
learning achievements. Multimedia-Assisted Language Learning, 20, 105–127.
Orwin, R.G. (1983). A fail-safe N for effect size in meta-analysis. Journal of Educational Statistics,
8, 157–159.
Paiva, C.E., Araujo, R.L.C., Paiva, B.S.R., et al. (2017). What are the personal and professional
characteristics that distinguish the researchers who publish in high- and low-impact journals?
A multi-national web-based survey. ecancermedicalscience, 11, 718.
Pallotti, G. (2009). CAF: Defining, refining and differentiating constructs. Applied Linguistics,
30, 590–601.
1292 Language Teaching Research 27(5)
Plonsky, L., & Oswald, F.L. (2014). How big is ‘big’? Interpreting effect sizes in L2 research.
Language Learning, 64, 878–912.
Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological
Bulletin, 86, 638–641.
Rounds, P.L. (1996). The classroom-based researcher as fieldworker: Strangers in a strange land.
In Schachter, J., & S. Gass (Eds.), Second language classroom research: Issues and opportu-
nities (pp. 45–59). Mahwah, NJ: Lawrence Erlbaum.
Ryback, D., & Sanders, J.J. (1980). Humanistic versus traditional teaching styles and student sat-
isfaction. Journal of Humanistic Psychology, 20, 87–90.
Shi, L., & Lin, L. (2019). The trim-and-fill method for publication bias. Medicine, 98(23), e15987.
Shi, Y., Ma, Y., MacLeod, J., & Yang, H.H. (2020). College students’ cognitive learning out-
comes in flipped classroom instruction: A meta-analysis of the empirical literature. Journal
of Computers in Education, 7, 79–103.
Simonsohn, U., Nelson, L.D., & Simmons, J.P. (2014). P-curve: A key to the file-drawer. Journal
of Experimental Psychology: General, 143, 534–547.
Strelan, P., Osborn, A., & Palmer, E. (2020). The flipped classroom: A meta-analysis of effects on
student performance across disciplines and education levels. Educational Research Review,
30, 100314.
Teng, M.F. (2017). Flipping the classroom and tertiary level EFL students’ academic performance
and satisfaction. Journal of Asia TEFL, 14, 605–620.
Turan, Z., & Akdag-Cimen, B. (2019). Flipped classroom in English language teaching: A system-
atic review. Computer Assisted Language Learning, 33, 590–606.
van Alten, D.C.D., Phielix, C., Janssen, J., & Kester, L. (2019). Effects of flipping the classroom
on learning outcomes and satisfaction: A meta-analysis. Educational Research Review, 28,
100281.
Vitta, J.P., & Al-Hoorie, A.H. (2017). Scopus- and SSCI-indexed L2 journals: A list for the Asia
TEFL community. The Journal of Asia TEFL, 14, 784–792.
Vitta, J.P., Jost, D., & Pusina, A. (2019). A case study inquiry into the efficacy of four East Asian
EAP writing programmes: Presenting the emergent themes. RELC Journal, 50, 71–85.
Voss, E., & Kostka, I. (2019). Flipping academic English language learning: Experiences from an
American university. Singapore: Springer Nature Singapore.
Webb, M., & Doman, E. (2016). Does the flipped classroom lead to increased gains on learning
outcomes in ESL/EFL contexts? CATESOL Journal, 28, 39–67.
Willis, D., & Willis, J. (2019). Doing task-based teaching. Oxford: Oxford University Press.
Xu, P., Chen, Y., Nie, W., et al. (2019). The effectiveness of a flipped classroom on the develop-
ment of Chinese nursing students’ skill competence: A systematic review and meta-analysis.
Nurse Education Today, 80, 67–77.
Zanna, M.P., & Fazio, R.H. (1982). The attitude-behavior relation: Moving toward a third gen-
eration of research. In Zanna, M.P., Higgins, E.T., & C.P. Herman (Eds.), Consistency in
social behavior: The Ontario symposium: Volume 2 (pp. 283–301). Hillsdale, NJ: Lawrence
Erlbaum.
Zarrinabadi, N., & Ebrahimi, A. (2019). Increasing peer collaborative dialogue using a flipped
classroom strategy. Innovation in Language Learning and Teaching, 13, 267–276.
Zhang, X. (2020). A bibliometric analysis of second language acquisition between 1997 and 2018.
Studies in Second Language Acquisition, 42, 199–222.
Zhang, S., & Zhang, X. (2020). The relationship between vocabulary knowledge and L2 read-
ing/listening comprehension: A meta-analysis. Language Teaching Research. Epub ahead of
print 31 March 2020. DOI: 10.1177/1362168820913998.
Content uploaded by Ali H. Al-Hoorie
Author content
All content in this area was uploaded by Ali H. Al-Hoorie on Aug 07, 2023
Content may be subject to copyright.