Do Productive Skills Improve in Content and Language Integrated Learning Contexts? The Case of Writing

  • Universitat Internacional de Catalunya. Barcelona

Abstract and Figures

This study investigates the differential effects of two learning contexts, formal instruction (FI) and content and language integrated learning (CLIL), on the written production skills of intermediate-level Catalan Spanish adolescent learners of English as a foreign language. Written samples elicited through a composition at two data collection times over one academic year were quantitatively and qualitatively assessed for complexity, accuracy, and fluency and for task fulfilment, organization, grammar, and vocabulary, respectively. Based on the findings, the superiority of CLIL cannot be confirmed: although improvement in the case of the FI + CLIL group is shown, results were only significant in the domain of accuracy.
Second language acquisition (SLA) research has often related differences in
linguistic outcomes achieved by language learners to differences in learning
contexts (Freed 1998;Collentine and Freed 2004), and has examined what
such contexts offer in terms of language exposure and opportunities for practi-
cing the target language (DeKeyser 2007).
In recent decades, a new learning context called Content and Language
Integrated Learning (CLIL) has spread throughout European education. The
term has gained popularity as the way to refer to an educational approach
adopted in school programmes at primary and secondary levels, whereby a
second or foreign language is used as the medium of instruction for specific
curricular subjects (see Dalton-Puffer 2007).
The CLIL approach has been
repeatedly recommended by European institutions for its multidimensional
nature, as it aims at second/foreign language education, the promotion of
plurilingualism, pluriculturalism, and learners’ international stance (see Ruiz
de Zarobe and Lasagabaster 2010;Cenoz et al. 2014;Pe
´rez-Vidal 2014).
Typically, CLIL has been adopted as part of a school strategy, being used at
specific points in the curriculum; the favourite CLIL subjects being Science,
Physical Education, Informatics, and Arts and Crafts.
Hand in hand with the growing popularity of CLIL programmes, an initial
body of research analysing progress made by learners in such European CLIL
contexts has been taking shape (to name but a handful of studies carried out in
Spain, see Mun
˜oz and Nave
`s 2007;Lorenzo et al. 2009;Lasagabaster and Ruiz
de Zarobe 2010;Pe
´rez-Vidal 2011). To a certain extent, research in Europe has
tried to find out whether findings so far concerning the beneficial impact of
CLIL contexts are comparable with those reported in the literature on the
results of Canadian immersion programmes of different sorts. In these
Canadian programmes, students’ proficiency is described as comparable with
that of second language speakers who are relatively fluent and effective
communicators, but non-target like in terms of grammatical structure,
and non-idiomatic in their lexical choices and pragmatic expressions—in
comparison with native speakers of the same age (Cummins and Swain
1986;Genesee 1987;Swain and Lapkin 1995;Lyster 2007). One methodo-
logical concern which most existing studies have raised regarding the effects of
CLIL contexts on learners’ linguistic progress is that, more often than not,
either the CLIL groups are pilot groups in schools, as CLIL programmes are
young, or students in them are selected, be it by the school itself or through a
process of natural selection, so that it is the best or most motivated students
who choose to join the programme (Moore 2007). In this respect, the current
article aims at filling a research gap in the literature and adding further evi-
dence to the already existing one, with a study in which the CLIL programme
is neither a pilot programme nor one in which learners are selected. The pro-
gramme had been thoroughly designed and planned by the school previously
to its implementation, and it involves one subject, Science, taught through the
medium of English as the only option for all learners alike, beginning at pri-
mary education level (age 10). However, when implementation began, it was
introduced at stages, so, over one year, CLIL groups and non-CLIL groups
coexisted, which made a contrastive study possible. Hence, the current study
examines the effects of such a programme, in contrast with the effects of a
conventional formal instruction (FI) programme. Further, the study measures
development in the learners’ written abilities, one of the least examined to
date in the CLIL literature, across both quantitative and qualitative
From an educational perspective, CLIL programmes have been described as
‘the second time around’ or an improvement on communicative language
teaching (CLT) after the success of the communicative approach (see Pe
Vidal 2009). In that respect, robust CLIL programmes may be claimed to
stretch learners’ use of languages, making them act as language users rather
than language novices in the classroom (Dalton-Puffer 2007) and, accordingly,
lead them to use the language in a more naturalistic manner than in FI. This is
central to the comparison established in this article, the fact that CLIL stands in
contrast to the experience which learners go through in FI classrooms. In
them, and even if CLT is adopted, communication is not clearly always
taking place through the medium of English, or, in the worse possible case,
it often deals with form rather than content, with few opportunities for lan-
guage practice in the target language to take place (Ellis 2001;Doughty 2006).
This is certainly the case of the setting in which our data were collected be-
cause in this specific educational system (primary and secondary education in
Spain), English as a foreign language (EFL) courses are taught following a
centralized curriculum geared towards preparing EFL learners to sit an exter-
nal examination, with an emphasis on academic, and receptive skills. Such an
emphasis has a backwash effect on teaching practices, which easily distances
them from CLT.
Taking a SLA perspective, CLIL programmes use the target language as the
vehicle for communication, and deal with curricular content; hence, they are
meaning-oriented, and naturalistic, a central feature in CLT, instead of focusing
on language as a system as occurs in conventional FI in EFL classrooms (Mun
and Nave
´s2007). According to SLA research, those naturalistic settings for lan-
guage learning generate the optimal conditions for linguistic progress to take
place. Indeed, from the perspective of language acquisition models, such as
Krashen’s Input Hypothesis (1985), or cognitive interactionist models (Long
1996;Gass 1997;Sanz 2014), meaning-oriented classrooms offer high-quality
input and opportunities for output practice and negotiation of meaning, a
requisite for linguistic development to occur. From a cognitive perspective,
such contexts offer additional opportunities for input and output practice, allow-
ing learners to transform declarative knowledge into procedural knowledge,
and, in turn, into automatized knowledge (DeKeyser 2007).
Turning to the scope and findings of CLIL research in Europe, it is important
to highlight here that what emerges from the existing studies so far points at
much of the linguistic impact of CLIL approaches being positive rather than
negative, both linguistically and content wise.
If we refer to skill develop-
ment, the perspective of the current article, the area where a difference be-
tween CLIL and mainstream learners is most noticeable is spontaneous oral
production (Admiraal et al. 2006; Zydatiss 2007; Lasagabaster 2008;Ruiz de
Zarobe 2008); the specific general language areas of improvement being mor-
phosyntax, phonetics, and lexicon. Additionally, CLIL students also seem to be
more fluent and risk-taking, self-rating their abilities significantly higher than
non-CLIL students (Sylve
´n 2006;Dalton-Puffer et al. 2009).
Written abilities, the language competence investigated in this study, have
only recently began to be measured (Jexenflicker and Dalton-Puffer 2010;
Ruiz de Zarobe 2010;Whittaker et al. 2011). In the Basque country, Ruiz de
Zarobe (2010) analysed general written competence by two groups of bilingual
students who followed two different CLIL programmes,
and another group
enrolled in a traditional EFL programme. The results show that both CLIL
groups scored better in written production in relation to the five categories
analysed through a holistic approach: content, organization, vocabulary, lan-
guage usage, and mechanics (with significant differences in content and vo-
cabulary). This suggests that there is a positive relationship between the
amount of exposure and opportunities for communicative interaction through
English and written foreign language proficiency. Since this advantage in-
creases with grade, it confirms the effectiveness of the CLIL approach as far
as written production outcomes.
When comparing the writing of CLIL and non-CLIL students in higher col-
leges of technology in Austria, Jexenflicker and Dalton-Puffer (2010) found
that, in the area of lexico-grammar, the CLIL students showed significant ad-
vantages throughout, as they did in vocabulary range and orthographic cor-
rectness. At the level of discourse abilities and textual organization, however,
differences were difficult to discern. The areas analysed using a rating scale
were task fulfilment, organization and structure, grammar, and vocabulary. In
short, current knowledge on the impact of CLIL on progress in writing might
be described as follows:
CLIL students have at their disposal a wider range not only of lexical
but also of morphosyntactic resources, which they deployed in
more elaborate and more complex structures. What was not to be
assumed outright given the focus on meaning (and not form) in
CLIL classrooms is the fact that CLIL students also show a higher
degree of accuracy, not only in inflectional affixation and tense use
but also in spelling. The greater pragmatic awareness of CLIL stu-
dents was shown in their better fulfilment of the communicative
intentions of writing tasks. There are, however, dimensions of writ-
ing on which CLIL experience seemed to have little or no effect.
(Dalton-Puffer 2011: 187)
Another interesting approach in research on written production has thrown
significant insights to this matter by comparing CLIL students’ L2 writing abil-
ity with their subject writing in the L1, which, perhaps surprisingly, has not
been found to necessarily surpass CLIL-L2 abilities (Vollmer et al. 2006;
Lorenzo et al. 2009;Ja
¨rvinen 2010;Llinares and Whittaker 2010).
Interesting practical as well as theoretical implications arise from this:
indeed, they suggest deficiencies in writing, both in CLIL and non-CLIL class-
rooms (Llinares and Whittaker 2006;Vollmer et al. 2006). More specifically,
Vollmer et al. (2006) suggest the need for developing learners’ general
writing competence. This is further stressed by Dalton-Puffer in the following
Might we be justified in postulating some kind of general level of
writing development that has an impact on how learners deal with
a writing task independently of whether it is in their L1 or in L2?
This is an issue that needs to be further developed. (Dalton-Puffer
2011: 187)
In a similar vein, Whittaker et al. (2011) and Llinares et al. (2012) take a
further step in trying to come to terms with written competence from a de-
velopmental as well as a pedagogical perspective. They conducted a longitu-
dinal study on written discourse development with a sample of CLIL secondary
schools with EFL in history classes. It showed development in the control of
textual resources, as well as some increase in nominal group complexity, over
the four years of the study. The authors suggest that CLIL settings, which focus
primarily on the learning of content, provide suitable contexts in which to
develop written discourse.
Against this background on the differential effects of CLIL and non-CLIL
contexts on learners’ written development, the study presented here seeks
to fill a research gap. It sets out to contribute further evidence to the already
existing, regarding the acquisition of written English in two differentiated edu-
cational programmes: a conventional FI programme geared towards learners’
development in EFL, and a non-selective, well-established programme includ-
ing, in addition to FI, a CLIL component with a European perspective.
Furthermore, it examines CLIL gains in writing across both quantitative and
qualitative dimensions, taking into account that, to the best of our knowledge,
no study has done so to date.
With the above overarching objective in mind, the study addresses the fol-
lowing research question: How do different learning contexts affect the written
linguistic development of young bilingual secondary education EFL learners
when contrasting two groups with a highly similar accumulated hours of ex-
posure to EFL at the onset of the study? The learning contexts they experience
are different because one group is experiencing FI only, and the other group is
experiencing FI in combination with CLIL; the CLIL hours being the extra
hours which separate them from the FI-only group.
In order to address the above question, the study presented in this section was
Hence, the study focuses on the development of written abilities
over one year, as reflected through the learners’ composing skills measured
with a pre-test–post-test design. It aims at examining written development by
analysing, on one hand, the participants’ progress in the domains of syntactic
and lexical complexity, accuracy, and fluency, and, on the other, their
achievement in the domains of task fulfilment, organization, grammar, and
vocabulary. Quantitative and qualitative analyses were respectively applied.
It must be emphasized, as already mentioned above, that the total number of
hours of exposure to English at pre-test is highly similar albeit not equal, for
both participant groups. Then, during the experimental period (between pre-
test and post-test), both groups had an equal number of FI hours; however, the
experimental group had a 50 per cent additional number of hours of exposure
to English through the CLIL programme, with the subject Science being taught
in this language. As Tables 1 and 2display, this design has allowed us to
analyse time issues in the following manner: we contrast the effects of a FI
and a FI + CLIL programme, which are different in that one includes FI only
but the other includes FI and an additional CLIL component, while keeping the
number of hours of instruction highly similar at pre-test (Mun
˜oz 2012). Thus,
the linguistic progress made by the two groups A and B is respectively
measured, allowing us to gauge the impact of the additional (and also quali-
tatively different) hours of exposure to English experienced by Group A (GA)
in the CLIL programme. Such a design represents a methodological alternative
to comparing FI with CLIL intact groups: they are more often than not differ-
ent from one another from the start. Indeed, as suggested above, CLIL students
have been censed to attract the best students (Moore 2007). In this respect,
regarding the learners’ level of English at the onset of the study, the CLIL stu-
dents have been often described the best students, as all students enrol on the
CLIL programme, and reportedly find themselves at an intermediate level of
English with an accumulated 1,300 h of instruction at pre-test (´az 2013).
What follows is a more accurate description of the CLIL programme on focus.
Context: the CLIL programme
The study is conducted within the Catalan/Spanish bilingual educational con-
text in Catalonia. The setting in which the subjects are immersed has been
defined as additive trilingualism (Cenoz and Valencia 1994). In Catalonia,
Catalan, the language of instruction, together with Spanish, are the majority
languages, and English is taught as the main foreign language in mainstream
education, with the recent introduction of CLIL programmes in some schools.
In this environment, the foreign target language, English, is hardly ever used
and heard outside the school setting.
Table 1: Participants (N = 50)
AoI in English Data collection T1 Data collection T2
GA: FI + CLIL Grade 7 (12/13 years) Grade 8 (13/14 years)
FI: Nursery (5/6 years)
CLIL: Grade 5 (10/11 years)
FI: Nursery (5/6 years) Grade 8 (13/14 years) Grade 9 (14/15 years)
Table 2: Design
T1 T2
GA: FI + CLIL Grade 7 (12/13 years) Grade 8 (13/14 years)
FI: 1,120 h + CLIL: 210 h FI: 1,260 h + CLIL: 280 h
= 1,330 h= 1,540 h
GB: FI Grade 8 (13/14 years) Grade 9 (14/15 years)
FI: 1,260 h FI: 1,400 h
Learners are also studying a second foreign language in school at later ages
(see Vila-Moreno 2008 for a description of educational multilingualism in
Catalonia). It must be borne in mind that learning foreign languages through
the CLIL approach is not only a European recommendation which member
states have diligently taken on board, as mentioned above, but also an attract-
ive opportunity for schools and teachers. This is especially so in bilingual con-
texts in which two co-official languages coexist, such as the one presented here
(the Catalan one), as CLIL was initially the approach adopted to revitalize
Catalan after the Spanish dictatorship (all subjects except for Spanish language
have since been taught through Catalan, Pe
´rez-Vidal 2002). However, CLIL is
also an attractive opportunity for schools and teachers who do not belong to
bilingual settings, as many monolingual communities such as Andalusia have
shown (see Lorenzo et al. 2009). All in all, Catalonia is a fertile ground for
conducting research on the effects of CLIL approaches to education, that is,
CLIL contexts of learning, in the case of foreign languages. As a case in point,
the school that generously offered to contribute to our research goal is a state-
run (ordinary government-supported) school, catering for education from
infant school to post-compulsory second. In the year 2001, the school con-
sidered the possibility of introducing a third language, English, as the medium
of instruction in the school curriculum, in parallel to the conventional EFL
programme. The school carefully prepared the programme over one year, in
collaboration with academic scholars specialists in CLIL (Pe
´rez-Vidal and
Escobar 2004). They designed the programme, they trained teachers, who
produced specific plans and materials, while all stakeholders were being ad-
equately informed and involved in the project (see Pe
´rez-Vidal and Escobar
2004 for an account of the planning process).
As regards the design and the
implementation of the CLIL programme, it was decided that it would start a
year later, in 2002, and in a staged manner, as already mentioned, with lear-
ners who were 8 and 10 years old, that is, at Grades 3 and 5 at the start, and
those in Grades 4 and 6 in the second year of implementation. Hence, the
school’s new Science CLIL programme progressively involved all learners as of
their third grade, year by year.
Participants were two groups of Catalan/Spanish bilingual EFL learners for
which English was their L3. GA (N= 50) was the experimental group experi-
encing the FI plus CLIL, so they are the FI + CLIL group in the design of the
study (from now on GA: FI + CLIL group). Group B (GB) is the control group
(N= 50) experiencing only FI, so they are the FI group (from now on GB: FI
group). There are 50 per cent of males and females in each group. Ages range
from 12 to 15.
Having been together in the same school since nursery (at the age of three in
Spain), both groups had started learning English at the age of five/six, hence
shared the age of onset of instruction (AoI). By the time both groups of
learners were tested at T1, they had an accumulated 1,300 hours of instruc-
tion. Data collection started at the end of their first year of secondary education
(Grade 7) at the age of 13. They had both then had eight years of FI. However,
GA: FI + CLIL had received three years of the extra CLIL hours, since the age of
10 (Grade 5). In order to make comparisons possible, GA: FI + CLIL was not
matched for age with GB: FI, which would have created a disadvantage in
terms of time of exposure to English at T1, but for as similar as possible total
number of hours of exposure. Consequently, this entailed that the latter group
included learners who were a year older than the former, as Table 1 displays.
Design and rationale of the study
The study has a longitudinal pre-test–post-test design as Table 2 shows. Both
groups of learners were measured respectively before (pre-test) and after (post-
test) one academic year in order to tap into gains obtained over the course of
that year. Then, as their respective accumulated hours of exposure to English
were very similar at pre-test, that is, at the first data collection time (T1),
although for GA: FI+CLIL some of the hours had been CLIL hours, the dif-
ference in gains obtained by each group over that year was calculated.
The quantity of hours of exposure received after the experimental
period being 50 per cent higher for GA: FI + CLIL, and the quality of
those hours, the CLIL hours, being different, any significant difference in
the results obtained by each group after a year’s treatment would reveal what
kind of effect the hours of CLIL treatment had on the GA: FI + CLIL learners’
linguistic knowledge.
GA: FI + CLIL learners were measured in secondary school when they were
13 (T1) and 14 (T2) years old, at the end of Grades 7 and 8, respectively. At T1,
they had altogether eight years of FI and three years of CLIL. At T2, they had
had nine years of FI and four years of CLIL. GB: FI learners were measured in
Grades 8 and 9 when they were 14 (T1) and 15 (T2) years old, respectively,
also at the end of each academic year. They had had altogether 9 years of FI at
T1, and 10 years of FI at T2. As already mentioned and needs emphasizing,
they were measured at different ages in order to have a similar number of
accumulated hours of total exposure to English although in different learning
contexts (GA: FI + CLIL, whereas GB: FI) (see Table 2).
Table 2 displays the accumulated number of hours of English at T1 and T2
for each group. In the case of GA: FI + CLIL, at T1 data collection, in addition to
1,120 hours of FI (approximately 140 per year since AoI), they had had a total
of 210 of CLIL hours (70 per year). Their total exposure to English was 1,330 h.
One year later, at T2, GA had had 1,260 h of FI and 280 h of CLIL, that is,
1,540 h in total. GB: FI, at T1 data collection, had had a total of 1,260 h of FI
(approximately 140/year since AoI), and at T2, a total of 1,400 h.
In order to assess the differential degree of gain between both groups, GA: FI
+ CLIL gains between T1 and T2 are compared with gains by GB: FI, the control
group, throughout the same period. The design allows for a between-groups
comparison of the effect of 50 per cent more hours of exposure and CLIL
instruction over one year: 210 h (140 FI + 70 CLIL) in GA versus 140 h (FI)
in GB.
Instruments and data collection procedures
The written test administered in the current study was part of a larger battery
of tests tapping on both productive and receptive skills of a larger study, as
mentioned above.
In order to gauge the participants’ written production, they
were administered a writing activity in class groups, in an exam-like situation.
They had to write a dialogue on the basis of a picture. It showed two policemen
talking to a mother and a boy at the entrance of their home. Learners were
shown the picture and then they were asked to answer the following two
Why did this happen?
How do you think the situation will end?
They were allowed 20 min to complete the task. The choice of a composition
was based on the number of subskills that come to play when learners write a
piece of text, and it is also a task that is practised in the classroom (Foster and
Skehan 1996). The picture and questions were chosen because it was thought
that the young boy in the picture would allow for a process of identification on
the part of our participants, something which should be inspiring for them
when writing (Foster and Skehan 1996;Tavakoli and Foster 2008). Finally,
since they were asked to write a dialogue and the answer to the two questions
(two very short narratives), different genres had to be used.
Analysis and measures
The participants’ compositions obtained from the writing test were transcribed
using the CLAN programme. As Table 3 shows, they were then analysed quan-
titatively for lexical (Guiraud’s index) and syntactic complexity (coordination
index), fluency (total words in 20 min), and accuracy features (errors per
word), following Wolfe-Quintero et al. (1998). The data were also analysed
qualitatively following a rating scale (Friedl and Auer 2007), whereby task
fulfilment, organization, grammar, and vocabulary features were measured.
Table 3: Measures used to analyse written development
complexity CI
complexity GI
Accuracy E/W Fluency WM
Qualitative measures Task fulfilment Organization Grammar Vocabulary
CI = coordination index; GI = Guiraud’s index; E/W = errors per word; WM = words per minute.
Results were introduced to a Stats Graphic matrix, and the formulae for each
ratio were calculated. After that, mean results for all measures per group were
drawn and compared with an analysis of variance (ANOVA) statistical analysis.
In order to address the research question, comparisons between the progress
made in written abilities over one academic year (T2-T1) by the two groups of
participants, GA and GB, are established. That is, the effect of the number of
hours and treatment in GA, 210 h (140 FI + 70 CLIL), versus that of B (140 FI)
are scrutinized. First, the analyses of the data with quantitative measures are
presented, and then writing results measured qualitatively are displayed in the
following sections.
Writing: quantitative measures
The participants’ results in the domains of syntactic and lexical complexity,
accuracy, and fluency are presented in this section.
Syntactic complexity
An ANOVA statistical analysis with the significance level set at <0.05 was
performed. The results appear in Figure 1. It must be noted that when using
the Coordination Index, a lower ratio, reflecting higher amounts of subordin-
ation, is expected in the post-test than in the pre-test. GA (FI + CLIL) obtained
a coordination index of 0.40 at T1, versus a coordination index of 0.39 at T2.
This results in a 0.01 improvement. In contrast, GB (FI) obtained an average of
0.47 at T1 versus an average of 0.49 at T2. This results in a loss of 0.02.
Therefore, there was a minor increase in the level of ability in GA (FI +
CLIL) versus a decrease in GB’s (FI) level of ability once the marks were
averaged. The comparison between GA’s progress (T2-T1) and GB’s progress
(T2-T1) shows that greater progress is made by GA than by GB. However, this
difference turned out not to be statistically significant [F(1,196) = 0.25,
p= .6201].
The previous results are further represented in Figure 2. It shows how GA
(FI + CLIL) makes a small progress since there is a 0.01 decrease in the coord-
ination index and therefore an increase in the use of subordination. In con-
trast, such a progress cannot be seen in GB (FI), as the decrease in the
coordination index reflects the fact that it produces more coordinate clauses
at T2 than at T1. Hence, in a FI (GB) context our subjects make use of higher
levels of coordination than in a FI + CLIL (GA) context. However, these results
are not significant. These gains actually occur at the expense of subordination.
One point deserves further attention here; Figures 1 and 2, in addition,
clearly show how GA (FI + CLIL) subjects start with a lower coordination
index, hence with higher levels of subordination. This fact must be emphasized
as it places GA (FI + CLIL) at a different onset level than GB (FI), something
worth taking into account for the discussion of results.
Lexical complexity
Guiraud’s index was used to measure changes in the vocabulary used by our
two groups of subjects after a FI + CLIL (GA) treatment and a purely FI (GB)
In this case, the ANOVA revealed that the differential effect between GA’s
(FI + CLIL) progress (T2-T1) and GB’s (FI) progress (T2-T1) as regards lexical
complexity in the compositions analysed showed greater progress made by GB
(FI). However, the difference was not statistically significant
Figure 1: Average performance in the syntactic complexity measure (coordin-
ation index) at T1 and T2
Figure 2: Progress in one year in GA (FI + CLIL) and GB (FI) syntactic
complexity (coordination index) measure
[F(1,196) = 0.69, p= .406]. Guiraud’s index was obtained by dividing the total
number of lexical types by the square root of the total number of lexical
tokens. Therefore, a higher value in the index indicated that the participants’
vocabulary was richer. Figure 3 shows the average performance in this area of
lexical richness and their graphic representation. On one hand, GA (FI + CLIL)
obtained a Guiraud’s index of 6.49 at T1 and 6.7 at T2. This is an improvement
of 0.21. On the other hand, GB’s (FI) Guiraud’s index was 6.3 at T1 versus 6.7
at T2. This is a 0.41 gain. Hence, the rate of progress was higher in GB (FI).
Such a difference in progress between the groups was not statistically
As in the previous area, GA (FI + CLIL) starts the treatment with a higher
onset level. This is not paired with higher gains throughout. In Figure 4, the
three phenomena are clearly shown: the one year progress in lexical richness
both in GA (FI + CLIL) and GB (FI), the faster rate of progress made by GB (FI),
and the higher starting level of GA (FI + CLIL).
Figure 3: Average performance in the lexical complexity measure (Guiraud’s
index) at T1 and T2
Figure 4: Progress in one year in GA (FI + CLIL) and GB (FI) lexical com-
plexity (Guiraud’s index) measure
The third domain tapped in our quantitative analysis on writing is accuracy.
Accuracy was measured by means of errors per word. In this domain, GA’s (FI
+ CLIL) progress (T2-T1) was significantly higher than gains made by GB
(FI) (T2-T1) [F(1,196) = 4.41, p= .037].
As shown in Figure 5, GA (FI + CLIL) obtained an average performance of
0.12 errors per word at T1 versus a result of 0.078 errors per word at T2, hence
a decrease in errors per word. Therefore, GA improved by 0.042. In contrast,
progress was not so high in GB (FI), as they obtained 0.092 errors per word at
T1 versus 0.086 errors per word at T2. Hence, it only reached a 0.006
In Figure 6, GA’s (FI + CLIL) higher advantage at T2 is evident if we compare
it with the progress made by GB (FI) in the same year. A higher decrease in the
Figure 5: Average performance in the accuracy measure (errors per word) at
T1 and T2
Figure 6: Progress in one year in GA (FI + CLIL) and GB (FI) accuracy
(errors per word) measure
number of errors per word in GA (FI + CLIL) is clearly shown. This, combined
with the fact that the group had started with a higher number of mistakes at
T1, makes their improvement in accuracy outstanding.
The last domain scrutinized quantitatively in this study as far as writing is
concerned is fluency. The results of the ANOVA statistical analysis run with
the data obtained from the quantitative measure total number of words, when
comparing GA’s (FI + CLIL) progress (T2-T1) and GB’s (FI) progress (T2-T1),
showed that both groups decreased in fluency but GB more than GA.
However, this difference turned out not to be statistically significant
[F(1,196) = 0.08, p= .7801].
Figure 7 shows the average performance of each group in the domain of
fluency. At T1, GA (FI + CLIL) produced an average of 146.2 words in the
compositions analysed but, surprisingly, after one year’s treatment, the total
number of words produced decreased to 145.1. It decreased by 1.1 points.
Similarly, GB’s (FI) total number of words decreased from 149.1 at T1 to
144.7 at T2. This is a 4.4 point decrease.
Surprisingly, both groups are less fluent after one year’s treatment as mea-
sured by total number of words. It is also remarkable that GB (FI) started with
a higher degree of fluency than GA (FI + CLIL), just as had happened with the
number of errors, where it had an initial advantage over GA (FI + CLIL).
Finally, the advantage of GB (FI) over GA (FI + CLIL) on lexical complexity
is also something deserving attention.
Figure 7: Average performance in the fluency measure (total number of
words) at T1 and T2
So far the quantitative results allow us to identify the following trends. First,
GA (FI + CLIL) only outperforms GB (FI) significantly in gains in the domain of
accuracy. However, GA (FI + CLIL) shows a tendency towards surpassing GB
(FI) as far as syntactic complexity goes. Another relevant result is the fact that
in the domain of lexical and syntactic complexity GA (FI + CLIL) has a higher
onset level than GB (FI), whereas in the domain of accuracy and fluency, in
contrast, a lower onset level.
It is interesting at this point to gain an overall appraisal of each context in
turn and how each of the groups progresses. On one hand, after the CLIL
treatment, GA (FI + CLIL) writes shorter texts, which nonetheless are more
accurate, lexically richer, and syntactically more complex. On the other hand,
after the FI treatment, GB (FI) also writes shorter texts which are more accur-
ate and lexically richer, however, less syntactically complex. Hence, they both
make some progress except in fluency, but GA’s (FI + CLIL) progress in accur-
acy significantly outrates that of GB (FI). If we now contrast their gains, GA
(FI + CLIL) gains more than GB (FI) in syntactic complexity and significantly
more in accuracy, having started at a lower level than GB (FI). GB (FI) in turn
gains more than GA (FI + CLIL) in lexical complexity, in turn having started at
a lower level.
Writing: qualitative measures
The results for the qualitative measures used in the compositions written by
the participants: task fulfilment, organization, grammar, and vocabulary, are
presented in this section.
Task fulfilment
In relation to task fulfilment, GA’s (FI + CLIL) progress (T2-T1) was higher
than GB’s (FI) progress (T2-T1). The difference between both groups’ progress,
however, was not statistically significant [F(1,96) = 0.20, p= .6572].
When measuring the compositions according to six behavioural levels on a
scale from 0 (not enough to evaluate) to 5 (very good), GA (FI + CLIL) ob-
tained an average performance of 2.92 at T1, versus 3.29 at T2. This is an
improvement of 0.37. On the other hand, progress made by GB (FI) was not
so high since it obtained an average result of 2.63 at T1, versus 2.87 at T2. This
is a 0.24 improvement.
Figure 8 shows such progress: although being higher in the case of GA (FI +
CLIL), the difference with GB (FI) was not statistically significant.
As far as the organization of the compositions analysed is concerned, GA’s
progress (T2-T1) was again higher than GB’s progress (T2-T1). However, the
ANOVA statistical analysis revealed that the difference was not significant
[F(1,96) = 0.20, p= .6565] as in the previous subsection. When measuring
the compositions according to the six behavioural levels explained above, GA
(FI + CLIL) obtained an average performance of 2.84 at T1, versus 3.24 at T2.
This is a 0.4 improvement. In contrast, GB’s (FI) average results were 2.49 at
T1, versus 2.76 at T2. This is a 0.27 improvement.
In Figure 9, these results are displayed: it is interesting to note that in both
this and the previous measure, GA (FI + CLIL) had a higher starting level, as
happened with some of the quantitative measures.
In this area, accuracy, the ANOVA of the participants’ performance revealed
that again GA’s (FI + CLIL) results were higher than GB’s (FI), but that how-
ever no significant differences between GA’s (FI + CLIL) progress (T2-T1) and
GB’s (FI) progress (T2-T1) were to be found [F(1,96) = 0.98, p= .3240].
As can be appreciated in Figure 10, participants in GA (FI + CLIL) obtained
an average performance of 2.4 at T1 versus an average of 3.06 at T2 (from the
six behavioural levels used in these measures). Hence, the improvement
amounted to 0.66. In contrast, progress in GB (FI) was not so high since
Figure 9: Progress in one year in GA (FI + CLIL) and GB (FI) organization
Figure 8: Progress in one year in GA (FI + CLIL) and GB (FI) task fulfilment
they obtained an average result of 2.34 at T1, versus 2.7 at T2. This is a 0.36
As far as the vocabulary used in the compositions analysed when qualitative
measures were applied, GA’s (FI + CLIL) progress (T2-T1) was again higher
than GB’s (FI) (T2-T1), but the ANOVA statistical analysis revealed that the
difference between the two groups was not significant [F(1,96) = 2.37,
p= .1256].
Like in the previous subsections, taking into account the six behavioural
levels mentioned, GA (FI + CLIL) obtained an average performance of 2.52
at T1, versus 3.18 at T2. This is a 0.66 improvement. On the other hand, GB
(FI) obtained an average performance of 2.53 at T1, versus 2.74 at T2. This is a
0.21 improvement.
Figure 11 shows higher progress in GA (FI + CLIL) than in GB (FI) from T1 to
T2. However, the difference in progress, contrary to what it may seem, was not
statistically significant.
These results obtained by the subjects in the qualitative measures of their
respective progress in writing can be summarized as follows. When analysed
with qualitative measures, GA (FI + CLIL) outperforms GB (FI) in all areas.
However, results do not reach statistical significance in any of them on any
measure. More specifically, GA (FI + CLIL) consistently tends to write a better
organized, more accurate, lexically richer, and more purposeful composition. It
is also interesting here to highlight that GA (FI + CLIL) always has a higher
onset level except in the domain of vocabulary, albeit being a year younger.
Figure 10: Average performance in the grammar measure at T1 and T2
GB’s (FI) improvement in all domains of written competence is always inferior
to GA’s (FI + CLIL).
This article has sought to fill an existing gap in the literature by examining the
effects of a CLIL learning context, in the case of a well-established programme
initiated at primary education level, involving all students at a school located
in an urban area. It focuses on written development and includes quantitative
and qualitative analyses, to our knowledge never used in combination in the
few studies published to date tapping into the CLIL effects on written devel-
opment. The analyses presented above explore the impact of an EFL pro-
gramme including a CLIL component in addition to a FI component (GA), in
contrast with a programme only including a FI component (GB). They try and
pin down whether the differential factor between both groups, that of the
additional CLIL hours in GA, has a significant impact on learners’ linguistic
progress in writing over the course of one year. When measured quantita-
tively, GA’s (FI + CLIL) results, as far as written accuracy is concerned, show
a progress of 0.042 over one academic year. This is significantly higher than
GB’s (FI) results, which improved 0.006 from T1 to T2. Syntactic complexity
also shows evidence of a tendency towards improvement on the part of GA (FI
+ CLIL), however, not to a significant level. Lexical complexity is the only skill
in which GB (FI) outperforms GA (FI + CLIL), albeit non-significantly either.
When we try and picture the results as far as the qualitative measures go, GA
(FI + CLIL) consistently tended to write a better organized, more accurate,
lexically richer, and more purposeful composition, however, not to a signifi-
cant level.
In a nutshell, these results reveal that both groups made progress on all
measures over the year under scrutiny, except for fluency and, in the case
of GB, also except for syntactic complexity. However, none of the differences
were significant, apart from accuracy, where GA (FI + CLIL) outperformed GB
Figure 11: Progress in one year in GA (FI + CLIL) and GB (FI) vocabulary
(FI). Therefore, what stands out from this study is that the CLIL addition ac-
tually means a significant gain in accuracy. On the one hand, this result was
not easy to predict, since prior findings in the CLIL literature had revealed that
largest gains tend to be obtained in lexis and pragmatic features as far as oral
production goes. On the other hand, however, the results are consistent with
some prior research on CLIL writing and particularly with regards to accuracy
(Jexenflicker and Dalton-Puffer 2010).
As reviewed in the theoretical background section above, the studies con-
ducted by Whittaker et al. (2011) and Llinares et al. (2012) on written discourse
in history classes had shown development in the control of textual resources,
as well as some increase in nominal group complexity, over the four years of
the study. The authors then suggested that CLIL settings which focus primarily
on the learning of content provide suitable contexts in which to develop writ-
ten discourse, since the secondary students involved in the study seemed to be
able to draw on a solid knowledge base from which to generate their texts. This
is an interpretation which might be extended to explain significant gains in
accuracy in the present study.
Another possible explanation that might justify the significant benefit on the
progress made in the domain of accuracy by GA (FI + CLIL) is the transfer of
knowledge and skills from the FI context to the CLIL context, where mean-
ingful practice provides extra opportunities for linguistic development to take
place. Indeed grammar abilities are often practised in the FI context. This might
confirm the beneficial effect of the additional CLIL hours on the automatiza-
tion of previously acquired knowledge through the extensive practice CLIL
contexts allow (DeKeyser 2007). Empirical studies, probably of an experimen-
tal nature, are needed in order to test such an interpretation.
Indeed, further research on CLIL versus FI contexts in secondary education
should help us find an explanation to the fact that, in the present case, an
addition in terms of hours of CLIL exposure to English has meant a significant
gain in written accuracy. While it should be noted that this finding coincides
with some other studies carried out at other latitudes in the continent (see e.g.
Jexenflicker and Dalton-Puffer 2010), it is also true that contradictory results
in CLIL research are highly frequent, perhaps because of the methodological
concerns mentioned above, but also because research is still in its infancy.
Further empirical enquiry should take into account the fact that in the present
study, the skill of writing has been analysed only through a dialogue and two
short narratives. In order to provide substantial evidence of the learners’ over-
all capacity to write, it would be desirable to also administer descriptive and
argumentative tasks, of a more complex nature, to CLIL learners. Following
Kuiken and Vedder (2007), a more complex task would have probably affected
the accuracy results in writing, particularly with respect to lexical errors.
In conclusion, the results obtained in the present study show that the su-
periority of CLIL cannot be confirmed. Our findings show improvement in
written productive abilities for the FI + CLIL group (GA), as writing shows
development, but significantly so only in the case of accuracy. Consequently,
as a final point we can state that, with the sample analysed in the presented
study, a CLIL context of learning over the course of one year, does not suffice
for the participants to improve significantly in all the domains scrutinized.
It seems evident that more research is needed in this subfield of language
acquisition if we wish to obtain a more accurate and precise picture regarding
the linguistic impact of European CLIL programmes. This should also allow us
to address a central issue in SLA research, such as that of the effect of the
language practice which all learners alike, and not only (self-)selected learners,
may avail themselves of in different learning contexts. Having said that, we
would like to argue that CLIL programmes hold the promise of representing an
educational approach which can guarantee successful learning, provided pro-
grammes are carefully designed and developed in each school context, such as
the case of the programme whose data are presented in this study, or the
programmes so carefully and successfully implemented in previously existing
immersion settings.
