English, ‘polyglot’ politicians and polyglot businessmen: Language ideologies in contemporary Bosnian press

Chapter (PDF Available) · August 2016with 170 Reads
DOI: 10.21832/9781783095971-011
In book: The position of English in Bosnia-Herzegovina, Publisher: Multilingual Matters, Editors: Louisa Buckingham
Cite this publication
Abstract
English, 'polyglot' politicians and polyglot businessmen: Language ideologies in contemporary Bosnian press Since the arrival of English-using UN and NATO peacekeeping troops in the early 1990s and the consequent postwar administration of the country by an English-using international conglomerate, as well as owing to the ongoing influx of the largely Anglophone neoliberal capitalism, English has seen a surge in importance and interest in independent Bosnia and Herzegovina. Thus, as a global language and, more importantly, the primary language of powerful outsiders, English quickly solidified its already dominant position as the foreign language of choice in prestigious domains, from politics to media to education. At the same time, paradoxically, with no ratified translations into the country's official languages, Bosnia-Herzegovina may be the only country in the world to have an English-language constitution but no official role for English (see Ajšić, 2014; Buka, 2013). As a consequence of these developments, English has gained wide acceptance throughout Bosnian society and is a primary foreign-language choice for anyone from prospective emigrants to academic scholars. 1 During this same period, English also firmly established itself as the global lingua franca without historical precedent and (for the foreseeable future) without competition (Graddol, 1997, 2006). Surprisingly, however, despite the availability of an extensive critical literature on the globalization of English (e.g. little scholarly attention has been paid outside the Anglophone and postcolonial worlds, and particularly from a quantitative or mixed-methods perspective, to the obviously important language-ideological issues emanating from the globalization of English.
Figures - uploaded by Adnan Ajšić
Author content
All content in this area was uploaded by Adnan Ajšić
Content may be subject to copyright.
English, ‘polyglot’ politicians and polyglot businessmen: Language ideologies in contemporary
Bosnian press
Since the arrival of English-using UN and NATO peace-keeping troops in the early 1990s
and the consequent post-war administration of the country by an English-using international
conglomerate, as well as owing to the ongoing influx of the largely Anglophone neoliberal
capitalism, English has seen a surge in importance and interest in independent Bosnia and
Herzegovina. Thus, as a global language and, more importantly, the primary language of powerful
outsiders, English quickly solidified its already dominant position as the foreign language of
choice in prestigious domains, from politics to media to education. At the same time,
paradoxically, with no ratified translations into the country’s official languages, Bosnia-
Herzegovina may be the only country in the world to have an English-language constitution but
no official role for English (see Ajšić, 2014; Buka, 2013). As a consequence of these
developments, English has gained wide acceptance throughout Bosnian society and is a primary
foreign-language choice for anyone from prospective emigrants to academic scholars.
1
During this same period, English also firmly established itself as the global lingua franca
without historical precedent and (for the foreseeable future) without competition (Graddol, 1997,
2006). Surprisingly, however, despite the availability of an extensive critical literature on the
globalization of English (e.g. Blommaert, 2010; Canagarajah, 1999; Pennycook, 1998; Phillipson,
1992) and a vast literature on ideology (e.g. Eagleton, 1991; Thompson, 1984; van Dijk, 1998),
little scholarly attention has been paid outside the Anglophone and postcolonial worlds, and
particularly from a quantitative or mixed-methods perspective, to the obviously important
language-ideological issues emanating from the globalization of English.
2
This study aims to contribute to filling this gap by providing an empirical account of
English-related language ideologies (i.e. conscious or unconscious, explicit or implicit beliefs
about language) in the context of postwar Bosnia-Herzegovina as a (somewhat) atypical ecology
for global English. Focusing on media language as the principal domain for public discourses and
ideologies (cf. Fowler, 1991; van Dijk, 1998, 2006), this paper examines references to the English
language in contemporary Bosnian press for evidence of language ideologies pertaining to English.
To this end, a comprehensive 11-million-word corpus comprising news articles from five leading
Bosnian-language publications from the period between 2003 and 2010 was compiled. Following
recent developments in corpus linguistics (henceforth CL) and critical discourse analysis (CDA;
e.g. Baker, 2006; Baker, 2010; Baker et al., 2008; Baker, Gabrielatos & McEnery, 2013;
Partington, 2010), the corpus data was subjected to a combination of quantitative corpus-linguistic
analytical procedures (collocation, keyword and exploratory factor analysis) and qualitative CDA
analytical techniques (e.g. examination of discursive strategies such as topoi) to identify and
describe dominant English language-related discourses and underlying language ideologies
circulating in the Bosnian public.
Literature Review
In order to understand how the present study fits into existing language ideology research,
it is helpful to briefly recount the historical trajectory of language ideology research here. The
question of language ideology, as Kroskrity (2004) notes, was neglected for the better part of the
history of linguistics. Considered briefly by the leading figures of early twentieth-century
linguistic theory, it was quickly dismissed as inconsequential and relegated outside the scope of
legitimate linguistic research. Thus, it was not until Silverstein’s seminal 1979 paper (Silverstein,
1979) that language ideology began to return into focus, first of linguistic anthropology and later
3
also of applied linguistics research. Following the oft-cited 1989 contributions by Susan Gal and
Judith Irvine (Gal, 1989; Irvine, 1989), academic interest in language ideology surged during the
1990s with a series of conferences and publications, culminating with the 1998 and 2000 volumes
titled Language ideologies: Practice and theory (Schieffelin, Woolard & Kroskrity, 1998) and
Language regimes (Kroskrity, 2000), respectively. Research in applied linguistics and particularly
language policy and sociolinguistics soon followed suit (Blackledge & Pavlenko, 2002;
Blommaert, 1999; Duchêne & Heller, 2007; Lippi-Green, 2007; Mar-Molinero & Stevenson,
2006; McGroarty, 2008, 2010; Ricento, 2000; Spolsky, 2009). Importantly, the resurgence of
interest in language ideology in these two fields has been paralleled by similar interests in the field
of CDA, where emphasis has been on the interplay between discourse and ideology in language-
based analyses of power and domination in contemporary societies (e.g. Blackledge, 2005; van
Dijk, 2006; Wodak & Meyer, 2009).
Most recently, this interest has taken a methodological turn following synergistic
developments in CDA and CL (for a seminal paper, see Baker et al., 2008) which resulted from a
rapid development of CL and its application in an increasing number of areas of linguistics (for a
comprehensive overview, see McEnery, Xiao, & Tono, 2006), as well as a response to
methodological criticisms of CDA (Mautner, 2009; Meyer, 2001; Wodak & Meyer, 2009). As
Baker et al. (2008) explain, a mixed-methods approach relying on both CL and CDA has a number
of advantages, most important of which are triangulation, generalizability, and replicability.
However, the number of corpus-based or corpus-assisted studies that specifically focus on
language-related discourses (i.e. more or less coherent systems of statements which construct an
object of which they speak, e.g. language) and language ideologies rather than discourses or
4
ideologies more generally (such as, for example, Baker, Gabrielatos & McEnery, 2013) remains
small.
Similar to many qualitative studies of language ideology (e.g. Blackledge, 2002;
Blommaert & Verschueren, 1998; Bokhorst-Heng, 1999; DiGiacomo, 1999; Kuo & Nakamura,
2005; Ricento, 2003), mixed-methods studies often rely on newspaper and, to a lesser extent,
policy corpora as their main data sources (for an overview of data sources used in studies of
language ideology, see Ajsic & McGroarty, in press). Partington and Morley (2004), for example,
rely on a 500,000-word corpus compiled from editorials published in seven English broadsheet
and tabloid newspapers to identify and analyze frequent phraseology (cf. clusters below) in
political debates. Fitzsimmons Doolan (2009) uses newspaper corpora compiled from Arizona
newspaper articles to compare the lexical overlap between discourses of language policies and
immigration in this local context. Similarly, Vessey (2013a,b) examines two sets of corpora of
English- and French-language articles from Canadian newspapers for language ideologies and
their links to discourses on national identity. Most recently, Subtirelu (2015) uses a corpus of 14
million of online student evaluations of university instructors to examine language-related remarks
made about instructors speaking English as a foreign language. Freake, Gentil and Sheyholislami
(2011), on the other hand, study a bilingual corpus of English and French policy briefs submitted
to a Canadian government commission on religious and cultural accommodation, while
Fitzsimmons Doolan (2011, 2014) bases her investigations of lexical variables as indicators of
language ideologies on a one-million-word corpus of language-in-education policy documents
harvested from the Arizona Department of Education website. Most pertinently for our purposes,
however, Ensslin and Johnson (2006) use a corpus of newspaper articles from The Times and The
Guardian to investigate ideological representations of language and linguistics in the British press,
5
ultimately focusing their follow-up qualitative analysis on the occurrences of the phrase the
English language.
Following the research tradition outlined above, this study seeks to answer the following
research questions:
RQ1: How do quantitative corpus-linguistic analytical procedures (keyword, collocation
and exploratory factor analysis) compare in terms of their potential for the identification of
English language-related discourses and ideologies in a corpus of mainstream Bosnian
newspaper discourse?
RQ2: What English language-related discourses and underlying language ideologies can
be identified from the mainstream Bosnian newspaper discourse from the period between
2003 and 2010?
Data and Method
Data
Similar to existing research (e.g. Ensslin & Johnson, 2006; Vessey, 2013a,b), this study is
based on a specialized corpus comprising newspaper articles containing the search term lemma
JEZIK language (i.e. jezik, jezika, jeziku, jezikom, jezici, jezike, jezicima).
2
The parent corpus used
here is a comprehensive corpus of relevant articles (BOSCORP) from five leading Bosnian daily,
weekly, and bi-weekly newspapers and newsmagazines (Dnevni Avaz, Oslobođenje, Dani,
Slobodna Bosna, Start). BOSCORP comprises 11,252,145 words in 11,592 articles from the
period between 2003 and 2010. It was compiled using the following procedure. First, using the
the Infobiro digital archive of the Bosnian press (Mediacentar Sarajevo, 2013) and its built-in
search function and the search term jezi*, all articles containing any of the forms of the lemma
JEZIK ‘language’ (as well as, perforce, those containing forms of the lemma JEZIČ[K/N]I ‘linguistic’)
6
were identified. Second, a custom-written Python program was used to download and save all the
relevant articles. After the corpus was thus compiled, a wordlist was created using the wordlist
tool in WordSmith Tools 6.0 (WST; Scott, 2014) and checked against the selection criteria (i.e.
articles containing forms of the lemma JEZIK). Third, another custom-written Python program was
used to identify all articles in BOSCORP which contained any of the lemma forms of the phrase
ENGLESKI JEZIK ‘English language’ (using the search term englesk* jezi*), and these were then
copied into separate folders to compile a sub-corpus pertaining to the English language
(ENGCORP). This search identified a total of 1,353 articles with a total of 1,209,898 words. Next,
a wordlist was created for ENGCORP and checked against the selection criteria to ensure
representativeness. Finally, both BOSCORP and ENGCORP were examnined for errors (e.g.
articles containing words that matched the keywords ‘jezi*’ and ‘engleski jezi*’ but did not
otherwise match the selection criteria).
Method
As mentioned above, three distinctly different methodological approaches were used in the
process of identification of pertinent lexis and lexical patterns here. All quantitative analyses were
conducted with the help of WST and the Statistical Package for the Social Sciences 21.0 (SPSS;
IBM, 2012). Following is a brief explanation of the theoretical background and the relevant
procedures and parameters.
Keyword analysis. Keyword analysis (Scott, 1997, 2014) uses statistical techniques such
as the chi-square and log likelihood tests to compare the difference between the observed
frequency of a word in a corpus with its expected frequency, on the one hand, and the difference
between that same word’s observed and expected frequencies in a larger, reference corpus, on the
other. The result is a statistical measure of a word’s salience in the node corpus such that if the
7
difference between the two sets of frequencies is statistically significantly, the word is identified
as key and given a keyness score based on the strength of the difference. A list of keywords
calculated for a corpus (or a single text) thus suggests the “aboutnessof that corpus, i.e. what a
corpus (or text) is about. Keywords can be positive (when they are significantly more frequent in
the node as compared to the reference corpus) or negative (when they are significantly less frequent
in the node corpus). Whereas positive keywords suggest what a corpus is about, negative
keywords can be used as an indicator of what may be missing from it. Keyword analysis has been
used in a wide variety of discourse studies (see, for example, the essays in Bondi & Scott, 2010)
to identify what characterizes a certain text or corpus, as well as to look for differences between
parallel texts or corpora.
Although keyword analysis has been the object of criticism on several grounds and
particularly for its dependence on the reference corpus chosen, it was used as a starting point here
because it can provide a macroscopic discursive profile of a corpus. It has been suggested that a
reference corpus needs to be at least five times the size of the node corpus (Berber-Sardinha, 1999).
The reference corpus used here was the remainder of BOSCORP (8,275,378 words) after
ENGCORP (1,209,898 words) had been subtracted (i.e. articles containing references to language
but not the English language) and thus meets the size criterion. There are two principal advantages
to this particular choice of reference corpus. First, items identified as key can be expected to be
characteristic of discourses around English as compared to general language-related discourses,
and second, because both corpora comprise newspaper register, the resulting keyword list will be
free from items that characterize newspaper language in general. The analysis was conducted with
the help of the ‘keywords’ tool in WST, using log likelihood (p < .00000001) and the default WST
keywords settings. The results are shown in Table 1.
8
Collocation analysis
3
.
Strength of association and frequency. Collocation analysis, on the other hand, examines
the co-occurrence patterns between words and does not require a reference corpus. The strength
of association between two words is measured by various statistical techniques such as the t-test,
and z and mutual information (MI) scores (McEnery, Xiao & Tono, 2006). MI score, the preferred
technique in lexical analysis, is calculated by comparing “the probability of observing the two
words together with the probability of observing each word independently, based on the
frequencies of the words(Biber, Conrad & Reppen, 1998: 266). A score of 0 means that there is
no association between the words, while a score higher than 0 suggests some association. An MI
score of 3 or higher is considered to indicate a significant association (Hunston, 2002: 71). Unlike
keyword analysis, which represents a more general characterization of a corpus, collocation
analysis provides an indication of how individual words are used in a corpus. Such patterns can
be suggestive of particular discourses and underlying ideologies as “[n]o words are neutral [and]
[c]hoice of words represents an ideological position(Stubbs, 1996: 107). Collocation analysis
was conducted with the help of the ‘concordancetool in WST, using the span of five words to the
left of the node word and five words to the right (L5-R5), and cutoff points for minimum item
frequency (≥ 5), minimum number of texts (≥ 5), and minimum strength of association score (MI
≥ 3). The results are shown in Tables 2 and 3.
Pattern analysis. In addition to the two primary perspectives from which to consider
collocates (strength of association and frequency), WST makes possible a mode of collocation
analysis which focuses on the positionality of collocates in the collocational span (5L-5R in our
case). Put differently, the ‘patterns’ tool in WST uses frequency data to organize collocates into
columns by the position in which they most frequently occur. As a result, any existing lexical
9
patterns become easier to identify as the most frequent collocates in every position around the node
word surface to the top of each column. Curiously, although collocation patterns are automatically
calculated in WST by the concordance tool and thus are easy to use, this feature is not often used
in this type of research (or, at least, its use and results are not often reported). The results are
shown in Tables 4 and 5.
Cluster analysis. In line with the recent shift in focus in corpus linguistics and applied
linguistics research generally to phraseology (see, for example, Biber, Conrad & Cortes, 2004;
Chen & Baker, 2010; Gray & Biber, 2013), corpus-based research into discourses and ideologies
has examined clusters (also known as lexical bundles or n-grams, i.e. recurring word combinations
with n number of constituents, e.g. jezik i književnost ‘language and literature’; see Cheng & Lam,
2013 for a discourse-analytic application). Cluster analysis, as it will be referred to here, is useful
as recurrent word combinations can be more informative in discursive terms than individual
collocates considered in isolation. Cluster analysis in this study was conducted using the ‘clusters’
function in the concordance tool in WST. The parameters used were 2-6-word clusters with a
minimum frequency of 5 in the span of 5 words to both left and right of the node word (L5-R5);
analysis was conducted separately for each of the forms of the node lemma ENGLESKI ‘English’.
The results of cluster analysis are shown in Table 6.
Downsampling using the ‘plot’ function of the concordance tool in WST. The final
step in the collocation analysis section of this paper was to identify a small number of
representative pertinent texts for follow-up, qualitative language-ideological analysis using the
CDA framework. This is a critical issue for studies combining CL and CDA because objective
selection of representative texts for qualitative analysis using replicable methods has been
identified as one of the main objectives of this methodological synergy (Baker et al., 2008; Wodak
10
& Meyer, 2009). Following Vessey (2013b), the ‘plot’ function of the concordance tool in WST
was used to identify texts with the highest numbers of occurrences of the search term lemma
ENGLESKI ‘English’ (i.e. hits). A total of four top-scoring texts (one per lemma form) was selected
for qualitative analysis (see Table 7).
Exploratory factor analysis. Exploratory factor analysis (EFA) has been used much less
frequently in discourse and ideology studies than either keyword or collocation analysis. Its
application to linguistic data goes back to Douglas Biber’s seminal work on multidimension
analysis (e.g. Biber, 1988). In essence, EFA is a multivariate statistical technique which relies on
co-occurrence between variables to identify more or less discrete sets of variables that co-vary to
a significant degree (see also Tabachnik & Fidell, 2007). These sets of variables, called factors,
are considered to represent an underlying construct which accounts for their mutual covariance.
Although the use of EFA is now fairly common in grammar- and pedagogy-related corpus
linguistics research (see, for example, the essays in Cortes & Csomay, 2015), it has, to the best of
the present author’s knowledge, been used only once to analyze language-related discourse and,
particularly, ideology. Fitzsimmons-Doolan (2011, 2014), in her study of language ideologies in
educational settings in Arizona, used a corpus of Arizona Department of Education documents to
identify several factors which she interpreted as indicators of different language ideologies. EFA
is applied here with a similar goal in mind: to identify any factors in the subcorpus of texts
containing references to English and examine them in terms of their discursive and ideological
content.
The procedure (for detailed outlines, see Biber, 1988: 59-99; Fitzsimmons-Doolan, 2011:
137-140 and Appendix H) begins with the identification of strong collocates, in this case of the
search term lemma ENGLESKI ‘English’ (see Tables 2 and 3). A custom-written PERL program
11
was used to count all occurrences of all collocates in all texts in the corpus, as well as to norm the
total counts per 1,000 words. SPSS was then used to run several factor analyses using each text
as an observation and each collocate as a variable. First, a factor analysis was conducted using all
1,353 texts in ENGCORP and all 183 collocates. However, this data set turned out to be only
marginally factorable, producing a factor solution of marginal interest from the standpoint of
language ideological analysis. EFA results depend to a large degree on the number of occurrences
per observation (Douglas Biber, personal communication), so it was decided at this point to limit
the data set to only those articles that contained a minimum of three occurrences of the phrase
lemma ENGLESKI (for a total of 161 texts). Further, because EFA also requires an absolute
minimum of two occurrences per observation (and, ideally, five, Tabachnik & Fidell, 2007), the
number of variables was reduced to 80 by applying a combination of stricter selection criteria (i.e.
higher cutoff points for item frequency, 7; number of texts, 7; and MI scores, 6). This number
was further reduced to 29 (Table 8) through several consecutive runs for an optimal variables-to-
texts ratio. Finally, a four-factor solution was chosen (Table 9). Interpretation of the resulting
factors was conducted by examining the salient collocates (loading at ≥ .30 on a factor).
Downsampling using factor scores. The ultimate goal of EFA here was to select
representative relevant texts for qualitative analysis using a transparent, objective, and replicable
procedure. To this end, regression analysis was used to estimate factor scores for each text,
identifying the top-scoring texts on each factor for qualitative analysis (Table 10). Confirmation
and elaboration of the interpretation of each factor were conducted by a qualitative analysis of the
top-scoring texts.
12
Results
Keyword Analysis
Keyword analysis produced a total of 124 keywords. Following recommendations in the
literature (e.g. Ensslin & Johnson, 2006), the keyword list was scanned and cleaned up by
removing function words and any newspaper text paraphernalia (e.g. by-line-related words). Thus
cleaned up, the keyword list included a total of 96 keywords, 86 positive (1-86) and 10 negative
(87-96, Table 1). Although research based on keyword analysis typically focuses on a limited
number of top keywords on account of their usually large numbers, arguably all words identified
as key are important and should be considered in the analysis. However, because this study relies
on several different quantitative methods, my observations here will be limited to prominent
patterns deducible from the keyword list.
Table 1
Positive and negative keywords in ENGCORP (by keyness score)
N
Key word
Freq.
%
Texts
RC. Freq.
RC. %
P
1
ENGLESKOM
923
0.08
710
0
0.0000000000
2
ENGLESKI
660
0.05
470
0
0.0000000000
3
ENGLESKOG
550
0.05
406
0
0.0000000000
4
VREMENA
415
0.03
278
1
0.0000000000
5
PROFESOR
262
0.02
150
1
0.0000000000
6
FAKULTET
400
0.03
117
614
0.0000000000
7
JEZIKU
1090
0.09
733
3402
0.04
0.0000000000
8
STUDENATA
314
0.03
101
639
0.0000000000
9
STUDIJ
216
0.02
89
320
0.0000000000
10
AMERICI
266
0.02
133
510
0.0000000000
11
DOLARA
292
0.02
108
605
0.0000000000
12
JEZIKA
1074
0.09
534
4222
0.05
0.0000000000
13
STUDENTI
252
0.02
114
480
0.0000000000
14
UNIVERZITETA
348
0.03
145
856
0.01
0.0000000000
15
STUDIJA
242
0.02
123
498
0.0000000000
16
FAKULTETA
342
0.03
142
893
0.01
0.0000000000
17
THE
286
0.02
132
693
0.0000000000
18
SARAJEVU
1423
0.12
559
6427
0.08
0.0000000000
19
UNIVERZITET
181
0.01
69
351
0.0000000000
20
OF
177
0.01
119
346
0.0000000000
21
LONDONU
126
0.01
80
202
0.0000000000
22
FAKULTETU
252
0.02
142
713
0.0000000000
23
GODINE
2906
0.24
853
15941
0.19
0.0000000000
24
UNIVERZITETU
194
0.02
115
536
0.0000000000
25
PROFESORI
140
0.01
81
320
0.0000000000
26
AMBASADORA
129
0.01
67
283
0.0000000000
27
KNJIGA
828
0.07
294
3839
0.05
0.0000000000
13
28
AMERIČKI
219
0.02
136
672
0.0000000000
29
AMERIČKE
176
0.01
102
486
0.0000000000
30
KM
512
0.04
154
2155
0.03
0.0000000000
31
SARAJEVO
1014
0.08
388
4973
0.06
0.0000000000
32
RADA
375
0.03
231
1453
0.02
0.0000000000
33
AMERIČKOM
89
75
162
0.0000000000
34
PROGRAM
323
0.03
162
1208
0.01
0.0000000000
35
RAD
495
0.04
278
2103
0.03
0.0000000000
36
POSAO
537
0.04
249
2333
0.03
0.0000000000
37
AMERIČKIM
93
72
189
0.0000000000
38
NEW
188
0.02
97
598
0.0000000000
39
KNJIGE
729
0.06
299
3477
0.04
0.0000000000
40
SUDU
187
0.02
76
604
0.0000000000
41
AMERIČKA
103
74
243
0.0000000000
42
AMERIČKIH
129
0.01
97
356
0.0000000000
43
AMERIKANCI
110
75
278
0.0000000000
44
RADOVA
147
0.01
75
437
0.0000000000
45
BORAVKA
118
82
321
0.0000000000
46
POSLOVA
240
0.02
150
897
0.01
0.0000000000
47
IZDANJE
140
0.01
97
423
0.0000000000
48
MINISTAR
293
0.02
146
1194
0.01
0.0000000000
49
KANTONA
221
0.02
96
826
0.0000000000
50
NAUKA
208
0.02
136
766
0.0000000000
51
OBRAZOVANJE
194
0.02
118
706
0.0000000000
52
OBJAŠNJAVA
200
0.02
123
737
0.0000000000
53
PROFESORA
210
0.02
116
791
0.0000000000
54
PROF
300
0.02
122
1266
0.02
0.0000000000
55
OBLASTI
216
0.02
116
834
0.01
0.0000000000
56
MINISTRA
196
0.02
102
734
0.0000000000
57
STUDIJE
111
71
330
0.0000000000
58
EURA
215
0.02
98
833
0.01
0.0000000000
59
JEZIK
982
0.08
522
5243
0.06
0.0000000000
60
BROJ
590
0.05
321
2938
0.04
0.0000000000
61
IN
115
76
364
0.0000000000
62
NEKOLIKO
1000
0.08
500
5417
0.07
0.0000000000
63
AMERIČKOG
113
87
361
0.0000000000
64
AMBASADE
107
74
334
0.0000000000
65
CENTAR
269
0.02
145
1155
0.01
0.0000000000
66
SAD
635
0.05
283
3244
0.04
0.0000000000
67
PRVE
256
0.02
199
1089
0.01
0.0000000000
68
AGENCIJE
141
0.01
85
498
0.0000000000
69
VANJSKIH
123
0.01
77
414
0.0000000000
70
RADI
1023
0.08
514
5601
0.07
0.0000000000
71
RADILA
115
68
381
0.0000000000
72
RADNIKA
146
0.01
77
533
0.0000000000
73
MILIONA
358
0.03
153
1670
0.02
0.0000000000
74
TRENUTNO
256
0.02
178
1107
0.01
0.0000000000
75
CENTRA
235
0.02
144
1000
0.01
0.0000000000
76
RODITELJI
150
0.01
83
559
0.0000000000
77
OSOBA
270
0.02
169
1204
0.01
0.0000000000
78
DR
627
0.05
221
3267
0.04
0.0000000000
79
ORGANIZACIJA
204
0.02
123
849
0.01
0.0000000000
80
INFORMACIJE
176
0.01
119
703
0.0000000001
81
MINISTARSTVO
158
0.01
96
613
0.0000000001
82
ŠKOLE
389
0.03
181
1890
0.02
0.0000000004
83
CIJENA
106
68
361
0.0000000007
84
IZDANJU
102
81
347
0.0000000038
85
GODIŠNJE
111
73
391
0.0000000045
86
INFORMACIJA
133
0.01
94
501
0.0000000056
87
VLAST
142
0.01
79
1572
0.02
0.0000000012
88
SRPSKI
102
68
1232
0.01
0.0000000001
89
SRPSKE
203
0.02
121
2107
0.03
0.0000000001
90
DRŽAVE
392
0.03
220
3662
0.04
0.0000000000
91
SRBIJI
133
0.01
86
1555
0.02
0.0000000000
92
SRBIJE
210
0.02
115
2232
0.03
0.0000000000
93
BOŠNJAKA
166
0.01
80
1882
0.02
0.0000000000
94
NAROD
176
0.01
121
2186
0.03
0.0000000000
95
NE
7183
0.59
931
57263
0.69
0.0000000000
96
NARODA
317
0.03
177
4217
0.05
0.0000000000
14
Unsurprisingly, the list is topped by three different forms of the lemma ENGLESKI ‘English’
confirming that the node corpus is primarily about English. Similarly, three forms of the lemma
JEZIK ‘language’ are also identified as keywords (7, 12, 59), which again reflects the selection
criteria used for corpus compilation (articles including all lemma forms of the phrase ENGLESKI
JEZIK ‘English language’). The most prominent semantic field both at the top and throughout the
list is that of education and higher education in particular (professor, 5, 25, 53, 54; faculty, 6, 16;
students, 8, 13; program of study, 9; university, 14, 19, 24; doctor, 78; education, 51; parents, 76;
schools, 82). Similarly prominent is the presence of as many as ten items referring to the United
States (10, 28, 29, 33, 37, 41-43, 63, 66). Other semantic fields of note include government and
foreign affairs (ambassador, 26; embassy, 64; minister, 48, 56; ministry, 81; foreign, 69; affairs,
46; canton, 49; court, 40;); work (work, 35, 44; job, 36; to work, 70-71; workers, 72); money
(dollars, 11; convertible marks, 30; euros, 58; millions, 73; price, 83); and publishing (book, 27,
39; edition, 47, 84). Interestingly, four English words are also identified as key (the, 17; of, 20;
new, 38; in, 61). Negative keywords (starting from line 87 in the table in italics), on the other
hand, seem to indicate a relative absence of references to some local identities and political
structures (Serb, 88-89; Serbia, 91-92; Bosniak, 93; people, 94, 96; authorities, 87; state, 90) and
thus suggest a local vs. global dichotomy in discussions of local languages (Bosnian, Croatian,
Serbian) on the one hand and English on the other.
Collocation Analysis
`Strength of association and frequency. Similar to keyword analysis, collocation
analysis of a large corpus typically produces a large number of collocates. Using the criteria
outlined above, collocation analysis conducted here produced a total of 298 significant collocates.
Again, as with keywords, although all significant collocates (MI score ≥ 3) are potentially
15
interesting, this number is too large for an individual human analyst to analyze. The list of
collocates was therefore scanned and cleaned up in a manner similar to that used in keyword
analysis above (i.e. excluding function words except pronouns
4
and lexical items of marginal
semantic value such as self-collocates and various forms of high-frequency verbs such as be and
have) and the number of collocates to be considered was thus reduced to 183. In addition to this,
and in contrast to keywords, collocates must be considered from at least two perspectives: strength
of collocation and frequency. While statistical techniques such as the MI score used here are
important indicators of the strength of association between pairs of words, low-frequency
collocates (even those with very high MI scores) are of limited value because they are not well
distributed throughout the corpus and do not typically provide much material for analysis (unless,
of course, they share a semantic field or prosody with several other low-frequency collocates in
the corpus, cf. Baker, 2006). Here, we consider the lemma collocates of all four forms of the
lemma ENGLESKI ‘English’ both from a strength of association (Table 2) and frequency perspective
(Table 3).
Table 2
Lemma collocates of the lemma ENGLESKI ‘English’ (by strength of association)
N
Word
Relation
Texts
Total
N
Word
Relation
Texts
Total
1
SLUŽI
12.05
9
9
93
ČASOPISA
6.59
5
5
2
GOVORITI
11.15
131
175
94
DOVOLJNO
6.57
10
12
3
PREVEDEN
11.14
50
58
95
SVIJETA
6.55
19
19
4
RAČUNARA
11.11
14
14
96
SVI
6.52
47
47
5
PREVOD
10.93
48
52
97
NAVEDEN
6.46
5
5
6
POZNAVANJE
10.90
50
59
98
RAD
6.32
18
18
7
KURS
10.89
39
50
99
POTREBNO
6.29
6
6
8
PREVESTI
10.78
22
24
100
GODINA
6.28
56
56
9
UČENJE
10.78
37
51
101
PISMO
6.16
7
8
10
ODVIJATI
10.66
20
21
102
SVOJ
6.13
18
24
11
PERFEKTNO
10.33
7
7
103
ZOVE
6.13
5
7
12
INFORMATIKE
10.28
14
18
104
TUZLI
6.11
7
7
13
TEČNO
10.22
10
15
105
BRZO
6.09
5
5
14
NASTAVA
10.22
43
46
106
ČOVJEK
6.06
13
15
15
UČITI
10.22
26
26
107
JEDAN
6.00
41
45
16
NJEMAČKI
10.03
64
79
108
DOBRO
5.99
20
22
17
ZNANJE
9.97
53
62
109
NAVODI
5.90
5
5
18
FRANCUSKI
9.82
53
62
110
RIJEČ
5.83
16
17
16
19
ŠPANSKI
9.72
11
12
111
INTERNET
5.82
5
5
20
ARAPSKI
9.71
29
32
112
BAREM
5.80
6
6
21
VERZIJA
9.66
19
24
113
OBJAŠNJAVA
5.78
5
5
22
TALIJANSKI
9.65
11
11
114
ONI
5.67
30
34
23
DOSTUPNA
9.52
8
8
115
IZBOR
5.66
5
5
24
NAPISANA
9.46
21
22
116
PJESME
5.65
10
10
25
IZVRSNO
9.42
6
6
117
ROMAN
5.57
7
7
26
DVOJEZIČNO
9.27
8
8
118
UNIVERZITETU
5.56
5
5
27
ODSJEKU
9.19
10
14
119
MORAJU
5.55
6
6
28
TURSKI
9.13
17
20
120
STOJI
5.55
6
6
29
TEST
9.11
6
9
121
DOBILI
5.54
7
7
30
FILOZOFSKOG
9.09
14
14
122
NAPISAO
5.51
7
7
31
PROFESOR
9.05
55
63
123
SKORO
5.48
7
8
32
BOSANSKI
9.03
94
105
124
NAUKA
5.46
5
5
33
ITALIJANSKI
9.03
6
6
125
TRENUTNO
5.42
6
6
34
AKTIVNO
8.96
15
16
126
OSIM
5.41
10
10
35
MATEMATIKE
8.94
7
8
127
POSEBNO
5.37
8
8
36
PISATI
8.91
23
36
128
MALO
5.37
11
11
37
ŠTAMPAN
8.84
7
7
129
PRIČA
5.35
11
12
38
ODLIČNO
8.80
18
18
130
POTPUNO
5.33
5
6
39
SLUŽBENI
8.79
7
7
131
MOJ
5.31
6
7
40
POHAĐAJU
8.78
6
6
132
TREBA
5.28
20
24
41
GRČKI
8.77
5
5
133
KULTURE
5.25
5
5
42
POZNAJE
8.76
15
17
134
DJELA
5.24
11
12
43
NASTAVNIKA
8.67
9
10
135
DIREKTOR
5.20
5
5
44
IZDANJE
8.66
38
43
136
DVA
5.19
22
24
45
RUSKI
8.64
15
15
137
SAMO
5.17
43
48
46
KANAL
8.59
5
5
138
VEĆINA
5.12
5
5
47
STUDIJ
8.56
30
37
139
NOVI
5.08
5
5
48
ČITATI
8.56
13
14
140
FILM
5.07
12
13
49
URAĐEN
8.55
6
6
141
DRUGI
5.03
16
17
50
SREDNJOJ
8.52
6
7
142
VAM
5.03
5
7
51
PREVOĐENJE
8.52
6
6
143
ŠEST
5.03
5
5
52
ŠKOLA
8.49
32
36
144
TV
4.99
5
5
53
OBJAVLJEN
8.48
27
30
145
DJECA
4.98
5
5
54
PREDAJE
8.46
5
5
146
ČETIRI
4.92
8
8
55
NAUČITI
8.35
9
11
147
NIKO
4.89
7
7
56
KNJIŽEVNOST
8.33
31
35
148
AMERICI
4.89
5
6
57
PODRUČJA
8.25
12
13
149
SAD
4.85
10
10
58
RJEČNIK
8.24
5
6
150
KASNIJE
4.82
6
6
59
ISPIT
8.19
6
7
151
AUTORA
4.82
5
5
60
MATERNJI
8.07
5
5
152
VREMENA
4.73
6
6
61
UDŽBENIK
8.04
6
6
153
DANAS
4.72
13
14
62
SAVRŠENO
8.03
6
6
154
VRLO
4.72
9
9
63
PREDAVANJA
7.97
13
13
155
NAM
4.70
21
22
64
OBJAVILA
7.94
14
14
156
PRVI
4.70
15
16
65
ZNAM
7.92
47
58
157
JA
4.69
32
34
66
MEDRESE
7.83
6
6
158
BROJ
4.64
8
8
67
IZLAZI
7.82
11
11
159
NIKADA
4.62
6
8
68
IZVODI
7.82
6
6
160
VELIKI
4.61
6
6
69
ZAVRŠILA
7.78
7
7
161
ZNAČI
4.57
8
8
70
HRVATSKOM
7.70
10
10
162
UGLAVNOM
4.54
6
6
71
STRANI
7.68
21
22
163
UPRAVO
4.52
5
5
72
DIPLOMU
7.58
6
6
164
RADI
4.43
12
12
73
POČELA
7.56
13
13
165
BH
4.42
12
13
74
DIPLOMIRAO
7.52
5
5
166
VEOMA
4.39
6
6
75
PROGRAM
7.51
24
27
167
TRI
4.38
15
15
76
PREVODILAC
7.47
7
7
168
DIO
4.38
10
10
77
OMOGUĆAVA
7.45
5
6
169
NARAVNO
4.37
5
5
78
PJESNIK
7.42
8
8
170
BEZ
4.30
14
17
79
SRPSKI
7.26
12
12
171
DR
4.23
9
9
80
STRANICA
7.26
11
12
172
ZAISTA
4.23
5
5
81
TEKST
7.20
23
24
173
ČESTO
4.15
5
5
82
KNJIGA
7.17
78
91
174
ZAPRAVO
4.12
5
5
83
KAZAO
7.12
23
25
175
POSTOJI
4.07
6
7
84
UČENICI
7.09
6
7
176
NAKON
3.95
13
13
85
STUDENTI
6.94
11
14
177
KOLIKO
3.86
5
5
86
ISKLJUČIVO
6.91
8
8
178
REKAO
3.82
5
5
87
UGOVORA
6.90
6
6
179
VIŠE
3.52
15
17
88
FAKULTET
6.90
23
26
180
NAČIN
3.50
7
7
17
89
NAŠ
6.88
35
39
181
PRIJE
3.49
11
11
90
KORISTI
6.72
8
8
182
NEKOLIKO
3.39
8
8
91
OBLASTI
6.67
10
10
183
BIH
3.36
31
36
92
MOGAO
6.64
23
23
As can be seen from Table 2, the strongest lemma collocates (MI ≥ 10) of the lemma ENGLESKI
refer to language proficiency (to use, 1; to speak, 2; knowledge/proficiency, 6; perfectly, 11;
fluently, 13), translation (translate, 3; translation, 5; to translate, 8), computer science (computers,
4; computer science, 12), language instruction (course, 7; learning, 9; language of instruction; 10;
instruction; 14; to learn, 15) and discrete languages (German, 16). A similar picture emerges when
collocates are considered from the perspective of frequency (Table 3). Here, we again find the
semantic fields of language proficiency (to speak, 1; knowledge/proficiency, 6, 8, 11), discrete
languages (Bosnian, 2; German, 4; French, 7), translation (book, 3; translated, 9; translation, 12),
and language instruction (professor, 5; learning, 13; course, 14). Finally, if the full list of 183
collocates (let us use Table 3 for reference here) is examined in terms of their semantic patterning,
four more semantic fields emerge: national identities (Bosnian, 2; our, 20; Bosnia-Herzegovina,
24, 73; United States, 92; America, 149; mother [tongue], 158), publishing (edition, 19; version,
35; published, 66; bilingual, 99; printed, 113; journal, 160), literature (book, 3; literature, 25; works
of art, 80; poems, 90; poet, 101; novel, 122; author, 176), and pop culture/media (film, 72; channel,
156; culture, 169; television, 174). Other semantic fields traces of which are evident from the full
list of collocates include science (science, 168) and religion (madrasa, 138).
Table 3
Lemma collocates of the lemma ENGLESKI ‘English’ (by frequency)
N
Word
Relation
Texts
Total
N
Word
Relation
Texts
Total
1
GOVORITI
11.15
131
175
93
DIO
4.38
10
10
2
BOSANSKI
9.03
94
105
94
SLUŽI
12.05
9
9
3
KNJIGA
7.17
78
91
95
TEST
9.11
6
9
4
NJEMAČKI
10.03
64
79
96
VRLO
4.72
9
9
5
PROFESOR
9.05
55
63
97
DR
4.23
9
9
18
6
ZNANJE
9.97
53
62
98
DOSTUPNA
9.52
8
8
7
FRANCUSKI
9.82
53
62
99
DVOJEZIČNO
9.27
8
8
8
POZNAVANJE
10.90
50
59
100
MATEMATIKE
8.94
7
8
9
PREVEDEN
11.14
50
58
101
PJESNIK
7.42
8
8
10
ZNAM
7.92
47
58
102
ISKLJUČIVO
6.91
8
8
11
GODINA
6.28
56
56
103
KORISTI
6.72
8
8
12
PREVOD
10.93
48
52
104
PISMO
6.16
7
8
13
UČENJE
10.78
37
51
105
SKORO
5.48
7
8
14
KURS
10.89
39
50
106
POSEBNO
5.37
8
8
15
SAMO
5.17
43
48
107
ČETIRI
4.92
8
8
16
SVI
6.52
47
47
108
BROJ
4.64
8
8
17
NASTAVA
10.22
43
46
109
NIKADA
4.62
6
8
18
JEDAN
6.00
41
45
110
ZNAČI
4.57
8
8
19
IZDANJE
8.66
38
43
111
NEKOLIKO
3.39
8
8
20
NAŠ
6.88
35
39
112
PERFEKTNO
10.33
7
7
21
STUDIJ
8.56
30
37
113
ŠTAMPAN
8.84
7
7
22
PISATI
8.91
23
36
114
SLUŽBENI
8.79
7
7
23
ŠKOLA
8.49
32
36
115
SREDNJOJ
8.52
6
7
24
BIH
3.36
31
36
116
ISPIT
8.19
6
7
25
KNJIŽEVNOST
8.33
31
35
117
ZAVRŠILA
7.78
7
7
26
ONI
5.67
30
34
118
PREVODILAC
7.47
7
7
27
JA
4.69
32
34
119
UČENICI
7.09
6
7
28
ARAPSKI
9.71
29
32
120
ZOVE
6.13
5
7
29
OBJAVLJEN
8.48
27
30
121
TUZLI
6.11
7
7
30
PROGRAM
7.51
24
27
122
ROMAN
5.57
7
7
31
UČITI
10.22
26
26
123
DOBILI
5.54
7
7
32
FAKULTET
6.90
23
26
124
NAPISAO
5.51
7
7
33
KAZAO
7.12
23
25
125
MOJ
5.31
6
7
34
PREVESTI
10.78
22
24
126
VAM
5.03
5
7
35
VERZIJA
9.66
19
24
127
NIKO
4.89
7
7
36
TEKST
7.20
23
24
128
POSTOJI
4.07
6
7
37
SVOJ
6.13
18
24
129
NAČIN
3.50
7
7
38
TREBA
5.28
20
24
130
IZVRSNO
9.42
6
6
39
DVA
5.19
22
24
131
ITALIJANSKI
9.03
6
6
40
MOGAO
6.64
23
23
132
POHAĐAJU
8.78
6
6
41
NAPISANA
9.46
21
22
133
URAĐEN
8.55
6
6
42
STRANI
7.68
21
22
134
PREVOĐENJE
8.52
6
6
43
DOBRO
5.99
20
22
135
RJEČNIK
8.24
5
6
44
NAM
4.70
21
22
136
UDŽBENIK
8.04
6
6
45
ODVIJATI
10.66
20
21
137
SAVRŠENO
8.03
6
6
46
TURSKI
9.13
17
20
138
MEDRESE
7.83
6
6
47
SVIJETA
6.55
19
19
139
IZVODI
7.82
6
6
48
INFORMATIKE
10.28
14
18
140
DIPLOMU
7.58
6
6
49
ODLIČNO
8.80
18
18
141
OMOGUĆAVA
7.45
5
6
50
RAD
6.32
18
18
142
UGOVORA
6.90
6
6
51
POZNAJE
8.76
15
17
143
POTREBNO
6.29
6
6
52
RIJEČ
5.83
16
17
144
BAREM
5.80
6
6
53
DRUGI
5.03
16
17
145
MORAJU
5.55
6
6
54
BEZ
4.30
14
17
146
STOJI
5.55
6
6
55
VIŠE
3.52
15
17
147
TRENUTNO
5.42
6
6
56
AKTIVNO
8.96
15
16
148
POTPUNO
5.33
5
6
57
PRVI
4.70
15
16
149
AMERICI
4.89
5
6
58
TEČNO
10.22
10
15
150
KASNIJE
4.82
6
6
59
RUSKI
8.64
15
15
151
VREMENA
4.73
6
6
60
ČOVJEK
6.06
13
15
152
VELIKI
4.61
6
6
61
TRI
4.38
15
15
153
UGLAVNOM
4.54
6
6
62
RAČUNARA
11.11
14
14
154
VEOMA
4.39
6
6
63
ODSJEKU
9.19
10
14
155
GRČKI
8.77
5
5
64
FILOZOFSKOG
9.09
14
14
156
KANAL
8.59
5
5
65
ČITATI
8.56
13
14
157
PREDAJE
8.46
5
5
66
OBJAVILA
7.94
14
14
158
MATERNJI
8.07
5
5
67
STUDENTI
6.94
11
14
159
DIPLOMIRAO
7.52
5
5
68
DANAS
4.72
13
14
160
ČASOPISA
6.59
5
5
69
PODRUČJA
8.25
12
13
161
NAVEDEN
6.46
5
5
70
PREDAVANJA
7.97
13
13
162
BRZO
6.09
5
5
71
POČELA
7.56
13
13
163
NAVODI
5.90
5
5
72
FILM
5.07
12
13
164
INTERNET
5.82
5
5
73
BH
4.42
12
13
165
OBJAŠNJAVA
5.78
5
5
74
NAKON
3.95
13
13
166
IZBOR
5.66
5
5
75
ŠPANSKI
9.72
11
12
167
UNIVERZITETU
5.56
5
5
19
76
SRPSKI
7.26
12
12
168
NAUKA
5.46
5
5
77
STRANICA
7.26
11
12
169
KULTURE
5.25
5
5
78
DOVOLJNO
6.57
10
12
170
DIREKTOR
5.20
5
5
79
PRIČA
5.35
11
12
171
VEĆINA
5.12
5
5
80
DJELA
5.24
11
12
172
NOVI
5.08
5
5
81
RADI
4.43
12
12
173
ŠEST
5.03
5
5
82
TALIJANSKI
9.65
11
11
174
TV
4.99
5
5
83
NAUČITI
8.35
9
11
175
DJECA
4.98
5
5
84
IZLAZI
7.82
11
11
176
AUTORA
4.82
5
5
85
MALO
5.37
11
11
177
UPRAVO
4.52
5
5
86
PRIJE
3.49
11
11
178
NARAVNO
4.37
5
5
87
NASTAVNIKA
8.67
9
10
179
ZAISTA
4.23
5
5
88
HRVATSKOM
7.70
10
10
180
ČESTO
4.15
5
5
89
OBLASTI
6.67
10
10
181
ZAPRAVO
4.12
5
5
90
PJESME
5.65
10
10
182
KOLIKO
3.86
5
5
91
OSIM
5.41
10
10
183
REKAO
3.82
5
5
92
SAD
4.85
10
10
Pattern analysis. Pattern analysis, as mentioned above, is useful as an alternative way of
looking at collocation data because the examination of collocates from the strength of association
and frequency perspectives may not reveal any obvious patterns beyond the possibility of grouping
collocates into semantic fields. Table 4 shows the top 50 collocational patterns for the lemma
ENGLESKI as produced by the ‘patternsfunction of the concordance tool in WST (shortened to
L3-R3 for ease of presentation). Similar to keyword and collocation analysis, this is the point at
which quantitative results must be manually examined by the analyst to identify any patterns for
further examination.
Table 4
Top 50 collocational patterns for all four forms of the lemma ENGLESKI ‘English’ (by position
relative to node)
N
L3
L2
L1
NODE
R1
R2
R3
1
NA
I
NA
ENGLESKOM
JEZIK
I
JE
2
I
JE
I
ENGLESKI
I
U
I
3
JE
BOSANSKI
GOVORITI
ENGLESKOG
JE
A
DA
4
U
NE
ZNANJE
ENGLESKIM
NA
NA
U
5
DA
NA
POZNAVANJE
NJEMAČKI
TE
SE
6
SE
KNJIGA
PROFESOR
U
ZA
SU
7
ZA
ZA
UČENJE
A
JE
JEZIK
8
KOJI
PREVOD
U
ALI
KOJI
ĆE
9
JEZIK
NJEMAČKI
ZA
AUTOR
KAO
NA
10
BOSANSKI
PREVEDEN
KURS
FRANCUSKI
KAKO
KNJIŽEVNOST
11
PREVEDEN
KOJI
ZNAM
AKO
ALI
SAM
12
S
FRANCUSKI
IZ
DA
ŠTO
NE
13
GOVORITI
DA
JE
JER
DA
BI
20
14
SU
SE
S
KOJI
TO
JA
15
SA
U
UČITI
FRANCUSKI
OD
16
ŠTO
PISATI
SA
KOJA
TO
17
PROGRAM
NAŠ
STUDIJ
AUTOR
ZA
18
GODINA
SU
POZNAJE
GOVORITI
ŠTO
19
NE
ARAPSKI
JEZIK
SVI
KOJI
20
OD
ODVIJATI
NASTAVNIKA
SAM
KNJIGA
21
ONI
IZDANJE
ILI
SA
O
22
BITI
AKTIVNO
ŠKOLA
NJEMAČKI
SMO
23
NASTAVA
NAPISANA
RIJEČI
OD
S
24
A
SAMO
FRANCUSKI
NE
NAM
25
KAO
DOBRO
OD
KNJIGA
KAKO
26
BILA
ODLIČNO
SU
JER
NJEMAČKI
27
SVOJ
GOVORITI
SE
IAKO
IMA
28
BIH
NASTAVA
INFORMATIKE
PA
PO
29
NJEMAČKI
JEZIK
SVOJ
DOK
BIH
30
KNJIGA
PREVESTI
UZ
ONI
31
PREVOD
TO
SE
ALI
32
ZBOG
VERZIJA
BOSANSKI
KAZAO
33
ĆE
TEČNO
KAZAO
TAKO
34
STRANICA
STUDIJ
ILI
BOSANSKI
35
PROFESOR
ŠKOLA
S
TOGA
36
SAMO
A
PODRUČJA
OVAJ
37
NIJE
O
THE
NIJE
38
JER
ODSJEKU
TAKO
GODINA
39
KOJA
OBJAVLJEN
PO
RAD
40
IZDANJE
BIH
O
PREVOD
41
DVA
KURS
NEGO
INFORMATIKE
42
ČITATI
NI
NIJE
TRI
43
JEDAN
ZNAM
MEĐUTIM
A
44
IZ
GODINA
KOJIM
GOVORITI
45
STUDIJ
TEKST
IMA
SVI
46
O
IZ
JA
SA
47
PRVI
ŠTO
BI
FRANCUSKI
48
PREDAVANJA
PERFEKTNO
KOJE
MU
49
KOJE
TURSKI
JEZIK
GA
50
ZNAM
IZVRSNO
SU
The full collocational pattern table (not shown here) revealed that the largest number of
collocates occurred in slots R2, L2, and L3, which is somewhat surprising considering that
modifiers (e.g. adjectives) are more likely to show up next to the node (i.e. L1 for premodifiers
and R1 for postmodifiers). Qualitative analysis was therefore focused on the collocates in columns
R2, L2 and L3.
5
Concentrating on content words (i.e. lexemes), the only relevant pattern that could
be identified was a set of six adverbs in L2 (in bold underline) which function as proficiency
markers (see further below). Subsequent follow-up examination of the cleaned-up list of 183
collocates revealed three more such items, for a total of nine adverbs functioning as proficiency
markers (Table 5). Perhaps expectedly, the markers suggest a cline of proficiency from malo
‘poorly’ at the low end to dovoljno ‘sufficiently’, aktivno ‘actively’, tečno ‘fluently, and dobro
21
‘well’ in the middle, to odlično ‘excellently, izvrsno ‘outstandingly’ and savršeno/perfektno
‘perfectly’ at the high end. Further, it is interesting to note that four of the five most frequent
adverbs refer to a middling proficiency, while three of the four adverbs referring to a high
proficiency are on the bottom of the list. The only adverb that refers to low proficiency is ranked
six of nine. Note also that there are proficiency markers other than adverbs in this data set, such as
the frequent two- and three-word clusters (ne) govori engleski ‘does (not) speak English’ (cf.
further below), whose inclusion would possibly alter the rankings here.
Table 5
Adverb collocates as proficiency indicators (by frequency)
N
Word
Relation
Texts
Total
Total
Left
Total
Right
L5
L4
L3
L2
L1
R1
R2
R3
R4
R5
84
DOBRO
5.99
20
22
18
4
4
13
1
2
1
1
100
ODLIČNO
8.80
18
18
16
2
13
3
1
1
111
AKTIVNO
8.96
15
16
15
1
1
14
1
120
TEČNO
10.22
10
15
14
1
1
1
10
2
1
144
DOVOLJNO
6.57
10
12
4
8
1
3
1
2
3
1
1
155
MALO
5.37
11
11
7
4
2
4
1
1
2
1
217
PERFEKTNO
10.33
7
7
7
0
6
1
240
IZVRSNO
9.42
6
6
6
0
6
259
SAVRŠENO
8.03
6
6
5
1
1
1
3
1
Cluster analysis. Cluster analysis, based on the parameters detailed above, yielded a large
number of 2-6-word custers (i.e. n-grams) for three of the four forms of the lemma ENGLESKI
‘English’ (the form engleskim [instrumental, singular] was the least frequent and did not cluster in
this sample, see Table 6). Predictably, the largest number of clusters were two-word clusters or
bigrams, followed by three- and four-word clusters, and so on, while the most frequent bigrams
were different lemma forms of the phrase engleski jezik ‘English language’ (engleski jezik
[nominative/accusative/instrumental, singular], 422; engleskog jezika [genitive, singular], 470;
engleskom jeziku [dative, singular], 767). Interestingly, many of the clusters extended across the
categories from two- to six-word clusters: for example, (2) engleskom jeziku [dative/instrumental,
22
singular] ‘English language’ > (3) na engleskom jeziku ‘in the English language’ > (4) bosanskom
i engleskom jeziku ‘Bosnian and English language’ > (5) na bosanskom i engleskom jeziku ‘in
the Bosnian and English languages’ > (6) nastava se odvija na bosanskom i engleskom jeziku
‘instruction is given in the Bosnian and English languages’.
Table 6
Most frequent clusters in ENGCORP (by lemma form, number of constituents, and frequency)
2-GRAM
FREQ.
3-GRAM
FREQ.
4-GRAM
FREQ.
5-GRAM
FREQ.
6-GRAM
FREQ.
ENGLESKI
engleski
jezik
422
na engleski
jezik
108
i na
engleski
jezik
14
na odsjeku
za engleski
jezik
8
n/a
na
engleski
139
engleski
jezik i
59
na engleski
jezik i
14
preveden i
na engleski
jezik
6
n/a
govori
engleski
71
i engleski
jezik
47
engleski
jezik i
književnost
12
na
bosanski i
engleski
jezik
5
n/a
ENGLESKOG
engleskog
jezika
470
engleskog
jezika i
62
za učenje
engleskog
jezika
18
programa
za učenje
engleskog
jezika
5
n/a
jezika i
65
i engleskog
jezika
33
engleskog
jezika i
književnosti
11
n/a
n/a
i
engleskog
36
poznavanje
engleskog
jezika
29
je
profesorica
engleskog
jezika
7
n/a
n/a
ENGLESKOM
engleskom
jeziku
767
na
engleskom
jeziku
585
bosanskom
i engleskom
jeziku
47
na
bosanskom
i
engleskom
jeziku
45
nastava se
odvija na
engleskom
jeziku
6
na
engleskom
723
i
engleskom
jeziku
125
na
bosanskom
i engleskom
46
na našem i
engleskom
jeziku
14
na
bosanskom
i na
engleskom
jeziku
6
i
engleskom
130
engleskom
jeziku a
52
na
engleskom
jeziku a
46
na
engleskom
jeziku
autor i
7
dvojezično
na
bosanskom
i
engleskom
jeziku
5
ENGLESKIM
n/a
n/a
n/a
n/a
n/a
A closer examination of the more semantically complete frequent clusters confirms some of the
patterns identified earlier such as sets of items referring to translation (preveden na engleski jezik
23
translated into the English language’), higher education (na odsjeku za engleski jezik ‘in the
English language department’, engleski jezik i književnost ‘English language and literature’),
language proficiency (govori engleski jezik ‘speaks the English language’, poznavanje engleskog
jezika ‘a command of the English language’), language learning (programa za učenje engleskog
jezika of program[s] for the study of the English language’), and, arguably, identity maintenance
through binary oppositions (na našem i engleskom jeziku ‘in our and English language’, na
bosanskom i engleskom jeziku ‘in the Bosnian and English languages’).
Importantly, however, frequency alone seems to be of limited value here as many of the
more semantically complete and therefore potentially more interesting clusters are found further
down in the cluster lists. For example, the cluster list for the form engleski includes a set of clusters
referring to different levels of language proficiency which is potentially interesting in terms of
language-ideological analysis (see above): tečno govori (engleski) speaks fluent (English)’, ranks
22 (2-word) and 17 (3-word); odlično govori (engleski) ‘speaks excellent (English), ranks 43 (2-
word) and 28 (3-word); and perfektno govori (engleski) ‘speaks perfect (English), ranks 70 (2-
word) and 44 (3-word). But, as can be seen from the clusters’ respective ranks, they are not among
the most frequent clusters identified. Conversely, many of the most frequent clusters, such as jezik
i ‘language and’ (65 occurrences) and i engleskom ‘and (in) English’ (130 occurrences), are
semantically incomplete and rather less informative. Cluster analysis, then, seems to provide little
information beyond a confirmation of patterns identified earlier in the analysis.
Downsampling using the ‘plot’ function of the concordance tool in WST. As noted
above, the final step in the collocation analysis section of this paper was to downsample or identify
a small number of representative texts for qualitative analysis (cf. Hunston, 2002). The results of
24
the downsampling procedure described above are shown in Table 7. Again, as noted above, only
the four top-scoring texts (one per lemma form) were selected for follow-up analysis.
Table 7
Articles with the highest numbers of hits for the search term lemma ENGLESKI ‘English’ (by
lemma form and number of hits)
N
Lemma form
File
Words
Hits
per 1,000
Dispersion
1
ENGLESKI
DI-04-05-2007
2825
13
4.60
0.49
2
ENGLESKOM
ST-04-11-2003
2912
10
3.43
0.55
3
ENGLESKOG
OS-03-11-2009
467
9
19.27
0.72
4
ENGLESKIM
ST-20-10-2009
2573
3
1.17
-0.07
Content summaries of articles selected for qualitative analysis based on raw
frequency scores. Article DI-04-05-2007 (Selimbegović, 2007) discusses the hiring practices at
one of the most successful privately-owned Bosnian companies, with an emphasis on the hiring of
well-educated Bosnians holding degrees from (prestigious) foreign universities and having a fluent
command of English and other foreign languages. In a nutshell, the author systematically
juxtaposes home and abroad, associating educational and business success with stints abroad and
proficiency in foreign languages, English in particular. Article ST-04-11-2003 (Filipović, 2003)
is a commentary on the attempts by several domestic and foreign contributors to Walter magazine
to discredit the literary success of the Bosnian American writer Aleksandar Hemon, who writes
mainly in English. In addition to the topos (i.e. explicit or implicit premise linking argument with
conclusion) of inauthenticity of literature written in second/foreign languages, this text also
juxtaposes the spheres of home and abroad, associating time abroad and proficiency in English
with success and, as a corollary, a lack thereof with failure and backwardness. The third article,
OS-03-11-2009 (O. M., 2009) reports on a successful English language-training program for
25
members of the Armed Forces of Bosnia-Herzegovina part of a larger effort to join NATO. Also
here, the central theme is the relationship between home and abroad, while English is associated
with success. Finaly, article ST-20-10-2009 (Redakcija Starta, 2009) is an interview with the well-
known Bosnian linguist Midhat Riđanović in which he opines on several language-related issues
such as the Bosnian linguistic culture, language use in the Bosnian media, language policy in
Bosnia, the lagging of local linguistic science, and English in Bosnia. Similar to the articles
discussed above, this text juxtaposes home and abroad and equates the sphere of abroad and
proficiency in English with success on the one hand, and localness and a lack of proficiency in
English with failure and backwardness on the other.
Exploratory Factor Analysis
Finally, let us look at the results of EFA which produces sets of collocates based on their
co-occurrence. As shown in Tables 9 and 10, EFA yielded a four-factor solution with a total of
29 highly-loading variables accounting for 40.57% of the total variance in the data. In contrast to
EFA-based studies of distribution of grammatical features (e.g. Biber, 1988), there were no
variables loading negatively on any of the factors. The identified factors were interpreted and
labeled as follows: Factor 1: proficiency in foreign languages (to use, Spanish, perfectly, German,
Italian, Russian, to know, to speak, foreign); Factor 2: published translations from foreign
languages into Bosnian and vice versa (translation, Bosnian, published, version, French, edition,
translated, book, to publish); Factor 3: language in higher education (department, professor,
philosophy, Tuzla, students, faculty, instruction); and Factor 4: computer science and foreign
languages (computer science, course, all, German, French, knowledge).
26
Table 8
Collocate variables with significant factor loadings (≥ .30, in alphabetical order)
1
2
3
4
BOSANSKI
.029
.667
.022
.012
FAKULTET
-.022
-.087
.465
-.006
FILOZOFSKI
.047
.054
.589
-.015
FRANCUSKI
.186
.530
-.0``
.402
GOVORITI
.612
.021
-.022
-.102
INFORMATIKA
.006
-.023
-.051
.797
ITALIJANSKI
.670
.096
-.082
-.062
IZDANJE
.013
.462
-.018
-.187
KNJIGA
-.011
.352
-.046
-.180
KURS
.016
-.070
-.042
.730
NASTAVA
-.040
-.113
.465
.045
NJEMAČKI
.678
-.052
-.125
.415
OBJAVITI
-.059
.305
-.057
-.147
OBJAVLJEN
-.032
.593
-.040
-.161
ODSJEK
.050
.049
.857
.026
PERFEKTNO
.690
.005
-.069
.025
POZNAVATI
.615
-.060
.049
-.035
PREVEDEN
-.054
.462
-.066
-.149
PREVOD
-.046
.734
-.028
.132
PROFESOR
.008
-.003
.661
.028
RUSKI
.627
.003
.108
-.013
SLUŽITI
.988
-.021
-.047
-.005
ŠPANSKI
.722
-.054
-.073
-.012
STRANI
.494
-.092
.116
-.027
STUDENTI
-.033
-.114
.529
.035
SVI
-.072
-.195
.201
.437
TUZLA
-.008
-.022
.550
.055
VERZIJA
-.036
.578
-.064
.162
ZNANJE
-.089
-.093
.088
.382
Table 9
Summary of the factorial structure
FACTOR 1
FACTOR 2
FACTOR 3
FACTOR 4
SLUŽITI
.988
PREVOD
.734
ODSJEK
.857
INFORMATIKA
.797
ŠPANSKI
.722
BOSANSKI
.667
PROFESOR
.661
KURS
.730
PERFEKTNO
.690
OBJAVLJEN
.593
FILOZOFSKI
.589
SVI
.437
NJEMAČKI
.678
VERZIJA
.578
TUZLA
.550
NJEMAČKI
.415
ITALIJANSKI
.670
FRANCUSKI
.530
STUDENTI
.529
FRANCUSKI
.402
RUSKI
.627
IZDANJE
.462
FAKULTET
.465
ZNANJE
.382
POZNAVATI
.615
PREVEDEN
.462
NASTAVA
.465
GOVORITI
.612
KNJIGA
.352
STRANI
.494
OBJAVITI
.305
27
It will be noted that the factors identified here indicate some general themes and discourses
emerging from articles mentioning English and, in contrast to the findings in Fitzsimmons-Doolan
(2011), are only suggestive of language ideologies. We thus see, in Factors 1 and 2, traces of a
discourse of binary linguistic identities (Bosnian vs. foreign languages), a related discourse of
discrete linguistic codes and associated monolingual ethnocultural identities (Bosnian, German,
French, etc.), and a discourse of measurable language proficiency (e.g. perfectly). In addition to
these, there are traces of a discourse of intercultural communication through translation (Factor 2),
a discourse of foreign languages in higher education (Factor 3), and, arguably, a related discourse
centering on computer science and foreign languages as indispensable in a knowledge economy.
Although it is possible to deduce certain language ideologies already at this point, such as for
example the apparent ideology of monolingualism, for a more complete language ideology profile
a qualitative analysis of the most representative texts is necessary.
Downsampling using factor scores. Following the procedure outlined above, factor
scores were estimated for each of the 161 texts with three or more hits for the search term lemma
ENGLESKI. Table 10 shows the five top-scoring texts for each factor. Only the four top-scoring
texts (one per factor) were selected for qualitative analysis here. Similar to the four texts identified
above, these texts are analyzed in detail in further below.
Table 10
Texts with top factor scores (by factor and score)
FACTOR 1
FACTOR 2
FACTOR 3
FACTOR 4
DA-09-12-2005
8.702
DA-29-03-2008
7.934
DA-28-11-2005
9.515
DA-03-08-2009
10.038
DA-11-07-2010
8.532
DA-12-08-2005
3.566
DA-21-07-2005
2.940
DA-29-03-2008
2.493
OS-16-09-2009
1.214
DA-03-03-2008
2.988
DI-30-05-2008
2.677
OS-03-11-2009
2.185
DI-23-12-2005
1.110
OS-27-04-2009
2.411
DA-12-01-2008
2.606
DI-22-05-2009
1.168
SB-19-07-2007
1.050
DA-13-09-2005
2.340
OS-02-10-2005
2.410
SB-10-07-2003
1.087
28
Content summaries of articles selected for qualitative analysis based on factor scores.
Similar to content analysis above, in this section we examine four articles with top scores on each
factor. The top-scoring article on Factor 1 (proficiency in foreign languages) DA-09-12-2005
(Sinanović, 2005) discusses the ability of high-ranking Bosnian politicians to speak foreign
languages. Although it is noted that some are proficient in various foreign languages, the overall
assessment is negative (they are ironically referred to as ‘polyglots’) as their ability to speak
foreign languages is deemed critical in the context of the country’s drive to join the European
Union. English is explicitly treated as the most important foreign language (e.g. it is the only one
mentioned in the title). The top-scoring article on Factor 2 (published translations from foreign
languages into Bosnian and vice versa) DA-29-03-2008 (FENA, 2008) reports on an effort to
translate into Bosnian the French version of the ruling by the International Court of Justice (ICJ)
in the case Bosnia-Herzegovina brought against Serbia and Montenegro for genocide. The French
version is said to be ‘richer’ than the English one, a translation of which was already available.
The top-scoring article on Factor 3 (language in higher education) DA-28-11-2005 (Hadžić, 2005)
is a short vignette about the experiences of a visiting American professor teaching English at the
University of Tuzla. The professor is quoted as saying that her students in Tuzla “are the best
students I have worked with in my career so far. Finally, the top-scoring article on Factor 4
(computer science and foreign languages) DA-03-08-2009 (DŽ. A., 2009) briefly reports on
computer science and foreign language courses, among others, offered by the public library in the
city of Zenica during the summer months.
29
Discussion
This study set out to 1) compare three quantitative corpus-linguistic methodological
approaches to the identification of language-related discourses and language ideologies (keyword,
collocation and exploratory factor analysis; RQ1), and 2) identify and describe English language-
related discourses and underlying ideologies in the mainstream Bosnian newspaper discourse
(RQ2). The analysis proceeded by method (from keyword to collocation to exploratory factor
analysis) and epistemological orientation (from quantitative to qualitative). The following sections
discuss, first, the results of the methodological comparison; second, language-related discourses
and ideologies identified through quantitative analysis; third, language-related discourses and
ideologies identified through qualitative analysis; and fourth, the resulting overall language
ideological profile of English in the mainstream Bosnian press. The final section discusses the
limitations and directions for future research.
Methodological Comparison
It was noted above that, while researchers have sometimes combined keyword and
collocation analysis, no previous research has compared the two or combined them with EFA.
This is the first objective of this study. Keyword and collocation analyses identified 124 keywords
and 183 collocates, respectively, grouped into several semantic fields, with a partial overlap.
Exploratory factor analysis, on the other hand, produced four factors, again with a partial overlap
with the semantic fields identified by the other two methods. Table 11 shows all of the semantic
fields and factors organized by method.
30
Table 11
Semantic fields and factors in ENGCORP (by prominence/significance)
Keyword analysis
Collocation analysis
Exploratory factor analysis
education and higher
education
language proficiency
proficiency in foreign
languages
United States
translation
translations from foreign
languages into Bosnian
and vice versa
government and foreign
affairs
computer science
language in higher
education
work
language instruction
computer science and
foreign languages
money
discrete languages
publishing
national identity
publishing
literature
pop culture/media
science
religion
Although there is some overlap between the results of all three analyses (semantic fields/factors
related to higher education), it is clear that there are considerable differences as well. The
difference is at its most obvious between the results of keyword analysis on the one hand, and
collocation analysis and EFA on the other. Thus, semantic fields relating to United States,
government and foreign affairs, work, and money are entirely absent from the results of collocation
analysis and EFA. This is, of course, hardly surprising considering that EFA depends on a similar
set of collocates to that used in collocation analysis. Despite the reliance on similar sets of
collocates, however, collocation analysis and EFA also produced somewhat different results.
Thus, semantic fields relating to national identity, literature, pop culture/media, and religion
identified by collocation analysis are either entirely absent or are much less prominent in the EFA
results. In addition to semantic fields, differences are also apparent in the results of the different
downsampling methods. Here, two sets of four entirely different articles were identified by
31
collocation analysis and EFA.
6
Further, it was shown that, beyond the triangulation of the results
of collocation analysis, cluster analysis and pattern analysis can be helpful in the identification of
patterns that are otherwise difficult to detect, such as the set of adverb collocates used as
proficiency markers. Based on these observations, then, it seems reasonable to conclude that all
of the employed analytical methods have certain advantages and can contribute to the overall
analysis, particularly in terms of triangulation. Let us then examine the dominant discourses and
language ideologies identified in this sample.
Dominant Language-related Discourses and Language Ideologies: Quantitative Findings
Based solely on the quantitative evidence, the lexical data suggests that references to the
English language in the mainstream Bosnian press can be classified into several dominant semantic
fields: higher education, language proficiency, translation and publishing, and discrete languages
and ethnolinguistic identity. In addition to these, there exist marginal semantic fields pertaining
to English and literature and popular culture and media, government and foreign affairs, business,
science, and religion. This evidence suggests conceptualizations of English in terms of an
international high’ variety reserved mainly for prestige domains such as higher education,
government and foreign affairs, business, and media. This is coupled by a very prominent
discourse on language proficiency (e.g. adverb collocates indicating a cline of proficiency, Table
5), as well as discourses of discrete linguistic codes and ethnolinguistic identities, and intercultural
communication. In language-ideological terms, then, the quantitative evidence seems to suggest
a dominant ideology of monolingualism and indispensability of English as the means of global
communication in domains of consequence, as well as, paradoxically, the language of the ‘other’
which is routinely discursively constructed in terms of an implied binary conception of identity
(e.g. naš ‘our’, rank 20 and svoj ‘own’, rank 37 vs. strani ‘foreign’, rank 42, Table 3). Moreover,
32
there is the prominence of the American identity of English which in postwar Bosnia can be
ascribed to the dominant role the US has played in the country’s political life as well as global US
dominance (cf. Ajsic, 2014).
These results are congruent to a certain extent with Ensslin and Johnson’s (2006) findings
based on a combination of keyword and discourse analysis. They identify four dominant semantic
fields pertaining to English in the mainstream British press: individual languages and language
varieties, education, media culture, and a set of terms relating broadly to identity (Ensslin &
Johnson, 2006: 160-161). Further, their collocation and concordance analyses of the patterning of
the phrase ‘English language’, suggest a discourse of English as a monolithic standard, a discourse
of folk linguistic commentary and endangerment of English, a discourse of commodification and
prestige, and a discourse of symbolic global dominance and triumphalism. The difference between
these findings points to a conspicuous absence in Bosnian discourses around English: namely, that
of threat or contestation (pervasive in discussions of global English), which suggests full
acceptance and naturalization of the dominance of English (albeit only as a means of
communication with the outside world). The question, however, is what English-related
discourses and ideologies can be identified from individual texts and to what extent they
correspond to the quantitative trends, to which we now turn.
Dominant Language-related Discourses and Language Ideologies: Qualitative Findings
One of the major goals of corpus-based discourse analysis, as noted above, is objective
downsampling or selection of representative texts for in-depth qualitative analysis. In this study,
two sets of representative individual texts were selected based on a) frequency counts of the
different forms of the node lemma ENGLESKI ‘English’ in individual texts, and b) individual texts’
factor scores. The content and discourse strategies in both sets of texts were summarized above.
33
The qualitative analysis conducted in this section is based on topoi (explicit or inferable obligatory
premises which make it possible to connect arguments with conclusions, Wodak, 2001: 72-74).
One dominant topos was identified: a systematic albeit largely implicit juxtaposition of the spheres
of home and abroad in which a foreign education and/or professional experience abroad and/or
proficiency in English (and other languages) were consistently associated with both individual and
societal success and progress, whereas a lack thereof was consistently associated with failure and
backwardness. This topos was evident in all four articles in the first set, as well as the top-scoring
articles on Factors 1 and 3 in the second set.
In line with the apparent naturalization of English dominance, its position seems to be
accepted by default and English is therefore rarely ideologized and explicitly thematized in
metalinguistic terms in ENGCORP articles. This makes the traditional CDA approach based on
the selection and close examination of one or several rich, illustrative examples problematic (cf.
Blackledge, 2002). Indeed, the topos of juxtaposition is entirely implicit in four of the six articles
in which it has been identified, and partly so in a fifth. The solution for in-depth qualitative
analysis, therefore, is not necessarily to examine full texts, as has been the case traditionally, but
rather to select illustrative excerpts from the articles identified as representative through
quantitative analyses. Here, we will examine two such excerpts: one from the first set of articles
(ST-20-10-2009, Table 7), and one from the second (DA-09-12-2005, Table 10).
In rare metalinguistic commentary on English in ENGCORP, linguist Riđanović is rather
explicit,
May I now add something you haven’t asked me? I am concerned to say that I believe the main reason we
have been lagging behind in all kinds of areas is that we have a pitifully small number of people who have a
high proficiency in English. In today’s world you cannot do anything at all without decent English. If you
follow CNN, BBC, or Al-Jazeera, you will see, for example, that every Palestinian politician of any standing
speaks fluent English. Why? Well, because they realize they cannot attain independence by way of arms,
but might possibly be able to do so by way of negotiations. Now, negotiating through interpreters is much
less effective than negotiating directly. I do not mean here that type of knowledge of English which suffices
to order water at a restaurant, but that which makes it possible for you to communicate using standard English
34
in order to attain some higher goal. Attaining such proficiency in English is an exacting, long-term endeavor
and the typical Bosnian would sooner die than expend that much effort. This is why I am fully convinced
that Bosnia has NO hope of progress WHATSOEVER. (original emphasis, my translation)
In this first excerpt, linguist Riđanović, at the end of an interview in which he was asked to discuss
a variety of language-related issues, takes the opportunity to offer his opinion on an issue he had
not been asked about by the interviewer but one which he obviously considers to be very important.
He then proceeds to a diagnosis of the systemic societal failure in Bosnia, which he ascribes,
somewhat hyperbolically, primarily to “a pitifully small number of people who have a high
proficiency in English,” which is to say low English-language standards. His argument is simple:
1) English is necessary for societal progress (“In today’s world you cannot do anything at all
without decent English.”), and 2) only a high proficiency in a certain variety of English is effective
(“I do not mean here that type of knowledge of English which suffices to order water at a restaurant,
but that which makes it possible for you to communicate using standard English in order to attain
some higher goal.”), but 3) attaining such proficiency is a difficult task and Bosnians are typically
not interested in doing things thoroughly (“Attaining such proficiency in English is an exacting,
long-term endeavor and the typical Bosnian would sooner die than expend that much effort.”), so
4) Bosnian society is not likely to make any progress (“This is why I am fully convinced that
Bosnia has NO hope of progress WHATSOEVER.”). What we see here then is an apparent belief
in language as a discrete code with a cline of possible varieties and proficiency levels (“a high
proficiency in English”, “decent English”, “fluent English”, “that type of knowledge of English
which suffices to order water at a restaurant”), only one of which has value (“that which makes it
possible for you to communicate using standard English in order to attain some higher goal”), i.e.
the standard variety.
35
These are, of course, all classic elements of a language ideology of monolingualism based
on a prestigious standard variety, which has historical origins in the West (see Bauman & Briggs,
2000). What is particularly interesting, however, is a) the exaggerated assessment of the
importance of English in Bosnian society, and b) the parallel Riđanović makes between Bosnia
and Palestine. It is not unreasonable to talk of systemic failure in postwar Bosnia-Herzegovina
with its rampant corruption and severe poverty, but the more likely primary reason for this state of
affairs, of course, is the recent war which left the country in ruins and the society divided along
ethnic lines. The implicit comparison with Palestine, however, sheds some light on the logic
behind the exaggeration of the importance of English. Similar to Palestine, Bosnia seems to have
been drawn into an intractable, longterm conflict with direct involvement of the United States and
other global powers. This means that decisive political power has lied outside of Bosnia itself
which necessitates appeals to that power to be made in its language. Needless to say, the language
of international diplomacy, regardless of the level of US involvement, is English. Hence the need
for “decent English” as “negotiating through interpreters is much less effective than negotiating
directly.” Finally, it is important to note here the implied negative assessment of the country’s
political establishment, which, of course, is the one “negotiating through interpreters” with,
arguably, pitiful results.
Somewhat similarly to the excerpt above, the author of article DA-09-12-2005 makes the
main topos of juxtaposition (partly) explicit,
A knowledge of one foreign language, above all English which has long been one of the basic means of
communication throughout the world, is one of the requirements for political and diplomatic practice today.
This rule, however, does not seem to apply to Bosnia-Herzegovina. […] Judging by their official biographies,
none of the members of the Presidency of Bosnia-Herzegovina speak English. Sulejman Tihić does not speak
any foreign languages. Borislav Paravac gets by in French, while Chair Ivo Miro Jović speaks German. The
situation in the Council of Ministers [i.e. government] is somewhat better, but Prime Minister Adnan Terzić
cannot speak any languages other than the mother tongue. He is joined by federal ministers Mirsad Kebo,
Ljerka Marić, Dragan Doko and Safet Halilović, as well as entity Prime Ministers Ahmet Hadžipašić and
Pero Bukejlović. Minister of defense, Nikola Radovanović, speaks Italian, Slovene and English, while
Mladen Ivanić [minister of foreign affairs] speaks German and English. Bariša Čolak has a passive
36
knowledge of German like his colleague Slobodan Kovač who knows Russian, while Branko Dokić speaks
English. Of course, the question is whether these ‘polyglots’ are a sufficient basis for a country hoping to
join European integrations. Moreover, interpreters serving our politicians almost on a daily basis are too
expensive. […] The ‘polyglots’ in the parliament. Speaker of the House of Representatives Nikola Špirić
speaks only his mother tongue, Martin Raguž gets by in English, while Šefik Džaferović gets by in German.
Of the remaning 41 representatives, eleven cannot speak any foreign languages. In the House of Peoples,
seven of the 15 representatives speak only local. How this is done elsewhere. Kolinda Grabar-Kitarović,
Croatian minister of foreign affairs, speaks perfect English, Spanish, and Portuguese, and gets by in Italian,
German, and French. Similarly, [Croatian] Prime Minister Ivo Sanader communicates without problems in
English, German, Italian, and French. In Serbia and Montenegro, President Boris Tadić speaks English and
French, while Serbian Prime Minister Vojislav Koštunica speaks German in addition to those two. (my
translation)
In this excerpt, we again see evidence of a matter-of-fact explicit acceptance of English (“English
[…] has long been one of the basic means of communication throughout the world, [and] is one of
the requirements for political and diplomatic practice today), as well as the juxtaposition of home
and abroad (“This rule, however, does not seem to apply to Bosnia-Herzegovina”, “How this is
done elsewhere”, “seven of the 15 representatives speak only local”), and a belief in language as
a discrete code with a cline of proficiency levels (“gets by in French”, “speaks German”, “has a
passive knowledge of German”, “knows Russian”). Further, we see the same topoi of 1) English
is indispensable (“the basic means of communication throughout the world”, “one of the
requirements for political and diplomatic practice today”), 2) only a high proficiency in standard
English has value (“speaks perfect English”, “communicates without problems in English”), 3)
Bosnian politicians are (linguistically) inept (“none of the members of the Presidency of Bosnia-
Herzegovina speak English”, “Of the remaning 41 representatives, eleven cannot speak any
foreign languages”, “seven of the 15 representatives speak only local”), and that same conclusion
that 4) there is little hope of societal progress in Bosnia while things are as they are (“the question
is whether these ‘polyglots’ are a sufficient basis for a country hoping to join European
integrations”). Note also that the same topos of externality of decisive political power is evident
37
here, albeit referring to a different external locus of power (“a country hoping to join European
integrations”).
A Tentative Language-ideological Profile of English in Bosnia
Based on the quantitative and qualitative evidence presented in this study, it is possible to
sketch a tentative language-ideological profile of English in Bosnia. As noted above, the
quantitative evidence suggests conceptualizations of English as an international ‘high’ variety
reserved for prestige domains, in addition to discourses of language proficiency, discrete linguistic
codes and monolingual ethnolinguistic identities, and intercultural communication. Again, in
language-ideological terms, the quantitative evidence seems to suggest a dominant ideology of
societal monolingualism and indispensability of English as the means of international/global
communication. Further, as the prominence of the lexical set referring to the United States among
the keywords in ENGCORP shows, despite its supranational aura English in Bosnia seems to be
strongly associated with the United States.
7
As might be expected, the qualitative evidence adds a degree of nuance to this macroscopic
view of the language-ideological profile of English. The qualitative analysis of contents and
discursive strategies revealed a dominant topos that can be described as a systematic albeit largely
implicit juxtaposition of the spheres of home and abroad whereby a foreign education and/or
professional experience abroad and/or a high proficiency in English (and other foreign languages)
are consistently associated with both individual and societal success and progress, whereas a lack
thereof is consistently associated with failure and backwardness. The discourse and ideology of
endangerment attested elsewhere (Duchêne & Heller, 2007), then, are turned on their head so that
it is English rather than Bosnian that is endangered by low standards and whose low standards are
a danger to societal progress.
8
Moreover, a close analysis of the two metalinguistically more or
38
less explicit excerpts shows that proficiency in English, as a correlate of success, functions as a
sort of litmus test in evaluations of individuals and groups, particularly those with power and
influence such as politicians. This is corroborated by a similar discourse strategy in evidence in
article DI-04-04-2007 which highlights the hiring practices at one of the most successful privately-
owned Bosnian companies, extolling the local hiring of well-educated Bosnians with degrees from
foreign universities and a high proficiency in English (and other foreign languages). Finally, when
juxtaposed, these articles arguably suggest a nascent neoliberal discourse of global vs. local (e.g.
“seven of the 15 representatives speak only local
9
”), pitting the (attested) ineptitude of local
politicians and the public sphere they represent against the private sphere of business and the
(unexamined) success of globe-trotting English-using businessmen.
Limitations and Directions for Future Research
Despite an effort to triangulate the findings by employing multiple methodological
(keyword, collocation and exploratory factor analysis, CDA) and epistemological (quantitative and
qualitative, macro- and miscroscopic analysis) perspectives, as well as an exhaustive data set
(including all relevant articles from the given time-frame), the language-ideological profile of
English in Bosnia presented here must be understood as tentative. The primary reason for this is
that manifestations of public language-related discourses and language ideologies are not limited
to the press, but can be found throughout the public sphere. Furthermore, while the press is an
excellent source of data on dominant discourses and ideologies purveyed by the elites, it is clear
that it does not represent equally well what the linguistic anthropologist Paul Kroskrity calls
“practical consciousness” (Kroskrity, 2004: 505), i.e. the so-called ordinary people who tend to
accept and naturalize dominant ideologies. In addition to this, there are further aspects of the
language-related discourse that merit consideration such as, for example, its diachronic
39
development or its possible correlation with political affiliation, but which are not considered here.
Future research would therefore do well to consider data from other discursive sites such as various
kinds of institutional documentation, popular culture, and language-related discussions in social
media. Also, consideration of independent variables such as time and political affiliation would
provide further distinctions in our attempt to account for the totality of English language-related
discourses and ideologies in Bosnia as well as elsewhere.
1
With the possible addition of German with its colonial foothold, geographical proximity, and a roughly 100-million-
strong market.
2
Bosnian is a heavily inflectional language, which complicates corpus-linguistic analysis of material in this language.
In order to account for the totality of variation, a decision was made to include all lemma forms of all lexis considered
in this study. This means that, depending on the goals of a particular analytical technique, lemma forms were
sometimes considered as a set and sometimes separately.
3
All patterns identified using the different subtypes of collocation analysis were also subjected to concordance analysis
for triangulation purposes. However, in contrast to many other studies which rely heavily on concordance analysis
and base their discussions of findings on it, the results of concordance analysis were deemed to be insufficiently
informative to be reported here, especially because the microscopic qualitative analysis was conducted on full texts
and text excerpts.
4
The decision to include pronouns with lexical material stems from their potential importance for the construction of
in- and out-groups (us vs. them) which has been shown to be an important discursive strategy (Wodak, 2001).
5
Note that the pattern function features the full original list of 298 collocates, including function words, as these
cannot be easily removed from the concordance tool results.
6
Keyword analysis does not offer an obvious or effective way to identify a small number of representative texts for
qualitative analysis.
7
Findings such as this cast some doubt on the claims of English’s independence of the inner-circle countries often
made in the English as a lingua franca literature (e.g. Seidlhofer, 2011)
8
Arguably, both the sometimes implicit, sometimes explicit negative (self) assessments of Bosnia and Bosnians and
the reverse discourse and ideology of endangerment suggest a kind of (neo?) colonial consciousness, not unlike that
widely attested in many ‘Anglophone’ postcolonial settings (e.g. Pennycook, 1998).
9
Referring to the language spoken in Bosnia-Herzegovina as ‘local’ here has a dual function. It is an economical way
of refering to the language without using its unwieldy triple label (Bosnian/Croatian/Serbian) or running the danger
of offending someone or other by only using one of the three labels. Perhaps more importantly, it is a tongue-in-cheek
reference to Bosnian politicians, whereby the discourse of globalization and the built-in binary difference between
local and global are used to implicitly characterize them as ‘local’ and therefore third-rate.
Primary references
DŽ., A. (2009, September 3) “Ljeto u biblioteci za sve [Summer at the library for all]”. Dnevni
Avaz, p. 29.
FENA (2008, March 29) “Za dva mjeseca prvi prijevod presude MSP-a [The first translation of
the International Court of Justice’s ruling will be available in two months]”. Dnevni Avaz,
p. 10.
Filipović, N. (2003, November 4) “Zašto Walter pokušava dezavuirati književni rad Aleksandra
Hemona [Why Walter is trying to discredit the literary work of Aleksandar Hemon]”. Start,
p. 59.
Hadžić, A. (2005, November 28) “Moji studenti u Tuzli ruše sve predrasude o BiH [My students
in Tuzla are breaking down all prejudices towards Bosnia-Herzegovina]”. Dnevni Avaz, p.
11.
O., M. (2009, November 3) “Kursevi engleskog jezika za blizu 800 pripadnika OSBiH [English
language courses for nearly 800 members of the Bosnian Armed Forces]”. Oslobođenje, p.
20.
Redakcija Starta (2009, October 20) “Od lingvističkih neznalica bukvalno mi se povraća [I am
sick of language amateurs]”. Start, p. 56.
Selimbegović, V. (2007, May 4) “Asini lavovi [The lions of ASA]”. Dani, p. 32.
Sinanović, L. (2005, December 9) “Engleski ne zna nijedan od članova Predsjedništva BiH [None
of the members of the Presidency of Bosnia-Herzegovina can speak English]”. Dnevni
Avaz, p. 3.
Secondary references
Ajsic, A. (2014) Political loanwords: Postwar constitutional arrangement and the co-occurrence
tendencies of anglicisms in contemporary Bosnian. Journal of Language and Politics 13
(1), 21-50. doi: 10.1075/jlp.13.1.02ajs.
Ajsic, A. and McGroarty, M. (in press) Mapping language ideologies. In F. M. Hult and D. C.
Johnson (eds) Research Methods in Language Policy and Planning: A Practical Guide.
Wiley-Blackwell.
Buka (2013) Ustav BiH i nakon dvije decenije samo na engleskom jeziku [Even two decades after
signing the constitution of Bosnia-Herzegovina remains available in English only].
Accessed 16 February 2015. http://www.6yka.com/mobile/novost/45279.
Baker, P. (2006) Using Corpora in Discourse Analysis. London: Continuum.
Baker, P. (2010) Sociolinguistics and Corpus Linguistics. Edinburgh: Edinburgh University Press.
Baker, P., Gabrielators, C., Khosravinik, M., Krzyzanowski, M., McEnery, T. and Wodak, R.
(2008) A useful methodological synergy? Combining critical discourse analysis and corpus
linguistics to examine discourses of refugees and asylum seekers in UK press. Discourse
& Society 19 (3), 273-306.
Baker, P., Gabrielatos, C. and McEnery, T. (2013) Sketching Muslims: A corpus driven analysis
of representations around the word ‘Muslim’ in the British Press 1998-2009. Applied
Linguistics, 13 (3), 255-278.
41
Bauman, R. and Briggs, C. L. (2000) Language philosophy as language ideology: John Locke and
Johann Gottfried Herder. In P. V. Kroskrity (ed.) Language Regimes: Ideologies, Polities,
and Identities (pp. 139-204). Santa Fe, New Mexico: School of American Research Press.
Berber Sardinha, T. (1999) Using key words in text analysis: Practical aspects. Direct Papers 42,
1-9. ISSN 1413-442x.
Biber, D. (1988) Variation Across Speech and Writing. Cambridge: Cambridge University Press.
Biber, D., S. Conrad and R. Reppen (1998) Corpus Linguistics: Investigating Language Structure
and Use Cambridge: Cambridge University Press.
Blackledge, A. (2002) What sort of people can look at a chicken and think dofednod?: Language,
ideology, and nationalism in public discourse. Multilingua 21 (2/3), 197-226.
Blackledge, A. (2005) Discourse and Power in a Multilingual World. Amsterdam: John Benjamins
Publishing.
Blackledge, A. and Pavlenko, A. (eds) (2002) Language ideologies in multilingual contexts
[Special Issue]. Multilingua 21 (2/3).
Blommaert, J. (Ed.) (1999) Language Ideological Debates. Berlin, New York: Mouton de Gruyter.
Blommaert, J. (2010) The Sociolinguistics of Globalization. Cambridge: Cambridge University
Press.
Blommaert, J. and Verschueren, J. (1998) The role of language in European nationalist ideologies.
In B. B. Schieffelin, K. A. Woolard and P. V. Kroskrity (eds) Language Ideologies:
Practice and Theory (pp. 189-210). New York: Oxford University Press.
Bokhorst-Heng, W. (1999) Singapore's Speak Mandarin Campaign: Language ideological debates
in the imagining of the nation. In J. Blommaert (ed.) Language Ideological Debates (pp.
235-265). Berlin, New York: Mouton de Gruyter.
Bondi, M. and Scott, M. (eds) (2010) Keyness in Texts. Amsterdam: John Benjamins.
Canagarajah, A. S. (1999) Resisting Linguistic Imperialism in English Teaching. Oxford: Oxford
University Press.
Chen, Y. and Baker, P. (2010) Lexical bundles in L1 and L2 academic writing. Language Learning
& Technology 12 (2), 30-49.
Cheng, W. and Lam, P. W. Y. (2013) Western perceptions of Hong Kong ten years on: A corpus
driven critical discourse study. Applied Linguistics 34 (2), 173-190.
doi:10.1093/applin/ams038.
Cortes, V. and Csomay, E. (eds) (2015) Corpus-based Research in Applied Linguistics: Studies in
Honor of Doug Biber. Amsterdam: John Benjamins.
DiGiacomo, M. (1999) Language ideological debates in an Olympic city: Barcelona 1992-1996.
In J. Blommaert (ed.) Language Ideological Debates (pp. 105-142). Berlin, New York:
Mouton de Gruyter.
Eagleton, T. (1991) Ideology: An Introduction. London: Verso.
Ensslin, A. and Johnson, S. (2006) Language in the news: Investigating representations of
‘Englishness’ using WordSmith Tools. Corpora 1 (2), 153-185.
Fitzsimmons Doolan, S. (2009) Is public discourse about language policy really public discourse
about immigration? A corpus-based study. Language Policy 8 (4), 377-402.
Fitzsimmons Doolan, S. (2011) Identifying and Describing Language Ideologies Related to
Arizona Educational Language Policy (Doctoral dissertation). Available from ProQuest
Dissertations and Theses Database. (UMI No. 3467048)
Fitzsimmons Doolan, S. (2014) Using lexical variables to identify language ideologies in a policy
corpus. Corpora 9 (1), 57-82.
42
Fowler, R. (1991). Language in the News. London: Routledge.
Freake, R., Gentil, G. and Sheyholislami, J. (2011). A bilingual corpus-assisted discourse study of
the construction of nationhood and belonging in Quebec. Discourse & Society 22 (1), 21-
47.
Gal, S. (1989) Language and political economy. Annual Review of Anthropology 18, 345-367.
Graddol, D. (2006) English Next: Why Global English May Mean the End of “English as a Foreign
Language”. British Council.
Gray, B. and D. Biber (2013) Lexical frames in academic prose and conversation. International
Journal of Corpus Linguistics 18, 109-135.
Hunston, S. (2002) Corpora in Applied Linguistics. Cambridge: Cambridge University Press.
IBM (2012) Statistical Package for the Social Sciences.
Mediacentar Sarajevo (2013) Infobiro. http://www.idoconline.info/.
Irvine, J. T. (1989) When talk isn’t cheap: Language and political economy. American Ethnologist
16 (2), 248-267.
Kroskrity, P. V. (ed.) (2000) Language Regimes: Ideologies, Polities, and Identities. Santa Fe,
New Mexico: School of American Research Press.
Kroskrity, P. V. (2004) Language ideologies. In A. Duranti (ed.) A Companion to Linguistic
Anthropology (pp. 496-517). Malden, MA: Blackwell Publishing.
Kuo, S. and Nakamura, M. (2005) Translation or transformation? A case study of language and
ideology in the Taiwanese press. Discourse & Society 16 (3), 393-417.
Lippi-Green, R. (2007) English with an Accent: Language, Ideology, and Discrimination in the
United States. London: Routledge.
Mar-Molinero, C. and Stevenson, P. (eds) (2006) Language Ideologies, Policies and Practices:
Language and the Future of Europe. Basingstoke: Palgrave Macmillan.
Mautner, G. (2009) Corpora and critical discourse analysis. In P. Baker (ed.), Contemporary
Corpus Linguistics (pp. 32-46). New York: Continuum.
McEnery, T., Xiao, R. and Tono, Y. (2006). Corpus-based Language Studies: An Advanced
Resource Book. New York: Routledge.
McGroarty, M. (2008) The political matrix of linguistic ideologies. In B. Spolsky and F. M. Hult
(eds), The Handbook of Educational Linguistics (pp. 98-112). Malden, MA: Blackwell
Publishing.
McGroarty, M. (2010) Language and ideologies. In N. N. Hornberger and S. L. McKay (eds),
Sociolinguistics and Language Education (pp. 3-39). Bristol: Multilingual Matters.
Meyer, (2001) Between theory, method, and politics: Positioning of the approaches to CDA. In R.
Wodak and M. Meyer (eds) Methods of Critical Discourse Analysis. (pp. 14-32). London:
Sage.
Partington, A. (2010) Modern diachronic corpus-assisted discourse studies (MD-CADS) [Special
Issue]. Corpora 5 (2), 83-108.
Partington, A. and Morley, J. (2004) At the heart of ideology: Word and cluster/bundle frequency
in political debate. In B. Lewandowska-Tomaszczyk (ed.) Practical Applications in
Language and Computers (pp. 179-192). Frankfurt am Main: Peter Lang.
Pennycook, A. (1998) English and the Discourses of Colonialism. London: Routledge.
Phillipson, R. (1992) Linguistic Imperialism. Oxford: Oxford University Press.
Ricento, T. (ed.) (2000) Ideology, Politics and Language Policies: Focus on English. Philadelphia:
John Benjamins.
43
Ricento, T. (2003) The discursive construction of Americanism. Discourse & Society 14 (5), 611-
637.
Schieffelin, B. B., Woolard, K. A. and Kroskrity, P. V. (eds) (1998) Language Ideologies: Practice
and Theory. New York: OUP.
Scott, M. (1997) PC analysis of key words and key key words. System 25 (2), 233-245.
Scott, M. (2014) WordSmith Tools Help Manual. Version 6.0. Liverpool: Lexical Analysis
Software.
Seidlhofer, B. (2011) Understanding English as a Lingua Franca. Oxford: Oxford University Press.
Silverstein, M. (1979) Language structure and linguistic ideology. In R. Clyne, W. Hanks and C.
Hofbauer (eds) The Elements: A Parasession on Linguistic Units and Levels (pp. 193-247).
Chicago: Chicago Linguistic Society.
Spolsky, B. (2009) Language Management. Cambridge: Cambridge University Press.
Stubbs, M. (1996) Text and Corpus Analysis. London: Blackwell.
Subtirelu, N. C. (2015). “She does have an accent but…”: Race and language ideology in students’
evaluations of mathematics instructors on RateMyProfessors.com. Language in Society 44
(1), 35-62.
Tabachnick, B. G. and Fidell, L. S. (eds) (2007). Using Multivariate Statistics. Boston, MA:
Pearson Education, Inc.
Thompson, J. B. (1984) Studies in the Theory of Ideology. Cambridge: Polity Press.
van Dijk, T. A. (1998) Ideology: A Multidisciplinary Approach. London: SAGE.
van Dijk, T. A. (2006) Ideology and discourse analysis. Journal of Political Ideologies 11 (2), 115-
140.
Vessey, R. (2013a) Too much French? Not enough French?: The Vancouver Olympics and a very
Canadian language ideological debate. Multilingua 32 (5), 659-682.
Vessey, R. (2013b) Language Ideologies and Discourses of National Identity in Canadian
Newspapers: A Cross-linguistic Corpus-assisted Discourse Study. Unpublished PhD
dissertation.
Wodak, R. (2001) The discourse-historical approach. In R. Wodak and M. Meyer (eds) Methods
of Critical Discourse Analysis (pp. 63-94). London: SAGE.
Wodak, R. and Meyer, M. (eds) (2009) Methods of Critical Discourse Analysis. London: SAGE.
This research hasn't been cited in any other publications.