Content uploaded by Karen Mcneil
Author content
All content in this area was uploaded by Karen Mcneil on Jan 21, 2024
Content may be subject to copyright.
Karen McNeil*
‘We don’t speak the same language:’
language choice and identity on a Tunisian
internet forum
https://doi.org/10.1515/ijsl-2021-0126
Received December 13, 2021; accepted May 3, 2022
Abstract: The linguistic situation in the Arab world is in an important state of
transition, with the “spoken”vernaculars increasingly functioning as written
languages as well. While this fact is widely acknowledged and the subject of a
growing body of qualitative literature, there is little quantitative research detailing
the process in action. The current project examines this development as it is
occurring in Tunisia: I present the findings from a corpus study comparing the
frequency of Tunisian Arabic–Standard Arabic equivalent pairs in online forum
posts from 2010 with those from 2021. The findings show that the proportion of
Tunisian lexical items, compared to their Standard Arabic equivalents, increased
from a minority (19.7%) to a majority (69.9%) over this period. At the same time,
metalinguistic comments on the forum reveal that, although its status is still
contentious, Tunisian has become unmarked as a written language. These changes
can be attributed to major developments in Tunisian society over the period of
study –including internet access and the 2011 revolution. These findings suggest
destabilization of the diglossic language situation in Tunisia and a privileging of
national identity vis-à-vis the rest of the Arab world.
Keywords: colloquial; dialects; diglossia; identity; Tunisian Arabic
1 Introduction
In these pages nearly 20 years ago, Walters (2003: 101–102) observed that, due to
increased literacy, Arabic diglossia in Tunisia was being “reconfigured”: learned
vocabulary from the “High”variety (Standard Arabic/MSA) was increasingly used
in educated speech, while opportunities for writing the “Low”variety (Tunisian
dialect) were “expanding.”Likewise, Belnap and Bishop (2003: 21) showed that
*Corresponding author: Karen McNeil, Arabic and Islamic Studies, Georgetown University,
Washington, DC, USA, E-mail: km1542@georgetown.edu. https://orcid.org/0000-0002-4875-
1124
IJSL 2022; 278: 51–80
increased literacy in the Arab world had led to new styles of informal writing. For
some young Arabs, “unmitigated MSA”had come to be seen as “too formal for
personal correspondence with peers”and they instead chose forms that were
closer to the vernacular. This was a significant change from the previous genera-
tion or two, for whom such writing “meant running the risk of being taken for a
semiliterate”(Belnap and Bishop 2003: 21).
Some of this informal correspondence was taking place on the then-nascent
internet, where Belnap and Bishop predicted that the immediacy of online
communication “may serve to further erode the spoken/written distinction and
result in even more [colloquial Arabic] being used in the written mode”(2003:
19). This prediction has proved prescient: the slow growth in literacy rates over
the last half century has been followed by the rapid growth of internet access in
this century. The same period has seen an expansion in vernacular writing, both
online and in print, throughout the Arab world. Though it is the subject of a
growing body of literature, quantitative data describing this expansion is sparse
and most existing literature has focusedonEgyptandMorocco(e.g.Aboelezz
2012; Caubet 2017a; Elinson 2013; Høigilt and Mejdell 2017; Kindt et al. 2016;
Mejdell 2006; Miller 2017).
The current study aims to provide a window on this ongoing change by pre-
senting evidence from Tunisia over the span of a decade. To do this, I compare a
corpus of Tunisian internet forum posts from 2010 with one of 2021 posts collected
from the same site. My analysis shows that the proportion of writing in Tunisian
Arabic, rather than Standard Arabic, has dramatically increased within this 11-year
period. I will argue that this change can be attributed to the assertion of a Tunisian
identity –indexed by Tunisian Arabic –rather than a pan-Arab Islamic identity.
This paper is divided into three main sections. I begin with an overview of the
language situation in the Arab world in general and Tunisia in particular, along
with the importance of language in Arab identity. I then describe the corpus,
composed of hundreds of thousands of posts on the online forum TunisiaSat from
2010 to 2021, and the methods used to quantify the proportion of Tunisian Arabic
writing and to assess language attitudes in these posts. Finally, I present and
discuss the results of the study, before concluding with some implications for the
linguistic situation in Tunisia and the Arab world.
2 Language in Tunisia and the Arab world
2.1 Diglossia
The term diglossia was coined by Charles Ferguson in a 1959 article of that title,
based on the French term diglossie that William Marçais had used to describe the
52 McNeil
linguistic situation in North Africa. Ferguson described a “particular kind of
standardization where two varieties of a language exist side by side …each having
adefinite role to play”(1959: 325). For Ferguson, this “definite role”or “speciali-
zation of function”was the defining characteristic of diglossic language situations:
“In one set of circumstances only [High] is appropriate and in another only [Low],
with the two sets overlapping only very slightly”(1959: 328).
Arabic was one of the defining cases discussed by Ferguson and is, in many
respects, viewed as a prototypical example of Fergusonian diglossia (Snow 2013).
The High language throughout the Arab world is Standard Arabic, referred to by its
speakers as al-fusʕħaː‘the most elegant’or simply al-ʕarabiyya –‘Arabic’. Standard
Arabic is the official language and the language of education in all Arab countries;
the few calls to include any vernacular elements in school curriculum have been
subject to virulent public criticism (Suleiman and Abdelhay 2020). There are strong
ideologies associated with the different varieties: Standard Arabic is believed to be
the “real”language, with the vernaculars viewed as flawed deviations from it (see
Bassiouney 2020: 233; Høigilt and Mejdell 2017).
The vernacular languages are quite different from Standard Arabic, as well as
from each other. To give a typical example from Tunisian radio: one day the host
was discussing a popular Egyptian Arabic novel named ʕayza ʔatgawwiz ‘I want to
get married.’The sequel of that novel had recently been released and was titled miʃ
ʕayza ʔatgawwiz ‘I don’t want to get married.’The host, for emphasis, translated
the title into Tunisian Arabic: ma-nħɛbb-ʃn-ʕarris. These varieties clearly differ in
all linguistic aspects from each other, as well as from Standard Arabic, as seen in
example (1).
(1a) miʃʕayz-a ʔa-tgawwiz (Egyptian)
NEG wanting-FI-get.married
(1b) ma n-ħɛbb-ʃn-ʕarris (Tunisian)
NEG I-want-NEG I-get.married
(1c) laːʔ-uriːd-u ʔan ʔa-tazawwaj-a (Standard Arabic)
NEG I-want-IND that I-get.married-SBJV
‘I don’t want to get married’
Note here the Tunisian two-part negation –ama before the verb and ʃfollowing –
which is distinctive of North African varieties of Arabic.
1
Tunisian also differs from
1It should be noted that Egyptian Arabic is also one of the varieties that uses this two-part
negation; it just happens that “I want”is expressed with a participle rather than a finite verb in
Egyptian.
Tunisian internet forum 53
the other two varieties in the first-person verb conjugation (n-) as well as in the
word it uses for “to get married.”It is clear that (in the terminology of Kloss 1967)
there is significant abstand (distance) between these varieties. Their speakers
consider them one “language”however, because the vernaculars are lacking
ausbau: they have not been standardized and are generally not written.
Although there are few signs of standardization, the vernaculars have been
making strides into the written domain. Literature written in vernacular Arabic has
begun to appear in countries like Morocco (Caubet 2017b; Elinson 2013; Miller 2017)
and Tunisia (McNeil 2023, Achour this issue); previously only Egypt (and, to a lesser
extent, Lebanon) had a history of vernacular literature (Davies 2006; Doss and
Davies 2013; Mejdell 2006; Rosenbaum 2011). The biggest expansion of vernacular
writing, however, has been online (Achour Kallel 2016; Caubet 2004, 2012, 2017a;
Daoudi 2011). In a recent survey of language attitudes and practices in Cairo and
Rabat, participants reported using mostly vernacular Arabic (or vernacular and
French in Rabat) online (Kebede et al. 2013; Kebede and Kindt 2016). Other than
this survey, only a few studies have quantified Arabic language choice online
(Al-Khatib and Sabbah 2008; Khalil 2018; Nordenson 2017; Warschauer et al.
2002). Many more have studied the phenomenon of Arabic language written in
Latin letters, often termed Arabizi (e.g. Allehaiby 2013; Palfreyman and Al Khalil
2007; Yaghan 2008). Though the studies available are geographically limited
(most cover either Egypt or Morocco), they suggest that the written use of
vernacular Arabic has greatly increased, led in no small part by its use online.
Some scholars have even suggested that the recent developments indicate that
diglossia in the Arab world is breaking down (Alkhamees et al. 2019). Whether that
is the case or not, the status of Arabic vernaculars as written languages is clearly
changing.
2.2 Tunisia and Tunisian Arabic
Tunisia is a small North African country (roughly the size of the US state of
Georgia), bordered by Algeria to the west, Libya to the southeast, with the Medi-
terranean Sea to the north and east. The “Low”language variety spoken in Tunisia
is Tunisian Arabic, called dɛːrja (‘common’)ortuːnsi (‘Tunisian’) by its speakers. It
is similar to the dialects of eastern Algerian and western Libyan Arabic (see Gibson
2009; Ritt-Benmimoun 2014; Sayahi 2011b for a description of its features). Arabic
was introduced to Tunisia by the Islamic armies in the 7th century and, over
multiple conquests, eventually replaced the native Berber (Amazighi) and Late
Latin which were the dominant languages in North Africa at the time (Sayahi 2014).
Tunisian Arabic is spoken by the nearly 12 million residents of Tunisia, as well as
54 McNeil
over a million Tunisians abroad. The dialect of the capital, Tunis, functions as a de
facto standard (Gibson 2013; Walters 2003). This standard is mostly a spoken one:
it is used in Tunisian movies and television series, for example, but most books,
newspapers, and other formal genres are still written in Standard Arabic.
A generation ago, Tunisian Arabic writing was even rarer. In 2003, Walters
noted an expansion of the “limited”writing opportunities, such as increased
visibility of the language in billboards and print advertisements (101). He ascribes
much of this change to improved literacy: whereas, at the time of Ferguson’s
writing, youth literacy in Tunisia was only 31% (49% for males and 12% for
females, Walters 2003: 84–86), youth literacy today is nearly universal at 96% and
the gender gap has closed to less than 1%. Widespread literacy tends to destabilize
diglossia because, when ordinary people write to each other about ordinary topics,
a genre of informal writing is created. The standard language is, by definition, a
formal language, so writing informally to personal intimates in the standard
language can feel awkward (Belnap and Bishop 2003). In other words, when
literacy increases, opportunities to write in the “spoken”language arise.
Since the time of Walters’writing, Tunisian Arabic has continued to spread
into domains that were previously limited to Standard Arabic. Although most
books, newspapers, and other printed material is still written in Standard Arabic,
there is an increasing trend of fiction writing in Tunisian Arabic (McNeil 2023,
Achour this issue). This written expansion is part of an overall growth of Tunisian
Arabic in formal spheres over the past couple decades, including radio and tele-
vision broadcasts (Achour Kallel 2011; Daoud 2011a), classrooms (Bach Baoueb
and Toumi 2012), and mosques (Sayahi 2014). Since the 2011 revolution and
democratic transition, a limited number of government communications have also
been produced in Tunisian Arabic (Achour Kallel 2015; McNeil 2023; Mejri 2017;
Sayahi 2019).
The spreading domains of Tunisian Arabic can be ascribed not only to literacy
but also to the internet and mobile phones: 64% of the population in 2018 had
access to the internet, and there were 124 mobile phones in Tunisia for every 100
people.
2
Coulmas (2013: 131) notes that the “quasi-orality”of new media like chat
and SMS encourages a more spoken register in online writing; in Arabic this has
meant that such messages are often written in the vernacular. Yasir Suleiman
wrote in 2004 that the use of Arabic “dialects”in writing “is resisted because it
breaks what is in effect a “cultural taboo”whose ideological validity is sanctioned
by the tradition and historical practice”(2004: 72). When mobile phones appeared,
2https://data.worldbank.org/indicator/SE.ADT.1524.LT.ZS?locations=TN; https://data.worldbank.
org/indicator/IT.NET.USER.ZS?end=2019&locations=TN.
Tunisian internet forum 55
texting in vernacular seemed natural because texting serves a similar communi-
cative function as speech. The early chat rooms on the internet were the same, as
was Facebook and Twitter. By the time that online genres, like blogs, that were
more like traditional writing appeared, the “taboo”that Suleiman described had
already been greatly weakened by all of this online writing.
2.3 Language and identity
Tunisian Arabic is a variety of MAGHREBI, or North African, Arabic. There is an
unequal relationship of power and prestige between Maghrebi and MASHREQI Arabic
(varieties of the Middle East including Egypt). The Middle East (especially Egypt
and Lebanon) has traditionally dominated Arab popular culture, so North Africans
easily understand Mashreqi varieties of Arabic, but the reverse is not true (Hachimi
2013; S’hiri 2003). Middle Eastern Arabs claim to not be able to understand North
Africans, even going so far as to assert that what they are speaking is “not Arabic.”
When two speakers from the different regions meet, the communicative burden is
on the North African to alter their language to be understood; the Middle Easterner
will make no such attempt at accommodation (S’hiri 2003). Although this antip-
athy has long existed, the new international media has brought speakers from
different countries into contact like never before and heightened tensions
(Hachimi 2013).
The status of Tunisia in the Arab world underwent a change after the 2011
Jasmine Revolution, when mass protest forced the longtime dictator of Tunisia to
step down and sparked the “Arab Spring.”Although similar protests spread
throughout the region, they resulted only in a continuation of the status quo
(Morocco and the Gulf), replacement of one dictator for another (Egypt), or ruinous
civil war (Syria, Libya). After long being considered a backwater of the Arab world,
Tunisia was suddenly “a beacon of hope”(Hellyer 2015) and the “Arab Anomaly.”
The latter is the title of a book, in which the Jordanian-American author concludes:
“there is something unique and special about Tunisia that is missing in the rest of
the Arab world, to which Tunisia both belongs and does not”(Masri 2017: xxvi).
The major slogan of the Jasmine Revolution was in Standard Arabic: aʃ-ʃaːb
yuriːd isqaːtˁan-niðˤaːm‘the people want to bring down the regime’.
3
Historically,
Standard Arabic has been associated with pan-Arab identity, while vernacular
Arabic has been a prominent feature of territorial nationalism (Suleiman 2004).
This kind of social identity, in the framework of Bucholtz and Hall (2004), is
3This is a reference to the first line of a 1933 poem by Tunisian poet Aboul-Qasim Echebbi: “If, one
day, the people want to live, then fate will answer their call.”
56 McNeil
constructed around sameness and difference: emphasizing similarities between
oneself and the other people in a group, as well as accentuating one’s differences
from people of other groups. This is instantiated through the semiotic processes of
practice,indexicality, ideology, and performance (Bucholtz and Hall 2004: 370).
Tunisian Arabic acts as an index of national identity, in relation to both Standard
Arabic and other vernaculars, while the forum studied here is a locus of the
practice, ideology, and performance in which this identity is built. We will see how
Tunisians on the forum construct a post-revolution national identity through their
use and promotion of their vernacular, often in opposition to “Arabs”who are
constructed as an Other.
Tunisians are also constructing this identity in relation to French –and both
the domestic elites and the foreign colonizers which it indexes. Many common
words in Tunisian Arabic were originally borrowings from French and some have
been adapted to Tunisian phonology, e.g. blaːsˁa‘place’> Fr. place (Sayahi 2011a:
128). Code-switching is also common in spoken Tunisian –this contributes to the
widespread, though mistaken, impression of Middle Eastern Arabs that the
Maghrebi varieties are more French than Arabic. The French language itself is
associated both negatively with colonialism, and positively with modernity: “the
more conservative sectors of the population tend to perceive it as a threat to the
Arabo-Islamic identity, while the more progressive sectors consider it a necessary
tool for scientific and technological advancement”(Sayahi 2014: 42). In the post-
independence period, competence in French increased –due to expanded access
to education –and French became the language of the Tunisian political and social
elite (Sayahi 2014: 31–32). There are indications, however, that the younger
generations have less competence in French, compared with their parents (Daoud
2011b). French is also increasingly in competition with English, which is not
associated with a colonial power in Tunisia and is perceived as more important
internationally (Sayahi 2014: 51).
2.4 Online language use in Tunisia: an example post
To illustrate the TunisiaSat forum and the character of online Tunisian writing, I
will discuss two samples from 2010 (Figures 1 and 2). These posts are characteristic
of the forum and show the multilingual nature of Tunisia: rather than using one
single language, these posts instead resemble the diglossic switching often seen in
speech (Boussofara-Omar 2006; Sayahi 2014). In Figure 1, the thread originator
had asked what household items, clothes, and other necessities a bride needs for
her trousseau. In the replies, several commentators supplied links to existing lists
and offered opinions of the various resources.
Tunisian internet forum 57
Figure 1: An example of a Tunisia-Sat.com post. The quoted text is in a mixed variety that leans
towards standard, and the reply is written in a mixed variety that leans towards Tunisian.
Figure 2: A thread from TunisiaSat, with text in romanized Tunisian with French code-switching
(top), French (middle), and Tunisian Arabic (bottom).
58 McNeil
The post in Figure 1 is a response to one of those links. It begins with a
quotation of a previous reply –the smaller text in the gray box. This reply provides
a link to additional items that the writer describes as being beyond “the basics.”
Though it contains some vernacular features, its lexicon and syntax are largely
Standard Arabic. The main text of the post, however, is more markedly Tunisian. In
the following transcription, words that are unambiguously Tunisian Arabic are
underlined, Standard Arabic is in boldface, and indeterminate words are in plain
roman font.
4
Extralingual elements are noted in chevrons.
(1) ⟨tophat-raising emoji⟩
w illi muʃmin baːrdo ʃ-ya-ʕmel ???
and who not from Bardo what-he-does ???
‘And what are you supposed to do if you’re not from Bardo???’
[= if you’re not rich]
⟨light blue font
maʃkuːra ʕa-l-ʔidˤaːfa (wa law ʔanna-haːʔiʃhaːr
thanked.Ffor-the-addition (and if that-it ad
⟩
tu-ʕtabar)
it-is.considered)
‘thanks for your addition (even if it’s something of an advertisement)’
ʔaxuː-k fiːallah
brother-your in God
‘your brother in Islam’
The first line here –willi muʃmin baːrdo ʃyaʕmel??? –is emotionally emphatic, as
expressed through the preceding emoji, the multiple question marks, and the
language choice, which is entirely Tunisian Arabic. The message the writer is
conveying is that the items the previous post linked to are unaffordable for anyone
not from Bardo (an upper-class district in the capital Tunis), in other words, for
most Tunisians. The use of Tunisian Arabic lends emotional force because it makes
the written statement evoke the spoken exclamation, or how he would say this
spontaneously if they were speaking face to face. By contrast, the writer lessens the
emotional impact of his mild admonishment of the poster (that her link was like an
advertisement) by using Standard Arabic. He further softens it by putting it in
parenthesis and in a lighter-colored font than the rest of the post.
4“Indeterminate”includes words used in both Tunisian and Standard Arabic. Often, the pro-
nunciation would identify the word as one or the other, but because the Arabic script does not
include short vowels, such differences are often obscured in writing. Ambiguous words are
transcribed here with Standard Arabic vowels, for convenience.
Tunisian internet forum 59
Many posts on this forum are written exclusively in Standard Arabic or
vernacular Arabic, but mixtures of both varieties, as in Figure 1 here, are also
common. In addition, we also see posts like those in Figure 2. In this thread, we
have a post written entirely in French (middle), one written in Tunisian Arabic
(bottom), and, at the top, a post written in Tunisian Arabic (with French code-
switching) but in romanized Arabic script (Arabizi). These two examples show
that, even in 2010, Tunisia had already drifted far from the Fergusonian model, in
which specific language varieties are associated with specific domains. In these
two posts we have one domain –an asynchronous conversation between people
who do not know each other personally –with four different language varieties
represented. Clearly, whatever is prompting the variant codes is not one simply of
context or domain.
Romanization like that in Figure 2 used to be quite common: in the early days
of the internet, digital technology was only available in ASCII (the basic set of Latin
characters used for English). As the technology matured and became more wide-
spread, however, native Arabic interfaces and keyboards became available. Some
users, however, continued writing in Arabizi, which carries in-group prestige
among young internet users (Alkhamees et al. 2019).
There is some indication that Tunisians (and North Africans in general) may
have been slower to switch to using Arabic script online than Middle Eastern
internet users, due to the dominance of French in the formal sphere. The extent to
which Tunisians, and Arabs in general, still write in Standard Arabic, French, and
Arabizi online is not clear from the existing research, however. Sayahi’s (2014)
study of a Tunisian soccer forum found that half of the forum’s posts were written
in French, likely because the language of the forum interface and administration
was French (2014: 43). The other half were written in Tunisian Arabic, but nearly
entirely in romanized form, which the author attributes to the fact that “it is hard to
find an Arabic keyboard in a public internet space”(Sayahi 2014: 112–118). Another
study from the same year found that 43% of a corpus of Tunisian SMS and online
writing was made up of Tunisian Arabic written in Latin characters, while only 25%
was Arabic, either Standard or Tunisian, written in Arabic script (Younes and
Souissi 2014). A more recent study of Tunisian Facebook posts, however, found
that 61% of posts were Tunisian Arabic written in Arabic script while only 10.8%
were romanized Tunisian Arabic; in addition, only 21.1% of posts were in MSA, and
7% in French (Kashina 2020). It is not clear whether Tunisians’language use online
has changed over the past several years, or if these studies are just not comparable,
due to their differing methods and corpora. The current study was designed to
resolve this confusion by explicitly comparing Tunisians’usage of different
language varieties online and how that usage has changed over time.
60 McNeil
3 Tunisian Arabic online: corpus and methods
3.1 The TunisiaSat corpus
The corpus data for this study comes from the website TunisiaSat (tunisia-sat.
com), which describes itself as the largest community of Tunisians online and
consists of forums on topics such as sports, news, religion, and family life. The
most popular boards have hundreds of thousands of threads and participants in
the millions. While anyone can read the forums, users must have an account and
log in to post to them; all the posts are associated therefore with a username. The
user pages do not provide any biographical information such as gender or location:
the only available information about a user is their username, how long they have
been active on the forum, and the posts that they have contributed.
From the topics and language used on the site, participation appears to be
limited to speakers of Tunisian Arabic: in other words, I saw little evidence of other
Arabic vernaculars on the forums. As noted above, approximately one million
speakers of Tunisian Arabic reside outside of Tunisia: it is not uncommon for
Tunisians to spend several years in another Arab country, especially in the Gulf, to
work. Also, in recent years migration to Europe (including clandestine migration)
has increased. The number of speakers abroad, however, are dwarfed by the
number –almost 12 million –residing in Tunisia. In addition, Tunisia itself is
highly homogenous and gets little immigration. For these reasons, the participants
on the board can be assumed in most cases to be native speakers of Tunisian Arabic
residing in Tunisia.
The corpus collection and analysis were done with scripts I wrote using the
programming language Python. To build the corpus I programmed a web-crawler to
scrape TunisiaSat posts from 2010 to 2021. After collection, the corpus consisted of
just over 16 million words for eachyear, 32.5 million words total. The text of each post
was normalized as is usual in Arabic textual processing (see Habash 2010) and its
associated metadata was collected. This metadata includes the thread ID, post ID,
date, the username of the author, and the forum category (e.g. “Sports”). The News
and Sports forums’posts collected outnumber all other categories combined in both
2010 and 2021. Many other popular forums on the site are related to technology: it
appears that TunisiaSat began as a forum to discuss satellite television technology
(hence the name), before expanding into general interest forums.
3.2 Analyzing language choice on the forum
As we saw in the samples discussed in Section 2.4, many posts on the forum are
unlikely to be solely in one language variety or another. In addition, many words
Tunisian internet forum 61
are shared between the two varieties or are ambiguous, due to the lack of short
vowels in the script. For this reason, it is impossible to count the number of posts
written “in Tunisian.”This study instead compares a discrete list of highly salient
Tunisian words with their equivalents in Standard Arabic, Arabizi, and French. In
the example post in Figure 1, for example, the occurrence of the Tunisian negator
muʃ(‘is not’) would be added to the total occurrences of muʃacross all posts and
compared with the occurrences of the Standard Arabic equivalent laysa (‘is not’)as
well as the Arabizi mouch and the French pas, along with their orthographic and
inflectional variants. In this way, language choice across the site was estimated
without having to categorize individual posts as one variety or another.
This method is innovative: I am not aware of another study (certainly not on
Arabic) that compares varieties systematically in this way. The objection could be
raised, however, that it may be misleading: perhaps the counted word was the only
Tunisian Arabic word in an otherwise Standard Arabic post. The salience of the
chosen words, however, is key: most of the words chosen for comparison are
function words, such as tense markers and question particles. In the code-
switching framework of Myers-Scotton (1995), the language variety of the gram-
matical markers in a sentence is the matrix language of the sentence, which
speakers will identify as “the language”of an utterance, regardless of the source of
the content words (Myers-Scotton 1995). All of the Tunisian Arabic words chosen
for comparison (even those that are not grammatical markers) are also highly
salient, e.g. barʃa‘a lot.’Research on written vernacular Arabic has shown that
readers will consider a text to be “in vernacular”even if it contains only a few
highly salient vernacular words (Elinson 2013: n. 20; Miller 2017: 106). In
comparing the occurrence of salient words, then, I am identifying texts which
writers would have considered vernacular as they were writing, and that would
have been received as vernacular by their readers.
In choosing the Tunisian–Standard word pairs, I began with a list of the 500
most frequent words from the Tunisian Arabic Corpus (McNeil 2019). I then chose a
subset based on the following criteria: 1) The Tunisian word has a Standard Arabic
equivalent and is unambiguously distinct from it; and 2) the Tunisian word (and
any of its orthographic variants) does not coincide with any other Standard Arabic
word, and vice versa. In addition, each of the words needed to be independent: it
was not possible to search for affixes (like the future tense marker sa- or the
negation particle -ʃ,for example).
5
Following this process, a total of 30 words were
chosen and their French and Arabizi equivalents were added. For example, the
5This is because the corpus was plain text, with no morphological parsing or other processing. So
the search algorithm had no way to distinguish between the future particle sa-, for example, and a
word that simply began with s-.
62 McNeil
terms for ‘a little’would be: ʃwayya ⟨ﺷﻮﻳﺎ ⟩(Tunisian) ∼chwaya (Arabizi) ∼qaliːl(an)
⟨ﻗﻠﻴﻼ ⟩(Standard Arabic) ∼un peu (French). In addition to the base form of words,
however, common inflectional and orthographic variants were included: a
complete list of terms and their variants is in Appendix A.
To calculate the relative proportion of language varieties, I counted the
frequency of each word in each forum post and summed them. This raw frequency
was then converted into a normalized frequency of occurrence per one million
words, which both makes the numbers easier to interpret and directly comparable
between the two corpora. I then calculated the average (median) frequency of
words of each language variety for each year.
For insight into the ideologies behind these changes, I conducted a qualitative
analysis of metalinguisitc discussion on the forum. I selected two recent threads
from TunisiaSat: the first post discussed the status of French in Tunisia, while the
second dealt with the status of Tunisian Arabic. Though discussions of this kind
were rare on the forum, these two were quite active, running to 20 pages of replies
before being closed. I analyzed and categorized the arguments made on these
threads to ascertain the language ideologies of the forum users, and to what extent
they explicitly associated the various language varieties available to them with
distinct identities.
4 Results: a large increase in Tunisian Arabic
4.1 Quantitative results
The major finding of this study is that the proportion of writing on the forum in
Tunisian Arabic, relative to the other language varieties, vastly increased between
2010 and 2021. In 2010, Tunisian Arabic figured in a minority of the posts on the
website –less than a fifth of usage –whereas the bulk of the forum used Standard
Arabic, with a significant amount of French and a small contribution of Arabizi. By
2021, however, Tunisian Arabic had become the dominant language of the site.
Although the proportion of Standard Arabic decreased, much of the growth of
Tunisian Arabic was not at the expense of Standard, but rather at the expense of
French and Arabizi. The proportion of the French equivalents used fell to just a
sliver, and Arabizi disappeared almost entirely.
Table 1 shows the normalized frequency of the ten most common equivalent
terms, in all four language varieties –the frequency for all 30 equivalent terms is
given in Appendix B. The “Tunisian proportion”column gives the percentage of
time that the Tunisian word is used, among all instances of equivalent words. For
example, in the first row we see that the Tunisian relative pronoun illi ⟨ﺍﻟﻲ ⟩(‘that,
Tunisian internet forum 63
what’) had a frequency of 1,074 per million in the 2010 corpus, while the frequency
of its Standard Arabic equivalent allaːðiː⟨ﺍﻟﺬﻱ ⟩was 5,154 per million. In addition,
the romanized form ⟨illi⟩had a frequency of 196 per million, and the French ⟨qui⟩
appeared at 1,426 per million, outnumbering its Tunisian counterpart. This means
that, when forum users wrote the relative pronoun in 2010, they used the Tunisian
term less than 14% of the time.
By contrast, in the 2021 corpus the Standard Arabic term allaːðiːwas used only
3,389 times, compared to 5,679 instances of Tunisian illi. French had decreased to
only 92 per million, and Arabizi was almost unattested at only 3 per million. Writers
on the forum used the Tunisian term 62% of the time. This term is typical, as we can
see when we look at the median frequency of all terms (Figure 3): a large increase in
Tunisian Arabic, a modest decrease in Standard Arabic, and a large decrease in
both Arabizi and French.
We see this pattern in almost all the equivalent terms. In 2010, 29 out of 30
Tunisian terms are the minority choice; the lowest relative frequency of a Tunisian
word in 2010 was 1.8% and the highest was 53.0%. By 2021, however, this pattern
had been reversed (Figure 4). For 25 out of 30 terms, it is the Tunisian word that is
used more frequently. In none of the pairs did the relative frequency of the Tuni-
sian word decrease: every Tunisian word was used more frequently (compared to
its Standard Arabic equivalent) in 2021 than in 2010. The lowest relative frequency
for a Tunisian word in 2021 was 16.5% and the highest was 95.8%.
It is worth noting the heterogeneity of the relative frequencies in the results.
For example, in 2010 the Tunisian word ɣaːdi was used for ‘there’only 1.8% of the
time, whereas its Standard Arabic cognate hunaːka was used 97% of the time. At
the other extreme, Tunisian bɛːʃ (‘will’) was used 53.0% percent of the time even in
2010, compared with just 22.6% for its Standard Arabic cognate sawfa. These
discrepancies are because there is rarely perfect semantic and pragmatic overlap
between the equivalent pairs. Tunisian ɣaːdi, for example, is used only as an
adverb of place (‘[over] there’), whereas hunaːka, in addition to this use, is one of
the main existential particles in Standard Arabic (‘there is’). Likewise, Tunisian bɛːʃ
(‘will’) is always written independently, whereas the Standard Arabic sawfa has a
prefixed form (sa–) that could not be included in its count (since it was not possible
to search for affixes), making it appear less common than it was.
Overall, such differences balance out throughout the word list. We can see this
by looking at the equivalent pairs that do have close semantic fields, like the
question particle ‘how much’(qaddɛːʃ in Tunisian, kam in Standard) and the
adverb ‘always’(diːma,daːʔiman). The 2010 proportion of Tunisian use for both
these words (19.2 and 17.9%) are close to each other as well as to the overall mean
of 19.7% for 2010. Likewise, their relative frequencies in 2021 (68.0 and 68.1%) are
similar both to each other and to the 2021 median of 69.9%. This internal coherence
64 McNeil
Table :Normalized frequency of equivalent terms, and .Occurrences per million words for most frequent terms –full data in Appendix B.
Word Tunisian Arabic
freq
Standard Arabic
freq
Arabizi frequency French frequency Tunisian proportion
TunAr ∼StAr
what (rel pron)
illi∼allaːðiː
, , , , , .%.%
will
bɛːʃ∼sawfa
, .%.%
(is) not
muːʃ∼laysa
, , , .%.%
this (f.)
haːði∼haːðahi
, , .%.%
very, a lot
barʃa∼kaθiːr(an)
, .%.%
now
tawwa∼alʔaːn
, .%.%
brother
xuː∼ʔax
, .%.%
that (dem pron)
haːka∼ðaːlika
, , .%.%
what (interrog)
ʃnuwwa∼maːða
, .%.%
like
kiːma∼miθla
.%.%
Tunisian internet forum 65
suggests that, despite the wide variance in relative frequencies, the aggregated
results do accurately reflect language use on the site.
The increase in Tunisian Arabic was not uniformly distributed among the
different forums on the website. The largest increases were in the Family Life and
General forums, where the Tunisian proportion increased to almost 80%, while
many of the technical forums saw little increase. Whether large or small, however,
all forums saw an increase of some size during the period of the study.
If this sample is taken to be representative of writing on the forum, it suggests
that the proportion of writing in Tunisian Arabic increased from less than one-fifth
Figure 4: Proportion of each language variety for the fifteen most frequent equivalent terms. Full
data given in Appendix B.
Figure 3: Median frequency (per million words) of equivalent terms, 2010 and 2021.
66 McNeil
(19.7%) in 2010 to over two-thirds (69.9%) in 2021 (Figure 5). Much of this growth
came at the expense of Standard Arabic, the proportion of which decreased from
51.9% in 2010 to 28.5% in 2021. Equally important, however, was the almost
complete disappearance of French and Arabizi on the site. While French had made
up a significant portion of the 2010 corpus (23.1%), it had dwindled to just 1.5% by
2021. Contrary to the data of prior studies, Arabizi represented just 5.2% of written
tokens even in 2010 –by 2021 it had, for all intents and purposes, disappeared.
4.2 Qualitative results
The qualitative analysis was of two recent metalinguistic threads on TunisiaSat, in
which users were debating the place of various language varieties in Tunisia. These
kinds of language debates can provide insight into the language ideologies in a
society –they are when prevailing language conflicts are brought out into the open
(Blommaert 1999). These debates are, at heart, political rather than linguistic
debates: the conflicting views about language in Tunisia represent conflicting
views of the kind of country Tunisia is or should be.
In the first post, a user complains about receiving a government letter written
in French and titles his post, “For the love of God, speak to us in Arabic!”
6
The
poster (writing in Tunisian Arabic) wonders sarcastically if there are French
departments that send out letters to the French citizens written in Arabic, and asks
Figure 5: Proportion of each language variety in the forum corpus, 2010 and 2021.
6https://www.tunisia-sat.com/forums/threads/4134011/.
Tunisian internet forum 67
“who exactly does our country belong to?”(l-blaːdhaːði tbaːʕ ʃkuːn biðˁ-ðˁabt?). He
later edits his post to specify that he’s not interested in answers like “because we’re
still colonized by France:”he wants the “real”reason that is hidden. This suggests
that he views the place of French in Tunisian society to be a subject of contention
between different groups of Tunisians, unrelated to France itself.
The replies reflect an overall negative opinion about the place of French in
Tunisian life. Although there were some who made statements like “French is the
best language in the world,”they were in the minority and received responses like
“ROFL”emojis. The majority opinion shared the original poster’s annoyance with
the use of French in official communications, offering examples of their own that
they were “fed up”with. One observed that most foreign embassies in Tunisia post
in French on Facebook and that this “shows a lack of respect for Tunisia.”Another
described a visit of Turkish president Erdogan (who only speaks Turkish) to
Tunisia: Erdogan greeted the Tunisian delegation with a couple of memorized
phrases in Standard Arabic –the Tunisians responded in French.
Participants also extended the criticism to the prevalent use of French in the
media. One writer said that his mother could no longer watch Tunisian cooking
shows and had started to watch Algerian ones, where they spoke in understand-
able Arabic without all the “showing off”in French. The same writer protested the
association of French with the cultured elite, saying “How is it that I am highly-
educated, yet speak Arabic? It can’t be! How scandalous!!”
Most responses empathized with the original poster’s frustrations but did not
offer an explanation for the phenomenon. Those who did generally explained the
conflict as a generational one:
This is the French bourgeois mentality that’s deep rooted in the older generation. I’ll give you
an example …There was this old movie from 1988 on TV yesterday …The [Tunisian] main
character goes to a cafe in Paris, where he meets this Algerian guy who owns it. And they sit
there speaking French to each other, with only a word or two of Arabic …To me it was just
further confirmation of the inferiority complex that the old generation has towards France.
This comment was written in Tunisian Arabic, except for the English “inferiority
complex.”Recall that the younger generation (presumably highly represented on
this forum), has had less immersion in French than their parents’generation, and
are considered by the older generation to be lacking in their French skills (Daoud
2011b). Here they are turning that criticism around –it is their parents’generation
that has the problem –and positioning French as unimportant and in fact obtru-
sive in Tunisian life. They have also had much more opportunity to use Tunisian
Arabic in formal contexts (like writing) compared to their parents’generation; in
this way they have the option of a different kind of national identity, represented
by vernacular, which is no longer dependent on France.
68 McNeil
While the participants in this discussion generally used the term “Arabic”
without specifying which variety they meant, there was some discussion of the
tension between the Standard and the vernacular. More than one poster made
comments to the effect of “Well the president speaks Arabic, and no one un-
derstands him”(referring to the highly classical style of Standard Arabic employed
by President Kais Saied). Another poster started off defending the use of French in
Tunisia, before pivoting to the status of Tunisian Arabic:
Openness to languages is a good thing. Proficiency in Arabic, English and French together not
only benefits you in your work life, but also in your level of culture and your view of things ….
Whatever the language, the important thing is using it well and to use it correctly. That’s the
problem with Tunisians, many of them do not excel in even one of these languages. And the
common language dɛːrja is not a language [ma-hiːyaː-ʃluɣa] at all but a dialect. It’sfit for
expressing emotions for example but is not at all capable of expressing thought (except of
course for superficial thoughts).
It is notable that this post is largely written in Tunisian Arabic, with some higher
level words borrowed from Standard Arabic. The writer apparently sees no
contradiction between writing in Tunisian Arabic while saying that Tunisian
Arabic is incapable of expressing complex thought.
These ideologies around Tunisian Arabic are the subject of the second post, in
which the originating poster asks: “What do you think about the Tunisian
language in Latin letters?”
7
This title is provocative in both its use of the term
“Tunisian language”and its promotion of romanization. The idea of writing in
Latin script was not popular and few people engaged with it; most of the thread
focused instead on the status of the vernacular. All the replies can be clearly
divided into two camps: pro-Tunisian Arabic and pro-Standard Arabic. The pro-
Standard partisans had several main objections, all of which were mentioned by
Ferguson as common language ideologies in diglossic societies (1959, 1997 [1959]).
A recurrent one –in response to the poster’s framing –was that Tunisian is a
“dialect”and not a “language.”This assertion was usually considered self-
evident, requiring no proof, marking it as the normative language ideology. Two
other main arguments of the pro-Standard camp were also related to Tunisian’s
status as a language: 1) that Tunisian “has no grammar”(with the implication that
“real”languages have grammar) and 2) that there is too much regional variation in
Tunisian for it to be standardized. Other users argued that Standard Arabic is
superior because of the richness of its lexicon. One listed 25 words for “rain”in
Classical Arabic and wrote (in Tunisian Arabic), “I want the language with 12
million words, not the impoverished one”(aːna n-ħɛbb luɣa al-12 miːlyoːnkɛlma, u
7https://www.tunisia-sat.com/forums/threads/4228359/.
Tunisian internet forum 69
ma-nħɛbb-ʃɛl-luɣaɛl-faqiːra). But whereas one writer touts the vastness of the
Arabic lexicon as evidence of its superiority, another uses it as evidence for its
irrelevance: 25 words for rain and 24 of them are completely unknown.
Interestingly, few of the participants used another common argument cited by
Ferguson: that Arabic is inherently superior because it is the language chosen by
God for the revelation of the Quran. Rather, the practical aspects of religion were
more likely to be raised: namely, that if Standard Arabic is not maintained as an
official language, people will no longer be able to understand the Quran. This was
countered by the argument (offered by several different authors) that religion has
nothing to do with the question: Indonesia is the largest Muslim country “and they
don’t speak Arabic.”It is surprising that this opinion was popular, given the
traditionally tight linkage of Islam and Arab identities, as the term “Arabo-Islamic”
used by Sayahi (2014) above indicates.
The pro-Tunisian replies can be divided into several main arguments as well.
One was that Standard Arabic is “adeadlanguage”(luːɣamayta) and is not useful in
modern life. The original poster made this point repeatedly, saying that he had
wasted years of his childhood learning a language “that’s not used for anything
except the 8 o’clock news.”This writer consistently uses the term “Tunisian
language,”pointedly contrasted with the normative framing of the “Tunisian
dialect.”He further emphasizes the distance betweenTunisian and “Arabic”with his
orthographic choices. Although he spelled luːɣa‘language’in the Standard way in
his originating post title ⟨lɣaﻟﻐﺔ ⟩, in later posts he spells it with a long vowel,
representing the Tunisian pronunciation: ⟨luːɣaﻟﻮﻏﺔ ⟩. This is not a common spelling,
and it is greeted with incredulity and derision from the pro-Standard camp.
8
Other pro-Tunisian posts claimed that Tunisian is not Arabic anyway, since it
is a mix of Arabic, Berber, Latin, and French. This implies that Tunisian cannot,
then, be merely a “dialect”of Arabic. One writer even claimed that Tunisian is more
similar to Phoenician than to Standard Arabic (providing a cognate sentence as
proof). This argument –like the similar arguments made in the Levant (Suleiman
2003) –emphasizes Tunisia’s deep pre-Arab roots. It also reflects a common
self-view of Tunisia as a liberal society, open to languages and ideas from else-
where. This is in dialogue with the stereotype of the Arabian Gulf (the most Arab of
Arabs) as closed and insular.
In fact, the relationship of Tunisian Arabic to other vernaculars, particularly
Eastern (Mashreqi) ones, was a major theme in the discussion. Some in the pro-
vernacular camp argued that the lack of mutual intelligibility between Tunisian
and other varieties of Arabic confirms its status as a language: “The Mashreqi or
8Representation of such lengthening in other Tunisian words –like raːjil ‘man’–is common; it
just happens that it is not common in this word.
70 McNeil
even the non-Tunisian Maghrebi doesn’t understand me for a simple reason: we
don’t speak the same language”(manɛħkiyuː-ʃfard luːɣa). Another user agreed:
The entire Gulf has made a campaign of telling us we’re not Arab. It’s pathetic: they’re trying
to distance themselves from us and we’re running after them trying to prove we’re related.
Note the correspondence of language and identity here: saying that North Africans
don’t speak Arabic means that they are not “Arab,”since Arab is not an ethnicity
but is rather defined by mother tongue (see Bassiouney 2020: 237–240). Some
posters consequently expressed strong dislike for “the Arabs.”One, when chal-
lenged about his contempt for his “Muslim brothers,”explained:
Egyptians, Moroccans, Algerians: do they like you? When you go to the Gulf to work, do they
respect you? The Emirates were the first to bomb Libya, and look who bombed Yemen. The
Arabs are beastly to each other. So what ‘brothers’are you talking about?
These arguments get at the heart of the struggle for identity, both as expressed in
this debate and as reflected in the quantitative increase in Tunisian Arabic use. It is
the struggle to define Tunisians as a distinct identity, speaking a distinct lan-
guage –not merely a “dialect.”The difference between a dialect and a vernacular
is that a dialect lacks autonomy (Fishman 2010); in other words, it is considered a
variety of another language, not an independent language itself. In this way, those
arguing that Tunisian is a different language than other varieties of Arabic are
assigning autonomy to it, trying to promote it from a mere “dialect”to a full-
fledged vernacular “language.”
9
What’s more, they are not only arguing for, but
also instantiating its autonomy, both by writing it and by creating Abstand through
non-standard spellings like ⟨luːɣaﻟﻮﻏﺔ ⟩.
Although these views were countered by the pro-Standard camp, interestingly
no one appeared to question the suitability of Tunisian Arabic for writing. Most
pro-Standard partisans, in fact, used Tunisian Arabic to express their arguments
against its suitability. This shows the extent to which writing in Tunisian Arabic
has been normalized, a large change from Walters’time, when Tunisian Arabic
was written in few circumstances (Walters 2003: 94).
5 Conclusion
In this article, I have argued that the momentous societal changes over the past
decade have led to Tunisians increasingly expressing a national identity, distinct
9I should note here that there is no Arabic equivalent of “vernacular,”so the argument here is
framed as between “dialect”and “language.”
Tunisian internet forum 71
from a pan-Arab identity, and that they are expressing this identity through their
writing in Tunisian Arabic. To support this, I have drawn upon my research of
online forum posts to demonstrate how Tunisian speakers’language choice online
has changed between 2010 and 2021. I used a novel method of comparing high-
frequency, salient terms between the language varieties as a proxy for language
choice. I found a large increase in the use of Tunisian Arabic during the period
under study: in 2010, the Tunisian variants were chosen only 19.7% of the time on
the forum; they were outnumbered by both Standard Arabic and French variants.
By 2021, however, Tunisian had become the dominant language on the forum, with
its lexical items used 69.9% of the time. This increase was not uniform: some forum
categories saw large increases in Tunisian Arabic, while others saw only slight
ones. Whether large or small, however, every category witnessed an increase in
Tunisian Arabic. A significant result of the study was that the increase in Tunisian
Arabic did not come entirely at the expense of Standard Arabic. While use of
Tunisian Arabic increased 50 percentage points, that of Standard Arabic decreased
only 23.4 percentage points. The rest of the Tunisian increase came at the expense
of French and romanized Arabic (Arabizi). French was more present than Tunisian
on the forum in 2010, at 23.1% of the lexical variants, whereas by 2021 it had
dwindled to only 1.5%. Arabizi was rare even in 2010 (5.2%) but by 2021 had almost
entirely disappeared.
This study has implications for the status of diglossia in Tunisia and in the
Arab world in general. The fact that Tunisian vernacular has become the unmarked
language of choice on the forum indicates a radical departure from the Arabic
language situation described by Ferguson (1959), in which writing was almost
entirely the domain of Standard Arabic. Though the use of Tunisian Arabic in
writing saw a slow but steady increase in the last two decades of the 20th century
(McNeil 2023; Sayahi 2014; Walters 2003), the current study shows the extent to
which online writing has accelerated this change. The slow change can be
considered largely the effect of increased societal literacy, but the rapid change in
the second decade of this century can be attributed to the spread of the internet and
the increased sense of Tunisian exceptionalism following the 2011 revolution. The
increase in use of Tunisian Arabic in writing coincides with its spread into many
public domains (Achour Kallel 2011, 2015; Baoueb et al. 2012; Daoud 2011a; Mejri
2017; Sayahi 2014, 2019). The current study, in combination with these other works,
supports the assertion of Alkhamees et al. (2019) that Arabic diglossia is in the
process of destabilizing.
This development of Tunisian Arabic as a written language is key to its future
status and domains. In many cases of former (now resolved) diglossia such as
Tamil, Greek, and the Romance languages, the point at which the vernacular
variety became an acceptable choice for writing was an inflection point that
72 McNeil
became a self-reinforcing cycle (Hudson 2002: 31). Furthermore, destabilizing
diglossia is characterized by exactly the “leakage in function”and “mixing in
form”(Fasold 1984: 54) that we see here. Hudson (1991: 13–14) explains this by
postulating that the rigid functional compartmentalization between the High and
Low varieties is not a “defining characteristic”of diglossic language situations but
rather a “necessary prerequisite.”In other words, maintaining a strict separation
between the two varieties is necessary to prevent convergence and language shift.
This would suggest that, as the barrier between the two varieties becomes more
porous, it may lead to a destabilization of the linguistic situation and, eventually,
the expansion of the vernacular into all domains.
This process is not inevitable or “natural”but rather reflects the choices of the
speech community (Coulmas 2002). The “desire for a full-fledged standard ‘na-
tional’language as an attribute of autonomy or of sovereignty”was one of the
societal trends that Ferguson predicted would lead to the breakdown in diglossia
(1959: 338). We can see this development clearly in the metalinguistic comments
on the forum, where users explicitly associate Tunisian Arabic with their national
identity. This identity is not only in opposition to Standard Arabic and French, but
also to other Arabic vernaculars –especially Middle Eastern ones. These views are
still highly contested, however, with some forum users valuing the Arab identity
indexed by Standard Arabic above that of Tunisian national identity. It is impor-
tant to note, however, that even anti-vernacular partisans often wrote their argu-
ments in Tunisian Arabic. This indicates that Tunisian has been normalized as a
written language, even if some of those writing it still do not consider it a
“language.”
Research funding: This work was funded by American Institute for Maghrib
Studies (AIMS).
Appendix A: Equivalent terms in four varieties
Word Tunisian Arabic Romanized
Tunisian
Standard Arabic French
IPA Orthography (Arabizi)aIPA Orthography
a little ʃwayya ﺷﻮﻳﺔ،ﺷﻮﻳﺎ،ﺷﻮﻳﻪ chwaya,
chweya
qaliːl(an) ﻗﻠﻴﻞ،ﻓﻠﻴﻼ un peu
also zada ﺯﺍﺩﺍ،ﺯﺍﺩﺓ،ﺯﺍﺩﻩ zeda, zada,
zede
aydan ﺍﻳﻀﺎ aussi
always diːma ﺩﻳﻤﺎ dima, dayma,
deyma
daːʔiman ﺩﺍﺋﻤﺎ toujours
Tunisian internet forum 73
(continued)
Word Tunisian Arabic Romanized
Tunisian
Standard Arabic French
IPA Orthography (Arabizi)aIPA Orthography
be able to ynejjim ﺗﻨﺠﻢ،ﺍﺗﻨﺠﻢ،ﻳﻨﺠﻢ،
ﻧﻨﺠﻢ،ﻧﺠﻢ
tnajem, tnajim,
tnejem, nejem,
najim, najem,
ynajem …
yastatˤiːʕuﺗﺴﺘﻄﻴﻊ،ﻳﺴﺘﻄﻴﻊ،
ﺍﺳﺘﻄﻴﻊ
peux, peut
brother xu ﺧﻮ،ﺧﻮﻳﺎ،ﺧﻮﻙ ouya, ouk,
khouya,
khouk, khou,
ou
ʕax ﺍﺥ،ﺍﺧﻲ،ﺍﺧﻚ frère
come yji ﺗﺠﻲ،ﻳﺠﻲ،ﻧﺠﻲ yji, tji, nji taʔtiːﺗﺎﺗﻲ،ﺗﺠﻲء،ﻳﺎﺗﻲ،
ﻳﺠﻲء،ﺍﺗﻲ،ﺍﺟﻲء
viens, vient
good bahi ﺑﺎﻫﻲ،ﺑﺎﻫﻴﺔ behi, bahi,
bahia, bahya
jayyid ﺟﻴﺪ،ﺟﻴﺪﺓ bien
he has ʕandu ﻋﻨﺪﻭ andou,
andou, ando
ʕandahu ﻋﻨﺪﻩ il a
himself ruːħuﺭﻭﺣﻮ roo, roou,
roho, rouo,
rouou,
rouhou
nafsahu ﻧﻔﺴﻪ la même
how
much
qaddɛːʃ ﻗﺪﺍﺵ kaddech,
kadech, qad-
dech, adech
…
kam ﻛﻢ combien
like kiːma ﻛﻴﻤﺎ kima miθla ﻣﺜﻞ comme
like that hakka ﻫﻜﺎ،ﻫﻜﺔ haka, hakeka,
heka
haːkaða ﻫﻜﺬﺍ comme ça
man raːjil ﺭﺍﺟﻞ،ﺍﻟﺮﺍﺟﻞ rajel, rajil rajul ﺭﺟﻞ،ﺍﻟﺮﺟﻞ homme
not/is not muːʃ ﻣﺶ،ﻣﻮﺵ،ﻣﺎﻧﻴﺶ،
ﻣﺎﻫﻮﺵ،ﻣﺎﻫﺎﺵ،
ﻣﺎﻧﺎﺵ،ﻣﺎﻛﺶ
moch, mosh,
mouch, mou-
che, moush,
much, mahich
…
laysa ﻟﻴﺴﺖ،ﻟﻴﺲ،ﻟﺴﺖ،
ﻟﺴﻨﺎ،ﻟﻴﺴﻮﺍ
pas
now tawwa ﺗﻮ،ﺗﻮﺍ،ﺗﻮﻩ،ﺗﻮﻯ،
ﺗﻮﺓ
tawa, taw, twa alaːnﺍﻻﻥ maintenant
see yʃuːfﺗﺸﻮﻑ،ﻧﺸﻮﻑ،
ﻳﺸﻮﻑ
tchouf, nchouf,
ychouf
yara ﺗﺮﻯ،ﻳﺮﻯ،ﺍﺭﻯ vois, voit
should laːzim ﻻﺯﻡ،ﻳﻠﺰﻣﻨﻲ،ﻳﻠﺰﻣﻮ،
ﻳﻠﺰﻣﻪ،ﻳﻠﺰﻣﻬﺎ،ﻳﻠﺰﻣﻨﺎ
lazem, laz-
emna, lazmou,
lezem,
lezemha, yel-
zemha …
yajib ﻳﺠﺐ devrait
talk yɛħkiːﺗﺤﻜﻲ،ﻧﺤﻜﻲ،ﻳﺤﻜﻲ tahki, tehki,
taki, teki,
nahki, nehki,
naki …
yatakallam ﺗﺘﻜﻠﻢ،ﻳﺘﻜﻠﻢ،ﺍﺗﻜﻠﻢ parles,
parle
74 McNeil
(continued)
Word Tunisian Arabic Romanized
Tunisian
Standard Arabic French
IPA Orthography (Arabizi)aIPA Orthography
that (dem
pron)
haːka/
haðaːka
ﻫﺎﻙ،ﻫﺬﺍﻛﺎ hadhaka,
hathaka,
hatheka, hed-
haka, hedheka
ðalika/
ðaːk
ﺫﻟﻚ،ﺫﺍﻙ cette, cet
them huːma ﻫﻮﻣﺎ houma, homa humma ﻫﻢ ils
there ɣaːdiːﻏﺎﺩﻱ ghadi, adi,
gadi
hunaːkﻫﻨﺎﻙ là bas
this haːðiːﻫﺎﺫﻱ،ﻫﺬﻱ hedhi, hethi,
hathi, hadi,
hedhe, hadhi,
hédhi
haːðihi ﻫﺬﻩ ça
very, a lot barʃa/
yaːssir
ﺑﺮﺷﺎ،ﺑ
ﺮﺷﺔ،ﺑﺮﺷﻪ،
ﺑﺮﺷﻰ،ﺑﺎﺭﺷﺎ،ﻳﺎﺳﺮ
barcha, bar-
sha, yasar,
yaser, yasser,
yeser, yesr,
yesser
jiddan/
kaθiːran
ﺟﺪﺍ،ﻛﺜﻴﺮﺍ beaucoup
we naħna ﻧﺤﻨﺎ،ﺍﺣﻨﺎ ana, ahna naħnu ﻧﺤﻦ nous
what
(interrog)
aʃ/
ʃnuwwa
ﺷﻨﻮﺍ،ﺷﻨﻮﺓ،ﺷﻨﻮ،
ﺷﻨﻮﻩ،ﺍﺷﻨﻮﻩ،ﺍﺷﻨﻮ
ﺍ،
ﺷﻨﻴﺔ،ﺷﻨﻴﺎ،ﺍﺷﻨﻴﺔ،
ﺍﺵ
ech, éch, ach,
esh, chnawa,
chnewa,
chneya,
chnoua …
maːða ﻣﺎﺫﺍ quoi
what (rel
pron)
illi ﺍﻟﻲ،ﺍﻟﻠﻲ eli, elli, ili, illi allaːðiːﺍﻟﺬﻱ،ﺍﻟﺘﻲ،ﺍﻟﺬﻳﻦ qui
where wayn ﻭﻳﻦ win, wen ayna ﺍﻳﻦ où
why ʕalɛːʃ ﻋﻼﻩ،ﻋﻼﺵ lech, leh,
alih, alech,
lach
limaːða ﻟﻤﺎﺫﺍ pourquoi
will bɛːʃ,
mɛːʃ
ﺑﺎﺵ،ﺑﺶ،ﻣﺎﺵ bech, bch,
mech, mch,
mche, bach,
besh …
sawfa ﺳﻮﻑ je vais
with mʕaː-ﻣﻌﺎﻩ،ﻣﻌﺎﻙ،ﻣﻌﺎﻫﺎ،
ﻣﻌﺎﻱ
mah, mak,
maha,
maya, meya
maʕ-ﻣﻌﻪ،ﻣﻌﻬﺎ،ﻣﻌﻚ،
ﻣﻌﻲ
avec
aThe Arabizi had many variable spellings; variable spellings more than thre e lines here are truncated, indicated
by ‘…’.
Tunisian internet forum 75
Appendix B: Quantitative results
Occurrences per million words in the TunisiaSat corpus, ordered by descending frequency.
Word Tunisian
Arabic freq
Standard
Arabic freq
Arabizi freq French freq Tunisian
proportion
what (rel pron) , , , , , .%.%
will , .%.%
(is) not , , , .%.%
this , , .%.%
very, a lot , .%.%
now , .%.%
brother , .%.%
that (dem pron) , , .%.%
what (interrog) , .%.%
like .%.%
should .%.%
be able to .%.%
why .%.%
with , .%.%
also .%.%
he has .%.%
there .%.%
talk .%.%
good .%.%
them .%.%
we .%.%
where .%.%
come .%.%
see .%.%
a little .%.%
always .%.%
man .%.%
like that .%.%
himself .%.%
how much .%.%
76 McNeil
References
Aboelezz, Mariam. 2012. We are young. We are trendy. Buy our product! The use of Latinized Arabic
in edited printed press in Egypt. United Academics Journal of Social Sciences 2. 48–72.
Achour Kallel, Myriam. 2011. Choix langagiers sur la radio Mosaïque FM, dispositifs d’invisibilité
et de normalisation sociales. Langage et société 4(138). 77–96.
Achour Kallel, Myriam. 2015. “Ici on parle tunisien”. Écriture du politique et politique de l’écriture
ou qui ne peut pas être passeur ? In Myriam Achour Kallel (ed.), Le social par le langage. La
parole au quotidien,95–118. Paris: IRMC-Karthala.
Achour Kallel, Myriam. 2016. «La Rolls et la Volkswagen»: Écrire en tunisien sur Facebook en 2016.
Journal of Arabic and Islamic Studies 16. 253–272.
Alkhamees, Abdulrahman, Rasha Elabdali & Keith Walters. 2019. Destabilizing Arabic diglossia?
New media and translingual practice. In Amel Khalfaoui & Youssef Haddad (eds.),
Perspectives on Arabic linguistics, vol. 31, 105–134. Amsterdam: John Benjamins.
Al-Khatib, Mahmoud A. & Enaq H. Sabbah. 2008. Language choice in mobile text messages among
Jordanian university students. SKY Journal of Linguistics 21. 37–65.
Allehaiby, Wid H. 2013. Arabizi: An analysis of the romanization of the Arabic script from a
sociolinguistic perspective. Arab World English Journal 4(3). 52–62.
Bach Baoueb, Sallouha Lamia & Naouel Toumi. 2012. Code switching in the classroom: A case
study of economics and management students at the University of Sfax, Tunisia. Journal of
Language, Identity & Education 11(4). 261–282.
Bassiouney, Reem. 2020. Arabic sociolinguistics: Topics in diglossia, gender, identity, and
politics, 2nd edn. Washington, DC: Georgetown University Press.
Belnap, R. Kirk & Brian Bishop. 2003. Arabic personal correspondence: A window on change in
progress? International Journal of the Sociology of Language 163. 9–25.
Blommaert, Jan. 1999. The debate is open. In Jan Bommaert (ed.), Language ideological debates,
1–38. Germany: Mouton Publishers.
Boussofara-Omar, Naima. 2006. Neither third language nor middle varieties but diglossic
switching. Zeitschrift für Arabische Linguistik 45. 55–80.
Bucholtz, Mary & Kira Hall. 2004. Language and identity. In Alessandro Duranti (ed.), A companion
to linguistic anthropology, 369–394. Malden, MA: Blackwell.
Caubet, Dominique. 2004. L’intrusion des téléphones portables et des “SMS”dans l’arabe
marocain en 2002–2003. In Dominique Caubet, Thierry Bulot, Isabelle Léglise,
Catherine Miller & Jacqueline Billiez (eds.), Parlers jeunes ici et là-bas, 247–270. Paris:
L’Harmattan.
Caubet, Dominique. 2012. Apparition massive de la darija à l’écrit à partir de 2008–2009: sur le
papier ou sur la toile: quelle graphie ? Quelles régularités? In Mohamed Meouak,
Pablo Sánchez & Ángeles Vincente (eds.), De los manuscritos medielvales a internet: la
presencia del árabe vernáculo en las fuentes escritas, 377–402. Zaragoza: Universidad de
Zaragoza.
Caubet, Dominique. 2017a. New elaborate written forms in Darija: Blogging, posting and
slamming in Morocco. In The Routledge handbook of Arabic linguistics, 387–406. London:
Routledge.
Caubet, Dominique. 2017b. Morocco: An informal passage to literacy in dārija (Moroccan Arabic).
In Jacob Høigilt & Gunvor Mejdell (eds.), The politics of written language in the Arab world:
Writing change, 116–141. Leiden: Brill.
Tunisian internet forum 77
Coulmas, Florian. 2002. Writing is crucial. International Journal of the Sociology of Language 157.
59–62.
Coulmas, Florian. 2013. Writing and society: An introduction. Cambridge, UK: Cambridge
University Press.
Daoudi, Anissa. 2011. Globalization, computer-mediated communications and the rise of e-Arabic.
Middle East Journal of Culture and Communication 4(2). 146–163.
Daoud, Mohamed. 2011a. The sociolinguistic situation in Tunisia: Language rivalry or
accommodation? International Journal of the Sociology of Language 211. 9–33.
Daoud, Mohamed. 2011b. The survival of French in Tunisian identity. In Joshua Fishman &
Ofelia García (eds.), Handbook of language and ethnic identity, vol. 2, 54–67. Oxford: Oxford
University Press.
Davies, Humphrey. 2006. Dialect literature. In Kees Versteegh & Mushira Eid (eds.), Encyclopedia
of Arabic language and linguistics, 2, 597–604. Leiden: Brill.
Doss, Madiha & Humphrey Davies. 2013. Al-ʿāmmīyah al-miṣrīyah al-maktūba [Written Egyptian
Colloquial Arabic]. Cairo: The General Egyptian Book Organization.
Elinson, Alexander E. 2013. Dārija and changing writing practices in Morocco. International
Journal of Middle East Studies 45(4). 715–730.
Fasold, Ralph W. 1984. The sociolinguistics of society. Oxford: Blackwell.
Ferguson, Charles. 1959. Diglossia. Word 15(2). 325–340.
Ferguson, Charles. 1997 [1959]. Myths about Arabic. In R. Kirk Belnap & Niloofar Haeri (eds.),
Structuralist studies in Arabic linguistics: Charles A. Ferguson’s papers: 1954–1994,
250–256. Leiden: Brill.
Fishman, Joshua. 2010. European vernacular literacy: A sociolinguistic and historical introduction.
Bristol, UK: Channel View Publications.
Gibson, Maik. 2009. Tunis Arabic. In Kees Versteegh & Mushira Eid (eds.), Encyclopedia of Arabic
language and linguistics, vol. 4, 563–571. Leiden: Brill.
Gibson, Maik. 2013. Dialect levelling in Tunisian Arabic: Towards a new spoken standard. In
Aleya Rouchdy (ed.), Language contact and language conflict in Arabic,42–58. London:
Routledge.
Habash, Nizar. 2010. Introduction to Arabic natural language processing. San Rafael, CA: Morgan
& Claypool.
Hachimi, Atiqa. 2013. The Maghreb-Mashreq language ideology and the politics of identity in a
globalized Arab world. Journal of Sociolinguistics 17(3). 269–296.
Hellyer, H. A. 2015. Tunisia remains a beacon of hope in the Arab world. The National. Available at:
https://www.thenationalnews.com/opinion/tunisia-remains-a-beacon-of-hope-in-the-
arab-world-1.45419.
Høigilt, Jacob & Gunvor Mejdell (eds.). 2017. The politics of written language in the Arab world:
Writing change. Leiden: Brill.
Hudson, Alan. 1991. Toward the systematic study of diglossia. Southwest Journal of Linguistics
10(1). 1–22.
Hudson, Alan. 2002. Outline of a theory of diglossia. International Journal of the Sociology of
Language 157. 1–48.
Kashina, Anna. 2020. Case study of language preferences in social media of Tunisia. In Advances
in social science, education and humanities research, Vol. 489. Proceedings of the
international conference digital age: Traditions, modernity and innovations (ICDATMI 2020),
121–125. Paris.
78 McNeil
Kebede, Tewodros Aragie & Kristian Takvam Kindt. 2016. Language and social survey in Morocco:
A tabulation report. Oslo: Fafo.
Kebede, Tewodros Aragie, Kristian Takvam Kindt & Jacob Høigilt. 2013. Language Change in Egypt:
Social and Cultural Indicators Survey: A Tabulation Report. Oslo: Fafo.
Khalil, Saussan. 2018. Fuṣḥá, ‘āmmīyah, or both?: Towards a theoretical framework for written
Cairene Arabic. Leeds, UK: University of Leeds dissertation.
Kindt, Kristian Takvam, Jacob Høigilt & Tewodros Aragie Kebede. 2016. Writing change: Diglossia
and popular writing practices in Egypt. Arabica 63(3-4). 324–376.
Kloss, Heinz. 1967. “Abstand languages”and “Ausbau languages”.Anthropological Linguistics
9(7). 29–41.
Masri, Safwan. 2017. Tunisia: An Arab anomaly. New York: Columbia University Press.
McNeil, Karen. 2019. Tunisian Arabic corpus: Creating a written corpus of an “unwritten”
language. In Andrew Hardie (ed.), Arabic corpus linguistics,30–55. Edinburgh: Edinburgh
University Press.
McNeil, Karen. 2023. When the leak becomes a flood: The development of vernacular literature in
Tunisia. In Mahmoud Azaz (ed.), Perspectives on Arabic linguistics, 34. Amsterdam: John
Benjamins, In press.
Mejdell, Gunvor. 2006. The use of colloquial in modern Egyptian literature—A survey. In
Lutz Edzard & Jan Retsö (eds.), Current issues in the analysis of semitic grammar and lexicon,
vol. 2, 195–213. Wiesbaden: Harrassowitz Verlag.
Mejri, Salah. 2017. La nouvelle Constitution tunisienne en dialectal. In Veronika Ritt-Benmimoun
(ed.), Tunisian and Libyan Arabic dialects: Common trends –recent developments –
diachronic aspects, 191–204. Zaragoza, Spain: Prensas de la Universidad de Zaragoza.
Miller, Catherine. 2017. Contemporary dārija writings in Morocco: Ideology and practices. In
Jacob Høigilt & Gunvor Mejdell (eds.), The politics of written language in the Arab world:
Writing change,90–115. Leiden: Brill.
Myers-Scotton, Carol. 1995. Social motivations for codeswitching: Evidence from Africa. Oxford:
Clarendon Press.
Nordenson, Jon. 2017. The language of online activism: A case from Kuwait. In Jacob Høigilt &
Gunvor Mejdell (eds.), The politics of written language in the Arab world: Writing change,
266–289. Leiden: Brill.
Palfreyman, David & Muhamed Al Khalil. 2007. “A funky language for teenzz to use:”Representing
Gulf Arabic in instant messaging. In Brenda Danet & Susan Herring (eds.), The multilingual
internet: Language, culture, and communication online,43–63. Oxford: Oxford University
Press.
Ritt-Benmimoun, Veronika. 2014. Grammatik des arabischen Beduinendialekts der Region Douz
(Südtunesien). Wiesbaden: Harrassowitz.
Rosenbaum, Gabriel. 2011. The rise and expansion of colloquial Egyptian Arabic as a literary
language. In Rakefet Sela-Sheffy & Gideon Toury (eds.), Culture contacts and the making of
cultures, 323–344. Tel Aviv: Tel Aviv University.
Sayahi, Lotfi. 2011a. Code-switching and language change in Tunisia. International Journal of the
Sociology of Language 211. 113–133.
Sayahi, Lotfi. 2011b. Current perspectives on Tunisian sociolinguistics. International Journal of the
Sociology of Language 211. 1–8.
Sayahi, Lotfi. 2014. Diglossia and language contact: Language variation and change in North
Africa. Cambridge, UK: Cambridge University Press.
Tunisian internet forum 79
Sayahi, Lotfi. 2019. Diglossia and the normalization of the vernacular: Focus on Tunisia. In
Enam Al-Wer & Uri Horesh (eds.), The Routledge handbook of Arabic sociolinguistics,
227–239. London: Routledge.
S’hiri, Sonia. 2003. Speak Arabic please! Tunisian Arabic speakers’linguistic accommodation to
Middle Easterners. In Aleya Rouchdy (ed.), Language contact and language conflict in Arabic,
149–173. London: Routledge.
Snow, Don. 2013. Revisiting Ferguson’sdefining cases of diglossia. Journal of Multilingual and
Multicultural Development 34(1). 61–76.
Suleiman, Yasir. 2003. The Arabic language and national identity. Edinburgh: Edinburgh
University Press.
Suleiman, Yasir. 2004. A war of words: Language and conflict in the Middle East. Cambridge, UK:
Cambridge University Press.
Suleiman, Yasir & Ashraf Abdelhay. 2020. Diglossia, folk-linguistics, and language anxiety. In
Reem Bassiouney & Keith Walters (eds.), The Routledge handbook of Arabic and identity,
147–160. London: Routledge.
Walters, Keith. 2003. Fergie’s prescience: The changing nature of diglossia in Tunisia.
International Journal of the Sociology of Language 163. 77–109.
Warschauer, Mark, Ghada R. El Said & Ayman G. Zohry. 2002. Language choice online:
Globalization and identity in Egypt. Journal of Computer-Mediated Communication 7(4).
1–18.
Yaghan, Mohammad Ali. 2008. “Arabizi”: A contemporary style of Arabic slang. Design Issues
24(2). 39–52.
Younes, Jihene & Emna Souissi. 2014. A quantitative view of Tunisian dialect electronic writing. In
5th international conference on Arabic language processing,63–72. Oujda, Morocco:
University of Mohammed Premier Oujda.
80 McNeil