ArticlePDF Available

‘We don’t speak the same language:’ language choice and identity on a Tunisian internet forum

Authors:

Abstract and Figures

The linguistic situation in the Arab world is in an important state of transition, with the “spoken” vernaculars increasingly functioning as written languages as well. While this fact is widely acknowledged and the subject of a growing body of qualitative literature, there is little quantitative research detailing the process in action. The current project examines this development as it is occurring in Tunisia: I present the findings from a corpus study comparing the frequency of Tunisian Arabic–Standard Arabic equivalent pairs in online forum posts from 2010 with those from 2021. The findings show that the proportion of Tunisian lexical items, compared to their Standard Arabic equivalents, increased from a minority (19.7%) to a majority (69.9%) over this period. At the same time, metalinguistic comments on the forum reveal that, although its status is still contentious, Tunisian has become unmarked as a written language. These changes can be attributed to major developments in Tunisian society over the period of study – including internet access and the 2011 revolution. These findings suggest destabilization of the diglossic language situation in Tunisia and a privileging of national identity vis-à-vis the rest of the Arab world.
Content may be subject to copyright.
Karen McNeil*
We dont speak the same language:
language choice and identity on a Tunisian
internet forum
https://doi.org/10.1515/ijsl-2021-0126
Received December 13, 2021; accepted May 3, 2022
Abstract: The linguistic situation in the Arab world is in an important state of
transition, with the spokenvernaculars increasingly functioning as written
languages as well. While this fact is widely acknowledged and the subject of a
growing body of qualitative literature, there is little quantitative research detailing
the process in action. The current project examines this development as it is
occurring in Tunisia: I present the findings from a corpus study comparing the
frequency of Tunisian ArabicStandard Arabic equivalent pairs in online forum
posts from 2010 with those from 2021. The findings show that the proportion of
Tunisian lexical items, compared to their Standard Arabic equivalents, increased
from a minority (19.7%) to a majority (69.9%) over this period. At the same time,
metalinguistic comments on the forum reveal that, although its status is still
contentious, Tunisian has become unmarked as a written language. These changes
can be attributed to major developments in Tunisian society over the period of
study including internet access and the 2011 revolution. These ndings suggest
destabilization of the diglossic language situation in Tunisia and a privileging of
national identity vis-à-vis the rest of the Arab world.
Keywords: colloquial; dialects; diglossia; identity; Tunisian Arabic
1 Introduction
In these pages nearly 20 years ago, Walters (2003: 101102) observed that, due to
increased literacy, Arabic diglossia in Tunisia was being recongured: learned
vocabulary from the Highvariety (Standard Arabic/MSA) was increasingly used
in educated speech, while opportunities for writing the Lowvariety (Tunisian
dialect) were expanding.Likewise, Belnap and Bishop (2003: 21) showed that
*Corresponding author: Karen McNeil, Arabic and Islamic Studies, Georgetown University,
Washington, DC, USA, E-mail: km1542@georgetown.edu. https://orcid.org/0000-0002-4875-
1124
IJSL 2022; 278: 5180
increased literacy in the Arab world had led to new styles of informal writing. For
some young Arabs, unmitigated MSAhad come to be seen as too formal for
personal correspondence with peersand they instead chose forms that were
closer to the vernacular. This was a signicant change from the previous genera-
tion or two, for whom such writing meant running the risk of being taken for a
semiliterate(Belnap and Bishop 2003: 21).
Some of this informal correspondence was taking place on the then-nascent
internet, where Belnap and Bishop predicted that the immediacy of online
communication may serve to further erode the spoken/written distinction and
result in even more [colloquial Arabic] being used in the written mode(2003:
19). This prediction has proved prescient: the slow growth in literacy rates over
the last half century has been followed by the rapid growth of internet access in
this century. The same period has seen an expansion in vernacular writing, both
online and in print, throughout the Arab world. Though it is the subject of a
growing body of literature, quantitative data describing this expansion is sparse
and most existing literature has focusedonEgyptandMorocco(e.g.Aboelezz
2012; Caubet 2017a; Elinson 2013; Høigilt and Mejdell 2017; Kindt et al. 2016;
Mejdell 2006; Miller 2017).
The current study aims to provide a window on this ongoing change by pre-
senting evidence from Tunisia over the span of a decade. To do this, I compare a
corpus of Tunisian internet forum posts from 2010 with one of 2021 posts collected
from the same site. My analysis shows that the proportion of writing in Tunisian
Arabic, rather than Standard Arabic, has dramatically increased within this 11-year
period. I will argue that this change can be attributed to the assertion of a Tunisian
identity indexed by Tunisian Arabic rather than a pan-Arab Islamic identity.
This paper is divided into three main sections. I begin with an overview of the
language situation in the Arab world in general and Tunisia in particular, along
with the importance of language in Arab identity. I then describe the corpus,
composed of hundreds of thousands of posts on the online forum TunisiaSat from
2010 to 2021, and the methods used to quantify the proportion of Tunisian Arabic
writing and to assess language attitudes in these posts. Finally, I present and
discuss the results of the study, before concluding with some implications for the
linguistic situation in Tunisia and the Arab world.
2 Language in Tunisia and the Arab world
2.1 Diglossia
The term diglossia was coined by Charles Ferguson in a 1959 article of that title,
based on the French term diglossie that William Marçais had used to describe the
52 McNeil
linguistic situation in North Africa. Ferguson described a particular kind of
standardization where two varieties of a language exist side by side each having
adenite role to play(1959: 325). For Ferguson, this denite roleor speciali-
zation of functionwas the dening characteristic of diglossic language situations:
In one set of circumstances only [High] is appropriate and in another only [Low],
with the two sets overlapping only very slightly(1959: 328).
Arabic was one of the defining cases discussed by Ferguson and is, in many
respects, viewed as a prototypical example of Fergusonian diglossia (Snow 2013).
The High language throughout the Arab world is Standard Arabic, referred to by its
speakers as al-fusʕħaːthe most elegantor simply al-ʕarabiyya Arabic. Standard
Arabic is the ofcial language and the language of education in all Arab countries;
the few calls to include any vernacular elements in school curriculum have been
subject to virulent public criticism (Suleiman and Abdelhay 2020). There are strong
ideologies associated with the different varieties: Standard Arabic is believed to be
the reallanguage, with the vernaculars viewed as awed deviations from it (see
Bassiouney 2020: 233; Høigilt and Mejdell 2017).
The vernacular languages are quite different from Standard Arabic, as well as
from each other. To give a typical example from Tunisian radio: one day the host
was discussing a popular Egyptian Arabic novel named ʕayza ʔatgawwiz I want to
get married.The sequel of that novel had recently been released and was titled miʃ
ʕayza ʔatgawwiz I dont want to get married.The host, for emphasis, translated
the title into Tunisian Arabic: ma-nħɛbb-ʃn-ʕarris. These varieties clearly differ in
all linguistic aspects from each other, as well as from Standard Arabic, as seen in
example (1).
(1a) miʃʕayz-a ʔa-tgawwiz (Egyptian)
NEG wanting-FI-get.married
(1b) ma n-ħɛbb-ʃn-ʕarris (Tunisian)
NEG I-want-NEG I-get.married
(1c) laːʔ-uriːd-u ʔan ʔa-tazawwaj-a (Standard Arabic)
NEG I-want-IND that I-get.married-SBJV
I dont want to get married
Note here the Tunisian two-part negation ama before the verb and ʃfollowing
which is distinctive of North African varieties of Arabic.
1
Tunisian also differs from
1It should be noted that Egyptian Arabic is also one of the varieties that uses this two-part
negation; it just happens that I wantis expressed with a participle rather than a nite verb in
Egyptian.
Tunisian internet forum 53
the other two varieties in the rst-person verb conjugation (n-) as well as in the
word it uses for to get married.It is clear that (in the terminology of Kloss 1967)
there is signicant abstand (distance) between these varieties. Their speakers
consider them one languagehowever, because the vernaculars are lacking
ausbau: they have not been standardized and are generally not written.
Although there are few signs of standardization, the vernaculars have been
making strides into the written domain. Literature written in vernacular Arabic has
begun to appear in countries like Morocco (Caubet 2017b; Elinson 2013; Miller 2017)
and Tunisia (McNeil 2023, Achour this issue); previously only Egypt (and, to a lesser
extent, Lebanon) had a history of vernacular literature (Davies 2006; Doss and
Davies 2013; Mejdell 2006; Rosenbaum 2011). The biggest expansion of vernacular
writing, however, has been online (Achour Kallel 2016; Caubet 2004, 2012, 2017a;
Daoudi 2011). In a recent survey of language attitudes and practices in Cairo and
Rabat, participants reported using mostly vernacular Arabic (or vernacular and
French in Rabat) online (Kebede et al. 2013; Kebede and Kindt 2016). Other than
this survey, only a few studies have quantied Arabic language choice online
(Al-Khatib and Sabbah 2008; Khalil 2018; Nordenson 2017; Warschauer et al.
2002). Many more have studied the phenomenon of Arabic language written in
Latin letters, often termed Arabizi (e.g. Allehaiby 2013; Palfreyman and Al Khalil
2007; Yaghan 2008). Though the studies available are geographically limited
(most cover either Egypt or Morocco), they suggest that the written use of
vernacular Arabic has greatly increased, led in no small part by its use online.
Some scholars have even suggested that the recent developments indicate that
diglossia in the Arab world is breaking down (Alkhamees et al. 2019). Whether that
is the case or not, the status of Arabic vernaculars as written languages is clearly
changing.
2.2 Tunisia and Tunisian Arabic
Tunisia is a small North African country (roughly the size of the US state of
Georgia), bordered by Algeria to the west, Libya to the southeast, with the Medi-
terranean Sea to the north and east. The Lowlanguage variety spoken in Tunisia
is Tunisian Arabic, called dɛːrja (common)ortuːnsi (Tunisian) by its speakers. It
is similar to the dialects of eastern Algerian and western Libyan Arabic (see Gibson
2009; Ritt-Benmimoun 2014; Sayahi 2011b for a description of its features). Arabic
was introduced to Tunisia by the Islamic armies in the 7th century and, over
multiple conquests, eventually replaced the native Berber (Amazighi) and Late
Latin which were the dominant languages in North Africa at the time (Sayahi 2014).
Tunisian Arabic is spoken by the nearly 12 million residents of Tunisia, as well as
54 McNeil
over a million Tunisians abroad. The dialect of the capital, Tunis, functions as a de
facto standard (Gibson 2013; Walters 2003). This standard is mostly a spoken one:
it is used in Tunisian movies and television series, for example, but most books,
newspapers, and other formal genres are still written in Standard Arabic.
A generation ago, Tunisian Arabic writing was even rarer. In 2003, Walters
noted an expansion of the limitedwriting opportunities, such as increased
visibility of the language in billboards and print advertisements (101). He ascribes
much of this change to improved literacy: whereas, at the time of Fergusons
writing, youth literacy in Tunisia was only 31% (49% for males and 12% for
females, Walters 2003: 8486), youth literacy today is nearly universal at 96% and
the gender gap has closed to less than 1%. Widespread literacy tends to destabilize
diglossia because, when ordinary people write to each other about ordinary topics,
a genre of informal writing is created. The standard language is, by denition, a
formal language, so writing informally to personal intimates in the standard
language can feel awkward (Belnap and Bishop 2003). In other words, when
literacy increases, opportunities to write in the spokenlanguage arise.
Since the time of Walterswriting, Tunisian Arabic has continued to spread
into domains that were previously limited to Standard Arabic. Although most
books, newspapers, and other printed material is still written in Standard Arabic,
there is an increasing trend of fiction writing in Tunisian Arabic (McNeil 2023,
Achour this issue). This written expansion is part of an overall growth of Tunisian
Arabic in formal spheres over the past couple decades, including radio and tele-
vision broadcasts (Achour Kallel 2011; Daoud 2011a), classrooms (Bach Baoueb
and Toumi 2012), and mosques (Sayahi 2014). Since the 2011 revolution and
democratic transition, a limited number of government communications have also
been produced in Tunisian Arabic (Achour Kallel 2015; McNeil 2023; Mejri 2017;
Sayahi 2019).
The spreading domains of Tunisian Arabic can be ascribed not only to literacy
but also to the internet and mobile phones: 64% of the population in 2018 had
access to the internet, and there were 124 mobile phones in Tunisia for every 100
people.
2
Coulmas (2013: 131) notes that the quasi-oralityof new media like chat
and SMS encourages a more spoken register in online writing; in Arabic this has
meant that such messages are often written in the vernacular. Yasir Suleiman
wrote in 2004 that the use of Arabic dialectsin writing is resisted because it
breaks what is in effect a cultural taboowhose ideological validity is sanctioned
by the tradition and historical practice(2004: 72). When mobile phones appeared,
2https://data.worldbank.org/indicator/SE.ADT.1524.LT.ZS?locations=TN; https://data.worldbank.
org/indicator/IT.NET.USER.ZS?end=2019&locations=TN.
Tunisian internet forum 55
texting in vernacular seemed natural because texting serves a similar communi-
cative function as speech. The early chat rooms on the internet were the same, as
was Facebook and Twitter. By the time that online genres, like blogs, that were
more like traditional writing appeared, the taboothat Suleiman described had
already been greatly weakened by all of this online writing.
2.3 Language and identity
Tunisian Arabic is a variety of MAGHREBI, or North African, Arabic. There is an
unequal relationship of power and prestige between Maghrebi and MASHREQI Arabic
(varieties of the Middle East including Egypt). The Middle East (especially Egypt
and Lebanon) has traditionally dominated Arab popular culture, so North Africans
easily understand Mashreqi varieties of Arabic, but the reverse is not true (Hachimi
2013; Shiri 2003). Middle Eastern Arabs claim to not be able to understand North
Africans, even going so far as to assert that what they are speaking is not Arabic.
When two speakers from the different regions meet, the communicative burden is
on the North African to alter their language to be understood; the Middle Easterner
will make no such attempt at accommodation (Shiri 2003). Although this antip-
athy has long existed, the new international media has brought speakers from
different countries into contact like never before and heightened tensions
(Hachimi 2013).
The status of Tunisia in the Arab world underwent a change after the 2011
Jasmine Revolution, when mass protest forced the longtime dictator of Tunisia to
step down and sparked the Arab Spring.Although similar protests spread
throughout the region, they resulted only in a continuation of the status quo
(Morocco and the Gulf), replacement of one dictator for another (Egypt), or ruinous
civil war (Syria, Libya). After long being considered a backwater of the Arab world,
Tunisia was suddenly a beacon of hope(Hellyer 2015) and the Arab Anomaly.
The latter is the title of a book, in which the Jordanian-American author concludes:
there is something unique and special about Tunisia that is missing in the rest of
the Arab world, to which Tunisia both belongs and does not(Masri 2017: xxvi).
The major slogan of the Jasmine Revolution was in Standard Arabic: aʃ-ʃaːb
yuriːd isqaːtˁan-niðˤaːmthe people want to bring down the regime.
3
Historically,
Standard Arabic has been associated with pan-Arab identity, while vernacular
Arabic has been a prominent feature of territorial nationalism (Suleiman 2004).
This kind of social identity, in the framework of Bucholtz and Hall (2004), is
3This is a reference to the rst line of a 1933 poem by Tunisian poet Aboul-Qasim Echebbi: If, one
day, the people want to live, then fate will answer their call.
56 McNeil
constructed around sameness and difference: emphasizing similarities between
oneself and the other people in a group, as well as accentuating ones differences
from people of other groups. This is instantiated through the semiotic processes of
practice,indexicality, ideology, and performance (Bucholtz and Hall 2004: 370).
Tunisian Arabic acts as an index of national identity, in relation to both Standard
Arabic and other vernaculars, while the forum studied here is a locus of the
practice, ideology, and performance in which this identity is built. We will see how
Tunisians on the forum construct a post-revolution national identity through their
use and promotion of their vernacular, often in opposition to Arabswho are
constructed as an Other.
Tunisians are also constructing this identity in relation to French and both
the domestic elites and the foreign colonizers which it indexes. Many common
words in Tunisian Arabic were originally borrowings from French and some have
been adapted to Tunisian phonology, e.g. blaːsˁaplace> Fr. place (Sayahi 2011a:
128). Code-switching is also common in spoken Tunisian this contributes to the
widespread, though mistaken, impression of Middle Eastern Arabs that the
Maghrebi varieties are more French than Arabic. The French language itself is
associated both negatively with colonialism, and positively with modernity: the
more conservative sectors of the population tend to perceive it as a threat to the
Arabo-Islamic identity, while the more progressive sectors consider it a necessary
tool for scientic and technological advancement(Sayahi 2014: 42). In the post-
independence period, competence in French increased due to expanded access
to education and French became the language of the Tunisian political and social
elite (Sayahi 2014: 3132). There are indications, however, that the younger
generations have less competence in French, compared with their parents (Daoud
2011b). French is also increasingly in competition with English, which is not
associated with a colonial power in Tunisia and is perceived as more important
internationally (Sayahi 2014: 51).
2.4 Online language use in Tunisia: an example post
To illustrate the TunisiaSat forum and the character of online Tunisian writing, I
will discuss two samples from 2010 (Figures 1 and 2). These posts are characteristic
of the forum and show the multilingual nature of Tunisia: rather than using one
single language, these posts instead resemble the diglossic switching often seen in
speech (Boussofara-Omar 2006; Sayahi 2014). In Figure 1, the thread originator
had asked what household items, clothes, and other necessities a bride needs for
her trousseau. In the replies, several commentators supplied links to existing lists
and offered opinions of the various resources.
Tunisian internet forum 57
Figure 1: An example of a Tunisia-Sat.com post. The quoted text is in a mixed variety that leans
towards standard, and the reply is written in a mixed variety that leans towards Tunisian.
Figure 2: A thread from TunisiaSat, with text in romanized Tunisian with French code-switching
(top), French (middle), and Tunisian Arabic (bottom).
58 McNeil
The post in Figure 1 is a response to one of those links. It begins with a
quotation of a previous reply the smaller text in the gray box. This reply provides
a link to additional items that the writer describes as being beyond the basics.
Though it contains some vernacular features, its lexicon and syntax are largely
Standard Arabic. The main text of the post, however, is more markedly Tunisian. In
the following transcription, words that are unambiguously Tunisian Arabic are
underlined, Standard Arabic is in boldface, and indeterminate words are in plain
roman font.
4
Extralingual elements are noted in chevrons.
(1) tophat-raising emoji
w illi muʃmin baːrdo ʃ-ya-ʕmel ???
and who not from Bardo what-he-does ???
And what are you supposed to do if youre not from Bardo???
[= if youre not rich]
light blue font
maʃkuːra ʕa-l-ʔidˤaːfa (wa law ʔanna-haːʔiʃhaːr
thanked.Ffor-the-addition (and if that-it ad
tu-ʕtabar)
it-is.considered)
thanks for your addition (even if its something of an advertisement)
ʔaxuː-k fiːallah
brother-your in God
your brother in Islam
The first line here willi muʃmin baːrdo ʃyaʕmel??? is emotionally emphatic, as
expressed through the preceding emoji, the multiple question marks, and the
language choice, which is entirely Tunisian Arabic. The message the writer is
conveying is that the items the previous post linked to are unaffordable for anyone
not from Bardo (an upper-class district in the capital Tunis), in other words, for
most Tunisians. The use of Tunisian Arabic lends emotional force because it makes
the written statement evoke the spoken exclamation, or how he would say this
spontaneously if they were speaking face to face. By contrast, the writer lessens the
emotional impact of his mild admonishment of the poster (that her link was like an
advertisement) by using Standard Arabic. He further softens it by putting it in
parenthesis and in a lighter-colored font than the rest of the post.
4Indeterminateincludes words used in both Tunisian and Standard Arabic. Often, the pro-
nunciation would identify the word as one or the other, but because the Arabic script does not
include short vowels, such differences are often obscured in writing. Ambiguous words are
transcribed here with Standard Arabic vowels, for convenience.
Tunisian internet forum 59
Many posts on this forum are written exclusively in Standard Arabic or
vernacular Arabic, but mixtures of both varieties, as in Figure 1 here, are also
common. In addition, we also see posts like those in Figure 2. In this thread, we
have a post written entirely in French (middle), one written in Tunisian Arabic
(bottom), and, at the top, a post written in Tunisian Arabic (with French code-
switching) but in romanized Arabic script (Arabizi). These two examples show
that, even in 2010, Tunisia had already drifted far from the Fergusonian model, in
which specic language varieties are associated with specic domains. In these
two posts we have one domain an asynchronous conversation between people
who do not know each other personally with four different language varieties
represented. Clearly, whatever is prompting the variant codes is not one simply of
context or domain.
Romanization like that in Figure 2 used to be quite common: in the early days
of the internet, digital technology was only available in ASCII (the basic set of Latin
characters used for English). As the technology matured and became more wide-
spread, however, native Arabic interfaces and keyboards became available. Some
users, however, continued writing in Arabizi, which carries in-group prestige
among young internet users (Alkhamees et al. 2019).
There is some indication that Tunisians (and North Africans in general) may
have been slower to switch to using Arabic script online than Middle Eastern
internet users, due to the dominance of French in the formal sphere. The extent to
which Tunisians, and Arabs in general, still write in Standard Arabic, French, and
Arabizi online is not clear from the existing research, however. Sayahis (2014)
study of a Tunisian soccer forum found that half of the forums posts were written
in French, likely because the language of the forum interface and administration
was French (2014: 43). The other half were written in Tunisian Arabic, but nearly
entirely in romanized form, which the author attributes to the fact that it is hard to
nd an Arabic keyboard in a public internet space(Sayahi 2014: 112118). Another
study from the same year found that 43% of a corpus of Tunisian SMS and online
writing was made up of Tunisian Arabic written in Latin characters, while only 25%
was Arabic, either Standard or Tunisian, written in Arabic script (Younes and
Souissi 2014). A more recent study of Tunisian Facebook posts, however, found
that 61% of posts were Tunisian Arabic written in Arabic script while only 10.8%
were romanized Tunisian Arabic; in addition, only 21.1% of posts were in MSA, and
7% in French (Kashina 2020). It is not clear whether Tunisianslanguage use online
has changed over the past several years, or if these studies are just not comparable,
due to their differing methods and corpora. The current study was designed to
resolve this confusion by explicitly comparing Tunisiansusage of different
language varieties online and how that usage has changed over time.
60 McNeil
3 Tunisian Arabic online: corpus and methods
3.1 The TunisiaSat corpus
The corpus data for this study comes from the website TunisiaSat (tunisia-sat.
com), which describes itself as the largest community of Tunisians online and
consists of forums on topics such as sports, news, religion, and family life. The
most popular boards have hundreds of thousands of threads and participants in
the millions. While anyone can read the forums, users must have an account and
log in to post to them; all the posts are associated therefore with a username. The
user pages do not provide any biographical information such as gender or location:
the only available information about a user is their username, how long they have
been active on the forum, and the posts that they have contributed.
From the topics and language used on the site, participation appears to be
limited to speakers of Tunisian Arabic: in other words, I saw little evidence of other
Arabic vernaculars on the forums. As noted above, approximately one million
speakers of Tunisian Arabic reside outside of Tunisia: it is not uncommon for
Tunisians to spend several years in another Arab country, especially in the Gulf, to
work. Also, in recent years migration to Europe (including clandestine migration)
has increased. The number of speakers abroad, however, are dwarfed by the
number almost 12 million residing in Tunisia. In addition, Tunisia itself is
highly homogenous and gets little immigration. For these reasons, the participants
on the board can be assumed in most cases to be native speakers of Tunisian Arabic
residing in Tunisia.
The corpus collection and analysis were done with scripts I wrote using the
programming language Python. To build the corpus I programmed a web-crawler to
scrape TunisiaSat posts from 2010 to 2021. After collection, the corpus consisted of
just over 16 million words for eachyear, 32.5 million words total. The text of each post
was normalized as is usual in Arabic textual processing (see Habash 2010) and its
associated metadata was collected. This metadata includes the thread ID, post ID,
date, the username of the author, and the forum category (e.g. Sports). The News
and Sports forumsposts collected outnumber all other categories combined in both
2010 and 2021. Many other popular forums on the site are related to technology: it
appears that TunisiaSat began as a forum to discuss satellite television technology
(hence the name), before expanding into general interest forums.
3.2 Analyzing language choice on the forum
As we saw in the samples discussed in Section 2.4, many posts on the forum are
unlikely to be solely in one language variety or another. In addition, many words
Tunisian internet forum 61
are shared between the two varieties or are ambiguous, due to the lack of short
vowels in the script. For this reason, it is impossible to count the number of posts
written in Tunisian.This study instead compares a discrete list of highly salient
Tunisian words with their equivalents in Standard Arabic, Arabizi, and French. In
the example post in Figure 1, for example, the occurrence of the Tunisian negator
muʃ(is not) would be added to the total occurrences of muʃacross all posts and
compared with the occurrences of the Standard Arabic equivalent laysa (is not)as
well as the Arabizi mouch and the French pas, along with their orthographic and
inectional variants. In this way, language choice across the site was estimated
without having to categorize individual posts as one variety or another.
This method is innovative: I am not aware of another study (certainly not on
Arabic) that compares varieties systematically in this way. The objection could be
raised, however, that it may be misleading: perhaps the counted word was the only
Tunisian Arabic word in an otherwise Standard Arabic post. The salience of the
chosen words, however, is key: most of the words chosen for comparison are
function words, such as tense markers and question particles. In the code-
switching framework of Myers-Scotton (1995), the language variety of the gram-
matical markers in a sentence is the matrix language of the sentence, which
speakers will identify as the languageof an utterance, regardless of the source of
the content words (Myers-Scotton 1995). All of the Tunisian Arabic words chosen
for comparison (even those that are not grammatical markers) are also highly
salient, e.g. barʃaa lot.Research on written vernacular Arabic has shown that
readers will consider a text to be in vernaculareven if it contains only a few
highly salient vernacular words (Elinson 2013: n. 20; Miller 2017: 106). In
comparing the occurrence of salient words, then, I am identifying texts which
writers would have considered vernacular as they were writing, and that would
have been received as vernacular by their readers.
In choosing the TunisianStandard word pairs, I began with a list of the 500
most frequent words from the Tunisian Arabic Corpus (McNeil 2019). I then chose a
subset based on the following criteria: 1) The Tunisian word has a Standard Arabic
equivalent and is unambiguously distinct from it; and 2) the Tunisian word (and
any of its orthographic variants) does not coincide with any other Standard Arabic
word, and vice versa. In addition, each of the words needed to be independent: it
was not possible to search for afxes (like the future tense marker sa- or the
negation particle -ʃ,for example).
5
Following this process, a total of 30 words were
chosen and their French and Arabizi equivalents were added. For example, the
5This is because the corpus was plain text, with no morphological parsing or other processing. So
the search algorithm had no way to distinguish between the future particle sa-, for example, and a
word that simply began with s-.
62 McNeil
terms for a littlewould be: ʃwayya (Tunisian) chwaya (Arabizi) qaliːl(an)
(Standard Arabic) un peu (French). In addition to the base form of words,
however, common inectional and orthographic variants were included: a
complete list of terms and their variants is in Appendix A.
To calculate the relative proportion of language varieties, I counted the
frequency of each word in each forum post and summed them. This raw frequency
was then converted into a normalized frequency of occurrence per one million
words, which both makes the numbers easier to interpret and directly comparable
between the two corpora. I then calculated the average (median) frequency of
words of each language variety for each year.
For insight into the ideologies behind these changes, I conducted a qualitative
analysis of metalinguisitc discussion on the forum. I selected two recent threads
from TunisiaSat: the first post discussed the status of French in Tunisia, while the
second dealt with the status of Tunisian Arabic. Though discussions of this kind
were rare on the forum, these two were quite active, running to 20 pages of replies
before being closed. I analyzed and categorized the arguments made on these
threads to ascertain the language ideologies of the forum users, and to what extent
they explicitly associated the various language varieties available to them with
distinct identities.
4 Results: a large increase in Tunisian Arabic
4.1 Quantitative results
The major finding of this study is that the proportion of writing on the forum in
Tunisian Arabic, relative to the other language varieties, vastly increased between
2010 and 2021. In 2010, Tunisian Arabic figured in a minority of the posts on the
website less than a fth of usage whereas the bulk of the forum used Standard
Arabic, with a signicant amount of French and a small contribution of Arabizi. By
2021, however, Tunisian Arabic had become the dominant language of the site.
Although the proportion of Standard Arabic decreased, much of the growth of
Tunisian Arabic was not at the expense of Standard, but rather at the expense of
French and Arabizi. The proportion of the French equivalents used fell to just a
sliver, and Arabizi disappeared almost entirely.
Table 1 shows the normalized frequency of the ten most common equivalent
terms, in all four language varieties the frequency for all 30 equivalent terms is
given in Appendix B. The Tunisian proportioncolumn gives the percentage of
time that the Tunisian word is used, among all instances of equivalent words. For
example, in the rst row we see that the Tunisian relative pronoun illi (that,
Tunisian internet forum 63
what) had a frequency of 1,074 per million in the 2010 corpus, while the frequency
of its Standard Arabic equivalent allaːðiː was 5,154 per million. In addition,
the romanized form illihad a frequency of 196 per million, and the French qui
appeared at 1,426 per million, outnumbering its Tunisian counterpart. This means
that, when forum users wrote the relative pronoun in 2010, they used the Tunisian
term less than 14% of the time.
By contrast, in the 2021 corpus the Standard Arabic term allaːðiːwas used only
3,389 times, compared to 5,679 instances of Tunisian illi. French had decreased to
only 92 per million, and Arabizi was almost unattested at only 3 per million. Writers
on the forum used the Tunisian term 62% of the time. This term is typical, as we can
see when we look at the median frequency of all terms (Figure 3): a large increase in
Tunisian Arabic, a modest decrease in Standard Arabic, and a large decrease in
both Arabizi and French.
We see this pattern in almost all the equivalent terms. In 2010, 29 out of 30
Tunisian terms are the minority choice; the lowest relative frequency of a Tunisian
word in 2010 was 1.8% and the highest was 53.0%. By 2021, however, this pattern
had been reversed (Figure 4). For 25 out of 30 terms, it is the Tunisian word that is
used more frequently. In none of the pairs did the relative frequency of the Tuni-
sian word decrease: every Tunisian word was used more frequently (compared to
its Standard Arabic equivalent) in 2021 than in 2010. The lowest relative frequency
for a Tunisian word in 2021 was 16.5% and the highest was 95.8%.
It is worth noting the heterogeneity of the relative frequencies in the results.
For example, in 2010 the Tunisian word ɣaːdi was used for thereonly 1.8% of the
time, whereas its Standard Arabic cognate hunaːka was used 97% of the time. At
the other extreme, Tunisian bɛːʃ (will) was used 53.0% percent of the time even in
2010, compared with just 22.6% for its Standard Arabic cognate sawfa. These
discrepancies are because there is rarely perfect semantic and pragmatic overlap
between the equivalent pairs. Tunisian ɣaːdi, for example, is used only as an
adverb of place ([over] there), whereas hunaːka, in addition to this use, is one of
the main existential particles in Standard Arabic (there is). Likewise, Tunisian bɛːʃ
(will) is always written independently, whereas the Standard Arabic sawfa has a
prexed form (sa) that could not be included in its count (since it was not possible
to search for afxes), making it appear less common than it was.
Overall, such differences balance out throughout the word list. We can see this
by looking at the equivalent pairs that do have close semantic fields, like the
question particle how much(qaddɛːʃ in Tunisian, kam in Standard) and the
adverb always(diːma,daːʔiman). The 2010 proportion of Tunisian use for both
these words (19.2 and 17.9%) are close to each other as well as to the overall mean
of 19.7% for 2010. Likewise, their relative frequencies in 2021 (68.0 and 68.1%) are
similar both to each other and to the 2021 median of 69.9%. This internal coherence
64 McNeil
Table :Normalized frequency of equivalent terms,  and .Occurrences per million words for most frequent  terms full data in Appendix B.
Word Tunisian Arabic
freq
Standard Arabic
freq
Arabizi frequency French frequency Tunisian proportion
TunAr StAr          
what (rel pron)
illiallaːðiː
, , , ,  ,  .%.%
will
bɛːʃsawfa
 ,      .%.%
(is) not
muːʃlaysa
 , ,   ,  .%.%
this (f.)
haːðihaːðahi
  , ,    .%.%
very, a lot
barʃakaθiːr(an)
 ,      .%.%
now
tawwaalʔaːn
 ,     .%.%
brother
xuːʔax
  ,    .%.%
that (dem pron)
haːkaðaːlika
  , ,   .%.%
what (interrog)
ʃnuwwamaːða
 ,      .%.%
like
kiːmamiθla
       .%.%
Tunisian internet forum 65
suggests that, despite the wide variance in relative frequencies, the aggregated
results do accurately reect language use on the site.
The increase in Tunisian Arabic was not uniformly distributed among the
different forums on the website. The largest increases were in the Family Life and
General forums, where the Tunisian proportion increased to almost 80%, while
many of the technical forums saw little increase. Whether large or small, however,
all forums saw an increase of some size during the period of the study.
If this sample is taken to be representative of writing on the forum, it suggests
that the proportion of writing in Tunisian Arabic increased from less than one-fifth
Figure 4: Proportion of each language variety for the fifteen most frequent equivalent terms. Full
data given in Appendix B.
Figure 3: Median frequency (per million words) of equivalent terms, 2010 and 2021.
66 McNeil
(19.7%) in 2010 to over two-thirds (69.9%) in 2021 (Figure 5). Much of this growth
came at the expense of Standard Arabic, the proportion of which decreased from
51.9% in 2010 to 28.5% in 2021. Equally important, however, was the almost
complete disappearance of French and Arabizi on the site. While French had made
up a signicant portion of the 2010 corpus (23.1%), it had dwindled to just 1.5% by
2021. Contrary to the data of prior studies, Arabizi represented just 5.2% of written
tokens even in 2010 by 2021 it had, for all intents and purposes, disappeared.
4.2 Qualitative results
The qualitative analysis was of two recent metalinguistic threads on TunisiaSat, in
which users were debating the place of various language varieties in Tunisia. These
kinds of language debates can provide insight into the language ideologies in a
society they are when prevailing language conicts are brought out into the open
(Blommaert 1999). These debates are, at heart, political rather than linguistic
debates: the conicting views about language in Tunisia represent conicting
views of the kind of country Tunisia is or should be.
In the first post, a user complains about receiving a government letter written
in French and titles his post, For the love of God, speak to us in Arabic!
6
The
poster (writing in Tunisian Arabic) wonders sarcastically if there are French
departments that send out letters to the French citizens written in Arabic, and asks
Figure 5: Proportion of each language variety in the forum corpus, 2010 and 2021.
6https://www.tunisia-sat.com/forums/threads/4134011/.
Tunisian internet forum 67
who exactly does our country belong to?(l-blaːdhaːði tbaːʕ ʃkuːn biðˁˁabt?). He
later edits his post to specify that hes not interested in answers like because were
still colonized by France:he wants the realreason that is hidden. This suggests
that he views the place of French in Tunisian society to be a subject of contention
between different groups of Tunisians, unrelated to France itself.
The replies reflect an overall negative opinion about the place of French in
Tunisian life. Although there were some who made statements like French is the
best language in the world,they were in the minority and received responses like
ROFLemojis. The majority opinion shared the original posters annoyance with
the use of French in ofcial communications, offering examples of their own that
they were fed upwith. One observed that most foreign embassies in Tunisia post
in French on Facebook and that this shows a lack of respect for Tunisia.Another
described a visit of Turkish president Erdogan (who only speaks Turkish) to
Tunisia: Erdogan greeted the Tunisian delegation with a couple of memorized
phrases in Standard Arabic the Tunisians responded in French.
Participants also extended the criticism to the prevalent use of French in the
media. One writer said that his mother could no longer watch Tunisian cooking
shows and had started to watch Algerian ones, where they spoke in understand-
able Arabic without all the showing offin French. The same writer protested the
association of French with the cultured elite, saying How is it that I am highly-
educated, yet speak Arabic? It cant be! How scandalous!!
Most responses empathized with the original posters frustrations but did not
offer an explanation for the phenomenon. Those who did generally explained the
conflict as a generational one:
This is the French bourgeois mentality thats deep rooted in the older generation. Ill give you
an example There was this old movie from 1988 on TV yesterday The [Tunisian] main
character goes to a cafe in Paris, where he meets this Algerian guy who owns it. And they sit
there speaking French to each other, with only a word or two of Arabic To me it was just
further confirmation of the inferiority complex that the old generation has towards France.
This comment was written in Tunisian Arabic, except for the English inferiority
complex.Recall that the younger generation (presumably highly represented on
this forum), has had less immersion in French than their parentsgeneration, and
are considered by the older generation to be lacking in their French skills (Daoud
2011b). Here they are turning that criticism around it is their parentsgeneration
that has the problem and positioning French as unimportant and in fact obtru-
sive in Tunisian life. They have also had much more opportunity to use Tunisian
Arabic in formal contexts (like writing) compared to their parentsgeneration; in
this way they have the option of a different kind of national identity, represented
by vernacular, which is no longer dependent on France.
68 McNeil
While the participants in this discussion generally used the term Arabic
without specifying which variety they meant, there was some discussion of the
tension between the Standard and the vernacular. More than one poster made
comments to the effect of Well the president speaks Arabic, and no one un-
derstands him(referring to the highly classical style of Standard Arabic employed
by President Kais Saied). Another poster started off defending the use of French in
Tunisia, before pivoting to the status of Tunisian Arabic:
Openness to languages is a good thing. Proficiency in Arabic, English and French together not
only benefits you in your work life, but also in your level of culture and your view of things .
Whatever the language, the important thing is using it well and to use it correctly. Thats the
problem with Tunisians, many of them do not excel in even one of these languages. And the
common language dɛːrja is not a language [ma-hiːyaː-ʃluɣa] at all but a dialect. Itst for
expressing emotions for example but is not at all capable of expressing thought (except of
course for supercial thoughts).
It is notable that this post is largely written in Tunisian Arabic, with some higher
level words borrowed from Standard Arabic. The writer apparently sees no
contradiction between writing in Tunisian Arabic while saying that Tunisian
Arabic is incapable of expressing complex thought.
These ideologies around Tunisian Arabic are the subject of the second post, in
which the originating poster asks: What do you think about the Tunisian
language in Latin letters?
7
This title is provocative in both its use of the term
Tunisian languageand its promotion of romanization. The idea of writing in
Latin script was not popular and few people engaged with it; most of the thread
focused instead on the status of the vernacular. All the replies can be clearly
divided into two camps: pro-Tunisian Arabic and pro-Standard Arabic. The pro-
Standard partisans had several main objections, all of which were mentioned by
Ferguson as common language ideologies in diglossic societies (1959, 1997 [1959]).
A recurrent one in response to the posters framing was that Tunisian is a
dialectand not a language.This assertion was usually considered self-
evident, requiring no proof, marking it as the normative language ideology. Two
other main arguments of the pro-Standard camp were also related to Tunisians
status as a language: 1) that Tunisian has no grammar(with the implication that
reallanguages have grammar) and 2) that there is too much regional variation in
Tunisian for it to be standardized. Other users argued that Standard Arabic is
superior because of the richness of its lexicon. One listed 25 words for rainin
Classical Arabic and wrote (in Tunisian Arabic), I want the language with 12
million words, not the impoverished one(aːna n-ħɛbb luɣa al-12 miːlyoːnkɛlma, u
7https://www.tunisia-sat.com/forums/threads/4228359/.
Tunisian internet forum 69
ma-nħɛbb-ʃɛl-luɣaɛl-faqiːra). But whereas one writer touts the vastness of the
Arabic lexicon as evidence of its superiority, another uses it as evidence for its
irrelevance: 25 words for rain and 24 of them are completely unknown.
Interestingly, few of the participants used another common argument cited by
Ferguson: that Arabic is inherently superior because it is the language chosen by
God for the revelation of the Quran. Rather, the practical aspects of religion were
more likely to be raised: namely, that if Standard Arabic is not maintained as an
official language, people will no longer be able to understand the Quran. This was
countered by the argument (offered by several different authors) that religion has
nothing to do with the question: Indonesia is the largest Muslim country and they
dont speak Arabic.It is surprising that this opinion was popular, given the
traditionally tight linkage of Islam and Arab identities, as the term Arabo-Islamic
used by Sayahi (2014) above indicates.
The pro-Tunisian replies can be divided into several main arguments as well.
One was that Standard Arabic is adeadlanguage(luːɣamayta) and is not useful in
modern life. The original poster made this point repeatedly, saying that he had
wasted years of his childhood learning a language thats not used for anything
except the 8 oclock news.This writer consistently uses the term Tunisian
language,pointedly contrasted with the normative framing of the Tunisian
dialect.He further emphasizes the distance betweenTunisian and Arabicwith his
orthographic choices. Although he spelled luːɣalanguagein the Standard way in
his originating post title lɣa , in later posts he spells it with a long vowel,
representing the Tunisian pronunciation: luːɣa . This is not a common spelling,
and it is greeted with incredulity and derision from the pro-Standard camp.
8
Other pro-Tunisian posts claimed that Tunisian is not Arabic anyway, since it
is a mix of Arabic, Berber, Latin, and French. This implies that Tunisian cannot,
then, be merely a dialectof Arabic. One writer even claimed that Tunisian is more
similar to Phoenician than to Standard Arabic (providing a cognate sentence as
proof). This argument like the similar arguments made in the Levant (Suleiman
2003) emphasizes Tunisias deep pre-Arab roots. It also reects a common
self-view of Tunisia as a liberal society, open to languages and ideas from else-
where. This is in dialogue with the stereotype of the Arabian Gulf (the most Arab of
Arabs) as closed and insular.
In fact, the relationship of Tunisian Arabic to other vernaculars, particularly
Eastern (Mashreqi) ones, was a major theme in the discussion. Some in the pro-
vernacular camp argued that the lack of mutual intelligibility between Tunisian
and other varieties of Arabic conrms its status as a language: The Mashreqi or
8Representation of such lengthening in other Tunisian words like raːjil man’–is common; it
just happens that it is not common in this word.
70 McNeil
even the non-Tunisian Maghrebi doesnt understand me for a simple reason: we
dont speak the same language(manɛħkiyuː-ʃfard luːɣa). Another user agreed:
The entire Gulf has made a campaign of telling us were not Arab. Its pathetic: theyre trying
to distance themselves from us and were running after them trying to prove were related.
Note the correspondence of language and identity here: saying that North Africans
dont speak Arabic means that they are not Arab,since Arab is not an ethnicity
but is rather defined by mother tongue (see Bassiouney 2020: 237240). Some
posters consequently expressed strong dislike for the Arabs.One, when chal-
lenged about his contempt for his Muslim brothers,explained:
Egyptians, Moroccans, Algerians: do they like you? When you go to the Gulf to work, do they
respect you? The Emirates were the first to bomb Libya, and look who bombed Yemen. The
Arabs are beastly to each other. So what brothersare you talking about?
These arguments get at the heart of the struggle for identity, both as expressed in
this debate and as reflected in the quantitative increase in Tunisian Arabic use. It is
the struggle to define Tunisians as a distinct identity, speaking a distinct lan-
guage not merely a dialect.The difference between a dialect and a vernacular
is that a dialect lacks autonomy (Fishman 2010); in other words, it is considered a
variety of another language, not an independent language itself. In this way, those
arguing that Tunisian is a different language than other varieties of Arabic are
assigning autonomy to it, trying to promote it from a mere dialectto a full-
edged vernacular language.
9
Whats more, they are not only arguing for, but
also instantiating its autonomy, both by writing it and by creating Abstand through
non-standard spellings like luːɣa .
Although these views were countered by the pro-Standard camp, interestingly
no one appeared to question the suitability of Tunisian Arabic for writing. Most
pro-Standard partisans, in fact, used Tunisian Arabic to express their arguments
against its suitability. This shows the extent to which writing in Tunisian Arabic
has been normalized, a large change from Walterstime, when Tunisian Arabic
was written in few circumstances (Walters 2003: 94).
5 Conclusion
In this article, I have argued that the momentous societal changes over the past
decade have led to Tunisians increasingly expressing a national identity, distinct
9I should note here that there is no Arabic equivalent of vernacular,so the argument here is
framed as between dialectand language.
Tunisian internet forum 71
from a pan-Arab identity, and that they are expressing this identity through their
writing in Tunisian Arabic. To support this, I have drawn upon my research of
online forum posts to demonstrate how Tunisian speakerslanguage choice online
has changed between 2010 and 2021. I used a novel method of comparing high-
frequency, salient terms between the language varieties as a proxy for language
choice. I found a large increase in the use of Tunisian Arabic during the period
under study: in 2010, the Tunisian variants were chosen only 19.7% of the time on
the forum; they were outnumbered by both Standard Arabic and French variants.
By 2021, however, Tunisian had become the dominant language on the forum, with
its lexical items used 69.9% of the time. This increase was not uniform: some forum
categories saw large increases in Tunisian Arabic, while others saw only slight
ones. Whether large or small, however, every category witnessed an increase in
Tunisian Arabic. A significant result of the study was that the increase in Tunisian
Arabic did not come entirely at the expense of Standard Arabic. While use of
Tunisian Arabic increased 50 percentage points, that of Standard Arabic decreased
only 23.4 percentage points. The rest of the Tunisian increase came at the expense
of French and romanized Arabic (Arabizi). French was more present than Tunisian
on the forum in 2010, at 23.1% of the lexical variants, whereas by 2021 it had
dwindled to only 1.5%. Arabizi was rare even in 2010 (5.2%) but by 2021 had almost
entirely disappeared.
This study has implications for the status of diglossia in Tunisia and in the
Arab world in general. The fact that Tunisian vernacular has become the unmarked
language of choice on the forum indicates a radical departure from the Arabic
language situation described by Ferguson (1959), in which writing was almost
entirely the domain of Standard Arabic. Though the use of Tunisian Arabic in
writing saw a slow but steady increase in the last two decades of the 20th century
(McNeil 2023; Sayahi 2014; Walters 2003), the current study shows the extent to
which online writing has accelerated this change. The slow change can be
considered largely the effect of increased societal literacy, but the rapid change in
the second decade of this century can be attributed to the spread of the internet and
the increased sense of Tunisian exceptionalism following the 2011 revolution. The
increase in use of Tunisian Arabic in writing coincides with its spread into many
public domains (Achour Kallel 2011, 2015; Baoueb et al. 2012; Daoud 2011a; Mejri
2017; Sayahi 2014, 2019). The current study, in combination with these other works,
supports the assertion of Alkhamees et al. (2019) that Arabic diglossia is in the
process of destabilizing.
This development of Tunisian Arabic as a written language is key to its future
status and domains. In many cases of former (now resolved) diglossia such as
Tamil, Greek, and the Romance languages, the point at which the vernacular
variety became an acceptable choice for writing was an inflection point that
72 McNeil
became a self-reinforcing cycle (Hudson 2002: 31). Furthermore, destabilizing
diglossia is characterized by exactly the leakage in functionand mixing in
form(Fasold 1984: 54) that we see here. Hudson (1991: 1314) explains this by
postulating that the rigid functional compartmentalization between the High and
Low varieties is not a dening characteristicof diglossic language situations but
rather a necessary prerequisite.In other words, maintaining a strict separation
between the two varieties is necessary to prevent convergence and language shift.
This would suggest that, as the barrier between the two varieties becomes more
porous, it may lead to a destabilization of the linguistic situation and, eventually,
the expansion of the vernacular into all domains.
This process is not inevitable or naturalbut rather reflects the choices of the
speech community (Coulmas 2002). The desire for a full-edged standard na-
tionallanguage as an attribute of autonomy or of sovereigntywas one of the
societal trends that Ferguson predicted would lead to the breakdown in diglossia
(1959: 338). We can see this development clearly in the metalinguistic comments
on the forum, where users explicitly associate Tunisian Arabic with their national
identity. This identity is not only in opposition to Standard Arabic and French, but
also to other Arabic vernaculars especially Middle Eastern ones. These views are
still highly contested, however, with some forum users valuing the Arab identity
indexed by Standard Arabic above that of Tunisian national identity. It is impor-
tant to note, however, that even anti-vernacular partisans often wrote their argu-
ments in Tunisian Arabic. This indicates that Tunisian has been normalized as a
written language, even if some of those writing it still do not consider it a
language.
Research funding: This work was funded by American Institute for Maghrib
Studies (AIMS).
Appendix A: Equivalent terms in four varieties
Word Tunisian Arabic Romanized
Tunisian
Standard Arabic French
IPA Orthography (Arabizi)aIPA Orthography
a little ʃwayya ،، chwaya,
chweya
qaliːl(an) ، un peu
also zada ،، zeda, zada,
zede
aydan aussi
always diːma dima, dayma,
deyma
daːʔiman toujours
Tunisian internet forum 73
(continued)
Word Tunisian Arabic Romanized
Tunisian
Standard Arabic French
IPA Orthography (Arabizi)aIPA Orthography
be able to ynejjim ،،،
،
tnajem, tnajim,
tnejem, nejem,
najim, najem,
ynajem
yastatˤiːʕu،،
peux, peut
brother xu ،، ouya, ouk,
khouya,
khouk, khou,
ou
ʕax ،، frère
come yji ،، yji, tji, nji taʔtiː،ء،،
ء،،ء
viens, vient
good bahi ، behi, bahi,
bahia, bahya
jayyid ، bien
he has ʕandu andou,
andou, ando
ʕandahu il a
himself ruːħu roo, roou,
roho, rouo,
rouou,
rouhou
nafsahu la même
how
much
qaddɛːʃ kaddech,
kadech, qad-
dech, adech
kam combien
like kiːma kima miθla comme
like that hakka ، haka, hakeka,
heka
haːkaða comme ça
man raːjil ، rajel, rajil rajul ، homme
not/is not muːʃ ،،،
،،
،
moch, mosh,
mouch, mou-
che, moush,
much, mahich
laysa ،،،
،
pas
now tawwa ،،،،
tawa, taw, twa alaːn maintenant
see yʃuːf،،
tchouf, nchouf,
ychouf
yara ،، vois, voit
should laːzim ،،،
،،
lazem, laz-
emna, lazmou,
lezem,
lezemha, yel-
zemha
yajib devrait
talk yɛħkiː،، tahki, tehki,
taki, teki,
nahki, nehki,
naki
yatakallam ،، parles,
parle
74 McNeil
(continued)
Word Tunisian Arabic Romanized
Tunisian
Standard Arabic French
IPA Orthography (Arabizi)aIPA Orthography
that (dem
pron)
haːka/
haðaːka
، hadhaka,
hathaka,
hatheka, hed-
haka, hedheka
ðalika/
ðaːk
، cette, cet
them huːma houma, homa humma ils
there ɣaːdiː ghadi, adi,
gadi
hunaːk bas
this haːðiː، hedhi, hethi,
hathi, hadi,
hedhe, hadhi,
hédhi
haːðihi ça
very, a lot barʃa/
yaːssir
،
،،
،،
barcha, bar-
sha, yasar,
yaser, yasser,
yeser, yesr,
yesser
jiddan/
kaθiːran
، beaucoup
we naħna ، ana, ahna naħnu nous
what
(interrog)
aʃ/
ʃnuwwa
،،،
،،
،
،،،
ech, éch, ach,
esh, chnawa,
chnewa,
chneya,
chnoua
maːða quoi
what (rel
pron)
illi ، eli, elli, ili, illi allaːðiː،، qui
where wayn win, wen ayna
why ʕalɛːʃ ، lech, leh,
alih, alech,
lach
limaːða pourquoi
will bɛːʃ,
mɛːʃ
،، bech, bch,
mech, mch,
mche, bach,
besh
sawfa je vais
with mʕaː-،،،
mah, mak,
maha,
maya, meya
maʕ-،،،
avec
aThe Arabizi had many variable spellings; variable spellings more than thre e lines here are truncated, indicated
by .
Tunisian internet forum 75
Appendix B: Quantitative results
Occurrences per million words in the TunisiaSat corpus, ordered by descending frequency.
Word Tunisian
Arabic freq
Standard
Arabic freq
Arabizi freq French freq Tunisian
proportion
         
what (rel pron) , , , ,  ,  .%.%
will  ,      .%.%
(is) not  , ,   ,  .%.%
this   , ,    .%.%
very, a lot  ,      .%.%
now  ,     .%.%
brother   ,    .%.%
that (dem pron)   , ,   .%.%
what (interrog)  ,      .%.%
like        .%.%
should       .%.%
be able to        .%.%
why        .%.%
with      ,  .%.%
also        .%.%
he has        .%.%
there     .%.%
talk       .%.%
good        .%.%
them        .%.%
we        .%.%
where        .%.%
come      .%.%
see       .%.%
a little       .%.%
always        .%.%
man      .%.%
like that       .%.%
himself       .%.%
how much       .%.%
76 McNeil
References
Aboelezz, Mariam. 2012. We are young. We are trendy. Buy our product! The use of Latinized Arabic
in edited printed press in Egypt. United Academics Journal of Social Sciences 2. 4872.
Achour Kallel, Myriam. 2011. Choix langagiers sur la radio Mosaïque FM, dispositifs dinvisibilité
et de normalisation sociales. Langage et société 4(138). 7796.
Achour Kallel, Myriam. 2015. Ici on parle tunisien. Écriture du politique et politique de lécriture
ou qui ne peut pas être passeur ? In Myriam Achour Kallel (ed.), Le social par le langage. La
parole au quotidien,95118. Paris: IRMC-Karthala.
Achour Kallel, Myriam. 2016. «La Rolls et la Volkswagen»: Écrire en tunisien sur Facebook en 2016.
Journal of Arabic and Islamic Studies 16. 253272.
Alkhamees, Abdulrahman, Rasha Elabdali & Keith Walters. 2019. Destabilizing Arabic diglossia?
New media and translingual practice. In Amel Khalfaoui & Youssef Haddad (eds.),
Perspectives on Arabic linguistics, vol. 31, 105134. Amsterdam: John Benjamins.
Al-Khatib, Mahmoud A. & Enaq H. Sabbah. 2008. Language choice in mobile text messages among
Jordanian university students. SKY Journal of Linguistics 21. 3765.
Allehaiby, Wid H. 2013. Arabizi: An analysis of the romanization of the Arabic script from a
sociolinguistic perspective. Arab World English Journal 4(3). 5262.
Bach Baoueb, Sallouha Lamia & Naouel Toumi. 2012. Code switching in the classroom: A case
study of economics and management students at the University of Sfax, Tunisia. Journal of
Language, Identity & Education 11(4). 261282.
Bassiouney, Reem. 2020. Arabic sociolinguistics: Topics in diglossia, gender, identity, and
politics, 2nd edn. Washington, DC: Georgetown University Press.
Belnap, R. Kirk & Brian Bishop. 2003. Arabic personal correspondence: A window on change in
progress? International Journal of the Sociology of Language 163. 925.
Blommaert, Jan. 1999. The debate is open. In Jan Bommaert (ed.), Language ideological debates,
138. Germany: Mouton Publishers.
Boussofara-Omar, Naima. 2006. Neither third language nor middle varieties but diglossic
switching. Zeitschrift für Arabische Linguistik 45. 5580.
Bucholtz, Mary & Kira Hall. 2004. Language and identity. In Alessandro Duranti (ed.), A companion
to linguistic anthropology, 369394. Malden, MA: Blackwell.
Caubet, Dominique. 2004. Lintrusion des téléphones portables et des SMSdans larabe
marocain en 20022003. In Dominique Caubet, Thierry Bulot, Isabelle Léglise,
Catherine Miller & Jacqueline Billiez (eds.), Parlers jeunes ici et là-bas, 247270. Paris:
LHarmattan.
Caubet, Dominique. 2012. Apparition massive de la darija à lécrit à partir de 20082009: sur le
papier ou sur la toile: quelle graphie ? Quelles régularités? In Mohamed Meouak,
Pablo Sánchez & Ángeles Vincente (eds.), De los manuscritos medielvales a internet: la
presencia del árabe vernáculo en las fuentes escritas, 377402. Zaragoza: Universidad de
Zaragoza.
Caubet, Dominique. 2017a. New elaborate written forms in Darija: Blogging, posting and
slamming in Morocco. In The Routledge handbook of Arabic linguistics, 387406. London:
Routledge.
Caubet, Dominique. 2017b. Morocco: An informal passage to literacy in dārija (Moroccan Arabic).
In Jacob Høigilt & Gunvor Mejdell (eds.), The politics of written language in the Arab world:
Writing change, 116141. Leiden: Brill.
Tunisian internet forum 77
Coulmas, Florian. 2002. Writing is crucial. International Journal of the Sociology of Language 157.
5962.
Coulmas, Florian. 2013. Writing and society: An introduction. Cambridge, UK: Cambridge
University Press.
Daoudi, Anissa. 2011. Globalization, computer-mediated communications and the rise of e-Arabic.
Middle East Journal of Culture and Communication 4(2). 146163.
Daoud, Mohamed. 2011a. The sociolinguistic situation in Tunisia: Language rivalry or
accommodation? International Journal of the Sociology of Language 211. 933.
Daoud, Mohamed. 2011b. The survival of French in Tunisian identity. In Joshua Fishman &
Ofelia García (eds.), Handbook of language and ethnic identity, vol. 2, 5467. Oxford: Oxford
University Press.
Davies, Humphrey. 2006. Dialect literature. In Kees Versteegh & Mushira Eid (eds.), Encyclopedia
of Arabic language and linguistics, 2, 597604. Leiden: Brill.
Doss, Madiha & Humphrey Davies. 2013. Al-ʿāmmīyah al-mirīyah al-maktūba [Written Egyptian
Colloquial Arabic]. Cairo: The General Egyptian Book Organization.
Elinson, Alexander E. 2013. Dārija and changing writing practices in Morocco. International
Journal of Middle East Studies 45(4). 715730.
Fasold, Ralph W. 1984. The sociolinguistics of society. Oxford: Blackwell.
Ferguson, Charles. 1959. Diglossia. Word 15(2). 325340.
Ferguson, Charles. 1997 [1959]. Myths about Arabic. In R. Kirk Belnap & Niloofar Haeri (eds.),
Structuralist studies in Arabic linguistics: Charles A. Fergusons papers: 19541994,
250256. Leiden: Brill.
Fishman, Joshua. 2010. European vernacular literacy: A sociolinguistic and historical introduction.
Bristol, UK: Channel View Publications.
Gibson, Maik. 2009. Tunis Arabic. In Kees Versteegh & Mushira Eid (eds.), Encyclopedia of Arabic
language and linguistics, vol. 4, 563571. Leiden: Brill.
Gibson, Maik. 2013. Dialect levelling in Tunisian Arabic: Towards a new spoken standard. In
Aleya Rouchdy (ed.), Language contact and language conict in Arabic,4258. London:
Routledge.
Habash, Nizar. 2010. Introduction to Arabic natural language processing. San Rafael, CA: Morgan
& Claypool.
Hachimi, Atiqa. 2013. The Maghreb-Mashreq language ideology and the politics of identity in a
globalized Arab world. Journal of Sociolinguistics 17(3). 269296.
Hellyer, H. A. 2015. Tunisia remains a beacon of hope in the Arab world. The National. Available at:
https://www.thenationalnews.com/opinion/tunisia-remains-a-beacon-of-hope-in-the-
arab-world-1.45419.
Høigilt, Jacob & Gunvor Mejdell (eds.). 2017. The politics of written language in the Arab world:
Writing change. Leiden: Brill.
Hudson, Alan. 1991. Toward the systematic study of diglossia. Southwest Journal of Linguistics
10(1). 122.
Hudson, Alan. 2002. Outline of a theory of diglossia. International Journal of the Sociology of
Language 157. 148.
Kashina, Anna. 2020. Case study of language preferences in social media of Tunisia. In Advances
in social science, education and humanities research, Vol. 489. Proceedings of the
international conference digital age: Traditions, modernity and innovations (ICDATMI 2020),
121125. Paris.
78 McNeil
Kebede, Tewodros Aragie & Kristian Takvam Kindt. 2016. Language and social survey in Morocco:
A tabulation report. Oslo: Fafo.
Kebede, Tewodros Aragie, Kristian Takvam Kindt & Jacob Høigilt. 2013. Language Change in Egypt:
Social and Cultural Indicators Survey: A Tabulation Report. Oslo: Fafo.
Khalil, Saussan. 2018. Fuṣḥá, āmmīyah, or both?: Towards a theoretical framework for written
Cairene Arabic. Leeds, UK: University of Leeds dissertation.
Kindt, Kristian Takvam, Jacob Høigilt & Tewodros Aragie Kebede. 2016. Writing change: Diglossia
and popular writing practices in Egypt. Arabica 63(3-4). 324376.
Kloss, Heinz. 1967. Abstand languagesand Ausbau languages.Anthropological Linguistics
9(7). 2941.
Masri, Safwan. 2017. Tunisia: An Arab anomaly. New York: Columbia University Press.
McNeil, Karen. 2019. Tunisian Arabic corpus: Creating a written corpus of an unwritten
language. In Andrew Hardie (ed.), Arabic corpus linguistics,3055. Edinburgh: Edinburgh
University Press.
McNeil, Karen. 2023. When the leak becomes a ood: The development of vernacular literature in
Tunisia. In Mahmoud Azaz (ed.), Perspectives on Arabic linguistics, 34. Amsterdam: John
Benjamins, In press.
Mejdell, Gunvor. 2006. The use of colloquial in modern Egyptian literatureA survey. In
Lutz Edzard & Jan Retsö (eds.), Current issues in the analysis of semitic grammar and lexicon,
vol. 2, 195213. Wiesbaden: Harrassowitz Verlag.
Mejri, Salah. 2017. La nouvelle Constitution tunisienne en dialectal. In Veronika Ritt-Benmimoun
(ed.), Tunisian and Libyan Arabic dialects: Common trends recent developments
diachronic aspects, 191204. Zaragoza, Spain: Prensas de la Universidad de Zaragoza.
Miller, Catherine. 2017. Contemporary dārija writings in Morocco: Ideology and practices. In
Jacob Høigilt & Gunvor Mejdell (eds.), The politics of written language in the Arab world:
Writing change,90115. Leiden: Brill.
Myers-Scotton, Carol. 1995. Social motivations for codeswitching: Evidence from Africa. Oxford:
Clarendon Press.
Nordenson, Jon. 2017. The language of online activism: A case from Kuwait. In Jacob Høigilt &
Gunvor Mejdell (eds.), The politics of written language in the Arab world: Writing change,
266289. Leiden: Brill.
Palfreyman, David & Muhamed Al Khalil. 2007. A funky language for teenzz to use:Representing
Gulf Arabic in instant messaging. In Brenda Danet & Susan Herring (eds.), The multilingual
internet: Language, culture, and communication online,4363. Oxford: Oxford University
Press.
Ritt-Benmimoun, Veronika. 2014. Grammatik des arabischen Beduinendialekts der Region Douz
(Südtunesien). Wiesbaden: Harrassowitz.
Rosenbaum, Gabriel. 2011. The rise and expansion of colloquial Egyptian Arabic as a literary
language. In Rakefet Sela-Sheffy & Gideon Toury (eds.), Culture contacts and the making of
cultures, 323344. Tel Aviv: Tel Aviv University.
Sayahi, Lot. 2011a. Code-switching and language change in Tunisia. International Journal of the
Sociology of Language 211. 113133.
Sayahi, Lot. 2011b. Current perspectives on Tunisian sociolinguistics. International Journal of the
Sociology of Language 211. 18.
Sayahi, Lot. 2014. Diglossia and language contact: Language variation and change in North
Africa. Cambridge, UK: Cambridge University Press.
Tunisian internet forum 79
Sayahi, Lot. 2019. Diglossia and the normalization of the vernacular: Focus on Tunisia. In
Enam Al-Wer & Uri Horesh (eds.), The Routledge handbook of Arabic sociolinguistics,
227239. London: Routledge.
Shiri, Sonia. 2003. Speak Arabic please! Tunisian Arabic speakerslinguistic accommodation to
Middle Easterners. In Aleya Rouchdy (ed.), Language contact and language conict in Arabic,
149173. London: Routledge.
Snow, Don. 2013. Revisiting Fergusonsdening cases of diglossia. Journal of Multilingual and
Multicultural Development 34(1). 6176.
Suleiman, Yasir. 2003. The Arabic language and national identity. Edinburgh: Edinburgh
University Press.
Suleiman, Yasir. 2004. A war of words: Language and conict in the Middle East. Cambridge, UK:
Cambridge University Press.
Suleiman, Yasir & Ashraf Abdelhay. 2020. Diglossia, folk-linguistics, and language anxiety. In
Reem Bassiouney & Keith Walters (eds.), The Routledge handbook of Arabic and identity,
147160. London: Routledge.
Walters, Keith. 2003. Fergies prescience: The changing nature of diglossia in Tunisia.
International Journal of the Sociology of Language 163. 77109.
Warschauer, Mark, Ghada R. El Said & Ayman G. Zohry. 2002. Language choice online:
Globalization and identity in Egypt. Journal of Computer-Mediated Communication 7(4).
118.
Yaghan, Mohammad Ali. 2008. Arabizi: A contemporary style of Arabic slang. Design Issues
24(2). 3952.
Younes, Jihene & Emna Souissi. 2014. A quantitative view of Tunisian dialect electronic writing. In
5th international conference on Arabic language processing,6372. Oujda, Morocco:
University of Mohammed Premier Oujda.
80 McNeil
... To date, researchers have focused on Dialect Identification (DI), which can be modeled either as a binary MSA-DA classification or a multi-class problem with a prespecified set of DA variants (Althobaiti, 2020; Keleg and Magdy, 2023). Arabic DI has attracted considerable research attention, with multiple shared tasks (Zampieri et al. 2014;Bouamor et al. 2019;Abdul-Mageed et al. 2020, 2021b, 2022 and datasets (Zaidan and Callison-Burch, 2011;Salama et al., 2014;Alsarsour et al., 2018;Zaghouani and Charfi, 2018;El-Haj, 2020;Abdelali et al., 2021;Althobaiti, 2022). ...
... For his last speech, he explicitly said: " " -"I talk to you in the language of all the Tunisians", apparently using his choice of dialect as a way to identify himself with a particular group (cf. Shoemark et al. 2017;McNeil 2022). ...
... Online discourse saw some short-format messages that were similar to texts but not as limited, such as Twitter and Facebook, while internet forum postings allowed longer messages. While early forum posts were mostly written in Standard Arabic, they contained a sizable minority of posts written in Tunisian Arabic -nearly one-fifth in 2010, according to a recent study (McNeil 2022). These postings, though differing from text messages in that the writer may not personally know their interlocutors, still have a conversation-like format ( Figure 8). ...
Chapter
Full-text available
Social and technological changes over the past several decades have led to widespread writing of "spoken" Arabic dialects. In Tunisia, there has been a noticeable growth of vernacular prose literature, part of a larger development of Tunisian Arabic as a written language. Tunisia does not have a history of colloquial literature: previously even the use of "derja" in literary dialogue was rare. From this nearly non-existent base, a small "leak" of vernacular writing appeared in the latter part of the 20th century, followed by a flood – first online, and increasingly in print – in the first two decades of the 21st. This has culminated in over a dozen vernacular novels and literary translations.
Article
In recent years the amount of written vernacular Arabic has increased dramatically. But encoding an unstandardized language in writing is not straightforward and mechanical; rather, it is a complex process that balances practical considerations with ideological stances such as autonomy from the standard language. This study examines how writers of Tunisian Arabic (or derja) are navigating this elaboration process. Using a quantitative analysis of a 279,000-word corpus of print literary works written in Tunisian Arabic (2014–2021) and a 5.8-million-word corpus of internet forum posts (2010–2021), this paper explores how Tunisians writing in derja make orthographic choices to collectively position themselves in relation to the larger Arab world. The study finds that forum writers who have advocated for an improved status for Tunisian Arabic use more phonemic spelling forms—more closely representing Tunisian pronunciation—while those who have advocated for Standard Arabic are more conservative in their spelling choices. The authors of Tunisian Arabic novels and translations—pulled between issues of elaboration and readability—are a bit more conservative than the pro- derja group. These results show how writers in unstandardized or not-yet-standardized languages like Tunisian Arabic use orthography to express national identity stances in relation to supranational languages and identities.
Chapter
Full-text available
Social and technological changes over the past several decades have led to widespread writing of "spoken" Arabic dialects. In Tunisia, there has been a noticeable growth of vernacular prose literature, part of a larger development of Tunisian Arabic as a written language. Tunisia does not have a history of colloquial literature: previously even the use of "derja" in literary dialogue was rare. From this nearly non-existent base, a small "leak" of vernacular writing appeared in the latter part of the 20th century, followed by a flood – first online, and increasingly in print – in the first two decades of the 21st. This has culminated in over a dozen vernacular novels and literary translations.
Book
Yasir Suleiman's 2004 book considers national identity in relation to language, the way in which language can be manipulated to signal political, cultural or even historical difference. As a language with a long-recorded heritage and one spoken by the majority of those in the Middle East in a variety of dialects, Arabic is a particularly appropriate vehicle for such an investigation. It is also a penetrating device for exploring the conflicts of the Middle East, the diversity of its peoples and the diversity of their viewpoints. Suleiman's book offers a wealth of empirical material, and intriguing, often poignant illustrations of antagonisms articulated through pun or double entendre.