Emna Souissi

Emna Souissi
Ecole Nationale Supérieure d'Ingénieurs de Tunis · Informatique

About

17
Publications
4,237
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
214
Citations

Publications

Publications (17)
Article
Full-text available
Language resources like corpora, lexicons and dictionaries are the key element to automatically process any natural language. In this paper, we focus on the written Tunisian dialect (TD) which is abundantly present on social media and yet still qualified as a low-resource language. We automatically construct sizable bi-script TD language resources...
Article
Full-text available
Diglossia is one of the main characteristics of Arabic language. In Arab countries, there are three forms of Arabic that co-exist: Classical Arabic (CA) which is mainly used in the Quran and in several classical literary texts, Modern Standard Arabic (MSA) that descends from CA and used as official language, and various regional colloquial varietie...
Article
Full-text available
Language identification is an important task in natural language processing that consists in determining the language of a given text. It has increasingly picked the interest of researchers for the past few years, especially for code-switching informal textual content. In this paper, we focus on the identification of the Romanized user-generated Tu...
Article
Full-text available
In recent years, social web users in Arabic countries have been resorting to the dialects as a written language in their social exchanges. Arabic dialects derive from modern standard Arabic (MSA) and differ significantly from one country to another and one region to another. The use of these dialects has led to an increase of interest in the specif...
Chapter
With the increasing use of social networks and the multilingualism that characterizes the Internet in general and the social media in particular, an increasing number of recent research works on Sentiment Analysis and Opinion Mining are tackling the analysis of informal textual content, which includes language alternation, known as code-switching....
Article
Full-text available
Dans le domaine du traitement automatique de la langue arabe, la majorité des recherches menées et des réalisations accomplies ont porté principalement sur l’arabe standard moderne (ASM). Les divers dialectes arabes (DA) comptent encore parmi les langues sous-dotées. Ce n’est que depuis une dizaine d’années que ces dialectes ont commencé à susciter...
Conference Paper
The language study and automatic processing require the availability of large raw and annotated corpora. Collecting data and constructing such language resources are non-trivial tasks in the NLP field, especially when it comes to deal with low-resource languages. In this paper, we are concerned with the Tunisian dialect (TD) and propose to survey t...
Article
Full-text available
Transliteration consists of automatically transforming a grapheme’s transcription from one writing system to another, while preserving its pronunciation. It is usually used in the context of machine translation and cross language information retrieval, mainly to deal with the issue of named entities and technical terms. In the case of some Arabic d...
Conference Paper
In the Arabic-speaking world, textual productions on social networks are often informal and generally characterized by the use of various dialects, which can be transcribed in Latin or Arabic characters. More specifically, electronic writing in Tunisia is characterized in large part by a mixture of Tunisian dialect with other languages and by a mar...
Conference Paper
The Tunisian dialect is the naturally spoken language in Tunisia and unlike the MSA (Modern Standard Arabic), it is an informal and non-written Arabic language variant. However, with the growing use of ICT and especially the internet, a written form of Tunisian dialect is emerging: the electronic Tunisian dialect (ETD). Originally used in the SMS w...
Conference Paper
In Arab countries, the dialect is daily gaining ground in the social interaction on the web and swiftly adapting to globalization. Strengthening the relationship of its practitioners with the outside world and facilitating their social exchanges, the dialect encompasses every day new transcriptions that arouse the curiosity of researchers in the NL...
Conference Paper
With technological development and the emergence of advanced forms of communication such as chat and SMS, a new form of writing, having high deviation from standard language, has appeared. This paper focuses specifically on electronic writing with Latin letters in Tunisian dialect. We describe the methodology used for the construction of a dialecta...
Article
Full-text available
Nous abordons les deux problèmes liés de l’étiquetage grammatical et de la voyellation automatiques de l’arabe. Nous décrivons les difficultés qu’ils posent l’un et l’autre, et au moyen de nombreux comptages, nous en donnons une caractérisation chiffrée. La question posée est jusqu’à quel point la résolution des ambiguïtés grammaticales permet-ell...
Conference Paper
Full-text available
Nous abordons le problème de l'étiquetage grammatical de l'arabe en reprenant les méthodes couramment utilisées, lesquelles sont fondées sur des règles de succession de deux ou trois étiquettes grammaticales. Nous montrons que l'on ne peut pas reprendre tels quels les algorithmes préconisés pour le français ou pour l'anglais, la raison étant que l'...
Article
This paper describes the work achieved in the first half of a 4-year cooperative research project (ARCADE), financedby AUPELF-UREF. The project is devoted to the evaluation of parallel text alignment techniques. In its first periodARCADE ran a competition between six systems on a sentence-to-sentence alignment task which yielded two main typesof re...

Network

Cited By