Injy Khairy HamedUniversity of Stuttgart · Institute for Natural Language Processing
Injy Khairy Hamed
About
32
Publications
7,247
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
214
Citations
Introduction
Doctoral Student at the Institute of Natural Language Processing, University of Stuttgart, working on speech recognition and machine translation for code-mixed Arabic-English. I'm particularly interested in Arabic NLP, as it is a great opportunity to put my knowledge in the Arabic language and Natural Language Processing to fill existing Arabic NLP research gaps.
Publications
Publications (32)
We describe the findings of the fifth Nuanced Arabic Dialect Identification Shared Task (NADI 2024). NADI's objective is to help advance SoTA Arabic NLP by providing guidance, datasets, modeling opportunities, and standardized evaluation conditions that allow researchers to collaboratively compete on pre-specified tasks. NADI 2024 targeted both dia...
Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition. In this paper, we focus on the question of robust and fair evaluation metrics. To that end, we develop a reference benchmark data set of code-switching speech recognition hypotheses with human judgments. We define clear guidelines for minima...
We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic - English Speech Translation Corpus. This corpus is an extension of the ArzEn speech corpus, which was collected through informal interviews with bilingual speakers. In this work, we collect translations in both directions, monolingual Egyptian Arabic and monolingual Englis...
Data sparsity is one of the main challenges posed by Code-switching (CS), which is further exacerbated in the case of morphologically rich languages. For the task of Machine Translation (MT), morphological segmentation has proven successful in alleviating data sparsity in monolingual contexts; however, it has not been investigated for CS settings....
Code-switching (CS) is a common linguistic phenomenon exhibited by multilingual individuals, where they tend to alternate between languages within one single conversation.CS is a complex phenomenon that not only encompasses linguistic challenges, but also contains a great deal of complexity in terms of its dynamic behavior across speakers. Given th...
Code-switching (CS) is a common linguistic phenomenon exhibited by multilingual individuals, where they tend to alternate between languages within one single conversation. CS is a complex phenomenon that not only encompasses linguistic challenges, but also contains a great deal of complexity in terms of its dynamic behaviour across speakers. Given...
Code-switching (CS) poses several challenges to NLP tasks, where data sparsity is a main problem hindering the development of CS NLP systems. In this paper, we investigate data augmentation techniques for synthesizing Dialectal Arabic-English CS text. We perform lexical replacements using parallel corpora and alignments where CS points are either r...
Multilingual speakers tend to alternate between languages within a conversation, a phenomenon referred to as "code-switching" (CS). CS is a complex phenomenon that not only encompasses linguistic challenges, but also contains a great deal of complexity in terms of its dynamic behaviour across speakers. This dynamic behaviour has been studied by soc...
Code-switching (CS), defined as the mixing of languages in conversations, has become a worldwide phenomenon. The prevalence of CS has been recently met with a growing demand and interest to build CS automatic speech recognition (ASR) systems. In this paper, we present our work on code-switched Egyptian Arabic-English ASR. We first contribute in fil...
Code-switching (CS), defined as the mixing of languages in conversations, has become a worldwide phenomenon. The prevalence of CS has been recently met with a growing demand and interest to build CS ASR systems. In this paper, we present our work on code-switched Egyptian Arabic-English automatic speech recognition (ASR). We first contribute in fil...
With the rise of globalization, the use of mixed languages in daily conversations, referred to as “code-switching” (CS) has become a common linguistic phenomenon among bilingual/multilingual communities. It has become common for people to alternate between distinct languages or “codes” in daily conversations. This has placed a high demand on Natura...
Code-switching has become a prevalent phenomenon across many communities. It poses a challenge to NLP researchers, mainly due to the lack of available data needed for training and testing applications. In this paper, we introduce a new resource: a corpus of Egyptian Arabic code-switch speech data that is fully tokenized, lemmatized and annotated fo...
In this paper, we present our ArzEn corpus, an Egyptian Arabic-English code-switching (CS) spontaneous speech corpus. The corpus is collected through informal interviews with 38 Egyptian bilingual university students and employees held in a soundproof room. A total of 12 hours are recorded, transcribed, validated and sentence segmented. The corpus...
Code-switching (CS) is a widespread phenomenon among bilingual and multilingual societies. The lack of CS resources hinders the performance of many NLP tasks. In this work, we explore the potential use of bilingual word embeddings for code-switching (CS) language modeling (LM) in the low resource Egyptian Arabic-English language. We evaluate differ...
Code-switching (CS) is a widespread phenomenon among bilingual and multilingual societies. The lack of CS resources hinders the performance of many NLP tasks. In this work, we explore the potential use of bilingual word embeddings for code-switching (CS) language modeling (LM) in the low resource Egyptian Arabic-English language. We evaluate differ...
It has become common, especially among urban youth, for people to use more than one language in their everyday conversations - a phenomenon referred to by linguists as “code-switching”. With the rise in globalization and the widespread of code-switching among multilingual societies, a great demand has been placed on Natural Language Processing (NLP...
Speech corpora are key components needed by both: linguists (in language analyses, research and teaching languages) and Natural Language Processing (NLP) researchers (in training and evaluating several NLP tasks such as speech recognition, text-to-speech and speech-to-text synthesis). Despite of the great demand, there is still a huge shortage in a...
Speech therapists and researchers are becoming more concerned with the use of computer-based systems in the therapy of speech disorders. In this paper, we propose a computer-based game with a purpose (GWAP) for speech therapy of Egyptian speaking children suffering from Dyslalia. Our aim is to detect if a certain phoneme is pronounced correctly. An...
The use of mixed languages in daily conversations, referred to as “code-switching”, has become a common linguistic phenomenon among bilingual/multilingual communities. Code-switching involves the alternating use of distinct languages or “codes” at sentence boundaries or within the same sentence. With the rise of globalization, code-switching has be...
Edutainment is a neologism that combines education and entertainment. It has been found to promote learning in a fun and interesting environment. Edutainment applications covered several fields including language learning. However, applications that focused on reading in language learning suffered from some limitations. These limitations include lo...
Adapting to the emotions of a user and reacting to his/her mood in computer applications has become a recent concern in current researches. This can help accomplish a required task in a very user-friendly way. Emotionally intelligent systems depend on their detection of emotions from the outer behavior of the user such as facial expressions, head g...
Building Automatic Speech Recognition (ASR) systems for spoken languages usually suffer from the problem of limited available transcriptions. Automatic Speech Recognition (ASR) systems require large speech corpora that contain speech and their corresponding transcriptions for training acoustic models. In this paper, we target the Egyptian dialectal...
Collaborative tagging systems allow their users to mark resources with labels thus providing metadata about shared content. This paper presents a new reasoning engine for collaborative tagging systems. The engine was built using Constraint Handling Rules (CHR). Through the new engine, different application-specific properties are captured. In addit...
With the rise of Web 2.0, the amount of information available to the users has grown tremendously. Recommendation systems have emerged as successful tools that look into the users' perspectives and accordingly provide users with information presumed to be of interest to them. Early generations of recommendation systems have achieved great success....